Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like
<script type="text/javascript" data-reactid="6">
window.__APOLLO_STATE__ = {
...
youtube_id: "9uASADiYe_8",
We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.
Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.
The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.