Commit graph

19 commits

Author SHA1 Message Date
Davide Masserut
5d8a8878d5 Update scraping rules for ilpost.it 2023-05-02 17:07:25 -07:00
Emiel Wiedijk
5a88e0465e Update rewrite rules for theverge.com
Articles on The Verge sometimes contain a section for related articles.
This section can be distracting in reader mode. Therefore, filter the
related article section using the scraper rules.
2023-04-07 16:12:19 -07:00
Davide Masserut
755c9af47d Update scraping rules for ilpost.it 2023-03-01 20:04:25 -08:00
Marie Ramlow
48acd1feca Add rewrite and scraper rules for blog.cloudflare.com 2023-02-05 21:01:42 -08:00
Davide Masserut
690d66ce0b Update scraping rules for ilpost.it 2022-12-27 13:33:41 -08:00
Romain de Laage
eb86773039 Recalbox rewrite rule 2022-10-19 20:13:44 -07:00
jgbresson
aa47789f55
Add add_dynamic_image rewrite rule for theverge.com 2022-10-16 11:57:01 -07:00
Gabe Cook
405d4febd9 Parse markdown by default for blog.laravel.com 2022-07-30 20:19:09 -07:00
Romain de Laage
8329e9b46c
Make Invidious instance configurable 2022-01-05 20:43:03 -08:00
Jouni K. Seppänen
bb0d2bf675 Add Youtube videos in Quanta articles
Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like

    <script type="text/javascript" data-reactid="6">
      window.__APOLLO_STATE__ = {
        ...
          youtube_id: "9uASADiYe_8",

We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Thiago Perrotta
28d036434f Add rewrite rule: monkeyuser.com
Comics site, uses alt image text similarly to xkcd.com.
2021-12-16 11:50:26 -08:00
Thiago Perrotta
4b12043cea Sort rewrite rules 2021-12-16 11:50:26 -08:00
Frédéric Guillot
31435ef83e Add rewrite rule to fix Medium.com images 2020-09-29 22:27:32 -07:00
Frédéric Guillot
592151bdb6 Add support for Invidious
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Frédéric Guillot
c43c9458a9 Add rewrite functions: convert_text_link and nl2br 2019-11-28 21:33:12 -08:00
Peter De Wachter
b6f3160dbc add_mailto_subject: New rewrite function
Dinosaur Comics (qwantz.com) likes to hide jokes in mailto: links, but
miniflux's sanitizer strips those out.
2019-08-19 19:42:47 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
Frédéric Guillot
33445e5b68 Add the possibility to define rewrite rules for each feed 2017-12-11 22:16:32 -08:00