Commit graph

36 commits

Author SHA1 Message Date
Frédéric Guillot
c234903255 Rename Miniflux package name to follow Go module naming convention
For reference: https://go.dev/ref/mod#major-version-suffixes
2023-08-09 22:10:44 -07:00
David Izquierdo
4fdef7b837 Add scrape and rewrite rules for webtoons
Although the only source I have for the rewrite rule is, in fact, https://github.com/miniflux/v2/pull/892, it does work when combined with add_dynamic_image and scraping the right element. I have not investigated further.

Works around https://github.com/miniflux/v2/issues/775 and https://github.com/miniflux/v2/issues/1871 (as in, gives us working webtoons feeds but referer spoofing would still be a nice tool to have).

Fixes https://github.com/miniflux/v2/issues/256.
2023-07-10 21:25:48 -07:00
fred
8646d61182 Replace copyright header with SPDX identifier 2023-06-19 15:00:45 -07:00
Marie Ramlow
48acd1feca Add rewrite and scraper rules for blog.cloudflare.com 2023-02-05 21:01:42 -08:00
Davide Masserut
690d66ce0b Update scraping rules for ilpost.it 2022-12-27 13:33:41 -08:00
Davide Masserut
ef312ef770 Update scraping rule for ilpost.it 2022-12-16 15:07:10 -08:00
Davide Masserut
c0bed53b42 Add scraping rule for ilpost.it 2022-12-15 19:53:12 -08:00
jgbresson
7f6ce16d85 Add scraping rules for theverge.com 2022-10-16 11:58:35 -07:00
Adam B
4d847c6a74 Add scraping rule for royalroad.com
This is what I use for several stories I follow, and I thought it might be useful to other miniflux users.
2022-08-17 19:25:39 -07:00
Owen Valentine
f404ddde91 Add swordscomic.com 2022-08-17 19:23:29 -07:00
Owen Valentine
c8a3d953cf Add smbc-comics.com 2022-08-17 19:23:29 -07:00
Owen Valentine
f851ecac78 Sort alphabetically 2022-08-17 19:23:29 -07:00
Gabe Cook
bd1dc3149e Add explosm.net scraper rule 2022-07-30 20:10:52 -07:00
Jouni K. Seppänen
bb0d2bf675 Add Youtube videos in Quanta articles
Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like

    <script type="text/javascript" data-reactid="6">
      window.__APOLLO_STATE__ = {
        ...
          youtube_id: "9uASADiYe_8",

We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
2fedd8f234 Add scraper rule for ikiwiki.iki.fi
Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5

Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita

(To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)
2021-12-27 20:51:37 -08:00
Frédéric Guillot
b7c229f30f Update scraper rule for theregister.com 2021-08-16 20:04:02 -07:00
Frédéric Guillot
31435ef83e Add rewrite rule to fix Medium.com images 2020-09-29 22:27:32 -07:00
Manuel Müller
ca918bc7e3 Added scraper rule for dilbert.com and turnoff.us 2020-06-10 20:15:46 -07:00
Corey McCaffrey
25d4b9fc0c Added scraper rule for financialsamurai.com
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey
0683074b8b Added scraper rule for TheOatmeal.com
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey
8f6c07afd6 Added scraper rule for RayWenderlich.com
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Andrew Williams
9974e0f458 Addition of scraper rule for wdwnt.com
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
somini
30f22fbd78 Update scraper rule for "Le Monde" 2019-12-19 18:35:29 -08:00
Neo Ng
90064a8cf0 Update scraper rule for openingsource.org 2019-11-28 19:40:26 -08:00
Tom Matthews
8b40778ee1 Add BBC News scraping rule 2018-12-13 20:25:30 -08:00
Frédéric Guillot
6f5d93cbbe Update scraper rule for lemonde.fr 2018-12-02 20:53:22 -08:00
mapl
e47188eab2 Update scraper rule for heise.de 2018-12-01 11:49:30 -08:00
Frédéric Guillot
df2bebaf3d Update scraper rule for heise.de 2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
Frédéric Guillot
1d7fe892e1 Add scraper rule for darkreading.com 2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef Add more scraper rules 2018-01-04 19:32:24 -08:00
Frédéric Guillot
c454f67037 Add scraper rules for version2.dk and ing.dk 2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597 Add more scraper rules 2017-12-27 13:36:07 -08:00
Frédéric Guillot
c6d9eb3614 Improve content scraper 2017-12-13 21:30:40 -08:00
Frédéric Guillot
87ccad5c7f Add scraper rules 2017-12-10 20:51:04 -08:00