Commit graph

263 commits

Author SHA1 Message Date
Davide Masserut
034e46700c Process older entries first
Feed entries are usually ordered from most to least recent.

Processing older entries first ensures that their creation timestamp
is lower than that of newer entries.

This is useful when we order by creation, because then we get a
consistent timeline.
2023-03-25 16:19:07 -07:00
Davide Masserut
755c9af47d Update scraping rules for ilpost.it 2023-03-01 20:04:25 -08:00
Frédéric Guillot
02e4b8eadc Update GitHub Actions to use Go 1.20 2023-03-01 19:56:06 -08:00
Frédéric Guillot
aaa1625724 Ignore empty link when discovering feeds 2023-02-26 17:19:26 -08:00
privatmamtora
8f9ccc6540
Parse <category> from Feeds (RSS, Atom and JSON) 2023-02-24 20:52:45 -08:00
Marie Ramlow
48acd1feca Add rewrite and scraper rules for blog.cloudflare.com 2023-02-05 21:01:42 -08:00
xdavidwu
08f7835f5d sanitizer: allow id in <sup>
One of blogs I read uses anchor on <sup> to link a footnote back to its
reference.
2023-01-31 17:53:45 -08:00
Davide Masserut
690d66ce0b Update scraping rules for ilpost.it 2022-12-27 13:33:41 -08:00
Davide Masserut
ef312ef770 Update scraping rule for ilpost.it 2022-12-16 15:07:10 -08:00
Davide Masserut
c0bed53b42 Add scraping rule for ilpost.it 2022-12-15 19:53:12 -08:00
Harry Cheng
d9777f1439 Skip integrations if there are no entries to push 2022-12-04 12:58:10 -08:00
Frédéric Guillot
93715b542c Revert "scraper follow the only link"
This reverts commit 10207967c4.
2022-11-14 17:45:40 -08:00
Frédéric Guillot
de1a06e3e8 Add missing check in followTheOnlyLink() that leads to a panic
Bug introduced in PR #1290. Fixes #1631.
2022-11-14 16:44:02 -08:00
jebbs
10207967c4 scraper follow the only link
* in some cases, what the scraper got is only a landing page, user can use scraper rules to extract the link of the landing page and follow it
* it also fix the  wrong scrape rule apply when the server redirects it to another host
2022-10-31 19:49:34 -07:00
Romain de Laage
550e7d0415 Add matrix bot support 2022-10-27 17:53:19 -07:00
Romain de Laage
eb86773039 Recalbox rewrite rule 2022-10-19 20:13:44 -07:00
jgbresson
7f6ce16d85 Add scraping rules for theverge.com 2022-10-16 11:58:35 -07:00
jgbresson
aa47789f55
Add add_dynamic_image rewrite rule for theverge.com 2022-10-16 11:57:01 -07:00
Frédéric Guillot
d947b0194b Handle RSS entries with only a GUID permalink 2022-10-09 16:58:25 -07:00
Frédéric Guillot
138fd926ee Do not convert anchors to absolute links 2022-09-11 22:40:52 -07:00
Adam B
4d847c6a74 Add scraping rule for royalroad.com
This is what I use for several stories I follow, and I thought it might be useful to other miniflux users.
2022-08-17 19:25:39 -07:00
Owen Valentine
f404ddde91 Add swordscomic.com 2022-08-17 19:23:29 -07:00
Owen Valentine
c8a3d953cf Add smbc-comics.com 2022-08-17 19:23:29 -07:00
Owen Valentine
f851ecac78 Sort alphabetically 2022-08-17 19:23:29 -07:00
Frédéric Guillot
cecab91298 Fix some linter issues 2022-08-08 22:06:38 -07:00
Frédéric Guillot
13fa08ad39 Handle Atom links with a text/html type defined 2022-07-31 17:43:03 -07:00
Gabe Cook
405d4febd9 Parse markdown by default for blog.laravel.com 2022-07-30 20:19:09 -07:00
Gabe Cook
36df7b36ec Add parse_markdown rewrite function 2022-07-30 20:19:09 -07:00
Gabe Cook
bd1dc3149e Add explosm.net scraper rule 2022-07-30 20:10:52 -07:00
Gabriel Augendre
6e50ce3293 Make reading speed user-configurable 2022-07-17 19:35:24 -07:00
Carsten
2659883ce5
Add rewrite rules for article URL before fetching content 2022-07-11 21:12:26 -07:00
Frédéric Guillot
c0eab5ebc5 Avoid stretched image if specified width is larger than Miniflux's layout 2022-07-04 20:10:07 -07:00
Frédéric Guillot
f0a698c6fe Add support for OPML files with several nested outlines 2022-07-04 16:02:49 -07:00
Frédéric Guillot
806a069785 sanitizer: handle image URLs in srcset attribute with comma 2022-07-04 13:50:09 -07:00
Frédéric Guillot
d85908e3de Allow width and height attributes for img tags 2022-07-03 17:44:12 -07:00
nemunaire
5a07fd8932
Add new rewrite rule to decode base64 content 2022-05-25 20:44:04 -07:00
lf94
fa8431c5c6 Try to use outermost element text when title is empty 2022-04-13 21:51:54 -07:00
Frédéric Guillot
f6825c1c60 Fix invalid parsing of data URL
Fetching icons crashes with "slice bounds out of range" error if no encoding is specified.
2022-03-25 22:30:20 -07:00
Frédéric Guillot
1eb01b39e7 Use truncated entry description as title if unavailable 2022-03-04 17:10:32 -08:00
Frédéric Guillot
c9e0f0b3e4 Do not fallback to InnerXML if XHTML title is empty 2022-03-04 14:28:56 -08:00
Romain de Laage
808635e314 Add a rewrite rule for castopod episodes 2022-01-30 16:33:17 -08:00
Adrian Smith
cc3e65dd3c Handle atom feed with space around CDATA
Trim space around CDATA elements before extracting the CharData.

This problem was discovered when reading https://www.sethvargo.com/feed.xml.
Title and Summary fields have newlines and space between the <title>
element and the CDATA element. e.g.

  <title>
    <![CDATA[Entry title here]]>
  </title>

This meant the title of the feed was coming into MiniFlux as,
  <![CDATA[Entry title here]]>
2022-01-17 15:25:22 -08:00
Frédéric Guillot
f18ded6117 Add support for multiple authors in Atom feeds 2022-01-14 20:20:55 -08:00
Frédéric Guillot
2309b27458 Use custom feed user agent to fetch website icon 2022-01-08 15:20:18 -08:00
Romain de Laage
8329e9b46c
Make Invidious instance configurable 2022-01-05 20:43:03 -08:00
Jouni K. Seppänen
bb0d2bf675 Add Youtube videos in Quanta articles
Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like

    <script type="text/javascript" data-reactid="6">
      window.__APOLLO_STATE__ = {
        ...
          youtube_id: "9uASADiYe_8",

We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
2fedd8f234 Add scraper rule for ikiwiki.iki.fi
Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5

Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita

(To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)
2021-12-27 20:51:37 -08:00
Thiago Perrotta
28d036434f Add rewrite rule: monkeyuser.com
Comics site, uses alt image text similarly to xkcd.com.
2021-12-16 11:50:26 -08:00
Thiago Perrotta
4b12043cea Sort rewrite rules 2021-12-16 11:50:26 -08:00