Jouni K. Seppänen
dcf87bd642
Add scrape and rewrite rules for quantamagazine
...
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.
Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.
The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
2fedd8f234
Add scraper rule for ikiwiki.iki.fi
...
Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5
Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita
(To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)
2021-12-27 20:51:37 -08:00
Thiago Perrotta
28d036434f
Add rewrite rule: monkeyuser.com
...
Comics site, uses alt image text similarly to xkcd.com.
2021-12-16 11:50:26 -08:00
Thiago Perrotta
4b12043cea
Sort rewrite rules
2021-12-16 11:50:26 -08:00
Frédéric Guillot
0f6f4c8c60
Add <head> tag to OPML export
2021-12-16 11:49:50 -08:00
Artémis
b585dab6b4
Add data-srcset support to "add_dynamic_image rewrite" rewrite rule
2021-10-22 18:12:23 -07:00
Frank Steinborn
2dcabc840c
Fix minor typo
2021-10-17 16:58:42 -07:00
Frédéric Guillot
5f9d6fd81b
Handle srcset images with no space after comma
2021-10-13 21:31:08 -07:00
三三
34dd358eb0
Add Telegram integration
2021-09-07 20:04:22 -07:00
Lukas Dietrich
93596c1218
Add rewrite rule to remove dom elements
2021-09-06 09:47:05 -07:00
hulb
01f678c3b1
add proxy arg in scraper.Fetch
2021-08-28 21:57:11 -07:00
James Loh
2f6895e118
Fix finding JSON feeds with new MIME type
...
The 1.1 version (https://jsonfeed.org/version/1.1 ) for JSON feeds defines that feeds should have a MIME type of `application/feed+json` which Miniflux wasn't searching for
2021-08-21 13:01:08 -07:00
Frédéric Guillot
b7c229f30f
Update scraper rule for theregister.com
2021-08-16 20:04:02 -07:00
Alexandros Kosiaris
b8b16c3bdf
Add /rss/ in finder's wellKnownUrls
...
ATCOM netvolution WCM, probably alongside others, a CMS powering several
high profile and high traffic Greek news sites, among other sites,
publishes the RSS feed under /rss/. Add it to the list. It's generic
enough to allow us to assume other software might do it to
On a select set of 627 Greek news media sites (the infamous Petsas list),
adding this rule increased discoverability of RSS feeds by a factor of
2.61% (from 498 to 511).
2021-07-22 19:46:40 -07:00
Dave Marquard
fc766de02d
use authors entry for json 1.1 feeds
2021-07-21 21:28:37 -07:00
Jan-Lukas Else
20cd023c07
Use runes instead of bytes to truncate JSON feed titles
...
This fix avoid breaking Unicode string.
It solves this error:
pq: invalid byte sequence for encoding "UTF8": 0xf0 0x9f 0x9a 0x2e
2021-05-31 11:42:59 -07:00
Frédéric Guillot
5b8eb4735c
Handle RSS feed title with encoded Unicode entities
2021-04-30 22:57:29 -07:00
yue
18e414ec45
Fix typo in reader/json/doc.go
2021-04-02 19:00:06 -07:00
Frédéric Guillot
6e2e2d1665
Setup golangci-lint Github Action
2021-03-22 21:34:48 -07:00
Darius
9242350f0e
Add per feed cookies option
2021-03-22 20:27:58 -07:00
Frédéric Guillot
e60e0ba3c4
Add workaround to handle some invalid dates
2021-03-21 10:52:27 -07:00
Frédéric Guillot
5877048749
Improve handling of Atom text content with CDATA
2021-03-20 20:47:35 -07:00
Frédéric Guillot
c8c1f05328
Add better support of Atom text constructs
...
- Note that Miniflux does not render entry title with HTML tags as of now
- Omit XHTML div element because it should not be part of the content
2021-03-19 22:05:00 -07:00
Frédéric Guillot
96f3e888cf
Handle RDF feed with HTML encoded entry title
...
Example: http://rss.slashdot.org/Slashdot/slashdotMain
2021-03-19 18:49:51 -07:00
Frédéric Guillot
14888f1cb8
Fix incorrect parsing of Atom entry content of type HTML
2021-03-18 21:43:59 -07:00
Gabriel Augendre
1d80c12e18
Prevent Youtube scraping if entry already exists
2021-03-08 20:10:53 -08:00
hykhd
053b1d0f8d
Handle RSS feeds with CDATA in author item element
2021-02-28 12:26:52 -08:00
Frédéric Guillot
ec3c604a83
Add option to allow self-signed or invalid certificates
2021-02-21 13:58:52 -08:00
Ilya Mateyko
c3f871b49b
Use YouTube video duration as read time
...
This feature works by scraping YouTube website.
To enable it, set the FETCH_YOUTUBE_WATCH_TIME environment variable to
1.
Resolves #972 .
2021-02-21 11:13:52 -08:00
hykhd
3cb04b2c56
update whitelist fix bilibili video
2021-02-20 10:29:42 -08:00
Frédéric Guillot
a352aff93b
Remove deprecated io/ioutil package
...
Miniflux now requires at least Go 1.16 and io/util is deprecated.
https://golang.org/doc/go1.16#ioutil
2021-02-16 21:25:21 -08:00
Frédéric Guillot
04f9c456d5
Handle entry title with double encoded entities in Atom feeds
2021-02-14 11:19:21 -08:00
Frédéric Guillot
0413daf76b
Remove iframe inner HTML contents
...
An iframe element never has fallback content, as it will always create a nested
browsing context, regardless of whether the specified initial contents are
successfully used.
https://www.w3.org/TR/2010/WD-html5-20101019/the-iframe-element.html#the-iframe-element
2021-02-13 14:00:21 -08:00
Frédéric Guillot
5043749b9f
Add workaround for entry title with double encoded entities
...
Example: &#39;Text&#39;
2021-02-13 13:33:59 -08:00
Nick Chitwood
793f475edd
Update date parser to fix another time zone issue
...
The Washington Post has its feeds with EST, which is getting parsed by miniflux as UTC, and showing up as 8 hours off.
See http://feeds.washingtonpost.com/rss/politics for an example.
This fix applies a similar workaround for EST/EDT as was done for PST/PDT.
2021-02-10 22:45:02 -08:00
Frédéric Guillot
864dd9f219
Allow images with data URLs
...
Only URLs with a mime-type image/* are allowed
2021-02-06 14:46:01 -08:00
Ilya Mateyko
4464802947
Reformat some Go files
...
When working on #994 I noticed that some Go files are not formatted with
`gofmt`.
This PR fixes this.
2021-01-27 18:13:58 -08:00
Frédéric Guillot
806b9545a9
Refactor feed validator
2021-01-04 14:47:25 -08:00
Frédéric Guillot
4468ef1410
Refactor category validation
2021-01-03 22:50:24 -08:00
Frédéric Guillot
291bf96d15
Do not strip tags for entry title
...
Some technical blogs have titles like "</some-title>" or "This is some <code>source code</code>".
Miniflux was removing these elements which prevent rendering the title correctly.
2021-01-03 11:44:07 -08:00
Frédéric Guillot
f0610bdd9c
Refactor feed creation to allow setting most fields via API
...
Allow API clients to create disabled feeds or define field like "ignore_http_cache".
2021-01-02 16:48:22 -08:00
Frédéric Guillot
1908c84fbe
Handle invalid French date
2020-12-02 20:59:14 -08:00
Frédéric Guillot
f722fd1208
Handle invalid feeds with relative URLs
2020-12-02 20:58:18 -08:00
Pacman99
b8b6c74d86
Add rewrite rule replace for custom search and replace
2020-11-29 10:32:26 -08:00
Frédéric Guillot
de7a613098
Calculate reading time during feed processing
...
The goal is to speed up the user interface.
Detecting the language based on the content is pretty slow.
2020-11-18 17:43:24 -08:00
Frédéric Guillot
b1c9977711
Handle more invalid dates
2020-11-17 17:12:12 -08:00
Frédéric Guillot
a108cb7808
Handle various invalid date
2020-11-16 21:37:33 -08:00
Frédéric Guillot
246a48359c
Do not follow redirects when trying known feed URLs
...
Some websites redirects unknown URLs to the home page.
As result, the list of known URLs is returned to the subscription list.
We don't want the user to choose between invalid feed URLs.
2020-11-06 17:46:54 -08:00
Frédéric Guillot
40e983664c
Trim spaces around icon URLs
2020-11-06 17:18:58 -08:00
Frédéric Guillot
4f358aa0f3
Do not escape HTML for Atom 1.0 text content during parsing
...
Avoid encoding single quotes to HTML entities (').
Feed contents are sanitized after parsing.
2020-10-30 23:41:33 -07:00
Frédéric Guillot
b30a045a4e
Refactor entry filtering
...
Avoid looping multiple times across entries
2020-10-19 22:18:41 -07:00
Frédéric Guillot
b50778d3eb
Add rewrite rule to use noscript content for images rendered with Javascript
2020-10-19 21:31:10 -07:00
Manuel Garrido
84b83fc3c8
Add feed filters (Keeplist and Blocklist)
2020-10-16 14:40:56 -07:00
Frédéric Guillot
3afdf25012
Do not proxy image data url
2020-10-14 22:26:54 -07:00
Frédéric Guillot
31435ef83e
Add rewrite rule to fix Medium.com images
2020-09-29 22:27:32 -07:00
Frédéric Guillot
d75ff0c5ab
Add sanitizer support for responsive images
...
- Add support for picture HTML tag
- Add support for srcset, media, and sizes attributes to img and source tags
2020-09-28 23:22:08 -07:00
Frédéric Guillot
c394a61a4e
Add Prometheus exporter
2020-09-27 20:04:48 -07:00
Frédéric Guillot
16b7b3bc3e
http client: remove dependency on global config options
2020-09-27 14:37:46 -07:00
Dave Marquard
eb026ae4ac
handle Pacific Daylight Time in addition to Pacific Standard Time
2020-09-22 19:47:36 -07:00
Frédéric Guillot
0d0395b4e3
Do not try to update a duplicated feed after a refresh
2020-09-20 23:42:18 -07:00
Frédéric Guillot
e6c6ee441a
Use a transaction to refresh and create entries
...
Also includes few database improvements:
- Speed up entries clean up with an index and a goroutine
- Avoid the accumulation of enclosures for some feeds
2020-09-20 23:12:23 -07:00
Frédéric Guillot
bfb96d536e
Add workaround for parsing an invalid date
2020-09-14 21:23:26 -07:00
Kebin Liu
cf7712acea
Add HTTP proxy option for subscriptions
2020-09-09 23:28:54 -07:00
alex
0f258fd55b
Make add_invidious_video rule applicable for different invidious instances
2020-09-06 13:41:42 -07:00
Frédéric Guillot
fc75b0cd8e
Add workaround to get YouTube feed from video page
2020-08-02 12:24:46 -07:00
Frédéric Guillot
7380c64141
Add workaround to find YouTube channel feeds
...
YouTube doesn't expose RSS links anymore for new-style URLs.
2020-08-02 11:37:07 -07:00
Frédéric Guillot
1d6b0491a7
Ignore <media:title> in RSS 2.0 feeds
...
In the vast majority of cases, the default entry title is correct.
Ignoring <media:title> avoid overriding the default title if they are different.
2020-06-29 18:24:06 -07:00
Gabriel Augendre
e44b4b2540
Try known urls if no link alternate
...
I came across a few blogs that didn't have a link rel alternate
but offered a RSS/Atom feed.
This aims at solving this issue for "well known" feed urls, since
these urls are often the same.
2020-06-21 20:34:59 -07:00
Manuel Müller
ca918bc7e3
Added scraper rule for dilbert.com and turnoff.us
2020-06-10 20:15:46 -07:00
Frédéric Guillot
6c6ca69141
Add feed option to ignore HTTP cache
2020-06-05 22:04:52 -07:00
Frédéric Guillot
7e5157f218
Rename alternative scheduler to entry_frequency
2020-05-25 15:12:47 -07:00
Shizun Ge
cead85b165
Add alternative scheduler based on the number of entries
2020-05-25 14:06:56 -07:00
Corey McCaffrey
25d4b9fc0c
Added scraper rule for financialsamurai.com
...
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey
0683074b8b
Added scraper rule for TheOatmeal.com
...
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey
8f6c07afd6
Added scraper rule for RayWenderlich.com
...
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Frédéric Guillot
619aa58fb3
Handle more invalid dates
...
Fixes #617
2020-04-25 20:15:18 -07:00
Frédéric Guillot
592151bdb6
Add support for Invidious
...
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Andrew Williams
9974e0f458
Addition of scraper rule for wdwnt.com
...
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
Frédéric Guillot
997e9422eb
Ignore enclosures without URL
2020-01-30 21:18:49 -08:00
Frédéric Guillot
61f0c8aa66
Allow application/xhtml+xml links as comments URL in Atom replies
2020-01-04 16:07:06 -08:00
Frédéric Guillot
bf632fad2e
Allow only absolute URLs in comments URL
...
Some feeds are using invalid URLs (random text).
2020-01-04 15:54:16 -08:00
Kebin Liu
8cebd985a2
Use internal XML workarounds to detect feed format
2020-01-02 22:19:15 -08:00
Frédéric Guillot
ac3c936820
Make sure whitelisted URI schemes are handled properly by the sanitizer
2020-01-02 11:03:51 -08:00
Frédéric Guillot
3debf75eb9
Normalize URL query string before executing HTTP requests
...
- Make sure query strings parameters are encoded
- As opposed to the standard library, do not append equal sign
for query parameters with empty value
- Strip URL fragments like Web browsers
2019-12-26 15:56:59 -08:00
Frédéric Guillot
200b1c304b
Improve Dublin Core support for RDF feeds
2019-12-23 14:45:58 -08:00
Frédéric Guillot
1b33bb3d1c
Improve Podcast support (iTunes and Google Play feeds)
...
- Add support for Google Play XML namespace
- Improve existing iTunes namespace implementation
2019-12-23 13:51:42 -08:00
Frédéric Guillot
33fdb2c489
Add support for Atom 0.3
2019-12-22 22:42:00 -08:00
Frédéric Guillot
cfb6ddfcea
Add support for Atom 'replies' link relation
...
Show comments URL for Atom feeds as per RFC 4685.
See https://tools.ietf.org/html/rfc4685#section-4
Note that only the first link with type "text/html" is taken into consideration.
2019-12-22 18:03:04 -08:00
cinput
8e1ed8bef3
Return outer HTML when scraping elements
2019-12-21 21:18:31 -08:00
somini
30f22fbd78
Update scraper rule for "Le Monde"
2019-12-19 18:35:29 -08:00
Jebbs
a155ab6deb
Filter valid XML characters for UTF-8 XML documents before decoding
...
This change should reduce "illegal character code" XML errors.
2019-12-19 18:31:52 -08:00
Frédéric Guillot
a4ebb33cd5
Trim spaces for RDF entry links
2019-12-01 15:06:01 -08:00
Frédéric Guillot
120d6ec7d8
Do no rewrite Youtube description twice in "add_youtube_video" rule
...
This is already done before in <media:description>.
2019-11-30 22:56:06 -08:00
Frédéric Guillot
69aa650203
Add the possibility to add rules during feed creation
2019-11-29 11:27:58 -08:00
Frédéric Guillot
912a98788e
Add support of media elements for Atom feeds
2019-11-28 23:55:40 -08:00
Frédéric Guillot
f90e9dfab0
Add support of media elements for RSS 2 feeds
2019-11-28 21:33:32 -08:00
Frédéric Guillot
c43c9458a9
Add rewrite functions: convert_text_link and nl2br
2019-11-28 21:33:12 -08:00
Neo Ng
90064a8cf0
Update scraper rule for openingsource.org
2019-11-28 19:40:26 -08:00
Tony Wang
2eb2441f2b
Improve XML decoder to remove illegal characters
2019-10-22 20:32:35 -07:00
Tony Wang
5517eebafe
Add new formats to date parser
2019-10-20 09:52:18 -07:00