Commit graph

196 commits

Author SHA1 Message Date
Frédéric Guillot
5877048749 Improve handling of Atom text content with CDATA 2021-03-20 20:47:35 -07:00
Frédéric Guillot
c8c1f05328 Add better support of Atom text constructs
- Note that Miniflux does not render entry title with HTML tags as of now
- Omit XHTML div element because it should not be part of the content
2021-03-19 22:05:00 -07:00
Frédéric Guillot
96f3e888cf Handle RDF feed with HTML encoded entry title
Example: http://rss.slashdot.org/Slashdot/slashdotMain
2021-03-19 18:49:51 -07:00
Frédéric Guillot
14888f1cb8 Fix incorrect parsing of Atom entry content of type HTML 2021-03-18 21:43:59 -07:00
Gabriel Augendre
1d80c12e18
Prevent Youtube scraping if entry already exists 2021-03-08 20:10:53 -08:00
hykhd
053b1d0f8d
Handle RSS feeds with CDATA in author item element 2021-02-28 12:26:52 -08:00
Frédéric Guillot
ec3c604a83 Add option to allow self-signed or invalid certificates 2021-02-21 13:58:52 -08:00
Ilya Mateyko
c3f871b49b Use YouTube video duration as read time
This feature works by scraping YouTube website.

To enable it, set the FETCH_YOUTUBE_WATCH_TIME environment variable to
1.

Resolves #972.
2021-02-21 11:13:52 -08:00
hykhd
3cb04b2c56 update whitelist fix bilibili video 2021-02-20 10:29:42 -08:00
Frédéric Guillot
a352aff93b Remove deprecated io/ioutil package
Miniflux now requires at least Go 1.16 and io/util is deprecated.

https://golang.org/doc/go1.16#ioutil
2021-02-16 21:25:21 -08:00
Frédéric Guillot
04f9c456d5 Handle entry title with double encoded entities in Atom feeds 2021-02-14 11:19:21 -08:00
Frédéric Guillot
0413daf76b Remove iframe inner HTML contents
An iframe element never has fallback content, as it will always create a nested
browsing context, regardless of whether the specified initial contents are
successfully used.

https://www.w3.org/TR/2010/WD-html5-20101019/the-iframe-element.html#the-iframe-element
2021-02-13 14:00:21 -08:00
Frédéric Guillot
5043749b9f Add workaround for entry title with double encoded entities
Example: 'Text'
2021-02-13 13:33:59 -08:00
Nick Chitwood
793f475edd
Update date parser to fix another time zone issue
The Washington Post has its feeds with EST, which is getting parsed by miniflux as UTC, and showing up as 8 hours off.

See http://feeds.washingtonpost.com/rss/politics for an example.

This fix applies a similar workaround for EST/EDT as was done for PST/PDT.
2021-02-10 22:45:02 -08:00
Frédéric Guillot
864dd9f219 Allow images with data URLs
Only URLs with a mime-type image/* are allowed
2021-02-06 14:46:01 -08:00
Ilya Mateyko
4464802947 Reformat some Go files
When working on #994 I noticed that some Go files are not formatted with
`gofmt`.

This PR fixes this.
2021-01-27 18:13:58 -08:00
Frédéric Guillot
806b9545a9 Refactor feed validator 2021-01-04 14:47:25 -08:00
Frédéric Guillot
4468ef1410 Refactor category validation 2021-01-03 22:50:24 -08:00
Frédéric Guillot
291bf96d15 Do not strip tags for entry title
Some technical blogs have titles like "</some-title>" or "This is some <code>source code</code>".

Miniflux was removing these elements which prevent rendering the title correctly.
2021-01-03 11:44:07 -08:00
Frédéric Guillot
f0610bdd9c Refactor feed creation to allow setting most fields via API
Allow API clients to create disabled feeds or define field like "ignore_http_cache".
2021-01-02 16:48:22 -08:00
Frédéric Guillot
1908c84fbe Handle invalid French date 2020-12-02 20:59:14 -08:00
Frédéric Guillot
f722fd1208 Handle invalid feeds with relative URLs 2020-12-02 20:58:18 -08:00
Pacman99
b8b6c74d86 Add rewrite rule replace for custom search and replace 2020-11-29 10:32:26 -08:00
Frédéric Guillot
de7a613098 Calculate reading time during feed processing
The goal is to speed up the user interface.

Detecting the language based on the content is pretty slow.
2020-11-18 17:43:24 -08:00
Frédéric Guillot
b1c9977711 Handle more invalid dates 2020-11-17 17:12:12 -08:00
Frédéric Guillot
a108cb7808 Handle various invalid date 2020-11-16 21:37:33 -08:00
Frédéric Guillot
246a48359c Do not follow redirects when trying known feed URLs
Some websites redirects unknown URLs to the home page.
As result, the list of known URLs is returned to the subscription list.
We don't want the user to choose between invalid feed URLs.
2020-11-06 17:46:54 -08:00
Frédéric Guillot
40e983664c Trim spaces around icon URLs 2020-11-06 17:18:58 -08:00
Frédéric Guillot
4f358aa0f3 Do not escape HTML for Atom 1.0 text content during parsing
Avoid encoding single quotes to HTML entities (&#39;).

Feed contents are sanitized after parsing.
2020-10-30 23:41:33 -07:00
Frédéric Guillot
b30a045a4e Refactor entry filtering
Avoid looping multiple times across entries
2020-10-19 22:18:41 -07:00
Frédéric Guillot
b50778d3eb Add rewrite rule to use noscript content for images rendered with Javascript 2020-10-19 21:31:10 -07:00
Manuel Garrido
84b83fc3c8
Add feed filters (Keeplist and Blocklist) 2020-10-16 14:40:56 -07:00
Frédéric Guillot
3afdf25012 Do not proxy image data url 2020-10-14 22:26:54 -07:00
Frédéric Guillot
31435ef83e Add rewrite rule to fix Medium.com images 2020-09-29 22:27:32 -07:00
Frédéric Guillot
d75ff0c5ab Add sanitizer support for responsive images
- Add support for picture HTML tag
- Add support for srcset, media, and sizes attributes to img and source tags
2020-09-28 23:22:08 -07:00
Frédéric Guillot
c394a61a4e Add Prometheus exporter 2020-09-27 20:04:48 -07:00
Frédéric Guillot
16b7b3bc3e http client: remove dependency on global config options 2020-09-27 14:37:46 -07:00
Dave Marquard
eb026ae4ac handle Pacific Daylight Time in addition to Pacific Standard Time 2020-09-22 19:47:36 -07:00
Frédéric Guillot
0d0395b4e3 Do not try to update a duplicated feed after a refresh 2020-09-20 23:42:18 -07:00
Frédéric Guillot
e6c6ee441a Use a transaction to refresh and create entries
Also includes few database improvements:

- Speed up entries clean up with an index and a goroutine
- Avoid the accumulation of enclosures for some feeds
2020-09-20 23:12:23 -07:00
Frédéric Guillot
bfb96d536e Add workaround for parsing an invalid date 2020-09-14 21:23:26 -07:00
Kebin Liu
cf7712acea
Add HTTP proxy option for subscriptions 2020-09-09 23:28:54 -07:00
alex
0f258fd55b
Make add_invidious_video rule applicable for different invidious instances 2020-09-06 13:41:42 -07:00
Frédéric Guillot
fc75b0cd8e Add workaround to get YouTube feed from video page 2020-08-02 12:24:46 -07:00
Frédéric Guillot
7380c64141 Add workaround to find YouTube channel feeds
YouTube doesn't expose RSS links anymore for new-style URLs.
2020-08-02 11:37:07 -07:00
Frédéric Guillot
1d6b0491a7 Ignore <media:title> in RSS 2.0 feeds
In the vast majority of cases, the default entry title is correct.

Ignoring <media:title> avoid overriding the default title if they are different.
2020-06-29 18:24:06 -07:00
Gabriel Augendre
e44b4b2540 Try known urls if no link alternate
I came across a few blogs that didn't have a link rel alternate
but offered a RSS/Atom feed.
This aims at solving this issue for "well known" feed urls, since
these urls are often the same.
2020-06-21 20:34:59 -07:00
Manuel Müller
ca918bc7e3 Added scraper rule for dilbert.com and turnoff.us 2020-06-10 20:15:46 -07:00
Frédéric Guillot
6c6ca69141 Add feed option to ignore HTTP cache 2020-06-05 22:04:52 -07:00
Frédéric Guillot
7e5157f218 Rename alternative scheduler to entry_frequency 2020-05-25 15:12:47 -07:00