Frédéric Guillot
1908c84fbe
Handle invalid French date
2020-12-02 20:59:14 -08:00
Frédéric Guillot
f722fd1208
Handle invalid feeds with relative URLs
2020-12-02 20:58:18 -08:00
Pacman99
b8b6c74d86
Add rewrite rule replace for custom search and replace
2020-11-29 10:32:26 -08:00
Frédéric Guillot
de7a613098
Calculate reading time during feed processing
...
The goal is to speed up the user interface.
Detecting the language based on the content is pretty slow.
2020-11-18 17:43:24 -08:00
Frédéric Guillot
b1c9977711
Handle more invalid dates
2020-11-17 17:12:12 -08:00
Frédéric Guillot
a108cb7808
Handle various invalid date
2020-11-16 21:37:33 -08:00
Frédéric Guillot
246a48359c
Do not follow redirects when trying known feed URLs
...
Some websites redirects unknown URLs to the home page.
As result, the list of known URLs is returned to the subscription list.
We don't want the user to choose between invalid feed URLs.
2020-11-06 17:46:54 -08:00
Frédéric Guillot
40e983664c
Trim spaces around icon URLs
2020-11-06 17:18:58 -08:00
Frédéric Guillot
4f358aa0f3
Do not escape HTML for Atom 1.0 text content during parsing
...
Avoid encoding single quotes to HTML entities (').
Feed contents are sanitized after parsing.
2020-10-30 23:41:33 -07:00
Frédéric Guillot
b30a045a4e
Refactor entry filtering
...
Avoid looping multiple times across entries
2020-10-19 22:18:41 -07:00
Frédéric Guillot
b50778d3eb
Add rewrite rule to use noscript content for images rendered with Javascript
2020-10-19 21:31:10 -07:00
Manuel Garrido
84b83fc3c8
Add feed filters (Keeplist and Blocklist)
2020-10-16 14:40:56 -07:00
Frédéric Guillot
3afdf25012
Do not proxy image data url
2020-10-14 22:26:54 -07:00
Frédéric Guillot
31435ef83e
Add rewrite rule to fix Medium.com images
2020-09-29 22:27:32 -07:00
Frédéric Guillot
d75ff0c5ab
Add sanitizer support for responsive images
...
- Add support for picture HTML tag
- Add support for srcset, media, and sizes attributes to img and source tags
2020-09-28 23:22:08 -07:00
Frédéric Guillot
c394a61a4e
Add Prometheus exporter
2020-09-27 20:04:48 -07:00
Frédéric Guillot
16b7b3bc3e
http client: remove dependency on global config options
2020-09-27 14:37:46 -07:00
Dave Marquard
eb026ae4ac
handle Pacific Daylight Time in addition to Pacific Standard Time
2020-09-22 19:47:36 -07:00
Frédéric Guillot
0d0395b4e3
Do not try to update a duplicated feed after a refresh
2020-09-20 23:42:18 -07:00
Frédéric Guillot
e6c6ee441a
Use a transaction to refresh and create entries
...
Also includes few database improvements:
- Speed up entries clean up with an index and a goroutine
- Avoid the accumulation of enclosures for some feeds
2020-09-20 23:12:23 -07:00
Frédéric Guillot
bfb96d536e
Add workaround for parsing an invalid date
2020-09-14 21:23:26 -07:00
Kebin Liu
cf7712acea
Add HTTP proxy option for subscriptions
2020-09-09 23:28:54 -07:00
alex
0f258fd55b
Make add_invidious_video rule applicable for different invidious instances
2020-09-06 13:41:42 -07:00
Frédéric Guillot
fc75b0cd8e
Add workaround to get YouTube feed from video page
2020-08-02 12:24:46 -07:00
Frédéric Guillot
7380c64141
Add workaround to find YouTube channel feeds
...
YouTube doesn't expose RSS links anymore for new-style URLs.
2020-08-02 11:37:07 -07:00
Frédéric Guillot
1d6b0491a7
Ignore <media:title> in RSS 2.0 feeds
...
In the vast majority of cases, the default entry title is correct.
Ignoring <media:title> avoid overriding the default title if they are different.
2020-06-29 18:24:06 -07:00
Gabriel Augendre
e44b4b2540
Try known urls if no link alternate
...
I came across a few blogs that didn't have a link rel alternate
but offered a RSS/Atom feed.
This aims at solving this issue for "well known" feed urls, since
these urls are often the same.
2020-06-21 20:34:59 -07:00
Manuel Müller
ca918bc7e3
Added scraper rule for dilbert.com and turnoff.us
2020-06-10 20:15:46 -07:00
Frédéric Guillot
6c6ca69141
Add feed option to ignore HTTP cache
2020-06-05 22:04:52 -07:00
Frédéric Guillot
7e5157f218
Rename alternative scheduler to entry_frequency
2020-05-25 15:12:47 -07:00
Shizun Ge
cead85b165
Add alternative scheduler based on the number of entries
2020-05-25 14:06:56 -07:00
Corey McCaffrey
25d4b9fc0c
Added scraper rule for financialsamurai.com
...
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey
0683074b8b
Added scraper rule for TheOatmeal.com
...
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey
8f6c07afd6
Added scraper rule for RayWenderlich.com
...
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Frédéric Guillot
619aa58fb3
Handle more invalid dates
...
Fixes #617
2020-04-25 20:15:18 -07:00
Frédéric Guillot
592151bdb6
Add support for Invidious
...
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Andrew Williams
9974e0f458
Addition of scraper rule for wdwnt.com
...
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
Frédéric Guillot
997e9422eb
Ignore enclosures without URL
2020-01-30 21:18:49 -08:00
Frédéric Guillot
61f0c8aa66
Allow application/xhtml+xml links as comments URL in Atom replies
2020-01-04 16:07:06 -08:00
Frédéric Guillot
bf632fad2e
Allow only absolute URLs in comments URL
...
Some feeds are using invalid URLs (random text).
2020-01-04 15:54:16 -08:00
Kebin Liu
8cebd985a2
Use internal XML workarounds to detect feed format
2020-01-02 22:19:15 -08:00
Frédéric Guillot
ac3c936820
Make sure whitelisted URI schemes are handled properly by the sanitizer
2020-01-02 11:03:51 -08:00
Frédéric Guillot
3debf75eb9
Normalize URL query string before executing HTTP requests
...
- Make sure query strings parameters are encoded
- As opposed to the standard library, do not append equal sign
for query parameters with empty value
- Strip URL fragments like Web browsers
2019-12-26 15:56:59 -08:00
Frédéric Guillot
200b1c304b
Improve Dublin Core support for RDF feeds
2019-12-23 14:45:58 -08:00
Frédéric Guillot
1b33bb3d1c
Improve Podcast support (iTunes and Google Play feeds)
...
- Add support for Google Play XML namespace
- Improve existing iTunes namespace implementation
2019-12-23 13:51:42 -08:00
Frédéric Guillot
33fdb2c489
Add support for Atom 0.3
2019-12-22 22:42:00 -08:00
Frédéric Guillot
cfb6ddfcea
Add support for Atom 'replies' link relation
...
Show comments URL for Atom feeds as per RFC 4685.
See https://tools.ietf.org/html/rfc4685#section-4
Note that only the first link with type "text/html" is taken into consideration.
2019-12-22 18:03:04 -08:00
cinput
8e1ed8bef3
Return outer HTML when scraping elements
2019-12-21 21:18:31 -08:00
somini
30f22fbd78
Update scraper rule for "Le Monde"
2019-12-19 18:35:29 -08:00
Jebbs
a155ab6deb
Filter valid XML characters for UTF-8 XML documents before decoding
...
This change should reduce "illegal character code" XML errors.
2019-12-19 18:31:52 -08:00
Frédéric Guillot
a4ebb33cd5
Trim spaces for RDF entry links
2019-12-01 15:06:01 -08:00
Frédéric Guillot
120d6ec7d8
Do no rewrite Youtube description twice in "add_youtube_video" rule
...
This is already done before in <media:description>.
2019-11-30 22:56:06 -08:00
Frédéric Guillot
69aa650203
Add the possibility to add rules during feed creation
2019-11-29 11:27:58 -08:00
Frédéric Guillot
912a98788e
Add support of media elements for Atom feeds
2019-11-28 23:55:40 -08:00
Frédéric Guillot
f90e9dfab0
Add support of media elements for RSS 2 feeds
2019-11-28 21:33:32 -08:00
Frédéric Guillot
c43c9458a9
Add rewrite functions: convert_text_link and nl2br
2019-11-28 21:33:12 -08:00
Neo Ng
90064a8cf0
Update scraper rule for openingsource.org
2019-11-28 19:40:26 -08:00
Tony Wang
2eb2441f2b
Improve XML decoder to remove illegal characters
2019-10-22 20:32:35 -07:00
Tony Wang
5517eebafe
Add new formats to date parser
2019-10-20 09:52:18 -07:00
Frédéric Guillot
36d7732234
Disable strict XML parsing
...
This change should improve parsing of broken XML feeds.
See https://golang.org/pkg/encoding/xml/#Decoder
2019-09-18 22:45:56 -07:00
Frédéric Guillot
934385ff55
Replace Travis by GitHub Actions
2019-09-15 11:48:15 -07:00
Frédéric Guillot
8d8f78241d
Add native lazy loading for images and iframes
...
This feature is available only in Chrome >= 76 for now.
See https://web.dev/native-lazy-loading
2019-09-10 21:22:19 -07:00
Peter De Wachter
b6f3160dbc
add_mailto_subject: New rewrite function
...
Dinosaur Comics (qwantz.com) likes to hide jokes in mailto: links, but
miniflux's sanitizer strips those out.
2019-08-19 19:42:47 -07:00
Frédéric Guillot
ac45307da6
Add test case for parsing HTML entities
2019-08-15 21:42:13 -07:00
Peter De Wachter
ea2b6e3608
addImageTitle: Fix HTML injection
...
This rewrite rule would change this:
<img title="<foo>">
to this:
<figure><img><figcaption><foo></figcaption></figure>
The image title needs to be properly escaped.
2019-08-15 21:39:41 -07:00
Peter De Wachter
3a39d110f0
Accept HTML entities when parsing XML
...
Every once in a while, one of my feeds would throw an XML parse error
because it used ` ` or some other HTML entity. I feel Miniflux
should be lenient here, and Go already has a handy hook to make this
work.
2019-08-15 21:26:07 -07:00
Ilya Glotov
c840268678
Sort feed categories before serialization
...
A function is added for feeds and its categories normalization.
The test will ensure that the order is right.
2019-07-05 20:34:49 +03:00
Frédéric Guillot
129f1bf3da
Add support for OPML v1 import
2019-03-26 20:09:31 -07:00
Jeremy Apthorp
304b43cb30
Add 'allow-popups' to iframe sandbox permissions
2019-03-26 18:26:56 -07:00
Frédéric Guillot
6764a420b0
Make parser compatible with Go 1.12
...
See changes in strings.Map(): https://golang.org/doc/go1.12#strings
2019-02-28 21:23:33 -08:00
Frédéric Guillot
f3fc8b7072
Use feed ID instead of user ID to check entry URLs presence
2019-02-28 20:43:33 -08:00
Frédéric Guillot
ed6ae7e0d2
Use preferably the published date for Atom feeds
...
YouTube feeds use the published date for the original creation date.
2019-01-29 20:01:36 -08:00
Peter De Wachter
0cdcec10ca
More robust Atom text handling
...
Miniflux couldn't deal with XHTML Summary elements.
- Make Summary an 'atomContent' field
- Define an atomContentToString function rather than inling it three times
- Also properly escape special characters in plain text fields.
2019-01-07 17:55:02 -08:00
Frédéric Guillot
56efd2eb3f
Add workaround for non GMT dates (RFC822, RFC850, and RFC1123)
...
RFC822, RFC850, and RFC1123 are supposed to be always in GMT.
This is a workaround for the one defined in PST timezone.
2018-12-26 20:24:38 -08:00
Frédéric Guillot
012138179c
Add function storage.UpdateFeedError()
2018-12-15 13:04:38 -08:00
Tom Matthews
8b40778ee1
Add BBC News scraping rule
2018-12-13 20:25:30 -08:00
Frederic Guillot
61bfb3cfa8
Make password prompt compatible with Windows
2018-12-09 17:44:33 -08:00
Frédéric Guillot
1bc8535dbb
Move image proxy filter to template functions
2018-12-02 21:09:53 -08:00
Frédéric Guillot
6f5d93cbbe
Update scraper rule for lemonde.fr
2018-12-02 20:53:22 -08:00
Frédéric Guillot
311a133ab8
Refactor manual entry scraper
2018-12-02 20:51:06 -08:00
mapl
e47188eab2
Update scraper rule for heise.de
2018-12-01 11:49:30 -08:00
Frédéric Guillot
487852f07e
Replace daemon and scheduler package with service package
2018-11-11 15:32:48 -08:00
Frédéric Guillot
3b6e44c331
Allow the scraper to parse XHTML documents
...
Only "text/html" was authorized before.
2018-11-03 13:44:13 -07:00
Frédéric Guillot
ae1dc1a91e
Handle more encoding conversion edge cases
2018-10-29 23:00:03 -07:00
Frédéric Guillot
7d1b471d88
Add test case to check different feed encoding and HTTP headers
2018-10-29 19:04:36 -07:00
Frédéric Guillot
85d48c8a71
Add entries storage error to feed errors count
2018-10-21 11:44:29 -07:00
Frédéric Guillot
b8f874a37d
Simplify feed entries filtering
...
- Rename processor package to filter
- Remove boilerplate code
2018-10-14 22:33:19 -07:00
Frédéric Guillot
778346b0b0
Simplify feed fetcher
...
- Add browser package to handle HTTP errors
- Reduce code duplication
2018-10-14 21:43:48 -07:00
Frédéric Guillot
5870f04260
Simplify feed parser and format detection
...
- Avoid doing multiple buffer copies
- Move parser and format detection logic to its own package
2018-10-14 11:46:41 -07:00
Frédéric Guillot
9606126196
Convert text links and line feeds to HTML in YouTube channels
2018-10-08 20:47:10 -07:00
Frédéric Guillot
9dc38a0803
Add missing package descriptions for GoDoc
2018-10-08 17:32:17 -07:00
Frédéric Guillot
11dfcdd3d6
Fix typo in license header
2018-10-08 15:50:15 -07:00
Frédéric Guillot
b1e8f534ef
Simplify locale package usage (refactoring)
2018-09-22 15:04:55 -07:00
Frédéric Guillot
beb7a0cfcb
Use unique translation IDs instead of English text as key
2018-09-21 22:23:23 -07:00
Patrick
2538eea177
Add the possibility to override default user agent for each feed
2018-09-19 18:19:24 -07:00
Frédéric Guillot
df2bebaf3d
Update scraper rule for heise.de
2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97
Use canonical imports
2018-08-24 21:56:39 -07:00
neepl
5365f31e90
Add support for published tag in Atom feeds
2018-07-17 21:52:05 -07:00
Frédéric Guillot
a786e78aca
Add embedly.com to iframe whitelist
2018-07-10 20:56:54 -07:00
dzaikos
6d25e02cb5
New add_dynamic_image
rewriter for JavaScript-loaded images.
...
Searches tags for various `data-*` attributes and sets `img` tag `src` attribute appropriately. Falls back to searching `noscript` for `img` tags.
Includes unit tests.
2018-07-09 01:22:48 -04:00
dzaikos
e1c56b2e53
Processor: Do rewriter before sanitizer for entry.Content
.
...
Addresses #163 .
2018-07-06 00:17:07 -04:00
Frédéric Guillot
de1a4aad30
Add support for protocol relative YouTube URLs
2018-07-04 22:45:44 -07:00
dzaikos
7d4a195519
Sandbox iframes when sanitizing.
...
Updated iframe unit tests.
Refactored sanitizer.getExtraAttributes() to use `switch` instead of multiple `if` statements.
2018-07-03 12:55:18 -07:00
Frédéric Guillot
9c0f882ba0
Add specific 404 and 401 error messages
2018-06-30 12:42:12 -07:00
dzaikos
45d7105ed1
Refactor AddImageTitle rewriter.
...
* Only processes images with `src` **and** `title` attributes (others are ignored).
* Processes **all** images in the document (not just the first one).
* Wraps the image and its title attribute in a `figure` tag with the title attribute's contents in a `figcaption` tag.
Updated xkcd rewriter unit test.
Added another xkcd rewriter unit test to check rendering of images without title tags.
2018-06-26 17:50:18 -04:00
dzaikos
c9131b0e89
Improve sanitizer to remove style tag contents.
...
See #157 .
Refactored how blacklisted tags are handled so they're easier manage in the future.
2018-06-24 19:53:23 -07:00
Dave Z
d847b10e32
Improve sanitizer to remove script and noscript contents
...
These tags where removed but the content was rendered as escaped HTML.
See #157
2018-06-23 17:50:43 -07:00
Frédéric Guillot
bddca15b69
Add new fields for feed username/password
2018-06-19 22:58:29 -07:00
Frédéric Guillot
c719cf7df0
Rewrite iframe Youtube URLs to https://www.youtube-nocookie.com
2018-06-12 18:45:09 -07:00
Frédéric Guillot
0c2e5ff0dc
Handle feeds with dates formatted as Unix timestamp
2018-05-08 20:41:24 -07:00
Frédéric Guillot
5cacae6cf2
Add API endpoint to import OPML file
2018-04-29 18:56:40 -07:00
Frédéric Guillot
1eba1730d1
Move HTTP client to its own package
2018-04-28 10:51:07 -07:00
aniran
322b265d7a
Scrape parent element for iframe
...
Current behavior: if you have an `iframe` scraper rule, `scrapContent`
tries to return the inner HTML of the `iframe`, which turns up blank.
New behavior: like `img` elements, if an `iframe` is matched by a scraper rule,
the parent element's inner HTML (i.e. the `iframe` is returned).
2018-04-27 17:57:22 -07:00
aniran
920dda79b7
Add soundcloud and bandcamp iframe sources
2018-04-27 17:55:58 -07:00
Frédéric Guillot
dcbb5047b1
Add support for Dublin Core date in RDF feeds
2018-04-10 18:13:05 -07:00
Frédéric Guillot
02ba735ba9
Handle some non-english date formats
2018-04-09 21:27:15 -07:00
Frédéric Guillot
e2d02bac5a
Rename RSS parser getters
2018-04-09 20:38:12 -07:00
Frédéric Guillot
f76093690c
Get the right comments URL when having multiple namespaces
2018-04-09 20:30:55 -07:00
Frédéric Guillot
702256bcc0
Add unit test for comments url and French translation
2018-04-07 13:56:11 -07:00
Ben Brooks
538d08c16c
Add CommentsURL to entry
2018-04-07 13:50:45 -07:00
Frédéric Guillot
6ea4da3bce
Handle RSS author elements with inner HTML
2018-03-18 11:57:46 -07:00
Frédéric Guillot
482785c5e6
Convert enclosure size field to bigint
2018-03-14 20:09:06 -07:00
Frédéric Guillot
ec08f45bf5
Fix broken OPML import with Go 1.10
2018-03-14 18:50:06 -07:00
Frédéric Guillot
f110384f11
Improve parser error messages
2018-02-27 21:19:59 -08:00
Frédéric Guillot
953d0a2dc0
Support localized feed errors generated by background workers
2018-02-27 21:08:32 -08:00
Frédéric Guillot
9292d5d604
Handle Atom feeds with HTML title
2018-02-17 12:21:58 -08:00
Frédéric Guillot
dda9114692
Improve error handling for HTTP client
2018-02-08 18:16:54 -08:00
Frédéric Guillot
7b0bfd9308
Strip invalid XML characters to avoid parsing errors
2018-02-07 20:57:56 -08:00
Frédéric Guillot
c6fd9eb9b1
Remove period for feed errors
2018-02-07 19:10:36 -08:00
Frédéric Guillot
0fb87eba3f
Improve error handling when the response is empty
2018-02-07 18:47:47 -08:00
Frédéric Guillot
b78172033f
Show API URL endpoints in user interface
2018-01-31 21:57:20 -08:00
Frédéric Guillot
ffabb009b8
Do not override existing entries when the crawler is enabled
2018-01-20 14:04:19 -08:00
Frédéric Guillot
713b38e34c
Handle more encoding edge cases
...
- Feeds with charset specified only in Content-Type header and not in XML document
- Feeds with charset specified in both places
- Feeds with charset specified only in XML document and not in HTTP header
2018-01-20 13:25:21 -08:00
Frédéric Guillot
3b62f904d6
Do not crawl existing entry URLs
2018-01-20 13:25:20 -08:00
Frédéric Guillot
9652dfa1fe
Add more comments (GoDoc)
2018-01-11 19:21:20 -08:00
Frédéric Guillot
1d7fe892e1
Add scraper rule for darkreading.com
2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef
Add more scraper rules
2018-01-04 19:32:24 -08:00
Frédéric Guillot
7d278d49f1
Add content length check when refreshing feeds
2018-01-04 18:41:23 -08:00
Frédéric Guillot
efac11e082
Handle more date formats
2018-01-03 18:59:29 -08:00
Frédéric Guillot
ec63cbe7bb
If the website URL is empty, assign the feed URL
2018-01-03 18:23:21 -08:00
Frédéric Guillot
c39f2e1a8d
Rename helper packages
2018-01-02 19:15:08 -08:00
Frédéric Guillot
3c3f397bf5
Make sure the scraper parse only HTML documents
2018-01-02 18:32:01 -08:00
Frédéric Guillot
c454f67037
Add scraper rules for version2.dk and ing.dk
2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597
Add more scraper rules
2017-12-27 13:36:07 -08:00
Frédéric Guillot
f6a5d7d6ed
Add support for data URL favicons
2017-12-22 19:01:39 -08:00
Frédéric Guillot
e7afec7eca
Handle more date formats
2017-12-22 17:59:28 -08:00
Frédéric Guillot
1d8193b892
Add logger
2017-12-15 18:55:57 -08:00
Frédéric Guillot
c6d9eb3614
Improve content scraper
2017-12-13 21:30:40 -08:00
Frédéric Guillot
827683ab59
Make sure that item URL are absolute
2017-12-13 20:16:15 -08:00
Frédéric Guillot
84d912c979
Rewrite imports
2017-12-12 21:48:13 -08:00