Frédéric Guillot
4f358aa0f3
Do not escape HTML for Atom 1.0 text content during parsing
...
Avoid encoding single quotes to HTML entities (').
Feed contents are sanitized after parsing.
2020-10-30 23:41:33 -07:00
Frédéric Guillot
b30a045a4e
Refactor entry filtering
...
Avoid looping multiple times across entries
2020-10-19 22:18:41 -07:00
Frédéric Guillot
b50778d3eb
Add rewrite rule to use noscript content for images rendered with Javascript
2020-10-19 21:31:10 -07:00
Manuel Garrido
84b83fc3c8
Add feed filters (Keeplist and Blocklist)
2020-10-16 14:40:56 -07:00
Frédéric Guillot
3afdf25012
Do not proxy image data url
2020-10-14 22:26:54 -07:00
Frédéric Guillot
31435ef83e
Add rewrite rule to fix Medium.com images
2020-09-29 22:27:32 -07:00
Frédéric Guillot
d75ff0c5ab
Add sanitizer support for responsive images
...
- Add support for picture HTML tag
- Add support for srcset, media, and sizes attributes to img and source tags
2020-09-28 23:22:08 -07:00
Frédéric Guillot
c394a61a4e
Add Prometheus exporter
2020-09-27 20:04:48 -07:00
Frédéric Guillot
16b7b3bc3e
http client: remove dependency on global config options
2020-09-27 14:37:46 -07:00
Dave Marquard
eb026ae4ac
handle Pacific Daylight Time in addition to Pacific Standard Time
2020-09-22 19:47:36 -07:00
Frédéric Guillot
0d0395b4e3
Do not try to update a duplicated feed after a refresh
2020-09-20 23:42:18 -07:00
Frédéric Guillot
e6c6ee441a
Use a transaction to refresh and create entries
...
Also includes few database improvements:
- Speed up entries clean up with an index and a goroutine
- Avoid the accumulation of enclosures for some feeds
2020-09-20 23:12:23 -07:00
Frédéric Guillot
bfb96d536e
Add workaround for parsing an invalid date
2020-09-14 21:23:26 -07:00
Kebin Liu
cf7712acea
Add HTTP proxy option for subscriptions
2020-09-09 23:28:54 -07:00
alex
0f258fd55b
Make add_invidious_video rule applicable for different invidious instances
2020-09-06 13:41:42 -07:00
Frédéric Guillot
fc75b0cd8e
Add workaround to get YouTube feed from video page
2020-08-02 12:24:46 -07:00
Frédéric Guillot
7380c64141
Add workaround to find YouTube channel feeds
...
YouTube doesn't expose RSS links anymore for new-style URLs.
2020-08-02 11:37:07 -07:00
Frédéric Guillot
1d6b0491a7
Ignore <media:title> in RSS 2.0 feeds
...
In the vast majority of cases, the default entry title is correct.
Ignoring <media:title> avoid overriding the default title if they are different.
2020-06-29 18:24:06 -07:00
Gabriel Augendre
e44b4b2540
Try known urls if no link alternate
...
I came across a few blogs that didn't have a link rel alternate
but offered a RSS/Atom feed.
This aims at solving this issue for "well known" feed urls, since
these urls are often the same.
2020-06-21 20:34:59 -07:00
Manuel Müller
ca918bc7e3
Added scraper rule for dilbert.com and turnoff.us
2020-06-10 20:15:46 -07:00
Frédéric Guillot
6c6ca69141
Add feed option to ignore HTTP cache
2020-06-05 22:04:52 -07:00
Frédéric Guillot
7e5157f218
Rename alternative scheduler to entry_frequency
2020-05-25 15:12:47 -07:00
Shizun Ge
cead85b165
Add alternative scheduler based on the number of entries
2020-05-25 14:06:56 -07:00
Corey McCaffrey
25d4b9fc0c
Added scraper rule for financialsamurai.com
...
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey
0683074b8b
Added scraper rule for TheOatmeal.com
...
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey
8f6c07afd6
Added scraper rule for RayWenderlich.com
...
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Frédéric Guillot
619aa58fb3
Handle more invalid dates
...
Fixes #617
2020-04-25 20:15:18 -07:00
Frédéric Guillot
592151bdb6
Add support for Invidious
...
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Andrew Williams
9974e0f458
Addition of scraper rule for wdwnt.com
...
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
Frédéric Guillot
997e9422eb
Ignore enclosures without URL
2020-01-30 21:18:49 -08:00
Frédéric Guillot
61f0c8aa66
Allow application/xhtml+xml links as comments URL in Atom replies
2020-01-04 16:07:06 -08:00
Frédéric Guillot
bf632fad2e
Allow only absolute URLs in comments URL
...
Some feeds are using invalid URLs (random text).
2020-01-04 15:54:16 -08:00
Kebin Liu
8cebd985a2
Use internal XML workarounds to detect feed format
2020-01-02 22:19:15 -08:00
Frédéric Guillot
ac3c936820
Make sure whitelisted URI schemes are handled properly by the sanitizer
2020-01-02 11:03:51 -08:00
Frédéric Guillot
3debf75eb9
Normalize URL query string before executing HTTP requests
...
- Make sure query strings parameters are encoded
- As opposed to the standard library, do not append equal sign
for query parameters with empty value
- Strip URL fragments like Web browsers
2019-12-26 15:56:59 -08:00
Frédéric Guillot
200b1c304b
Improve Dublin Core support for RDF feeds
2019-12-23 14:45:58 -08:00
Frédéric Guillot
1b33bb3d1c
Improve Podcast support (iTunes and Google Play feeds)
...
- Add support for Google Play XML namespace
- Improve existing iTunes namespace implementation
2019-12-23 13:51:42 -08:00
Frédéric Guillot
33fdb2c489
Add support for Atom 0.3
2019-12-22 22:42:00 -08:00
Frédéric Guillot
cfb6ddfcea
Add support for Atom 'replies' link relation
...
Show comments URL for Atom feeds as per RFC 4685.
See https://tools.ietf.org/html/rfc4685#section-4
Note that only the first link with type "text/html" is taken into consideration.
2019-12-22 18:03:04 -08:00
cinput
8e1ed8bef3
Return outer HTML when scraping elements
2019-12-21 21:18:31 -08:00
somini
30f22fbd78
Update scraper rule for "Le Monde"
2019-12-19 18:35:29 -08:00
Jebbs
a155ab6deb
Filter valid XML characters for UTF-8 XML documents before decoding
...
This change should reduce "illegal character code" XML errors.
2019-12-19 18:31:52 -08:00
Frédéric Guillot
a4ebb33cd5
Trim spaces for RDF entry links
2019-12-01 15:06:01 -08:00
Frédéric Guillot
120d6ec7d8
Do no rewrite Youtube description twice in "add_youtube_video" rule
...
This is already done before in <media:description>.
2019-11-30 22:56:06 -08:00
Frédéric Guillot
69aa650203
Add the possibility to add rules during feed creation
2019-11-29 11:27:58 -08:00
Frédéric Guillot
912a98788e
Add support of media elements for Atom feeds
2019-11-28 23:55:40 -08:00
Frédéric Guillot
f90e9dfab0
Add support of media elements for RSS 2 feeds
2019-11-28 21:33:32 -08:00
Frédéric Guillot
c43c9458a9
Add rewrite functions: convert_text_link and nl2br
2019-11-28 21:33:12 -08:00
Neo Ng
90064a8cf0
Update scraper rule for openingsource.org
2019-11-28 19:40:26 -08:00
Tony Wang
2eb2441f2b
Improve XML decoder to remove illegal characters
2019-10-22 20:32:35 -07:00
Tony Wang
5517eebafe
Add new formats to date parser
2019-10-20 09:52:18 -07:00
Frédéric Guillot
36d7732234
Disable strict XML parsing
...
This change should improve parsing of broken XML feeds.
See https://golang.org/pkg/encoding/xml/#Decoder
2019-09-18 22:45:56 -07:00
Frédéric Guillot
934385ff55
Replace Travis by GitHub Actions
2019-09-15 11:48:15 -07:00
Frédéric Guillot
8d8f78241d
Add native lazy loading for images and iframes
...
This feature is available only in Chrome >= 76 for now.
See https://web.dev/native-lazy-loading
2019-09-10 21:22:19 -07:00
Peter De Wachter
b6f3160dbc
add_mailto_subject: New rewrite function
...
Dinosaur Comics (qwantz.com) likes to hide jokes in mailto: links, but
miniflux's sanitizer strips those out.
2019-08-19 19:42:47 -07:00
Frédéric Guillot
ac45307da6
Add test case for parsing HTML entities
2019-08-15 21:42:13 -07:00
Peter De Wachter
ea2b6e3608
addImageTitle: Fix HTML injection
...
This rewrite rule would change this:
<img title="<foo>">
to this:
<figure><img><figcaption><foo></figcaption></figure>
The image title needs to be properly escaped.
2019-08-15 21:39:41 -07:00
Peter De Wachter
3a39d110f0
Accept HTML entities when parsing XML
...
Every once in a while, one of my feeds would throw an XML parse error
because it used ` ` or some other HTML entity. I feel Miniflux
should be lenient here, and Go already has a handy hook to make this
work.
2019-08-15 21:26:07 -07:00
Ilya Glotov
c840268678
Sort feed categories before serialization
...
A function is added for feeds and its categories normalization.
The test will ensure that the order is right.
2019-07-05 20:34:49 +03:00
Frédéric Guillot
129f1bf3da
Add support for OPML v1 import
2019-03-26 20:09:31 -07:00
Jeremy Apthorp
304b43cb30
Add 'allow-popups' to iframe sandbox permissions
2019-03-26 18:26:56 -07:00
Frédéric Guillot
6764a420b0
Make parser compatible with Go 1.12
...
See changes in strings.Map(): https://golang.org/doc/go1.12#strings
2019-02-28 21:23:33 -08:00
Frédéric Guillot
f3fc8b7072
Use feed ID instead of user ID to check entry URLs presence
2019-02-28 20:43:33 -08:00
Frédéric Guillot
ed6ae7e0d2
Use preferably the published date for Atom feeds
...
YouTube feeds use the published date for the original creation date.
2019-01-29 20:01:36 -08:00
Peter De Wachter
0cdcec10ca
More robust Atom text handling
...
Miniflux couldn't deal with XHTML Summary elements.
- Make Summary an 'atomContent' field
- Define an atomContentToString function rather than inling it three times
- Also properly escape special characters in plain text fields.
2019-01-07 17:55:02 -08:00
Frédéric Guillot
56efd2eb3f
Add workaround for non GMT dates (RFC822, RFC850, and RFC1123)
...
RFC822, RFC850, and RFC1123 are supposed to be always in GMT.
This is a workaround for the one defined in PST timezone.
2018-12-26 20:24:38 -08:00
Frédéric Guillot
012138179c
Add function storage.UpdateFeedError()
2018-12-15 13:04:38 -08:00
Tom Matthews
8b40778ee1
Add BBC News scraping rule
2018-12-13 20:25:30 -08:00
Frederic Guillot
61bfb3cfa8
Make password prompt compatible with Windows
2018-12-09 17:44:33 -08:00
Frédéric Guillot
1bc8535dbb
Move image proxy filter to template functions
2018-12-02 21:09:53 -08:00
Frédéric Guillot
6f5d93cbbe
Update scraper rule for lemonde.fr
2018-12-02 20:53:22 -08:00
Frédéric Guillot
311a133ab8
Refactor manual entry scraper
2018-12-02 20:51:06 -08:00
mapl
e47188eab2
Update scraper rule for heise.de
2018-12-01 11:49:30 -08:00
Frédéric Guillot
487852f07e
Replace daemon and scheduler package with service package
2018-11-11 15:32:48 -08:00
Frédéric Guillot
3b6e44c331
Allow the scraper to parse XHTML documents
...
Only "text/html" was authorized before.
2018-11-03 13:44:13 -07:00
Frédéric Guillot
ae1dc1a91e
Handle more encoding conversion edge cases
2018-10-29 23:00:03 -07:00
Frédéric Guillot
7d1b471d88
Add test case to check different feed encoding and HTTP headers
2018-10-29 19:04:36 -07:00
Frédéric Guillot
85d48c8a71
Add entries storage error to feed errors count
2018-10-21 11:44:29 -07:00
Frédéric Guillot
b8f874a37d
Simplify feed entries filtering
...
- Rename processor package to filter
- Remove boilerplate code
2018-10-14 22:33:19 -07:00
Frédéric Guillot
778346b0b0
Simplify feed fetcher
...
- Add browser package to handle HTTP errors
- Reduce code duplication
2018-10-14 21:43:48 -07:00
Frédéric Guillot
5870f04260
Simplify feed parser and format detection
...
- Avoid doing multiple buffer copies
- Move parser and format detection logic to its own package
2018-10-14 11:46:41 -07:00
Frédéric Guillot
9606126196
Convert text links and line feeds to HTML in YouTube channels
2018-10-08 20:47:10 -07:00
Frédéric Guillot
9dc38a0803
Add missing package descriptions for GoDoc
2018-10-08 17:32:17 -07:00
Frédéric Guillot
11dfcdd3d6
Fix typo in license header
2018-10-08 15:50:15 -07:00
Frédéric Guillot
b1e8f534ef
Simplify locale package usage (refactoring)
2018-09-22 15:04:55 -07:00
Frédéric Guillot
beb7a0cfcb
Use unique translation IDs instead of English text as key
2018-09-21 22:23:23 -07:00
Patrick
2538eea177
Add the possibility to override default user agent for each feed
2018-09-19 18:19:24 -07:00
Frédéric Guillot
df2bebaf3d
Update scraper rule for heise.de
2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97
Use canonical imports
2018-08-24 21:56:39 -07:00
neepl
5365f31e90
Add support for published tag in Atom feeds
2018-07-17 21:52:05 -07:00
Frédéric Guillot
a786e78aca
Add embedly.com to iframe whitelist
2018-07-10 20:56:54 -07:00
dzaikos
6d25e02cb5
New add_dynamic_image
rewriter for JavaScript-loaded images.
...
Searches tags for various `data-*` attributes and sets `img` tag `src` attribute appropriately. Falls back to searching `noscript` for `img` tags.
Includes unit tests.
2018-07-09 01:22:48 -04:00
dzaikos
e1c56b2e53
Processor: Do rewriter before sanitizer for entry.Content
.
...
Addresses #163 .
2018-07-06 00:17:07 -04:00
Frédéric Guillot
de1a4aad30
Add support for protocol relative YouTube URLs
2018-07-04 22:45:44 -07:00
dzaikos
7d4a195519
Sandbox iframes when sanitizing.
...
Updated iframe unit tests.
Refactored sanitizer.getExtraAttributes() to use `switch` instead of multiple `if` statements.
2018-07-03 12:55:18 -07:00
Frédéric Guillot
9c0f882ba0
Add specific 404 and 401 error messages
2018-06-30 12:42:12 -07:00
dzaikos
45d7105ed1
Refactor AddImageTitle rewriter.
...
* Only processes images with `src` **and** `title` attributes (others are ignored).
* Processes **all** images in the document (not just the first one).
* Wraps the image and its title attribute in a `figure` tag with the title attribute's contents in a `figcaption` tag.
Updated xkcd rewriter unit test.
Added another xkcd rewriter unit test to check rendering of images without title tags.
2018-06-26 17:50:18 -04:00
dzaikos
c9131b0e89
Improve sanitizer to remove style tag contents.
...
See #157 .
Refactored how blacklisted tags are handled so they're easier manage in the future.
2018-06-24 19:53:23 -07:00
Dave Z
d847b10e32
Improve sanitizer to remove script and noscript contents
...
These tags where removed but the content was rendered as escaped HTML.
See #157
2018-06-23 17:50:43 -07:00
Frédéric Guillot
bddca15b69
Add new fields for feed username/password
2018-06-19 22:58:29 -07:00
Frédéric Guillot
c719cf7df0
Rewrite iframe Youtube URLs to https://www.youtube-nocookie.com
2018-06-12 18:45:09 -07:00
Frédéric Guillot
0c2e5ff0dc
Handle feeds with dates formatted as Unix timestamp
2018-05-08 20:41:24 -07:00
Frédéric Guillot
5cacae6cf2
Add API endpoint to import OPML file
2018-04-29 18:56:40 -07:00
Frédéric Guillot
1eba1730d1
Move HTTP client to its own package
2018-04-28 10:51:07 -07:00
aniran
322b265d7a
Scrape parent element for iframe
...
Current behavior: if you have an `iframe` scraper rule, `scrapContent`
tries to return the inner HTML of the `iframe`, which turns up blank.
New behavior: like `img` elements, if an `iframe` is matched by a scraper rule,
the parent element's inner HTML (i.e. the `iframe` is returned).
2018-04-27 17:57:22 -07:00
aniran
920dda79b7
Add soundcloud and bandcamp iframe sources
2018-04-27 17:55:58 -07:00
Frédéric Guillot
dcbb5047b1
Add support for Dublin Core date in RDF feeds
2018-04-10 18:13:05 -07:00
Frédéric Guillot
02ba735ba9
Handle some non-english date formats
2018-04-09 21:27:15 -07:00
Frédéric Guillot
e2d02bac5a
Rename RSS parser getters
2018-04-09 20:38:12 -07:00
Frédéric Guillot
f76093690c
Get the right comments URL when having multiple namespaces
2018-04-09 20:30:55 -07:00
Frédéric Guillot
702256bcc0
Add unit test for comments url and French translation
2018-04-07 13:56:11 -07:00
Ben Brooks
538d08c16c
Add CommentsURL to entry
2018-04-07 13:50:45 -07:00
Frédéric Guillot
6ea4da3bce
Handle RSS author elements with inner HTML
2018-03-18 11:57:46 -07:00
Frédéric Guillot
482785c5e6
Convert enclosure size field to bigint
2018-03-14 20:09:06 -07:00
Frédéric Guillot
ec08f45bf5
Fix broken OPML import with Go 1.10
2018-03-14 18:50:06 -07:00
Frédéric Guillot
f110384f11
Improve parser error messages
2018-02-27 21:19:59 -08:00
Frédéric Guillot
953d0a2dc0
Support localized feed errors generated by background workers
2018-02-27 21:08:32 -08:00
Frédéric Guillot
9292d5d604
Handle Atom feeds with HTML title
2018-02-17 12:21:58 -08:00
Frédéric Guillot
dda9114692
Improve error handling for HTTP client
2018-02-08 18:16:54 -08:00
Frédéric Guillot
7b0bfd9308
Strip invalid XML characters to avoid parsing errors
2018-02-07 20:57:56 -08:00
Frédéric Guillot
c6fd9eb9b1
Remove period for feed errors
2018-02-07 19:10:36 -08:00
Frédéric Guillot
0fb87eba3f
Improve error handling when the response is empty
2018-02-07 18:47:47 -08:00
Frédéric Guillot
b78172033f
Show API URL endpoints in user interface
2018-01-31 21:57:20 -08:00
Frédéric Guillot
ffabb009b8
Do not override existing entries when the crawler is enabled
2018-01-20 14:04:19 -08:00
Frédéric Guillot
713b38e34c
Handle more encoding edge cases
...
- Feeds with charset specified only in Content-Type header and not in XML document
- Feeds with charset specified in both places
- Feeds with charset specified only in XML document and not in HTTP header
2018-01-20 13:25:21 -08:00
Frédéric Guillot
3b62f904d6
Do not crawl existing entry URLs
2018-01-20 13:25:20 -08:00
Frédéric Guillot
9652dfa1fe
Add more comments (GoDoc)
2018-01-11 19:21:20 -08:00
Frédéric Guillot
1d7fe892e1
Add scraper rule for darkreading.com
2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef
Add more scraper rules
2018-01-04 19:32:24 -08:00
Frédéric Guillot
7d278d49f1
Add content length check when refreshing feeds
2018-01-04 18:41:23 -08:00
Frédéric Guillot
efac11e082
Handle more date formats
2018-01-03 18:59:29 -08:00
Frédéric Guillot
ec63cbe7bb
If the website URL is empty, assign the feed URL
2018-01-03 18:23:21 -08:00
Frédéric Guillot
c39f2e1a8d
Rename helper packages
2018-01-02 19:15:08 -08:00
Frédéric Guillot
3c3f397bf5
Make sure the scraper parse only HTML documents
2018-01-02 18:32:01 -08:00
Frédéric Guillot
c454f67037
Add scraper rules for version2.dk and ing.dk
2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597
Add more scraper rules
2017-12-27 13:36:07 -08:00
Frédéric Guillot
f6a5d7d6ed
Add support for data URL favicons
2017-12-22 19:01:39 -08:00
Frédéric Guillot
e7afec7eca
Handle more date formats
2017-12-22 17:59:28 -08:00
Frédéric Guillot
1d8193b892
Add logger
2017-12-15 18:55:57 -08:00
Frédéric Guillot
c6d9eb3614
Improve content scraper
2017-12-13 21:30:40 -08:00
Frédéric Guillot
827683ab59
Make sure that item URL are absolute
2017-12-13 20:16:15 -08:00
Frédéric Guillot
84d912c979
Rewrite imports
2017-12-12 21:48:13 -08:00
Frédéric Guillot
ef097f02fe
Add the possibility to enable crawler for feeds
2017-12-12 19:19:36 -08:00
Frédéric Guillot
33445e5b68
Add the possibility to define rewrite rules for each feed
2017-12-11 22:16:32 -08:00
Frédéric Guillot
87ccad5c7f
Add scraper rules
2017-12-10 20:51:04 -08:00
Frédéric Guillot
7a35c58f53
Add readability package to fetch original content
2017-12-10 19:01:38 -08:00
Frédéric Guillot
6f5350a497
Move packages http and url
2017-12-02 20:26:21 -08:00
Frédéric Guillot
2356ddad28
Add Pinboard integration
2017-12-02 19:32:14 -08:00
Frédéric Guillot
fb2a73c91e
Proxify image enclosures
2017-12-01 22:29:18 -08:00
Frédéric Guillot
bb8e61c7c5
Make sure golint pass on the code base
2017-11-27 21:40:05 -08:00