Peter De Wachter
3a39d110f0
Accept HTML entities when parsing XML
...
Every once in a while, one of my feeds would throw an XML parse error
because it used ` ` or some other HTML entity. I feel Miniflux
should be lenient here, and Go already has a handy hook to make this
work.
2019-08-15 21:26:07 -07:00
Ilya Glotov
c840268678
Sort feed categories before serialization
...
A function is added for feeds and its categories normalization.
The test will ensure that the order is right.
2019-07-05 20:34:49 +03:00
Frédéric Guillot
129f1bf3da
Add support for OPML v1 import
2019-03-26 20:09:31 -07:00
Jeremy Apthorp
304b43cb30
Add 'allow-popups' to iframe sandbox permissions
2019-03-26 18:26:56 -07:00
Frédéric Guillot
6764a420b0
Make parser compatible with Go 1.12
...
See changes in strings.Map(): https://golang.org/doc/go1.12#strings
2019-02-28 21:23:33 -08:00
Frédéric Guillot
f3fc8b7072
Use feed ID instead of user ID to check entry URLs presence
2019-02-28 20:43:33 -08:00
Frédéric Guillot
ed6ae7e0d2
Use preferably the published date for Atom feeds
...
YouTube feeds use the published date for the original creation date.
2019-01-29 20:01:36 -08:00
Peter De Wachter
0cdcec10ca
More robust Atom text handling
...
Miniflux couldn't deal with XHTML Summary elements.
- Make Summary an 'atomContent' field
- Define an atomContentToString function rather than inling it three times
- Also properly escape special characters in plain text fields.
2019-01-07 17:55:02 -08:00
Frédéric Guillot
56efd2eb3f
Add workaround for non GMT dates (RFC822, RFC850, and RFC1123)
...
RFC822, RFC850, and RFC1123 are supposed to be always in GMT.
This is a workaround for the one defined in PST timezone.
2018-12-26 20:24:38 -08:00
Frédéric Guillot
012138179c
Add function storage.UpdateFeedError()
2018-12-15 13:04:38 -08:00
Tom Matthews
8b40778ee1
Add BBC News scraping rule
2018-12-13 20:25:30 -08:00
Frederic Guillot
61bfb3cfa8
Make password prompt compatible with Windows
2018-12-09 17:44:33 -08:00
Frédéric Guillot
1bc8535dbb
Move image proxy filter to template functions
2018-12-02 21:09:53 -08:00
Frédéric Guillot
6f5d93cbbe
Update scraper rule for lemonde.fr
2018-12-02 20:53:22 -08:00
Frédéric Guillot
311a133ab8
Refactor manual entry scraper
2018-12-02 20:51:06 -08:00
mapl
e47188eab2
Update scraper rule for heise.de
2018-12-01 11:49:30 -08:00
Frédéric Guillot
487852f07e
Replace daemon and scheduler package with service package
2018-11-11 15:32:48 -08:00
Frédéric Guillot
3b6e44c331
Allow the scraper to parse XHTML documents
...
Only "text/html" was authorized before.
2018-11-03 13:44:13 -07:00
Frédéric Guillot
ae1dc1a91e
Handle more encoding conversion edge cases
2018-10-29 23:00:03 -07:00
Frédéric Guillot
7d1b471d88
Add test case to check different feed encoding and HTTP headers
2018-10-29 19:04:36 -07:00
Frédéric Guillot
85d48c8a71
Add entries storage error to feed errors count
2018-10-21 11:44:29 -07:00
Frédéric Guillot
b8f874a37d
Simplify feed entries filtering
...
- Rename processor package to filter
- Remove boilerplate code
2018-10-14 22:33:19 -07:00
Frédéric Guillot
778346b0b0
Simplify feed fetcher
...
- Add browser package to handle HTTP errors
- Reduce code duplication
2018-10-14 21:43:48 -07:00
Frédéric Guillot
5870f04260
Simplify feed parser and format detection
...
- Avoid doing multiple buffer copies
- Move parser and format detection logic to its own package
2018-10-14 11:46:41 -07:00
Frédéric Guillot
9606126196
Convert text links and line feeds to HTML in YouTube channels
2018-10-08 20:47:10 -07:00
Frédéric Guillot
9dc38a0803
Add missing package descriptions for GoDoc
2018-10-08 17:32:17 -07:00
Frédéric Guillot
11dfcdd3d6
Fix typo in license header
2018-10-08 15:50:15 -07:00
Frédéric Guillot
b1e8f534ef
Simplify locale package usage (refactoring)
2018-09-22 15:04:55 -07:00
Frédéric Guillot
beb7a0cfcb
Use unique translation IDs instead of English text as key
2018-09-21 22:23:23 -07:00
Patrick
2538eea177
Add the possibility to override default user agent for each feed
2018-09-19 18:19:24 -07:00
Frédéric Guillot
df2bebaf3d
Update scraper rule for heise.de
2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97
Use canonical imports
2018-08-24 21:56:39 -07:00
neepl
5365f31e90
Add support for published tag in Atom feeds
2018-07-17 21:52:05 -07:00
Frédéric Guillot
a786e78aca
Add embedly.com to iframe whitelist
2018-07-10 20:56:54 -07:00
dzaikos
6d25e02cb5
New add_dynamic_image
rewriter for JavaScript-loaded images.
...
Searches tags for various `data-*` attributes and sets `img` tag `src` attribute appropriately. Falls back to searching `noscript` for `img` tags.
Includes unit tests.
2018-07-09 01:22:48 -04:00
dzaikos
e1c56b2e53
Processor: Do rewriter before sanitizer for entry.Content
.
...
Addresses #163 .
2018-07-06 00:17:07 -04:00
Frédéric Guillot
de1a4aad30
Add support for protocol relative YouTube URLs
2018-07-04 22:45:44 -07:00
dzaikos
7d4a195519
Sandbox iframes when sanitizing.
...
Updated iframe unit tests.
Refactored sanitizer.getExtraAttributes() to use `switch` instead of multiple `if` statements.
2018-07-03 12:55:18 -07:00
Frédéric Guillot
9c0f882ba0
Add specific 404 and 401 error messages
2018-06-30 12:42:12 -07:00
dzaikos
45d7105ed1
Refactor AddImageTitle rewriter.
...
* Only processes images with `src` **and** `title` attributes (others are ignored).
* Processes **all** images in the document (not just the first one).
* Wraps the image and its title attribute in a `figure` tag with the title attribute's contents in a `figcaption` tag.
Updated xkcd rewriter unit test.
Added another xkcd rewriter unit test to check rendering of images without title tags.
2018-06-26 17:50:18 -04:00
dzaikos
c9131b0e89
Improve sanitizer to remove style tag contents.
...
See #157 .
Refactored how blacklisted tags are handled so they're easier manage in the future.
2018-06-24 19:53:23 -07:00
Dave Z
d847b10e32
Improve sanitizer to remove script and noscript contents
...
These tags where removed but the content was rendered as escaped HTML.
See #157
2018-06-23 17:50:43 -07:00
Frédéric Guillot
bddca15b69
Add new fields for feed username/password
2018-06-19 22:58:29 -07:00
Frédéric Guillot
c719cf7df0
Rewrite iframe Youtube URLs to https://www.youtube-nocookie.com
2018-06-12 18:45:09 -07:00
Frédéric Guillot
0c2e5ff0dc
Handle feeds with dates formatted as Unix timestamp
2018-05-08 20:41:24 -07:00
Frédéric Guillot
5cacae6cf2
Add API endpoint to import OPML file
2018-04-29 18:56:40 -07:00
Frédéric Guillot
1eba1730d1
Move HTTP client to its own package
2018-04-28 10:51:07 -07:00
aniran
322b265d7a
Scrape parent element for iframe
...
Current behavior: if you have an `iframe` scraper rule, `scrapContent`
tries to return the inner HTML of the `iframe`, which turns up blank.
New behavior: like `img` elements, if an `iframe` is matched by a scraper rule,
the parent element's inner HTML (i.e. the `iframe` is returned).
2018-04-27 17:57:22 -07:00
aniran
920dda79b7
Add soundcloud and bandcamp iframe sources
2018-04-27 17:55:58 -07:00
Frédéric Guillot
dcbb5047b1
Add support for Dublin Core date in RDF feeds
2018-04-10 18:13:05 -07:00