Commit graph

10 commits

Author SHA1 Message Date
Frédéric Guillot
ca48f7612a Ignore invalid content type 2019-09-18 22:32:29 -07:00
Peter De Wachter
15505ee4a2 Make UTF-8 the default encoding for XML feeds
Consider the feed http://planet.haskell.org/atom.xml
- This is a UTF-8 encoded XML file
- No encoding declaration in the XML header
- No Unicode byte order mark
- Served with HTTP Content-Type "text/xml" (no charset parameter)

Miniflux lets charset.NewReader handle this. The charset package
implements the HTML5 character encoding algorithm, which, in this
situation, defaults to windows-1252 encoding if there are no UTF-8
characters in the first 1000 bytes. So for this feed, we get the wrong
encoding.

I inserted an explicit "utf8.Valid()" check, which fixes this problem.
2019-01-02 21:05:05 -08:00
Frédéric Guillot
6ae935309a Ignore JSON feeds from EnsureUnicode() 2018-12-12 21:37:39 -08:00
Frédéric Guillot
82e08d0f69 Update XML encoding regex to take single quotes into consideration 2018-12-12 21:13:06 -08:00
Frédéric Guillot
f3bff76aa1 Make sure slice is not out of range when reading XML prolog 2018-11-24 12:17:00 -08:00
Frédéric Guillot
ae1dc1a91e Handle more encoding conversion edge cases 2018-10-29 23:00:03 -07:00
Frédéric Guillot
5870f04260 Simplify feed parser and format detection
- Avoid doing multiple buffer copies
- Move parser and format detection logic to its own package
2018-10-14 11:46:41 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
Frédéric Guillot
9c0f882ba0 Add specific 404 and 401 error messages 2018-06-30 12:42:12 -07:00
Frédéric Guillot
1eba1730d1 Move HTTP client to its own package 2018-04-28 10:51:07 -07:00
Renamed from http/response.go (Browse further)