Frédéric Guillot
713b38e34c
Handle more encoding edge cases
...
- Feeds with charset specified only in Content-Type header and not in XML document
- Feeds with charset specified in both places
- Feeds with charset specified only in XML document and not in HTTP header
2018-01-20 13:25:21 -08:00
Frédéric Guillot
3b62f904d6
Do not crawl existing entry URLs
2018-01-20 13:25:20 -08:00
Frédéric Guillot
9652dfa1fe
Add more comments (GoDoc)
2018-01-11 19:21:20 -08:00
Frédéric Guillot
1d7fe892e1
Add scraper rule for darkreading.com
2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef
Add more scraper rules
2018-01-04 19:32:24 -08:00
Frédéric Guillot
7d278d49f1
Add content length check when refreshing feeds
2018-01-04 18:41:23 -08:00
Frédéric Guillot
efac11e082
Handle more date formats
2018-01-03 18:59:29 -08:00
Frédéric Guillot
ec63cbe7bb
If the website URL is empty, assign the feed URL
2018-01-03 18:23:21 -08:00
Frédéric Guillot
c39f2e1a8d
Rename helper packages
2018-01-02 19:15:08 -08:00
Frédéric Guillot
3c3f397bf5
Make sure the scraper parse only HTML documents
2018-01-02 18:32:01 -08:00
Frédéric Guillot
c454f67037
Add scraper rules for version2.dk and ing.dk
2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597
Add more scraper rules
2017-12-27 13:36:07 -08:00
Frédéric Guillot
f6a5d7d6ed
Add support for data URL favicons
2017-12-22 19:01:39 -08:00
Frédéric Guillot
e7afec7eca
Handle more date formats
2017-12-22 17:59:28 -08:00
Frédéric Guillot
1d8193b892
Add logger
2017-12-15 18:55:57 -08:00
Frédéric Guillot
c6d9eb3614
Improve content scraper
2017-12-13 21:30:40 -08:00
Frédéric Guillot
827683ab59
Make sure that item URL are absolute
2017-12-13 20:16:15 -08:00
Frédéric Guillot
84d912c979
Rewrite imports
2017-12-12 21:48:13 -08:00
Frédéric Guillot
ef097f02fe
Add the possibility to enable crawler for feeds
2017-12-12 19:19:36 -08:00
Frédéric Guillot
33445e5b68
Add the possibility to define rewrite rules for each feed
2017-12-11 22:16:32 -08:00
Frédéric Guillot
87ccad5c7f
Add scraper rules
2017-12-10 20:51:04 -08:00
Frédéric Guillot
7a35c58f53
Add readability package to fetch original content
2017-12-10 19:01:38 -08:00
Frédéric Guillot
6f5350a497
Move packages http and url
2017-12-02 20:26:21 -08:00
Frédéric Guillot
2356ddad28
Add Pinboard integration
2017-12-02 19:32:14 -08:00
Frédéric Guillot
fb2a73c91e
Proxify image enclosures
2017-12-01 22:29:18 -08:00
Frédéric Guillot
bb8e61c7c5
Make sure golint pass on the code base
2017-11-27 21:40:05 -08:00
Frédéric Guillot
bd663b43a0
Improve HTML sanitizer
2017-11-25 18:08:59 -08:00
Frédéric Guillot
71bf7e4358
Improve API
2017-11-24 22:29:20 -08:00
Frédéric Guillot
2b641cc224
Improve feed parsers
2017-11-22 14:52:31 -08:00
Frédéric Guillot
99dfbdbb47
Convert feed encoding only if the charset is specified
2017-11-21 22:55:19 -08:00
Frédéric Guillot
5f0ae8196c
Add timeout for HTTP client
2017-11-20 19:44:28 -08:00
Frédéric Guillot
eb9f588216
Make sure RDF entries have a date
2017-11-20 19:25:30 -08:00
Frédéric Guillot
d5838b6734
Move feed parsers packages in reader package
2017-11-20 19:17:04 -08:00
Frédéric Guillot
c26787f476
Improve OPML package to be more idiomatic
2017-11-20 19:11:06 -08:00
Frédéric Guillot
e91a9b4f13
Export only necessary structs in JsonFeed package
2017-11-20 18:57:54 -08:00
Frédéric Guillot
6618caca81
Use more idiomatic code for Atom parser
2017-11-20 18:50:16 -08:00
Frédéric Guillot
89307010ad
Add parser for RDF feeds
2017-11-20 18:34:11 -08:00
Frédéric Guillot
c5cd38de83
Add unit test for HTTP client response functions
2017-11-20 17:25:45 -08:00
Frédéric Guillot
aecda64030
Make sure XML feeds are always encoded in UTF-8
2017-11-20 17:12:37 -08:00
Frédéric Guillot
0e6717b7c8
Ensure that LocalizedError are returned by parsers
2017-11-20 16:11:55 -08:00
Frédéric Guillot
557cf9c21d
Handle RSS entries with Atom links
2017-11-20 15:48:26 -08:00
Frédéric Guillot
cf8af56a99
Handle RSS feeds without entry links
2017-11-20 15:15:10 -08:00
Frédéric Guillot
a76c2a8c22
Improve OPML import/export
2017-11-20 14:35:11 -08:00
Frédéric Guillot
8ffb773f43
First commit
2017-11-19 22:01:46 -08:00