Frédéric Guillot
31435ef83e
Add rewrite rule to fix Medium.com images
2020-09-29 22:27:32 -07:00
Frédéric Guillot
c394a61a4e
Add Prometheus exporter
2020-09-27 20:04:48 -07:00
Frédéric Guillot
16b7b3bc3e
http client: remove dependency on global config options
2020-09-27 14:37:46 -07:00
Manuel Müller
ca918bc7e3
Added scraper rule for dilbert.com and turnoff.us
2020-06-10 20:15:46 -07:00
Corey McCaffrey
25d4b9fc0c
Added scraper rule for financialsamurai.com
...
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey
0683074b8b
Added scraper rule for TheOatmeal.com
...
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey
8f6c07afd6
Added scraper rule for RayWenderlich.com
...
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Andrew Williams
9974e0f458
Addition of scraper rule for wdwnt.com
...
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
cinput
8e1ed8bef3
Return outer HTML when scraping elements
2019-12-21 21:18:31 -08:00
somini
30f22fbd78
Update scraper rule for "Le Monde"
2019-12-19 18:35:29 -08:00
Neo Ng
90064a8cf0
Update scraper rule for openingsource.org
2019-11-28 19:40:26 -08:00
Tom Matthews
8b40778ee1
Add BBC News scraping rule
2018-12-13 20:25:30 -08:00
Frédéric Guillot
6f5d93cbbe
Update scraper rule for lemonde.fr
2018-12-02 20:53:22 -08:00
Frédéric Guillot
311a133ab8
Refactor manual entry scraper
2018-12-02 20:51:06 -08:00
mapl
e47188eab2
Update scraper rule for heise.de
2018-12-01 11:49:30 -08:00
Frédéric Guillot
3b6e44c331
Allow the scraper to parse XHTML documents
...
Only "text/html" was authorized before.
2018-11-03 13:44:13 -07:00
Frédéric Guillot
5870f04260
Simplify feed parser and format detection
...
- Avoid doing multiple buffer copies
- Move parser and format detection logic to its own package
2018-10-14 11:46:41 -07:00
Frédéric Guillot
9dc38a0803
Add missing package descriptions for GoDoc
2018-10-08 17:32:17 -07:00
Patrick
2538eea177
Add the possibility to override default user agent for each feed
2018-09-19 18:19:24 -07:00
Frédéric Guillot
df2bebaf3d
Update scraper rule for heise.de
2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97
Use canonical imports
2018-08-24 21:56:39 -07:00
Frédéric Guillot
1eba1730d1
Move HTTP client to its own package
2018-04-28 10:51:07 -07:00
aniran
322b265d7a
Scrape parent element for iframe
...
Current behavior: if you have an `iframe` scraper rule, `scrapContent`
tries to return the inner HTML of the `iframe`, which turns up blank.
New behavior: like `img` elements, if an `iframe` is matched by a scraper rule,
the parent element's inner HTML (i.e. the `iframe` is returned).
2018-04-27 17:57:22 -07:00
Frédéric Guillot
1d7fe892e1
Add scraper rule for darkreading.com
2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef
Add more scraper rules
2018-01-04 19:32:24 -08:00
Frédéric Guillot
3c3f397bf5
Make sure the scraper parse only HTML documents
2018-01-02 18:32:01 -08:00
Frédéric Guillot
c454f67037
Add scraper rules for version2.dk and ing.dk
2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597
Add more scraper rules
2017-12-27 13:36:07 -08:00
Frédéric Guillot
1d8193b892
Add logger
2017-12-15 18:55:57 -08:00
Frédéric Guillot
c6d9eb3614
Improve content scraper
2017-12-13 21:30:40 -08:00
Frédéric Guillot
84d912c979
Rewrite imports
2017-12-12 21:48:13 -08:00
Frédéric Guillot
ef097f02fe
Add the possibility to enable crawler for feeds
2017-12-12 19:19:36 -08:00
Frédéric Guillot
87ccad5c7f
Add scraper rules
2017-12-10 20:51:04 -08:00
Frédéric Guillot
7a35c58f53
Add readability package to fetch original content
2017-12-10 19:01:38 -08:00