miniflux

Author	SHA1	Message	Date
Frédéric Guillot	93715b542c	Revert "scraper follow the only link" This reverts commit `10207967c4`.	2022-11-14 17:45:40 -08:00
Frédéric Guillot	de1a06e3e8	Add missing check in followTheOnlyLink() that leads to a panic Bug introduced in PR #1290. Fixes #1631.	2022-11-14 16:44:02 -08:00
jebbs	10207967c4	scraper follow the only link * in some cases, what the scraper got is only a landing page, user can use scraper rules to extract the link of the landing page and follow it * it also fix the wrong scrape rule apply when the server redirects it to another host	2022-10-31 19:49:34 -07:00
jgbresson	7f6ce16d85	Add scraping rules for theverge.com	2022-10-16 11:58:35 -07:00
Adam B	4d847c6a74	Add scraping rule for royalroad.com This is what I use for several stories I follow, and I thought it might be useful to other miniflux users.	2022-08-17 19:25:39 -07:00
Owen Valentine	f404ddde91	Add swordscomic.com	2022-08-17 19:23:29 -07:00
Owen Valentine	c8a3d953cf	Add smbc-comics.com	2022-08-17 19:23:29 -07:00
Owen Valentine	f851ecac78	Sort alphabetically	2022-08-17 19:23:29 -07:00
Frédéric Guillot	cecab91298	Fix some linter issues	2022-08-08 22:06:38 -07:00
Gabe Cook	bd1dc3149e	Add explosm.net scraper rule	2022-07-30 20:10:52 -07:00
Jouni K. Seppänen	bb0d2bf675	Add Youtube videos in Quanta articles Some articles (especially the recent year-in-review ones) include a Youtube video. The server-side rendered articles do not include the Youtube iframe, but they do have a script that looks like <script type="text/javascript" data-reactid="6"> window.__APOLLO_STATE__ = { ... youtube_id: "9uASADiYe_8", We add a reformatting function that tries to detect obvious JavaScript code that has a field or variable called youtube_id that has an 11-character double-quoted value, and adds the referenced Youtube videos in the beginning of the article. This is slightly more general than needed for Quanta, in the hope that it could be useful for similar sites.	2022-01-03 10:10:13 -08:00
Jouni K. Seppänen	dcf87bd642	Add scrape and rewrite rules for quantamagazine This is a somewhat complex React site so the rules could be a little fragile. Text content seems to be always inside .outer--content, and most h6 elements are fluff like "read later" or pointers to other articles. However, h6.byline and h6.post__title__kicker are relevant to the current article. Figure captions are sometimes inside both figure and div.outer--content elements, sometimes only inside figure, so take both and remove the intersection. The figure elements sometimes contain multiple copies of images or videos, and we just take them all. Math articles seem to use Mathjax, which we don't add.	2022-01-03 10:10:13 -08:00
Jouni K. Seppänen	2fedd8f234	Add scraper rule for ikiwiki.iki.fi Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5 Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita (To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)	2021-12-27 20:51:37 -08:00
hulb	01f678c3b1	add proxy arg in scraper.Fetch	2021-08-28 21:57:11 -07:00
Frédéric Guillot	b7c229f30f	Update scraper rule for theregister.com	2021-08-16 20:04:02 -07:00
Darius	9242350f0e	Add per feed cookies option	2021-03-22 20:27:58 -07:00
Frédéric Guillot	ec3c604a83	Add option to allow self-signed or invalid certificates	2021-02-21 13:58:52 -08:00
Frédéric Guillot	a352aff93b	Remove deprecated io/ioutil package Miniflux now requires at least Go 1.16 and io/util is deprecated. https://golang.org/doc/go1.16#ioutil	2021-02-16 21:25:21 -08:00
Frédéric Guillot	31435ef83e	Add rewrite rule to fix Medium.com images	2020-09-29 22:27:32 -07:00
Frédéric Guillot	c394a61a4e	Add Prometheus exporter	2020-09-27 20:04:48 -07:00
Frédéric Guillot	16b7b3bc3e	http client: remove dependency on global config options	2020-09-27 14:37:46 -07:00
Manuel Müller	ca918bc7e3	Added scraper rule for dilbert.com and turnoff.us	2020-06-10 20:15:46 -07:00
Corey McCaffrey	25d4b9fc0c	Added scraper rule for financialsamurai.com The default rule results in blank content.	2020-05-24 13:29:28 -07:00
Corey McCaffrey	0683074b8b	Added scraper rule for TheOatmeal.com The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".	2020-05-13 21:28:00 -07:00
Corey McCaffrey	8f6c07afd6	Added scraper rule for RayWenderlich.com RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.	2020-05-13 21:28:00 -07:00
Andrew Williams	9974e0f458	Addition of scraper rule for wdwnt.com By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.	2020-02-28 20:24:58 -08:00
cinput	8e1ed8bef3	Return outer HTML when scraping elements	2019-12-21 21:18:31 -08:00
somini	30f22fbd78	Update scraper rule for "Le Monde"	2019-12-19 18:35:29 -08:00
Neo Ng	90064a8cf0	Update scraper rule for openingsource.org	2019-11-28 19:40:26 -08:00
Tom Matthews	8b40778ee1	Add BBC News scraping rule	2018-12-13 20:25:30 -08:00
Frédéric Guillot	6f5d93cbbe	Update scraper rule for lemonde.fr	2018-12-02 20:53:22 -08:00
Frédéric Guillot	311a133ab8	Refactor manual entry scraper	2018-12-02 20:51:06 -08:00
mapl	e47188eab2	Update scraper rule for heise.de	2018-12-01 11:49:30 -08:00
Frédéric Guillot	3b6e44c331	Allow the scraper to parse XHTML documents Only "text/html" was authorized before.	2018-11-03 13:44:13 -07:00
Frédéric Guillot	5870f04260	Simplify feed parser and format detection - Avoid doing multiple buffer copies - Move parser and format detection logic to its own package	2018-10-14 11:46:41 -07:00
Frédéric Guillot	9dc38a0803	Add missing package descriptions for GoDoc	2018-10-08 17:32:17 -07:00
Patrick	2538eea177	Add the possibility to override default user agent for each feed	2018-09-19 18:19:24 -07:00
Frédéric Guillot	df2bebaf3d	Update scraper rule for heise.de	2018-08-25 10:33:18 -07:00
Frédéric Guillot	dbcc5d8a97	Use canonical imports	2018-08-24 21:56:39 -07:00
Frédéric Guillot	1eba1730d1	Move HTTP client to its own package	2018-04-28 10:51:07 -07:00
aniran	322b265d7a	Scrape parent element for iframe Current behavior: if you have an `iframe` scraper rule, `scrapContent` tries to return the inner HTML of the `iframe`, which turns up blank. New behavior: like `img` elements, if an `iframe` is matched by a scraper rule, the parent element's inner HTML (i.e. the `iframe` is returned).	2018-04-27 17:57:22 -07:00
Frédéric Guillot	1d7fe892e1	Add scraper rule for darkreading.com	2018-01-06 13:25:12 -08:00
Frédéric Guillot	48aa0d07ef	Add more scraper rules	2018-01-04 19:32:24 -08:00
Frédéric Guillot	3c3f397bf5	Make sure the scraper parse only HTML documents	2018-01-02 18:32:01 -08:00
Frédéric Guillot	c454f67037	Add scraper rules for version2.dk and ing.dk	2017-12-27 19:44:23 -08:00
Frédéric Guillot	d4839b5597	Add more scraper rules	2017-12-27 13:36:07 -08:00
Frédéric Guillot	1d8193b892	Add logger	2017-12-15 18:55:57 -08:00
Frédéric Guillot	c6d9eb3614	Improve content scraper	2017-12-13 21:30:40 -08:00
Frédéric Guillot	84d912c979	Rewrite imports	2017-12-12 21:48:13 -08:00
Frédéric Guillot	ef097f02fe	Add the possibility to enable crawler for feeds	2017-12-12 19:19:36 -08:00

1 2

52 commits