miniflux

Author	SHA1	Message	Date
Gabe Cook	bd1dc3149e	Add explosm.net scraper rule	2022-07-30 20:10:52 -07:00
Jouni K. Seppänen	bb0d2bf675	Add Youtube videos in Quanta articles Some articles (especially the recent year-in-review ones) include a Youtube video. The server-side rendered articles do not include the Youtube iframe, but they do have a script that looks like <script type="text/javascript" data-reactid="6"> window.__APOLLO_STATE__ = { ... youtube_id: "9uASADiYe_8", We add a reformatting function that tries to detect obvious JavaScript code that has a field or variable called youtube_id that has an 11-character double-quoted value, and adds the referenced Youtube videos in the beginning of the article. This is slightly more general than needed for Quanta, in the hope that it could be useful for similar sites.	2022-01-03 10:10:13 -08:00
Jouni K. Seppänen	dcf87bd642	Add scrape and rewrite rules for quantamagazine This is a somewhat complex React site so the rules could be a little fragile. Text content seems to be always inside .outer--content, and most h6 elements are fluff like "read later" or pointers to other articles. However, h6.byline and h6.post__title__kicker are relevant to the current article. Figure captions are sometimes inside both figure and div.outer--content elements, sometimes only inside figure, so take both and remove the intersection. The figure elements sometimes contain multiple copies of images or videos, and we just take them all. Math articles seem to use Mathjax, which we don't add.	2022-01-03 10:10:13 -08:00
Jouni K. Seppänen	2fedd8f234	Add scraper rule for ikiwiki.iki.fi Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5 Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita (To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)	2021-12-27 20:51:37 -08:00
Frédéric Guillot	b7c229f30f	Update scraper rule for theregister.com	2021-08-16 20:04:02 -07:00
Frédéric Guillot	31435ef83e	Add rewrite rule to fix Medium.com images	2020-09-29 22:27:32 -07:00
Manuel Müller	ca918bc7e3	Added scraper rule for dilbert.com and turnoff.us	2020-06-10 20:15:46 -07:00
Corey McCaffrey	25d4b9fc0c	Added scraper rule for financialsamurai.com The default rule results in blank content.	2020-05-24 13:29:28 -07:00
Corey McCaffrey	0683074b8b	Added scraper rule for TheOatmeal.com The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".	2020-05-13 21:28:00 -07:00
Corey McCaffrey	8f6c07afd6	Added scraper rule for RayWenderlich.com RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.	2020-05-13 21:28:00 -07:00
Andrew Williams	9974e0f458	Addition of scraper rule for wdwnt.com By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.	2020-02-28 20:24:58 -08:00
somini	30f22fbd78	Update scraper rule for "Le Monde"	2019-12-19 18:35:29 -08:00
Neo Ng	90064a8cf0	Update scraper rule for openingsource.org	2019-11-28 19:40:26 -08:00
Tom Matthews	8b40778ee1	Add BBC News scraping rule	2018-12-13 20:25:30 -08:00
Frédéric Guillot	6f5d93cbbe	Update scraper rule for lemonde.fr	2018-12-02 20:53:22 -08:00
mapl	e47188eab2	Update scraper rule for heise.de	2018-12-01 11:49:30 -08:00
Frédéric Guillot	df2bebaf3d	Update scraper rule for heise.de	2018-08-25 10:33:18 -07:00
Frédéric Guillot	dbcc5d8a97	Use canonical imports	2018-08-24 21:56:39 -07:00
Frédéric Guillot	1d7fe892e1	Add scraper rule for darkreading.com	2018-01-06 13:25:12 -08:00
Frédéric Guillot	48aa0d07ef	Add more scraper rules	2018-01-04 19:32:24 -08:00
Frédéric Guillot	c454f67037	Add scraper rules for version2.dk and ing.dk	2017-12-27 19:44:23 -08:00
Frédéric Guillot	d4839b5597	Add more scraper rules	2017-12-27 13:36:07 -08:00
Frédéric Guillot	c6d9eb3614	Improve content scraper	2017-12-13 21:30:40 -08:00
Frédéric Guillot	87ccad5c7f	Add scraper rules	2017-12-10 20:51:04 -08:00

24 commits