Commit graph

37 commits

Author SHA1 Message Date
Davide Masserut
690d66ce0b Update scraping rules for ilpost.it 2022-12-27 13:33:41 -08:00
Romain de Laage
eb86773039 Recalbox rewrite rule 2022-10-19 20:13:44 -07:00
jgbresson
aa47789f55
Add add_dynamic_image rewrite rule for theverge.com 2022-10-16 11:57:01 -07:00
Frédéric Guillot
cecab91298 Fix some linter issues 2022-08-08 22:06:38 -07:00
Gabe Cook
405d4febd9 Parse markdown by default for blog.laravel.com 2022-07-30 20:19:09 -07:00
Gabe Cook
36df7b36ec Add parse_markdown rewrite function 2022-07-30 20:19:09 -07:00
nemunaire
5a07fd8932
Add new rewrite rule to decode base64 content 2022-05-25 20:44:04 -07:00
Romain de Laage
808635e314 Add a rewrite rule for castopod episodes 2022-01-30 16:33:17 -08:00
Romain de Laage
8329e9b46c
Make Invidious instance configurable 2022-01-05 20:43:03 -08:00
Jouni K. Seppänen
bb0d2bf675 Add Youtube videos in Quanta articles
Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like

    <script type="text/javascript" data-reactid="6">
      window.__APOLLO_STATE__ = {
        ...
          youtube_id: "9uASADiYe_8",

We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen
dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Thiago Perrotta
28d036434f Add rewrite rule: monkeyuser.com
Comics site, uses alt image text similarly to xkcd.com.
2021-12-16 11:50:26 -08:00
Thiago Perrotta
4b12043cea Sort rewrite rules 2021-12-16 11:50:26 -08:00
Artémis
b585dab6b4
Add data-srcset support to "add_dynamic_image rewrite" rewrite rule 2021-10-22 18:12:23 -07:00
Lukas Dietrich
93596c1218 Add rewrite rule to remove dom elements 2021-09-06 09:47:05 -07:00
Ilya Mateyko
4464802947 Reformat some Go files
When working on #994 I noticed that some Go files are not formatted with
`gofmt`.

This PR fixes this.
2021-01-27 18:13:58 -08:00
Pacman99
b8b6c74d86 Add rewrite rule replace for custom search and replace 2020-11-29 10:32:26 -08:00
Frédéric Guillot
b50778d3eb Add rewrite rule to use noscript content for images rendered with Javascript 2020-10-19 21:31:10 -07:00
Frédéric Guillot
31435ef83e Add rewrite rule to fix Medium.com images 2020-09-29 22:27:32 -07:00
alex
0f258fd55b
Make add_invidious_video rule applicable for different invidious instances 2020-09-06 13:41:42 -07:00
Frédéric Guillot
592151bdb6 Add support for Invidious
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Frédéric Guillot
120d6ec7d8 Do no rewrite Youtube description twice in "add_youtube_video" rule
This is already done before in <media:description>.
2019-11-30 22:56:06 -08:00
Frédéric Guillot
69aa650203 Add the possibility to add rules during feed creation 2019-11-29 11:27:58 -08:00
Frédéric Guillot
c43c9458a9 Add rewrite functions: convert_text_link and nl2br 2019-11-28 21:33:12 -08:00
Peter De Wachter
b6f3160dbc add_mailto_subject: New rewrite function
Dinosaur Comics (qwantz.com) likes to hide jokes in mailto: links, but
miniflux's sanitizer strips those out.
2019-08-19 19:42:47 -07:00
Peter De Wachter
ea2b6e3608 addImageTitle: Fix HTML injection
This rewrite rule would change this:

    <img title="<foo>">

to this:

    <figure><img><figcaption><foo></figcaption></figure>

The image title needs to be properly escaped.
2019-08-15 21:39:41 -07:00
Frédéric Guillot
311a133ab8 Refactor manual entry scraper 2018-12-02 20:51:06 -08:00
Frédéric Guillot
9606126196 Convert text links and line feeds to HTML in YouTube channels 2018-10-08 20:47:10 -07:00
Frédéric Guillot
9dc38a0803 Add missing package descriptions for GoDoc 2018-10-08 17:32:17 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
dzaikos
6d25e02cb5 New add_dynamic_image rewriter for JavaScript-loaded images.
Searches tags for various `data-*` attributes and sets `img` tag `src` attribute appropriately. Falls back to searching `noscript` for `img` tags.

Includes unit tests.
2018-07-09 01:22:48 -04:00
dzaikos
45d7105ed1 Refactor AddImageTitle rewriter.
* Only processes images with `src` **and** `title` attributes (others are ignored).
* Processes **all** images in the document (not just the first one).
* Wraps the image and its title attribute in a `figure` tag with the title attribute's contents in a `figcaption` tag.

Updated xkcd rewriter unit test.

Added another xkcd rewriter unit test to check rendering of images without title tags.
2018-06-26 17:50:18 -04:00
Frédéric Guillot
c6d9eb3614 Improve content scraper 2017-12-13 21:30:40 -08:00
Frédéric Guillot
84d912c979 Rewrite imports 2017-12-12 21:48:13 -08:00
Frédéric Guillot
33445e5b68 Add the possibility to define rewrite rules for each feed 2017-12-11 22:16:32 -08:00
Frédéric Guillot
bb8e61c7c5 Make sure golint pass on the code base 2017-11-27 21:40:05 -08:00
Frédéric Guillot
8ffb773f43 First commit 2017-11-19 22:01:46 -08:00