Commit graph

33 commits

Author SHA1 Message Date
Romain de Laage
33c4b5188c Add a rewrite rule to remove clickbait titles 2023-04-15 18:25:43 -07:00
Davide Masserut
034e46700c Process older entries first
Feed entries are usually ordered from most to least recent.

Processing older entries first ensures that their creation timestamp
is lower than that of newer entries.

This is useful when we order by creation, because then we get a
consistent timeline.
2023-03-25 16:19:07 -07:00
Harry Cheng
d9777f1439 Skip integrations if there are no entries to push 2022-12-04 12:58:10 -08:00
Romain de Laage
550e7d0415 Add matrix bot support 2022-10-27 17:53:19 -07:00
Frédéric Guillot
cecab91298 Fix some linter issues 2022-08-08 22:06:38 -07:00
Gabriel Augendre
6e50ce3293 Make reading speed user-configurable 2022-07-17 19:35:24 -07:00
Carsten
2659883ce5
Add rewrite rules for article URL before fetching content 2022-07-11 21:12:26 -07:00
三三
34dd358eb0
Add Telegram integration 2021-09-07 20:04:22 -07:00
hulb
01f678c3b1 add proxy arg in scraper.Fetch 2021-08-28 21:57:11 -07:00
Darius
9242350f0e
Add per feed cookies option 2021-03-22 20:27:58 -07:00
Gabriel Augendre
1d80c12e18
Prevent Youtube scraping if entry already exists 2021-03-08 20:10:53 -08:00
Frédéric Guillot
ec3c604a83 Add option to allow self-signed or invalid certificates 2021-02-21 13:58:52 -08:00
Ilya Mateyko
c3f871b49b Use YouTube video duration as read time
This feature works by scraping YouTube website.

To enable it, set the FETCH_YOUTUBE_WATCH_TIME environment variable to
1.

Resolves #972.
2021-02-21 11:13:52 -08:00
Frédéric Guillot
de7a613098 Calculate reading time during feed processing
The goal is to speed up the user interface.

Detecting the language based on the content is pretty slow.
2020-11-18 17:43:24 -08:00
Frédéric Guillot
b30a045a4e Refactor entry filtering
Avoid looping multiple times across entries
2020-10-19 22:18:41 -07:00
Manuel Garrido
84b83fc3c8
Add feed filters (Keeplist and Blocklist) 2020-10-16 14:40:56 -07:00
Frédéric Guillot
c394a61a4e Add Prometheus exporter 2020-09-27 20:04:48 -07:00
Frédéric Guillot
6c6ca69141 Add feed option to ignore HTTP cache 2020-06-05 22:04:52 -07:00
Frédéric Guillot
f3fc8b7072 Use feed ID instead of user ID to check entry URLs presence 2019-02-28 20:43:33 -08:00
Frédéric Guillot
1bc8535dbb Move image proxy filter to template functions 2018-12-02 21:09:53 -08:00
Frédéric Guillot
311a133ab8 Refactor manual entry scraper 2018-12-02 20:51:06 -08:00
Frédéric Guillot
b8f874a37d Simplify feed entries filtering
- Rename processor package to filter
- Remove boilerplate code
2018-10-14 22:33:19 -07:00
Frédéric Guillot
9dc38a0803 Add missing package descriptions for GoDoc 2018-10-08 17:32:17 -07:00
Patrick
2538eea177 Add the possibility to override default user agent for each feed 2018-09-19 18:19:24 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
dzaikos
e1c56b2e53 Processor: Do rewriter before sanitizer for entry.Content.
Addresses #163.
2018-07-06 00:17:07 -04:00
Frédéric Guillot
3b62f904d6 Do not crawl existing entry URLs 2018-01-20 13:25:20 -08:00
Frédéric Guillot
1d8193b892 Add logger 2017-12-15 18:55:57 -08:00
Frédéric Guillot
84d912c979 Rewrite imports 2017-12-12 21:48:13 -08:00
Frédéric Guillot
ef097f02fe Add the possibility to enable crawler for feeds 2017-12-12 19:19:36 -08:00
Frédéric Guillot
33445e5b68 Add the possibility to define rewrite rules for each feed 2017-12-11 22:16:32 -08:00
Frédéric Guillot
bb8e61c7c5 Make sure golint pass on the code base 2017-11-27 21:40:05 -08:00
Frédéric Guillot
8ffb773f43 First commit 2017-11-19 22:01:46 -08:00