miniflux

Author	SHA1	Message	Date
Scott Leggett	bf1c851093	fetcher: use ETag as a stronger validator than Last-Modified As per the MDN article on HTTP caching: During cache revalidation, if both If-Modified-Since and If-None-Match are present, then If-None-Match takes precedence for the validator. https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching Previously Miniflux would consider a resource unmodified if the Last-Modified header had not changed, even if the ETag had changed. With this commit, Miniflux will consider a resource modified if the ETag header has changed, even if Last-Modified has not. This fixes Bug 1 in https://rachelbythebay.com/w/2024/06/11/fsr/	2024-07-02 22:05:49 -07:00
Scott Leggett	c787bb5b48	fetcher: add tests for IsModified behaviour In particular, add a failing test for the case where ETag changes but Last-Modified does not.	2024-07-02 22:05:49 -07:00
privatmamtora	1a81866bb9	Add global block and keep filters	2024-07-02 21:03:49 -07:00
JohnnyJayJay	ee5e18ea9f	sanitizer: add support for HTML `hidden` attribute This commit adjusts the `Sanitize` function to skip tags with the `hidden` attribute, similar to how it skips blocked tags and their contents.	2024-06-21 14:00:40 -07:00
Ztec	9f3a8e7f1b	Request builder: Allow the use of insecure TLS ciphers when `Allow self-signed or invalid certificates` is used Some server on the wild are badly configured. Either by mistake or lack of maintenance. Safe and unsafe Ciphers change overtime based on new discoveries. This proposition will include considered unsafe ciphers when `Allow self-signed or invalid certificates` is used. It could be put into a separate option but, I felt this could fit in. fix #2671	2024-06-13 20:23:37 -07:00
Ztec	e54825bf02	Improve YouTube page feed detection In order to be more resilient to YouTube URLs variation and to address this feature_request: https://github.com/miniflux/v2/issues/2628 I've reworked a bit the way the YouTube feed extraction is done. I've kept all the `FindSubscriptionsFromYouTube*` in order to keep all the existing unit tests as-is ensuring little to no regressions. By doing so, I had to call twice `youtubeURLIDExtractor`. Small performance penalty for peace of mind in my opinion. `youtubeURLIDExtractor` is made in a way only one kind of page can be detected at a time. This mean I can solve the "video in a playlist" feature_request by prioritizing the playlist ID over the Video ID Also, by using `url.Parse()` to get ids, it's safer to url mangle and variation. The most common variation being the `t=42` parameters that start the playback at a given position. Previously, this kind of url would not be detected as "YouTube URL". I deliberately ignored the url parsing error to keep previous behavior (skip the YouTube analysis and follow with the other analysis) I also tried to keep debug logs the same as before as much as I could. I manually tested all the YouTube cases (video,channel,playlist) and they all work as expected except for the video. But this one does not work either on main. The `meta` html tag that was searched for does not seem to exist anymore. fix: #2628	2024-06-13 20:18:47 -07:00
x	839fc3843a	Add pitchfork.com scraping rule	2024-06-10 21:08:59 -07:00
x	0bab8fac8e	Update theverge.com rewrite rule: fix duplicate image See: https://github.com/miniflux/v2/issues/1979	2024-06-10 21:08:59 -07:00
Ankit Pandey	b68b05c64c	reader/processor: error out for improper rewrite regexp It's possible to specify a rewrite regex that validates but doesn't compile such as: rewrite("(((unmatched-capture-group"\|"rewrite)))") In case we encounter one, exit early instead of letting the server panic.	2024-06-01 10:37:02 -07:00
Zhizhen He	ae432bc9c6	reader/readingtime: fix incorrect package name	2024-05-21 18:12:24 -07:00
Jan-Lukas Else	a33b1adf13	Add description field to feed settings This adds a new "description" field to the feed settings. This allows to save custom description regarding a feed. It is also exported and imported as "description" in OPML.	2024-05-06 15:40:36 -07:00
fin444	a631bd527d	options: add FETCH_NEBULA_WATCH_TIME	2024-05-02 16:30:01 -07:00
Frédéric Guillot	fb075b60b5	reader/processor: minifier is breaking HTML entry content	2024-04-23 20:31:52 -07:00
Frédéric Guillot	2c4c845cd2	http/response: add brotli compression support	2024-04-19 12:16:49 -07:00
Frédéric Guillot	771f9d2b5f	reader/fetcher: add brotli content encoding support	2024-04-19 10:50:46 -07:00
jvoisin	b205b5aad0	reader/processor: minimize the feed's entries html Compress the html of feed entries before storing it. This should reduce the size of the database a bit, but more importantly, reduce the amount of data sent to clients minify being [stupidly fast](https://github.com/tdewolff/minify/?tab=readme-ov-file#performance), the performance impact should be in the noise level.	2024-04-10 19:48:48 -07:00
Frédéric Guillot	38b80d96ea	storage: change GetReadTime() function to use entries_feed_id_hash_key index	2024-04-09 20:37:30 -07:00
Frédéric Guillot	fdd1b3f18e	database: entry URLs can exceeds btree index size limit	2024-04-04 20:22:23 -07:00
Evan Elias Young	1b8c45d162	finder: Find feed from YouTube playlist The feed from a YouTube playlist page is derived in practically the same way as a feed from a YouTube channel page.	2024-04-01 21:16:32 -07:00
jvoisin	19ce519836	reader/rewrite: add a rule for oglaf.com By default, Oglaf show some disclaimer/warning about its content, and this doesn't play well with rss readers, so let's rewrite it to show the actual comic instead of a placeholder.	2024-04-01 21:05:01 -07:00
jvoisin	f109e3207c	reader/rss: don't add empty tags to RSS items This commit adds a bunch of checks to prevent reader/rss from adding empty tags to rss items, as well as some minor refactors like nested conditions and loops unrolling.	2024-03-24 19:46:56 -07:00
Frédéric Guillot	ad1d349a0c	rss: use Channel tags only if there is no Item tags	2024-03-23 13:46:48 -07:00
jvoisin	fc4bdf3ab0	Inline a one-liner function No need to expose a symbol for this.	2024-03-20 17:21:30 -07:00
Frédéric Guillot	08640b27d5	Ensure enclosure URLs are always absolute	2024-03-19 21:57:46 -07:00
jvoisin	4be993e055	Minor refactoring of internal/reader/atom/atom_10_adapter.go - Move the population of the feed's entries into a new function, to make `BuildFeed` easier to understand/separate concerns/implementation details - Use `sort+compact` instead of `compact+sort` to remove duplicates - Change `if !a { a = } if !a {a = }` constructs into `if !a { a = ; if !a {a = }}`. This reduce the number of comparisons, but also improves a tad the control-flow readability.	2024-03-19 20:41:44 -07:00
Jean Khawand	a78d1c79da	Add `FILTER_ENTRY_MAX_AGE_DAYS` config option to limit fetching all feed items	2024-03-20 02:58:53 +00:00
Frédéric Guillot	fa9697b972	Remove trailing space in SiteURL and FeedURL	2024-03-18 17:51:06 -07:00
jvoisin	91f5522ce0	Minor simplification of internal/reader/media/media.go - Simplify a switch-case by moving a common condition above it. - Remove a superfluous error-check: `strconv.ParseInt` returns `0` when passed an empty string.	2024-03-18 16:09:32 -07:00
Frédéric Guillot	8212f16aa2	atom: avoid debug message when the date is empty	2024-03-17 15:29:50 -07:00
Frédéric Guillot	b1e73fafdf	Enable go-critic linter and fix various issues detected	2024-03-17 13:52:34 -07:00
jvoisin	c29ca0e313	Minor simplifications of the rewriter - Online some one-line functions - Transform a free-standing function into a method - Massively simplify `removeClickbait` - Use a proper constant instead of a magic number in `applyFuncOnTextContent`	2024-03-17 12:15:46 -07:00
jvoisin	02a074ed26	Compile block/keep regex only once per feed No need to compile them once for matching on the url, once per tag, once per title, once per author, … one time is enough. It also simplify error handling, since while regexp compilation can fail, matching can't.	2024-03-17 12:08:03 -07:00
Frédéric Guillot	309fdbb9fc	Fix force refresh	2024-03-15 19:42:09 -07:00
Frédéric Guillot	4834e934f2	Remove some duplicated code in RSS parser	2024-03-15 18:40:06 -07:00
Frédéric Guillot	dd4fb660c1	Refactor Atom parser to use an adapter	2024-03-15 17:27:16 -07:00
Frédéric Guillot	5948786b15	Add support for RSS <media:category> element	2024-03-13 21:35:39 -07:00
Frédéric Guillot	648b9a8f6f	Refactor RSS Parser to use an adapter	2024-03-13 21:25:09 -07:00
Frédéric Guillot	8429c6b0ab	Refactor JSON Feed parser to use an adapter	2024-03-12 22:37:14 -07:00
Frédéric Guillot	6bc4b35e38	Refactor RDF parser to use an adapter Avoid tight coupling between `model.Feed` and the original XML RDF feed.	2024-03-12 20:54:05 -07:00
jvoisin	45d486b919	When detecting the format, detect its version as well There is no need to detect the format and then the version when both can be done at the same time. Add a benchmark as well, on large and small atom and rss files.	2024-03-12 18:56:56 -07:00
Frédéric Guillot	6d97f8b458	Parse podcast categories	2024-03-11 22:30:27 -07:00
Frédéric Guillot	f8e50947f2	Move iTunes and GooglePlay XML definitions to their own packages	2024-03-11 22:09:31 -07:00
Frédéric Guillot	9a637ce95e	Refactor RSS parser to use default namespace This change avoid some limitations of the Go XML parser regarding XML namespaces	2024-03-11 21:07:13 -07:00
jvoisin	a074773e6c	Use an io.ReadSeeker instead of an io.Reader to parse feeds This will allow to make use of func (*Reader) Seek, instead of re-recreating a new reader. It's a large commit for a small change, but anything to simply the reader/buffer/ReadAll/… mess is a step in the right direction I think, and it should enable more follow-up simplifications.	2024-03-06 20:13:39 -08:00
jvoisin	3d0126be0b	Speed the sanitizer up a bit, again - allow youtube urls to start with `www` - use `strings.Builder` instead of a `bytes.Buffer` - use a `strings.NewReader` instead of a `bytes.NewBufferString` - sprinkles a couple of `continue` to make the code-flow more obvious - inline calls to `inList`, and put their parameters in the right order - simplify isPixelTracker - simplify `isValidIframeSource`, by extracting the hostname and comparing it directly, instead of using the full url and checking if it starts with multiple variations of the same one (`//`, `http:`, `https://` multiplied by ``/`www.`) - add a benchmark	2024-03-05 19:31:50 -08:00
jvoisin	111e3f2106	Reuse a Reader instead of copying to a buffer when parsing an atom feed	2024-03-04 17:36:10 -08:00
jvoisin	3339d9d3d7	Preallocate memory when exporting to OPML This should marginally increase performance when export a large amount of feeds to OPML.	2024-03-03 20:34:37 -08:00
jvoisin	347740dce1	Speed up removeUnlikelyCandidates `.Not` returns a brand new Selection, copied element by element.	2024-02-29 19:38:43 -08:00
jvoisin	ab85d4d678	Improve EstimateReadingTime's speed by a factor 7 - Refactorise the tests and add some - Use 250 signs instead of the whole text - Only check for Korean, Chinese and Japanese script - Add a benchmark - Use a more idiomatic control flow ```console $ # main branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime BenchmarkEstimateReadingTime-12 267 4821268 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.754s $ # speed_up_reading_time branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime cpu: 12th Gen Intel(R) Core(TM) i7-1265U BenchmarkEstimateReadingTime-12 1941 653312 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.342s $ ```	2024-02-29 19:24:15 -08:00
jvoisin	31ac62f410	Don't compute reading-time when unused If the user doesn't display reading times, there is no need to compute them. This should speed things up a bit, since `whatlanggo.Detect` is abysmally slow.	2024-02-29 19:14:17 -08:00

1 2 3

112 commits