miniflux

Author	SHA1	Message	Date
jvoisin	c29ca0e313	Minor simplifications of the rewriter - Online some one-line functions - Transform a free-standing function into a method - Massively simplify `removeClickbait` - Use a proper constant instead of a magic number in `applyFuncOnTextContent`	2024-03-17 12:15:46 -07:00
jvoisin	02a074ed26	Compile block/keep regex only once per feed No need to compile them once for matching on the url, once per tag, once per title, once per author, … one time is enough. It also simplify error handling, since while regexp compilation can fail, matching can't.	2024-03-17 12:08:03 -07:00
Frédéric Guillot	309fdbb9fc	Fix force refresh	2024-03-15 19:42:09 -07:00
Frédéric Guillot	4834e934f2	Remove some duplicated code in RSS parser	2024-03-15 18:40:06 -07:00
Frédéric Guillot	dd4fb660c1	Refactor Atom parser to use an adapter	2024-03-15 17:27:16 -07:00
Frédéric Guillot	5948786b15	Add support for RSS <media:category> element	2024-03-13 21:35:39 -07:00
Frédéric Guillot	648b9a8f6f	Refactor RSS Parser to use an adapter	2024-03-13 21:25:09 -07:00
Frédéric Guillot	8429c6b0ab	Refactor JSON Feed parser to use an adapter	2024-03-12 22:37:14 -07:00
Frédéric Guillot	6bc4b35e38	Refactor RDF parser to use an adapter Avoid tight coupling between `model.Feed` and the original XML RDF feed.	2024-03-12 20:54:05 -07:00
jvoisin	45d486b919	When detecting the format, detect its version as well There is no need to detect the format and then the version when both can be done at the same time. Add a benchmark as well, on large and small atom and rss files.	2024-03-12 18:56:56 -07:00
Frédéric Guillot	6d97f8b458	Parse podcast categories	2024-03-11 22:30:27 -07:00
Frédéric Guillot	f8e50947f2	Move iTunes and GooglePlay XML definitions to their own packages	2024-03-11 22:09:31 -07:00
Frédéric Guillot	9a637ce95e	Refactor RSS parser to use default namespace This change avoid some limitations of the Go XML parser regarding XML namespaces	2024-03-11 21:07:13 -07:00
jvoisin	a074773e6c	Use an io.ReadSeeker instead of an io.Reader to parse feeds This will allow to make use of func (*Reader) Seek, instead of re-recreating a new reader. It's a large commit for a small change, but anything to simply the reader/buffer/ReadAll/… mess is a step in the right direction I think, and it should enable more follow-up simplifications.	2024-03-06 20:13:39 -08:00
jvoisin	3d0126be0b	Speed the sanitizer up a bit, again - allow youtube urls to start with `www` - use `strings.Builder` instead of a `bytes.Buffer` - use a `strings.NewReader` instead of a `bytes.NewBufferString` - sprinkles a couple of `continue` to make the code-flow more obvious - inline calls to `inList`, and put their parameters in the right order - simplify isPixelTracker - simplify `isValidIframeSource`, by extracting the hostname and comparing it directly, instead of using the full url and checking if it starts with multiple variations of the same one (`//`, `http:`, `https://` multiplied by ``/`www.`) - add a benchmark	2024-03-05 19:31:50 -08:00
jvoisin	111e3f2106	Reuse a Reader instead of copying to a buffer when parsing an atom feed	2024-03-04 17:36:10 -08:00
jvoisin	3339d9d3d7	Preallocate memory when exporting to OPML This should marginally increase performance when export a large amount of feeds to OPML.	2024-03-03 20:34:37 -08:00
jvoisin	347740dce1	Speed up removeUnlikelyCandidates `.Not` returns a brand new Selection, copied element by element.	2024-02-29 19:38:43 -08:00
jvoisin	ab85d4d678	Improve EstimateReadingTime's speed by a factor 7 - Refactorise the tests and add some - Use 250 signs instead of the whole text - Only check for Korean, Chinese and Japanese script - Add a benchmark - Use a more idiomatic control flow ```console $ # main branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime BenchmarkEstimateReadingTime-12 267 4821268 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.754s $ # speed_up_reading_time branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime cpu: 12th Gen Intel(R) Core(TM) i7-1265U BenchmarkEstimateReadingTime-12 1941 653312 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.342s $ ```	2024-02-29 19:24:15 -08:00
jvoisin	31ac62f410	Don't compute reading-time when unused If the user doesn't display reading times, there is no need to compute them. This should speed things up a bit, since `whatlanggo.Detect` is abysmally slow.	2024-02-29 19:14:17 -08:00
Frédéric Guillot	97765b93a9	Revert "Minor internal/reader/readability/readability.go speedup" This reverts commit `4db138d4b8`. ``` panic: runtime error: index out of range [-1] goroutine 49 [running]: miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8) /home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62 miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?}) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211 miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0}) /home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859 miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0) /home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0) /home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485 miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0) /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7 created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1 /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5 ```	2024-02-29 19:06:03 -08:00
Frédéric Guillot	c493f8921e	Add missing regex anchor detected by CodeQL	2024-02-28 20:50:17 -08:00
jvoisin	4db138d4b8	Minor internal/reader/readability/readability.go speedup - Don't use a capturing group in `divToPElementsRegexp` - Remove a duplicate condition - Replace a regex with a fixed-comparison and a `Contains`	2024-02-28 20:03:14 -08:00
jvoisin	f12d5131b0	Divide the sanitization time by 3 Instead of having to allocate a ~100 keys map containing possibly dynamic values (at least to the go compiler), allocate it once in a global variable. This significantly speeds things up, by reducing the garbage collector/allocator involvements. Local synthetic benchmarks have shown a improvements from 38% of wall time to only 12%.	2024-02-28 20:00:13 -08:00
jvoisin	645a817685	Use modern for loops Go 1.22 introduced a new [for-range](https://go.dev/ref/spec#For_range) construct that looks a tad better than the usual `for i := 0; i < N; i++` construct. I also tool the liberty of replacing some `for i := 0; i < len(myitemsarray); i++ { … myitemsarray[i] …}` with `for item := range myitemsarray` when `myitemsarray` contains only pointers.	2024-02-28 19:55:28 -08:00
jvoisin	543a690bfd	Close resources as soon as possible, instead of using defer() in a loop So that resources can be freed as soon as they're not used anymore, instead of waiting for the two nested loops to finish.	2024-02-28 19:47:30 -08:00
jvoisin	c4e5dad549	Remove superfluous escaping in a regex	2024-02-28 19:47:30 -08:00
jvoisin	fa12c23d79	Use strings.ReplaceAll instead of strings.Replace(…, -1)	2024-02-28 19:47:30 -08:00
jvoisin	4fe902a5d2	Use `strings.EqualFold` instead of `strings.ToLower(…) ==`	2024-02-28 19:47:30 -08:00
jvoisin	61af08a721	Use .WriteString( instead of .Write([]byte(…	2024-02-28 19:47:30 -08:00
jvoisin	b04550e2f2	Use `%q` instead of `"%s"`	2024-02-28 19:47:30 -08:00
jvoisin	b94756bbf0	Add a warning for StripTags	2024-02-27 20:41:47 -08:00
jvoisin	db6ae707ef	Add some tests for add_image_title I'm not sure if the behaviour is expected, but I didn't manage to get the content injection to work in my browser, so I guess it's alright?	2024-02-27 20:41:15 -08:00
jvoisin	06e256e5ef	Simplify internal/reader/icon/finder.go - Use a simple regex to parse data uri instead of a hand-rolled parser, and document what fields are considered mandatory. - Use case-insensitive matching to find (fav)icons, instead of doing the same query twice with different letter cases - Add 'apple-touch-icon-precomposed.png' as a fallback favicon - Reorder the queries to have i`con` first, since it seems to be the most popular one. It used to be last, meaning that pages had to be parsed completely 4 times, instead of one now. - Minor factorisation in findIconURLsFromHTMLDocument	2024-02-26 18:18:04 -08:00
jvoisin	040938ff6d	Small refactoring of internal/reader/date/parser.go - Split dates formats into those that require local times and those who don't, so that there is no need to have a switch-case in the for loop with around 250 iterations at most. - Be more strict when it comes to timezones, previously invalid ones like -13 were accepted. Also add a test for this. - Bail out early if the date is an empty string.	2024-02-26 18:08:04 -08:00
jvoisin	c2d2f31438	Improve a bit internal/reader/scraper/scraper.go - make findContentUsingCustomRules' more idiomatic, since in golang a function returning an error might return garbage in other parameter. Moreover, ignoring errors is bad practise. - getPredefinedScraperRules is now running in constant-time, instead of iterating on a list with around 50 items in it.	2024-02-26 18:00:23 -08:00
jvoisin	5b2558bf92	Miscellaneous improvements to internal/reader/subscription/finder.go - Surface `localizedError` in FindSubscriptionsFromWellKnownURLs via slog - Use an inline declaration for new subscriptions, like done elsewhere in the file, if only for consistency's sake - Preallocate the `subscriptions` slice when using an RSS-bridge, it's a good practise, and it might even marginally improve performances when adding __a lot__ of feeds via an rss-bridge instance, wooo!	2024-02-26 17:52:21 -08:00
jvoisin	ecd59009fb	Add a couple of new possible locations for feeds - Hugo likes to generate index.xml - feed.atom and feed.rss are used by enterprise-scale/old-school gigantic CMS	2024-02-26 17:43:51 -08:00
jvoisin	4a943b722d	Add a couple of fuzzers	2024-02-26 17:23:49 -08:00
jvoisin	54b5be5e7d	Significantly simplify/speed up the sanitizer - Use constant time access for maps instead of iterating on them - Build a ~large whitelist map inline instead of constructing it item by item (and remove a duplicate key/value pair) - Use `slices` instead of hand-rolled loops	2024-02-25 17:29:46 -08:00
Frédéric Guillot	eae4cb1417	Add feed option to disable HTTP/2 to avoid fingerprinting	2024-02-24 22:30:26 -08:00
jvoisin	b48ad6dbfb	Make use of go≥1.21 slices package instead of hand-rolled loops This makes the code a tad smaller, moderner, and maybe even marginally faster, yay!	2024-02-24 20:22:53 -08:00
jvoisin	c544dadd55	Fix categories import from Thunderbird's OPML Thunderbird OPML exports are looking like this: ```xml <opml version="1.0" xmlns:fz="urn:forumzilla:"> <head> <title>Thunderbird OPML Export - RSS</title> <dateCreated>Sat, 24 Feb 2024 11:31:13 GMT</dateCreated> </head> <body> <outline title="News"> <outline type="rss" ...> <outline type="rss" ...> ... </outline> <outline title="Blogs"> <outline type="rss" ...> <outline type="rss" ...> ... </outline> </body> ``` This commit make it so that categories are now correctly imported.	2024-02-24 19:43:33 -08:00
Frédéric Guillot	c595c80356	Handle RDF feeds with duplicated <title> elements	2024-02-23 17:40:58 -08:00
Matt Stobo	4a50ca9122	Allow filtering feeds on entry.Author	2024-01-31 19:42:07 -08:00
Dave	1159dd6982	Add `addDynamicIframe` rewrite function. Add unit tests for `add_dynamic_iframe` rewrite.	2024-01-23 19:23:57 -08:00
dzaikos	d68f2306c6	Add attribute to add_dynamic_image rewrite candidates.	2024-01-21 14:27:06 -08:00
Frédéric Guillot	8553188ae4	Add missing translation argument	2024-01-20 10:48:27 -08:00
Frédéric Guillot	ce32d181d5	Change default Accept header	2024-01-13 13:53:57 -08:00
Filipe de Luna	1441dc7600	Update entry processor to allow blocking/keeping entries by tags	2024-01-09 21:15:11 -08:00

1 2

82 commits