miniflux

Author	SHA1	Message	Date
jvoisin	b205b5aad0	reader/processor: minimize the feed's entries html Compress the html of feed entries before storing it. This should reduce the size of the database a bit, but more importantly, reduce the amount of data sent to clients minify being [stupidly fast](https://github.com/tdewolff/minify/?tab=readme-ov-file#performance), the performance impact should be in the noise level.	2024-04-10 19:48:48 -07:00
Frédéric Guillot	38b80d96ea	storage: change GetReadTime() function to use entries_feed_id_hash_key index	2024-04-09 20:37:30 -07:00
Frédéric Guillot	fdd1b3f18e	database: entry URLs can exceeds btree index size limit	2024-04-04 20:22:23 -07:00
Evan Elias Young	1b8c45d162	finder: Find feed from YouTube playlist The feed from a YouTube playlist page is derived in practically the same way as a feed from a YouTube channel page.	2024-04-01 21:16:32 -07:00
jvoisin	19ce519836	reader/rewrite: add a rule for oglaf.com By default, Oglaf show some disclaimer/warning about its content, and this doesn't play well with rss readers, so let's rewrite it to show the actual comic instead of a placeholder.	2024-04-01 21:05:01 -07:00
jvoisin	f109e3207c	reader/rss: don't add empty tags to RSS items This commit adds a bunch of checks to prevent reader/rss from adding empty tags to rss items, as well as some minor refactors like nested conditions and loops unrolling.	2024-03-24 19:46:56 -07:00
Frédéric Guillot	ad1d349a0c	rss: use Channel tags only if there is no Item tags	2024-03-23 13:46:48 -07:00
jvoisin	fc4bdf3ab0	Inline a one-liner function No need to expose a symbol for this.	2024-03-20 17:21:30 -07:00
Frédéric Guillot	08640b27d5	Ensure enclosure URLs are always absolute	2024-03-19 21:57:46 -07:00
jvoisin	4be993e055	Minor refactoring of internal/reader/atom/atom_10_adapter.go - Move the population of the feed's entries into a new function, to make `BuildFeed` easier to understand/separate concerns/implementation details - Use `sort+compact` instead of `compact+sort` to remove duplicates - Change `if !a { a = } if !a {a = }` constructs into `if !a { a = ; if !a {a = }}`. This reduce the number of comparisons, but also improves a tad the control-flow readability.	2024-03-19 20:41:44 -07:00
Jean Khawand	a78d1c79da	Add `FILTER_ENTRY_MAX_AGE_DAYS` config option to limit fetching all feed items	2024-03-20 02:58:53 +00:00
Frédéric Guillot	fa9697b972	Remove trailing space in SiteURL and FeedURL	2024-03-18 17:51:06 -07:00
jvoisin	91f5522ce0	Minor simplification of internal/reader/media/media.go - Simplify a switch-case by moving a common condition above it. - Remove a superfluous error-check: `strconv.ParseInt` returns `0` when passed an empty string.	2024-03-18 16:09:32 -07:00
Frédéric Guillot	8212f16aa2	atom: avoid debug message when the date is empty	2024-03-17 15:29:50 -07:00
Frédéric Guillot	b1e73fafdf	Enable go-critic linter and fix various issues detected	2024-03-17 13:52:34 -07:00
jvoisin	c29ca0e313	Minor simplifications of the rewriter - Online some one-line functions - Transform a free-standing function into a method - Massively simplify `removeClickbait` - Use a proper constant instead of a magic number in `applyFuncOnTextContent`	2024-03-17 12:15:46 -07:00
jvoisin	02a074ed26	Compile block/keep regex only once per feed No need to compile them once for matching on the url, once per tag, once per title, once per author, … one time is enough. It also simplify error handling, since while regexp compilation can fail, matching can't.	2024-03-17 12:08:03 -07:00
Frédéric Guillot	309fdbb9fc	Fix force refresh	2024-03-15 19:42:09 -07:00
Frédéric Guillot	4834e934f2	Remove some duplicated code in RSS parser	2024-03-15 18:40:06 -07:00
Frédéric Guillot	dd4fb660c1	Refactor Atom parser to use an adapter	2024-03-15 17:27:16 -07:00
Frédéric Guillot	5948786b15	Add support for RSS <media:category> element	2024-03-13 21:35:39 -07:00
Frédéric Guillot	648b9a8f6f	Refactor RSS Parser to use an adapter	2024-03-13 21:25:09 -07:00
Frédéric Guillot	8429c6b0ab	Refactor JSON Feed parser to use an adapter	2024-03-12 22:37:14 -07:00
Frédéric Guillot	6bc4b35e38	Refactor RDF parser to use an adapter Avoid tight coupling between `model.Feed` and the original XML RDF feed.	2024-03-12 20:54:05 -07:00
jvoisin	45d486b919	When detecting the format, detect its version as well There is no need to detect the format and then the version when both can be done at the same time. Add a benchmark as well, on large and small atom and rss files.	2024-03-12 18:56:56 -07:00
Frédéric Guillot	6d97f8b458	Parse podcast categories	2024-03-11 22:30:27 -07:00
Frédéric Guillot	f8e50947f2	Move iTunes and GooglePlay XML definitions to their own packages	2024-03-11 22:09:31 -07:00
Frédéric Guillot	9a637ce95e	Refactor RSS parser to use default namespace This change avoid some limitations of the Go XML parser regarding XML namespaces	2024-03-11 21:07:13 -07:00
jvoisin	a074773e6c	Use an io.ReadSeeker instead of an io.Reader to parse feeds This will allow to make use of func (*Reader) Seek, instead of re-recreating a new reader. It's a large commit for a small change, but anything to simply the reader/buffer/ReadAll/… mess is a step in the right direction I think, and it should enable more follow-up simplifications.	2024-03-06 20:13:39 -08:00
jvoisin	3d0126be0b	Speed the sanitizer up a bit, again - allow youtube urls to start with `www` - use `strings.Builder` instead of a `bytes.Buffer` - use a `strings.NewReader` instead of a `bytes.NewBufferString` - sprinkles a couple of `continue` to make the code-flow more obvious - inline calls to `inList`, and put their parameters in the right order - simplify isPixelTracker - simplify `isValidIframeSource`, by extracting the hostname and comparing it directly, instead of using the full url and checking if it starts with multiple variations of the same one (`//`, `http:`, `https://` multiplied by ``/`www.`) - add a benchmark	2024-03-05 19:31:50 -08:00
jvoisin	111e3f2106	Reuse a Reader instead of copying to a buffer when parsing an atom feed	2024-03-04 17:36:10 -08:00
jvoisin	3339d9d3d7	Preallocate memory when exporting to OPML This should marginally increase performance when export a large amount of feeds to OPML.	2024-03-03 20:34:37 -08:00
jvoisin	347740dce1	Speed up removeUnlikelyCandidates `.Not` returns a brand new Selection, copied element by element.	2024-02-29 19:38:43 -08:00
jvoisin	ab85d4d678	Improve EstimateReadingTime's speed by a factor 7 - Refactorise the tests and add some - Use 250 signs instead of the whole text - Only check for Korean, Chinese and Japanese script - Add a benchmark - Use a more idiomatic control flow ```console $ # main branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime BenchmarkEstimateReadingTime-12 267 4821268 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.754s $ # speed_up_reading_time branch $ go test -bench=. goos: linux goarch: amd64 pkg: miniflux.app/v2/internal/reader/readingtime cpu: 12th Gen Intel(R) Core(TM) i7-1265U BenchmarkEstimateReadingTime-12 1941 653312 ns/op PASS ok miniflux.app/v2/internal/reader/readingtime 1.342s $ ```	2024-02-29 19:24:15 -08:00
jvoisin	31ac62f410	Don't compute reading-time when unused If the user doesn't display reading times, there is no need to compute them. This should speed things up a bit, since `whatlanggo.Detect` is abysmally slow.	2024-02-29 19:14:17 -08:00
Frédéric Guillot	97765b93a9	Revert "Minor internal/reader/readability/readability.go speedup" This reverts commit `4db138d4b8`. ``` panic: runtime error: index out of range [-1] goroutine 49 [running]: miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8) /home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62 miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?}) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211 miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0}) /home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859 miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0) /home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0) /home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485 miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0) /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7 created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1 /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5 ```	2024-02-29 19:06:03 -08:00
Frédéric Guillot	c493f8921e	Add missing regex anchor detected by CodeQL	2024-02-28 20:50:17 -08:00
jvoisin	4db138d4b8	Minor internal/reader/readability/readability.go speedup - Don't use a capturing group in `divToPElementsRegexp` - Remove a duplicate condition - Replace a regex with a fixed-comparison and a `Contains`	2024-02-28 20:03:14 -08:00
jvoisin	f12d5131b0	Divide the sanitization time by 3 Instead of having to allocate a ~100 keys map containing possibly dynamic values (at least to the go compiler), allocate it once in a global variable. This significantly speeds things up, by reducing the garbage collector/allocator involvements. Local synthetic benchmarks have shown a improvements from 38% of wall time to only 12%.	2024-02-28 20:00:13 -08:00
jvoisin	645a817685	Use modern for loops Go 1.22 introduced a new [for-range](https://go.dev/ref/spec#For_range) construct that looks a tad better than the usual `for i := 0; i < N; i++` construct. I also tool the liberty of replacing some `for i := 0; i < len(myitemsarray); i++ { … myitemsarray[i] …}` with `for item := range myitemsarray` when `myitemsarray` contains only pointers.	2024-02-28 19:55:28 -08:00
jvoisin	543a690bfd	Close resources as soon as possible, instead of using defer() in a loop So that resources can be freed as soon as they're not used anymore, instead of waiting for the two nested loops to finish.	2024-02-28 19:47:30 -08:00
jvoisin	c4e5dad549	Remove superfluous escaping in a regex	2024-02-28 19:47:30 -08:00
jvoisin	fa12c23d79	Use strings.ReplaceAll instead of strings.Replace(…, -1)	2024-02-28 19:47:30 -08:00
jvoisin	4fe902a5d2	Use `strings.EqualFold` instead of `strings.ToLower(…) ==`	2024-02-28 19:47:30 -08:00
jvoisin	61af08a721	Use .WriteString( instead of .Write([]byte(…	2024-02-28 19:47:30 -08:00
jvoisin	b04550e2f2	Use `%q` instead of `"%s"`	2024-02-28 19:47:30 -08:00
jvoisin	b94756bbf0	Add a warning for StripTags	2024-02-27 20:41:47 -08:00
jvoisin	db6ae707ef	Add some tests for add_image_title I'm not sure if the behaviour is expected, but I didn't manage to get the content injection to work in my browser, so I guess it's alright?	2024-02-27 20:41:15 -08:00
jvoisin	06e256e5ef	Simplify internal/reader/icon/finder.go - Use a simple regex to parse data uri instead of a hand-rolled parser, and document what fields are considered mandatory. - Use case-insensitive matching to find (fav)icons, instead of doing the same query twice with different letter cases - Add 'apple-touch-icon-precomposed.png' as a fallback favicon - Reorder the queries to have i`con` first, since it seems to be the most popular one. It used to be last, meaning that pages had to be parsed completely 4 times, instead of one now. - Minor factorisation in findIconURLsFromHTMLDocument	2024-02-26 18:18:04 -08:00
jvoisin	040938ff6d	Small refactoring of internal/reader/date/parser.go - Split dates formats into those that require local times and those who don't, so that there is no need to have a switch-case in the for loop with around 250 iterations at most. - Be more strict when it comes to timezones, previously invalid ones like -13 were accepted. Also add a test for this. - Bail out early if the date is an empty string.	2024-02-26 18:08:04 -08:00

1 2

97 commits