- Online some one-line functions
- Transform a free-standing function into a method
- Massively simplify `removeClickbait`
- Use a proper constant instead of a magic number in `applyFuncOnTextContent`
No need to compile them once for matching on the url,
once per tag, once per title, once per author, … one time is enough.
It also simplify error handling, since while regexp compilation can fail,
matching can't.
There is no need to detect the format and then the version when both can be
done at the same time.
Add a benchmark as well, on large and small atom and rss files.
This will allow to make use of func (*Reader) Seek, instead of re-recreating a
new reader. It's a large commit for a small change, but anything to simply the
reader/buffer/ReadAll/… mess is a step in the right direction I think, and it
should enable more follow-up simplifications.
- allow youtube urls to start with `www`
- use `strings.Builder` instead of a `bytes.Buffer`
- use a `strings.NewReader` instead of a `bytes.NewBufferString`
- sprinkles a couple of `continue` to make the code-flow more obvious
- inline calls to `inList`, and put their parameters in the right order
- simplify isPixelTracker
- simplify `isValidIframeSource`, by extracting the hostname and comparing it
directly, instead of using the full url and checking if it starts with
multiple variations of the same one (`//`, `http:`, `https://` multiplied by
``/`www.`)
- add a benchmark
- Refactorise the tests and add some
- Use 250 signs instead of the whole text
- Only check for Korean, Chinese and Japanese script
- Add a benchmark
- Use a more idiomatic control flow
```console
$ # main branch
$ go test -bench=.
goos: linux
goarch: amd64
pkg: miniflux.app/v2/internal/reader/readingtime
BenchmarkEstimateReadingTime-12 267 4821268 ns/op
PASS
ok miniflux.app/v2/internal/reader/readingtime 1.754s
$ # speed_up_reading_time branch
$ go test -bench=.
goos: linux
goarch: amd64
pkg: miniflux.app/v2/internal/reader/readingtime
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
BenchmarkEstimateReadingTime-12 1941 653312 ns/op
PASS
ok miniflux.app/v2/internal/reader/readingtime 1.342s
$
```
If the user doesn't display reading times, there is no need to compute them.
This should speed things up a bit, since `whatlanggo.Detect` is abysmally slow.
Instead of having to allocate a ~100 keys map containing possibly dynamic
values (at least to the go compiler), allocate it once in a global variable.
This significantly speeds things up, by reducing the garbage
collector/allocator involvements.
Local synthetic benchmarks have shown a improvements from 38% of wall time to only
12%.
Go 1.22 introduced a new [for-range](https://go.dev/ref/spec#For_range)
construct that looks a tad better than the usual `for i := 0; i < N; i++`
construct. I also tool the liberty of replacing some
`for i := 0; i < len(myitemsarray); i++ { … myitemsarray[i] …}`
with `for item := range myitemsarray` when `myitemsarray` contains only pointers.
- Use a simple regex to parse data uri instead of a hand-rolled parser, and
document what fields are considered mandatory.
- Use case-insensitive matching to find (fav)icons, instead of doing the same
query twice with different letter cases
- Add 'apple-touch-icon-precomposed.png' as a fallback favicon
- Reorder the queries to have i`con` first, since it seems to be the most
popular one. It used to be last, meaning that pages had to be parsed
completely 4 times, instead of one now.
- Minor factorisation in findIconURLsFromHTMLDocument
- Split dates formats into those that require local times
and those who don't, so that there is no need to have a switch-case in the
for loop with around 250 iterations at most.
- Be more strict when it comes to timezones, previously invalid ones like -13
were accepted. Also add a test for this.
- Bail out early if the date is an empty string.
- make findContentUsingCustomRules' more idiomatic,
since in golang a function returning an error might
return garbage in other parameter. Moreover, ignoring
errors is bad practise.
- getPredefinedScraperRules is now running in constant-time,
instead of iterating on a list with around 50 items in it.
- Surface `localizedError` in FindSubscriptionsFromWellKnownURLs via slog
- Use an inline declaration for new subscriptions, like done elsewhere in the
file, if only for consistency's sake
- Preallocate the `subscriptions` slice when using an RSS-bridge,
it's a good practise, and it might even marginally improve
performances when adding __a lot__ of feeds via an rss-bridge instance, wooo!
- Use constant time access for maps instead of iterating on them
- Build a ~large whitelist map inline instead of constructing it item by item
(and remove a duplicate key/value pair)
- Use `slices` instead of hand-rolled loops