Commit graph

16 commits

Author SHA1 Message Date
Ztec
e54825bf02 Improve YouTube page feed detection
In order to be more resilient to YouTube URLs variation and
to address this feature_request: https://github.com/miniflux/v2/issues/2628
I've reworked a bit the way the YouTube feed extraction is done.

I've kept all the `FindSubscriptionsFromYouTube*` in order
to keep all the existing unit tests as-is ensuring little to no
regressions. By doing so, I had to call twice `youtubeURLIDExtractor`.
Small performance penalty for peace of mind in my opinion.

`youtubeURLIDExtractor` is made in a way only one kind
of page can be detected at a time. This mean I can
solve the "video in a playlist" feature_request
by prioritizing the playlist ID over the Video ID

Also, by using `url.Parse()` to get ids, it's safer
to url mangle and variation. The most common variation
being the `t=42` parameters that start the playback
at a given position. Previously, this kind of url
would not be detected as "YouTube URL".

I deliberately ignored the url parsing error
to keep previous behavior (skip the YouTube analysis and follow with the other analysis)

I also tried to keep debug logs the same as before as much as I could.

I manually tested all the YouTube cases (video,channel,playlist)
and they all work as expected except for the video. But this one
does not work either on main. The `meta` html tag that was searched for
does not seem to exist anymore.

fix: #2628
2024-06-13 20:18:47 -07:00
Evan Elias Young
1b8c45d162 finder: Find feed from YouTube playlist
The feed from a YouTube playlist page is derived in practically the same way as a feed from a YouTube channel page.
2024-04-01 21:16:32 -07:00
jvoisin
fc4bdf3ab0 Inline a one-liner function
No need to expose a symbol for this.
2024-03-20 17:21:30 -07:00
jvoisin
45d486b919 When detecting the format, detect its version as well
There is no need to detect the format and then the version when both can be
done at the same time.

Add a benchmark as well, on large and small atom and rss files.
2024-03-12 18:56:56 -07:00
Frédéric Guillot
c493f8921e Add missing regex anchor detected by CodeQL 2024-02-28 20:50:17 -08:00
jvoisin
543a690bfd Close resources as soon as possible, instead of using defer() in a loop
So that resources can be freed as soon as they're not used anymore, instead of
waiting for the two nested loops to finish.
2024-02-28 19:47:30 -08:00
jvoisin
b04550e2f2 Use %q instead of "%s" 2024-02-28 19:47:30 -08:00
jvoisin
5b2558bf92 Miscellaneous improvements to internal/reader/subscription/finder.go
- Surface `localizedError` in FindSubscriptionsFromWellKnownURLs via slog
- Use an inline declaration for new subscriptions, like done elsewhere in the
  file, if only for consistency's sake
- Preallocate the `subscriptions` slice when using an RSS-bridge,
  it's a good practise, and it might even marginally improve
  performances when adding __a lot__ of feeds via an rss-bridge instance, wooo!
2024-02-26 17:52:21 -08:00
jvoisin
ecd59009fb Add a couple of new possible locations for feeds
- Hugo likes to generate index.xml
- feed.atom and feed.rss are used by enterprise-scale/old-school gigantic CMS
2024-02-26 17:43:51 -08:00
Frédéric Guillot
d0f99cee1a Regression: ensure all HTML documents are encoded in UTF-8
Fixes #2196
2023-12-01 16:52:03 -08:00
Frédéric Guillot
5de0714256 Deduplicate feed URLs when parsing HTML document during discovery process
Fixes #2232
2023-12-01 13:57:05 -08:00
Frédéric Guillot
eeaab72a9f Refactor feed discovery and avoid an extra HTTP request if the url provided is the feed 2023-10-22 18:05:37 -07:00
Frédéric Guillot
14e25ab9fe Refactor HTTP Client and LocalizedError packages 2023-10-22 13:09:30 -07:00
Ryan Stafford
120aabfbce
Add RSS-Bridge integration 2023-10-22 11:10:56 -07:00
Frédéric Guillot
e5d9f2f5a0 Rename internal url package to avoid overlap with net/url 2023-08-13 19:57:04 -07:00
Frédéric Guillot
168a870c02 Move internal packages to an internal folder
For reference: https://go.dev/doc/go1.4#internalpackages
2023-08-10 20:29:34 -07:00