Commit graph

56 commits

Author SHA1 Message Date
jvoisin
c4e5dad549 Remove superfluous escaping in a regex 2024-02-28 19:47:30 -08:00
jvoisin
fa12c23d79 Use strings.ReplaceAll instead of strings.Replace(…, -1) 2024-02-28 19:47:30 -08:00
jvoisin
4fe902a5d2 Use strings.EqualFold instead of strings.ToLower(…) == 2024-02-28 19:47:30 -08:00
jvoisin
61af08a721 Use .WriteString( instead of .Write([]byte(… 2024-02-28 19:47:30 -08:00
jvoisin
b04550e2f2 Use %q instead of "%s" 2024-02-28 19:47:30 -08:00
jvoisin
b94756bbf0 Add a warning for StripTags 2024-02-27 20:41:47 -08:00
jvoisin
db6ae707ef Add some tests for add_image_title
I'm not sure if the behaviour is expected, but I didn't manage to
get the content injection to work in my browser, so I guess it's alright?
2024-02-27 20:41:15 -08:00
jvoisin
06e256e5ef Simplify internal/reader/icon/finder.go
- Use a simple regex to parse data uri instead of a hand-rolled parser, and
  document what fields are considered mandatory.
- Use case-insensitive matching to find (fav)icons, instead of doing the same
  query twice with different letter cases
- Add 'apple-touch-icon-precomposed.png' as a fallback favicon
- Reorder the queries to have i`con` first, since it seems to be the most
  popular one. It used to be last, meaning that pages had to be parsed
  completely 4 times, instead of one now.
- Minor factorisation in findIconURLsFromHTMLDocument
2024-02-26 18:18:04 -08:00
jvoisin
040938ff6d Small refactoring of internal/reader/date/parser.go
- Split dates formats into those that require local times
  and those who don't, so that there is no need to have a switch-case in the
  for loop with around 250 iterations at most.
- Be more strict when it comes to timezones, previously invalid ones like -13
  were accepted. Also add a test for this.
- Bail out early if the date is an empty string.
2024-02-26 18:08:04 -08:00
jvoisin
c2d2f31438 Improve a bit internal/reader/scraper/scraper.go
- make findContentUsingCustomRules' more idiomatic,
  since in golang a function returning an error might
  return garbage in other parameter. Moreover, ignoring
  errors is bad practise.
- getPredefinedScraperRules is now running in constant-time,
  instead of iterating on a list with around 50 items in it.
2024-02-26 18:00:23 -08:00
jvoisin
5b2558bf92 Miscellaneous improvements to internal/reader/subscription/finder.go
- Surface `localizedError` in FindSubscriptionsFromWellKnownURLs via slog
- Use an inline declaration for new subscriptions, like done elsewhere in the
  file, if only for consistency's sake
- Preallocate the `subscriptions` slice when using an RSS-bridge,
  it's a good practise, and it might even marginally improve
  performances when adding __a lot__ of feeds via an rss-bridge instance, wooo!
2024-02-26 17:52:21 -08:00
jvoisin
ecd59009fb Add a couple of new possible locations for feeds
- Hugo likes to generate index.xml
- feed.atom and feed.rss are used by enterprise-scale/old-school gigantic CMS
2024-02-26 17:43:51 -08:00
jvoisin
4a943b722d Add a couple of fuzzers 2024-02-26 17:23:49 -08:00
jvoisin
54b5be5e7d Significantly simplify/speed up the sanitizer
- Use constant time access for maps instead of iterating on them
- Build a ~large whitelist map inline instead of constructing it item by item
  (and remove a duplicate key/value pair)
- Use `slices` instead of hand-rolled loops
2024-02-25 17:29:46 -08:00
Frédéric Guillot
eae4cb1417 Add feed option to disable HTTP/2 to avoid fingerprinting 2024-02-24 22:30:26 -08:00
jvoisin
b48ad6dbfb Make use of go≥1.21 slices package instead of hand-rolled loops
This makes the code a tad smaller, moderner,
and maybe even marginally faster, yay!
2024-02-24 20:22:53 -08:00
jvoisin
c544dadd55 Fix categories import from Thunderbird's OPML
Thunderbird OPML exports are looking like this:

```xml
<opml version="1.0" xmlns:fz="urn:forumzilla:">
<head>
	<title>Thunderbird OPML Export - RSS</title>
    	<dateCreated>Sat, 24 Feb 2024 11:31:13 GMT</dateCreated>
</head>
<body>
	<outline title="News">
		<outline type="rss" ...>
		<outline type="rss" ...>
		...
	</outline>
	<outline title="Blogs">
		<outline type="rss" ...>
		<outline type="rss" ...>
		...
	</outline>
</body>
```

This commit make it so that categories are now correctly imported.
2024-02-24 19:43:33 -08:00
Frédéric Guillot
c595c80356 Handle RDF feeds with duplicated <title> elements 2024-02-23 17:40:58 -08:00
Matt Stobo
4a50ca9122 Allow filtering feeds on entry.Author 2024-01-31 19:42:07 -08:00
Dave
1159dd6982 Add addDynamicIframe rewrite function.
Add unit tests for `add_dynamic_iframe` rewrite.
2024-01-23 19:23:57 -08:00
dzaikos
d68f2306c6 Add attribute to add_dynamic_image rewrite candidates. 2024-01-21 14:27:06 -08:00
Frédéric Guillot
8553188ae4 Add missing translation argument 2024-01-20 10:48:27 -08:00
Frédéric Guillot
ce32d181d5 Change default Accept header 2024-01-13 13:53:57 -08:00
Filipe de Luna
1441dc7600
Update entry processor to allow blocking/keeping entries by tags 2024-01-09 21:15:11 -08:00
Jan Tojnar
074393d3bf fix: Include type for OPML subscriptions
As per [OPML 2.0 specification]:

> Each sub-element of the body of the OPML document is a node of type rss or an outline element that contains nodes of type rss.

> Required attributes: type, text, xmlUrl.

[OPML 2.0 specification]: http://opml.org/spec2.opml#subscriptionLists
2023-12-31 10:00:50 -08:00
Darwin
d90667777f request_builder.go: fetcher: Force try HTTP/2 2023-12-15 16:27:00 -08:00
Kristof Mattei
0465f9b188 fix: tests for allow popups to escape sandbox 2023-12-10 16:59:58 -08:00
Kristof Mattei
d53ad3b79a fix: clicking youtube links in iframes returns ERR_BLOCKED_BY_RESPONSE 2023-12-10 16:59:58 -08:00
Frédéric Guillot
d0f99cee1a Regression: ensure all HTML documents are encoded in UTF-8
Fixes #2196
2023-12-01 16:52:03 -08:00
Frédéric Guillot
5de0714256 Deduplicate feed URLs when parsing HTML document during discovery process
Fixes #2232
2023-12-01 13:57:05 -08:00
Shizun Ge
27ec6dbd7d Setting NextCheckAt due to TTL of a feed in feed.go.
Add unit tests.
2023-12-01 12:22:30 -08:00
Thomas J Faughnan Jr
fe0ef8b579 Fix conditional requests regression
The recent HTTP client refactor in 14e25ab9fe
caused feed refreshes to no longer make conditional requests. Prior to
the refactor, `client.WithCacheHeaders` handled this. Now this function
is split into `fetcher.WithETag` and `fetcher.WithLastModified` but
these functions are only declared and never actually used. Fix this by
calling them inside `handler.RefreshFeed`.
2023-11-29 19:46:50 -08:00
Thomas J Faughnan Jr
7a03291442 Fix default User-Agent regression
The recent HTTP client refactor in 14e25ab9fe
introduced a bug in which the global default User-Agent is no longer
used for requests. Unless a per-feed User-Agent exists, the Go standard
library's default User-Agent is used, which looks something like
"Go-http-client/1.1". To fix this, make RequestBuilder.WithUserAgent
take an additional argument, the default User-Agent, which will be used
if there is no per-feed User-Agent (i.e. it is an empty string).

Fixes #2188
Fixes #2189
2023-11-18 20:57:47 +01:00
Frédéric Guillot
e3eaaea15a Update date parser to parse more invalid date formats 2023-11-01 20:55:35 +01:00
Frédéric Guillot
500c60b807 Fix error handling and logging issue after refactoring 2023-11-01 19:59:12 +01:00
Nicholas Parker
257e8c4761 Allow iframes pointing to Twitch videos
Docs: https://dev.twitch.tv/docs/embed/video-and-clips/#non-interactive-inline-frames-for-live-streams-and-vods
2023-10-27 10:02:57 -07:00
Tianfeng Wang
a1537f4b0d
Filter feed entries based on url or title 2023-10-25 19:38:08 -07:00
Frédéric Guillot
eeaab72a9f Refactor feed discovery and avoid an extra HTTP request if the url provided is the feed 2023-10-22 18:05:37 -07:00
Frédéric Guillot
14e25ab9fe Refactor HTTP Client and LocalizedError packages 2023-10-22 13:09:30 -07:00
Ryan Stafford
120aabfbce
Add RSS-Bridge integration 2023-10-22 11:10:56 -07:00
Frédéric Guillot
5e6c054345 Take RSS TTL field into consideration to schedule next check date 2023-10-20 20:11:05 -07:00
Frédéric Guillot
5ac3489ee5 Do not log website without icon as warning 2023-10-19 20:36:51 -07:00
Frédéric Guillot
9fd2dfa680 Refactor icon finder
Changes:

- Continue the discovery process when the feed icon is invalid
- Search all icons from the HTML document and do not stop on the first one
2023-10-18 22:24:56 -07:00
Frédéric Guillot
7650c81ad9 Add support for SVG icons with data url without encoding 2023-10-18 20:46:46 -07:00
Frédéric Guillot
7b541af253 Replace github.com/rylans/getlang with github.com/abadojack/whatlanggo
github.com/rylans/getlang doesn't seems to be updated anymore
2023-10-06 22:04:31 -07:00
Frédéric Guillot
c0e954f19d Implement structured logging using log/slog package 2023-09-24 22:37:33 -07:00
Adriano Di Luzio
54cb8fa028
Added new rewrite rules add_hn_links_using_hack and add_hn_links_using_opener to open HN comments with iOS apps 2023-09-23 13:54:48 -07:00
Frédéric Guillot
3b94217fb7 Make sure icon URLs are always absolute
Regression introduced in #1907
2023-09-09 14:59:44 -07:00
Frédéric Guillot
48f6885f44 Add generic webhook integration 2023-09-09 13:11:42 -07:00
fuchsrot
32d33104a4 Apprise Service Urls per feed 2023-09-09 10:59:04 -07:00