~nickbp/force-rss

Proxy that converts websites and Twitch accounts into RSS feeds

a3ba134 Release 0.1: html scraping generally works

~nickbp pushed to ~nickbp/force-rss git

19 days ago

e3a8b48 Add toml site param for article timestamp timezone

~nickbp pushed to ~nickbp/force-rss git

20 days ago

#Force-RSS

builds.sr.ht status

Generates RSS feeds for Twitch accounts and preconfigured web pages, acting as a proxy that forces RSS support for services that otherwise don't have support.

It can be run in two modes:

  • HTTP: A caching proxy to Twitch, presenting an HTTP endpoint that can be added to RSS readers
  • CLI: Manual fetching and generating the feed to stdout, for testing or writing to static sites.

#Quickstart

To install force-rss, you can run cargo install force-rss.

Docker images for amd64/arm64 are also available with tags against the git SHA, see the Dockerfile.

#Twitch Prerequisites

Twitch RSS generation uses their API. To query Twitch accounts, create a Twitch App and get the Client ID and Client Secret. Those credentials are used to query Twitch.

#Website Prerequisites

For website scraping, each distinct domain to support must be manually configured with CSS selectors in a TOML config.

The TOML config for a given domain will look like this, with separate site sections for each domain: ``` [site."www.example.com"] # this selector should fire for each article on the page articles = "div" # example.com only has one

, but lets pretend there could be one per "article"

# the following selectors are for extracting the elements within each article:

# the element selected here should be a link with a 'href' attribute:
link = "p a" # get the <a> within the <p>
# the element selected here should be an img with a 'src' attribute:
#image = ".image img" # not applicable for example.com
# a timestamp string for when the article was posted:
#timestamp = ".published" # not applicable for example.com
# an author string for who wrote the article:
#author = ".author a" # not applicable for example.com
# a title string to show as the article headline:
title = "h1" # example.com: get the h1 under the <div>
# a block of text summarizing the article:
summary = "p:first-of-type" # example.com: get the first <p> under the <div>

# optional timezone to assume for article timestamps that lack timezones:
#timezone = "Pacific/Auckland"
```
#Launch

Start force-rss:

  • To query websites: Include the path to your TOML config containing CSS selectors
  • To query Twitch: Include TWITCH_CLIENT_ID and TWITCH_CLIENT_SECRET envvars containing Twitch App credentials
  • To customize the listen endpoint: Specify a LISTEN envvar, 0.0.0.0:8080 is the default.

For example: $ TWITCH_CLIENT_ID=1234...abcd \ TWITCH_CLIENT_SECRET=5678...efgh \ LISTEN=127.0.0.1:8080 \ ./force-rss config.toml

#Use

Once force-rss is running, you can then query it to get RSS feeds:

For a Twitch feed, try /twitch?account=ACCOUNT_NAME_HERE: $ curl -v 'http://127.0.0.1:8080/rss?account=ACCOUNT_NAME_HERE'

For a website that's been configured with CSS selectors in the TOML file: $ curl -v 'http://127.0.0.1:8080/site?url=https://www.example.com

#CLI/stdout

force-rss supports one-off RSS generation for a single Twitch account or website URL as an argument. The resulting RSS payload will be written to stdout. This is mainly useful for testing, or for using an external webserver to serve the RSS.

$ TWITCH_CLIENT_ID=1234...abcd \
    TWITCH_CLIENT_SECRET=5678...efgh \
    ./force-rss twitch ACCOUNT_NAME_HERE
$ ./force-rss site config.toml URL_HERE

#Options

force-rss exposes options via a mix of envvars and a TOML config file.

These are the supported envvars:

  • TWITCH_CLIENT_ID/TWITCH_CLIENT_SECRET: Your Twitch App credentials for optionally querying the Twitch API.
  • LISTEN: The listen address when running in HTTP mode. Defaults to 0.0.0.0:8080 (port 8080 on all interfaces).
  • LOG_LEVEL: Amount of logging you'd like to have. Defaults to info, set to debug/trace for more logs or warn/error/off for fewer logs.

The TOML file is mainly for configuring CSS selectors for scraping websites, but some additional undocumented parameters for things like cache TTLs are available there too. These should already have reasonable defaults but you can find them in TomlConfig.

#License

This project is licensed under the FAFOL. This is intended to restrict use of the project for purposes that would be considered unethical by its authors.