~q3cpma/haggle

Tool to monitor the price of products across online stores

c83e51a Update atom and util

~q3cpma pushed to ~q3cpma/haggle git

a month ago

3332e9d Fix html in Atom entry content not being interpreted

~q3cpma pushed to ~q3cpma/haggle git

5 months ago
                                     Haggle
                                     ======

        Overview
        --------

A tool to monitor the price of products across online stores. When a price
change is detected, a notification is written in an Atom feed; you then just
need an Atom reader accepting local feeds (e.g. newsboat via the file://
protocol).

Obviously, you're supposed to run this regularly via cron.

+--------------------------------------------------------------------------+
| NAME                                                                     |
|     haggle.tcl - monitor online prices                                   |
|                                                                          |
| SYNOPSIS                                                                 |
|     haggle.tcl [OPTIONS] CATALOG                                         |
|                                                                          |
| DESCRIPTION                                                              |
|     Read product data from CATALOG, a file containing a Tcl list using   |
|     the following syntax:                                                |
|        {SHOP ?noproxy? METHOD METHOD_ARG ?PRODUCT URL ...?} ...          |
|     Everything between a # and a newline is ignored.                     |
|                                                                          |
|     For each PRODUCT, the corresponding URL is downloaded and the        |
|     extraction method applied with its argument on the body to produce   |
|     the price.                                                           |
|                                                                          |
|     Thus the lowest price for PRODUCT is found across the whole CATALOG  |
|     and if a change is detected compared to the last run, an Atom entry  |
|     is written into a designated feed.                                   |
|                                                                          |
|     The files for the Atom feed and database holding the price stats are |
|      created next to the given CATALOG.                                  |
|                                                                          |
|     Available methods are:                                               |
|      * xpath:  XPath query                                               |
|      * regexp: Tcl regexp's first capturing group of the first match     |
|                                                                          |
| OPTIONS                                                                  |
|     -help                                                                |
|         Print this help message and exit.                                |
|                                                                          |
|     -proxy PROXY_URL                                                     |
|         Set the curl HTTP/HTTPS proxy.                                   |
|                                                                          |
+--------------------------------------------------------------------------+

Two little tools named test_xpath.tcl and test_regexp.tcl are included to
experiment on local (downloaded) HTML to find the right argument for
extraction.

The price database actually contains enough data to draw a curve showing the
evolution of the minimum price (including the store where it's sold) for each
product across time.


        Examples
        --------

$ ls
catalog.tcllist
$ cat catalog.tcllist
{Bax xpath {number(//meta[@itemprop="price"]/@content)}
	{RME Babyface Pro FS} https://www.bax-shop.fr/carte-son-externe/rme-babyface-pro-fs-interface-audio
}

{Thomann xpath {substring-before(//div[@class="prod-pricebox-price-primary"]/div/span[@class="primary"], " ")}
	{Shure SHR1540}       https://www.thomann.de/fr/shure_srh1540.htm
	{RME Babyface Pro FS} https://www.thomann.de/fr/rme_babyface_pro_fs.htm
}

{Sonovente regexp {^'prdAmount':'([^']+)',$}
	{Shure SHR1540} https://www.sonovente.com/shure-srh1540-p41063.html
}

{{Global Audio Store} xpath {translate(substring-before(//span[@id="our_price_display"], " "), ",", ".")}
	{Shure SHR1540} https://www.global-audio-store.fr/en/8328-shure-srh1540.html
}

{Woodbrass noproxy regexp {"price": "([0-9.]+)"}
	{Shure SHR1540} https://www.woodbrass.com/casques-studio-fermes-shure-srh1540-p340698.html
}

# Tried with XPath, but tdom can't parse the HTML
{Amazon.fr regexp {^<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">([0-9]+)(,[0-9]+).€</span>$}
	{Shure SHR1540} https://www.amazon.fr/Shure-SRH1540-construction-daluminium-oreillettes/dp/B00FR8DMR8
}
$ haggle.tcl catalog.tcllist
...
$ ls
catalog.tcllist
haggle.xml
pricesdb.tcldict


        Dependencies
        ------------

* Tcl 8.6
* curl
* tdom