A script to bulk download files from Wikimedia Commons.
Really, if you don't have all the utilities above installed on your machine then you really need to get a better shell or rebuild Busybox or something, they're all pretty basic (other than wget, jq, and xmllint I suppose).
Download the script directly: https://git.sr.ht/~nytpu/commons-downloader/blob/master/commons-downloader
Or clone the repo:
git clone https://git.sr.ht/~nytpu/commons-downloader
Then run the script where it is with something like ./commons-downloader
or
symlink it into your $PATH
.
Usage: commons-downloader [-chns] [-o outdir] [-q query]... [-r file] <category>
-c Download all images in a given category.
-h Display this help information.
-n No output or progress information.
-o outdir Download all images to the given directory (will be created).
-q query Additional queries to add when downloading from a search.
-r file Resume downloading URIs from a given file.
-s Download all images from a search for the given category and queries.
-u agent Change the user agent to use for requests.
category The formal category name you wish to download from.
The main options are -c
, -s
, and -r
.
-c
will download all matches in a category, and -s
will download all
matches for a search; they can be combined, the downloaded files will be
deduplicated so an intersection between them is not an issue.
-r <URL list file>
will resume a download given a list of URLs, and is
mutually exclusive with -c
and -s
.
The URLs for a given download will be automatically saved in _URLS.txt
in the
directory holding the downloaded photos.
At least one of -c
. -s
, and -r
is required to be passed.
Multiple -q <add'l query>
flags can be added when using -s
to add additional
queries to a search.
It has no effect if -s
is not also passed.
For example,
commons-downloader -s -q Q173651 -q "African Wild Dog" Lycaon pictus
is equivalent to the search
"Lycaon pictus" OR "Q173651" OR "African Wild Dog"
-o <out directory>
will download all files to the given directory, creating
it if necessary.
The current directory is the default if -o
is not passed.
The mandatory argument is a category.
If only -s
is passed it can be an arbitrary search query, but if -c
is
passed then it must be an official Wikimedia Commons
category.
A category can be verified by visiting
https://commons.wikimedia.org/wiki/Category:<catergory_name>
.
You can often find a new category by going to the bottom of a Wikipedia page
and looking for a box that says:
Wikimedia Commons has media related to: <article name> (category)
You can then click the (category)
link to find the Wikimedia Commons
category.
Download all files in the
Panthera uncia
category and all results for in the search
"Panthera uncia" OR "Q30197" OR "snow leopard" OR "Uncia uncia"
to the snep/
subdirectory in the current folder:
commons-downloader -cs -o snep -q Q30197 -q "snow leopard" -q "Uncia uncia" Panthera uncia
If the download in the previous command was interrupted, it could be resumed with:
commons-downloader -o snep -r snep/_URLS.txt
The upstream URL of this project is https://sr.ht/~nytpu/commons-downloader. Send suggestions, bugs, patches, and other contributions to ~nytpu/public-inbox@lists.sr.ht or alex@nytpu.com. For help sending a patch through email, see https://git-send-email.io. You can browse the list archives at https://lists.sr.ht/~nytpu/public-inbox.
Written in 2021–2022 by nytpu <alex [at] nytpu.com>
To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.
You can view a copy of the CC0 Public Domain Dedication in COPYING or at http://creativecommons.org/publicdomain/zero/1.0/.