~jamesponddotco/wikiextract

A word extractor for Wikipedia articles.

618c9d5 Update manpage to reflect latest changes

3 months ago

618c9d5 Update manpage to reflect latest changes

3 months ago

#wikiextract

builds.sr.ht status

wikiextract is a word extractor for Wikipedia articles. It can extract words bigger than 4 characters from a given Wikipedia page or list of pages and save them to a file you can later use as the source for generating diceware passwords.

#Installation

#From source

First install the dependencies:

  • Go 1.22 or above.
  • make.
  • scdoc.

Switch to the latest stable tag, v1.0.0, then compile and install:

git checkout v1.0.0
make
sudo make install

#Usage

$ wikiextract --help
NAME:
   wikiextract - a simple word extractor for Wikipedia articles

USAGE:
   wikiextract [global options] 

VERSION:
   1.0.0

GLOBAL OPTIONS:
   --input-url value, -u value [ --input-url value, -u value ]  the URL of the Wikipedia page
   --input-file value, -f value                                 a file containing a list of URLs
   --output value, -o value                                     the path to the output file
   --help, -h                                                   show help
   --version, -v                                                print the version

$ wikiextract -u 'https://en.wikipedia.org/wiki/Wikipedia' -o 'output.txt'

See wikiextract(1) after installing for more information.

#Contributing

Anyone can help make wikiextract better. Send patches on the mailing list and report bugs on the issue tracker.

You must sign-off your work using git commit --signoff. Follow the Linux kernel developer's certificate of origin for more details.

All contributions are made under the GPL-2.0 license.

#Resources

The following resources are available:


Released under the GPL-2.0 license.