~sotirisp/tsvutils

Utilities for working with TSV files

d6d2264 tsvplot: change date separator

5 months ago

faaa9b6 Add tsvstats

7 months ago

#tsvutils

A collection of scripts for working with tab separated value (TSV) files.

#Installation

The dependencies are:

  • A POSIX system and standard utilities: sh, awk etc.
  • Python 3 for csv2tsv and tsv2csv.
  • gnuplot for tsvplot.
  • scdoc for building the manpages.

To build the manpages run:

make

To install the scripts and their manpages in /usr/local/ run the following as root:

make install

To install them to some other directory, e.g. ~/.local/ run the following:

env PREFIX=~/.local/ make install

The tools are designed to work together using pipelines and input/output redirections.

# Plot the data in file.tsv using the first column as the x-axis data and all
# other columns as y-axis data and save the result to plot.png.
tsvplot file.tsv > plot.png

# Convert the data in file.csv to TSV, keep the columns whose names match one
# of the extended regular expressions Time and Distance and then plot the data.
csv2tsv file.csv | tsvcut Time Distance | tsvplot

# Keep the columns of file.tsv whose names contain (m), format them as an HTML
# table and display it in the lynx browser. The parentheses have to be
# backslash escaped because they are extended regular expression special
# characters.
tsvcut '\(m\)' < file.tsv | tsv2html | lynx -stdin

# Sort the rows of file.tsv in descending order based on the values of the
# column whose name is Temperature, keep the first 5 and save them as a
# Markdown table in top_5.md.
tsvsort -r '^Temperature$' < file.tsv | tsvtail -n 5 | tsv2md > top_5.md

#Other useful tools

Many of the standard or non-standard Unix tools are great for working with TSV files.

  • Use cut to get certain columns of a TSV file by index instead of by header name. The default field delimiter is the TAB character.
  • Use paste to merge multiple TSV files line-wise, merging their corresponding lines into one. The default field delimiter is the TAB character.
  • Use awk for more complicated processing by setting the input and output field separators to the TAB character: FS = "\t" and OFS = "\t".

#License

Copyright 2022 Sotiris Papatheodorou

This program is Free Software: You can use, study share and improve it at your will. Specifically you can redistribute and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.