~nabijaczleweli/snappy-tools

snappy(1): Snappy compression and decompression with and without framing

445963b 1-1

10 days ago

445963b 1-1

10 days ago
snappy-tools
https://builds.sr.ht/~nabijaczleweli/snappy-tools
https://todo.sr.ht/~nabijaczleweli/snappy-tools  (report at <mailto:~nabijaczleweli/snappy-tools@todo.sr.ht>)
https://lists.sr.ht/~nabijaczleweli/snappy-tools <mailto:~nabijaczleweli/snappy-tools@lists.sr.ht>
https://man.sr.ht/~nabijaczleweli/snappy-tools

I saw some "snappy framed data"-compressed files in my Firefox profile,
and there didn't seem to be a decompressor for them in Debian.
Upstream homepage at https://google.github.io/snappy/; you need libsnappy-dev to build.

snappy(1): Snappy compression and decompression with and without framing
usage: snappy    [-f]   data > snappy.sn|.sz
       snappy    [-f] < data > snappy.sn|.sz
       snappy -d [-i]          snappy.sn|.sz
       snappy -d [-i] <        snappy.sn|.sz
usage: unsnappy [-i]   snappy.sn|.sz
       unsnappy [-i] < snappy.sn|.sz

Manual available online at https://srhtcdn.githack.com/~nabijaczleweli/snappy-tools/blob/man/snappy-tools.pdf


Naturally, the snappy framing format requires CRC32Cing the data.
There is a hardware-accelerated implementation for arm64
SSE4.2-capable i386/x32/amd64, and CRCC-capable loong64 hosts:
          soft      crc32q      CRC32C
  E5645   357 MiB/s  6598 MiB/s
  5800X3D 527 MiB/s 11085 MiB/s
  MT8173  246 MiB/s             5016 MiB/s (Cortex-A72)


Note that upstream's spiel is 250M/s compression and 500M/s decompression (on "an i7"),
and ~.55 of the ratio of gzip -1 for HTML:
  $ curl -SL https://github.com/google/snappy/ > data
  $ snappy data > /dev/null
  data: 286555 -> 81458 (28.43%)
  $ snappy -f data > /dev/null
  data: 286555 -> 81520 (28.45%)
  $ gzip -c1 data | wc -c
  58254                 (20%)
  $ lz4 -vc1 data > /dev/null
  *** LZ4 command line interface 64-bits v1.9.4, by Yann Collet ***
  Compressed 286555 bytes into 73853 bytes ==> 25.77%
  $ zstd -vc1 data > /dev/null
  *** Zstandard CLI (64-bit) v1.5.4, by Yann Collet ***
  data                 : 17.47%   (   280 KiB =>   48.9 KiB, /*stdout*\)
compare with the Silesia compression corpus' HTML-formatted "webster" file
(https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia):
  $ curl -SL https://sun.aei.polsl.pl//~sdeor/corpus/webster.bz2 | bzcat > webster
  $ snappy webster > /dev/null
  webster: 41458703 -> 20211213 (48.75%)
  $ snappy -f webster > /dev/null
  webster: 41458703 -> 20218182 (48.77%)
  $ gzip -c1 webster | wc -c
  14977104                      (36.13%)
  $ lz4 -vc1 data > /dev/null
  *** LZ4 command line interface 64-bits v1.9.4, by Yann Collet ***
  Compressed 41458703 bytes into 20149249 bytes ==> 48.60%
  $ zstd -vc1 data > /dev/null
  *** Zstandard CLI (64-bit) v1.5.4, by Yann Collet ***
  webster              : 32.97%   (  39.5 MiB =>   13.0 MiB, /*stdout*\)


Speed testing on tmpfs on bookworm/E5645/DDR3-1600, which an i7 is very much not:
  $ while :; do cat data || break; done | head -c 100M > 100M
  $ hyperfine -N 'snappy 100M' 'snappy -f 100M' 'gzip -c1 100M' 'lz4 -c1 100M' 'zstd -c1 100M'
  snappy 100M          (mean ± σ):  272.9 ms ±  2.5 ms  [User: 248.2 ms, System: 24.4 ms]
  snappy -f 100M       (mean ± σ):  298.8 ms ±  5.6 ms  [User: 275.7 ms, System: 22.8 ms]
  gzip -c1 100M        (mean ± σ): 1733   ms ± 10   ms  [User:  1.709 s, System: 0.024 s]
  lz4 -c1 100M         (mean ± σ):  329.9 ms ±  8.1 ms  [User: 298.1 ms, System: 31.6 ms]
  zstd -c1 100M        (mean ± σ):  128.1 ms ± 10.0 ms  [User: 140.2 ms, System: 58.0 ms]
  Summary
    'zstd -c1 100M' ran                                     780.640 MiB/s
      2.13 ± 0.17 times faster than 'snappy 100M'           366.434 MiB/s
      2.33 ± 0.19 times faster than 'snappy -f 100M'        334.672 MiB/s
      2.58 ± 0.21 times faster than 'lz4 -c1 100M'          303.122 MiB/s
     13.53 ± 1.06 times faster than 'gzip -c1 100M'          57.703 MiB/s

  snappy -d 100M.sn    (mean ± σ):  206.8 ms ±  3.4 ms  [User: 132.9 ms, System: 73.6 ms]
  snappy -d 100M.sz    (mean ± σ):  153.5 ms ±  4.9 ms  [User: 141.6 ms, System: 11.6 ms]
  gzip -dc 100M.gz     (mean ± σ):  709.0 ms ±  5.6 ms  [User: 698.3 ms, System: 10.4 ms]
  lz4 -dc 100M.lz4     (mean ± σ):  133.8 ms ±  6.1 ms  [User: 118.5 ms, System: 15.1 ms]
  zstd -dc 100M.zst    (mean ± σ):   61.3 ms ±  5.5 ms  [User:  59.8 ms, System: 10.2 ms]
  Summary
    'zstd -dc 100M.zst' ran                                1631.321 MiB/s
      2.18 ± 0.22 times faster than 'lz4 -dc 100M.lz4'      747.384 MiB/s
      2.50 ± 0.24 times faster than 'snappy -d 100M.sz'     651.465 MiB/s
      3.37 ± 0.31 times faster than 'snappy -d 100M.sn'     483.558 MiB/s
     11.57 ± 1.05 times faster than 'gzip -dc 100M.gz'      141.043 MiB/s


  snappy webster       (mean ± σ):  199.5 ms ±  5.3 ms  [User: 189.7 ms, System: 9.6 ms]
  snappy -f webster    (mean ± σ):  205.8 ms ±  4.7 ms  [User: 191.6 ms, System: 13.9 ms]
  gzip -c1 webster     (mean ± σ): 1032   ms ±  7   ms  [User:  1.019 s, System: 0.012 s]
  lz4 -c1 webster      (mean ± σ):  216.9 ms ±  5.7 ms  [User: 202.0 ms, System: 14.6 ms]
  zstd -c1 webster     (mean ± σ):  367.4 ms ±  5.2 ms  [User: 368.9 ms, System: 35.3 ms]
  Summary
    'snappy webster' ran                                    198.185 MiB/s
      1.03 ± 0.04 times faster than 'snappy -f webster'     192.119 MiB/s
      1.09 ± 0.04 times faster than 'lz4 -c1 webster'       182.287 MiB/s
      1.84 ± 0.05 times faster than 'zstd -c1 webster'      107.615 MiB/s
      5.17 ± 0.14 times faster than 'gzip -c1 webster'       38.312 MiB/s

  snappy -d webster.sn (mean ± σ):  134.0 ms ±  3.9 ms  [User: 100.9 ms, System: 32.9 ms]
  snappy -d webster.sz (mean ± σ):  109.2 ms ±  2.3 ms  [User: 100.1 ms, System: 8.9 ms]
  gzip -dc webster.gz  (mean ± σ):  394.8 ms ±  5.0 ms  [User: 389.3 ms, System: 5.2 ms]
  lz4 -dc webster.lz4  (mean ± σ):   67.8 ms ±  2.1 ms  [User:  57.7 ms, System: 9.9 ms]
  zstd -dc webster.zst (mean ± σ):  110.1 ms ±  3.1 ms  [User: 109.2 ms, System: 11.3 ms]
  Summary
    'lz4 -dc webster.lz4' ran                               583.157 MiB/s
      1.61 ± 0.06 times faster than 'snappy -d webster.sz'  362.070 MiB/s
      1.62 ± 0.07 times faster than 'zstd -dc webster.zst'  359.110 MiB/s
      1.98 ± 0.08 times faster than 'snappy -d webster.sn'  295.060 MiB/s
      5.82 ± 0.19 times faster than 'gzip -dc webster.gz'   100.147 MiB/s


Speed testing on tmpfs on sid@2024-02-14/5800X3D/DDR4-3600, which is several i7s:
  snappy 100M          (mean ± σ):  100.1 ms ±  1.6 ms  [User:  88.7 ms, System: 11.3 ms]
  snappy -f 100M       (mean ± σ):  108.8 ms ±  0.6 ms  [User:  98.7 ms, System: 10.1 ms]
  gzip -c1 100M        (mean ± σ):  710.5 ms ±  8.4 ms  [User: 696.3 ms, System: 14.0 ms]
  lz4 -c1 100M         (mean ± σ):  148.2 ms ±  1.8 ms  [User: 111.8 ms, System: 36.3 ms]
  zstd -c1 100M        (mean ± σ):   40.2 ms ±  1.2 ms  [User:  32.8 ms, System: 40.6 ms]
  Summary
    zstd -c1 100M ran                                      2487.562 MiB/s
      2.49 ± 0.08 times faster than snappy 100M             999.000 MiB/s
      2.71 ± 0.08 times faster than snappy -f 100M          919.117 MiB/s
      3.69 ± 0.12 times faster than lz4 -c1 100M            674.763 MiB/s
     17.69 ± 0.57 times faster than gzip -c1 100M           140.745 MiB/s

  snappy -d 100M.sn    (mean ± σ):  111.1 ms ±  1.2 ms  [User:  67.3 ms, System: 43.6 ms]
  snappy -d 100M.sz    (mean ± σ):   81.9 ms ±  1.3 ms  [User:  76.1 ms, System: 5.7 ms]
  gzip -dc 100M.gz     (mean ± σ):  351.8 ms ±  2.6 ms  [User: 348.1 ms, System: 3.6 ms]
  lz4 -dc 100M.lz4     (mean ± σ):   45.7 ms ±  0.6 ms  [User:  41.3 ms, System: 4.2 ms]
  zstd -dc 100M.zst    (mean ± σ):   16.7 ms ±  0.2 ms  [User:  16.2 ms, System: 3.5 ms]
  Summary
    zstd -dc 100M.zst ran                                  5988.023 MiB/s
      2.73 ± 0.05 times faster than lz4 -dc 100M.lz4       2188.183 MiB/s
      4.89 ± 0.10 times faster than snappy -d 100M.sz      1221.001 MiB/s
      6.63 ± 0.11 times faster than snappy -d 100M.sn       900.090 MiB/s
     21.00 ± 0.30 times faster than gzip -dc 100M.gz        284.252 MiB/s


  snappy webster       (mean ± σ):   75.7 ms ±  1.1 ms  [User:  72.2 ms, System: 3.4 ms]
  snappy -f webster    (mean ± σ):   79.4 ms ±  1.1 ms  [User:  74.2 ms, System: 5.1 ms]
  gzip -c1 webster     (mean ± σ):  408.4 ms ±  4.7 ms  [User: 405.9 ms, System: 2.4 ms]
  lz4 -c1 webster      (mean ± σ):   93.8 ms ±  2.1 ms  [User:  78.3 ms, System: 15.4 ms]
  zstd -c1 webster     (mean ± σ):  118.7 ms ±  2.3 ms  [User: 119.7 ms, System: 19.7 ms]
  Summary
    snappy webster ran                                      522.299 MiB/s
      1.05 ± 0.02 times faster than snappy -f webster       497.960 MiB/s
      1.24 ± 0.03 times faster than lz4 -c1 webster         421.514 MiB/s
      1.57 ± 0.04 times faster than zstd -c1 webster        333.092 MiB/s
      5.39 ± 0.10 times faster than gzip -c1 webster         96.812 MiB/s

  snappy -d webster.sn (mean ± σ):   67.8 ms ±  1.1 ms  [User:  50.1 ms, System: 17.7 ms]
  snappy -d webster.sz (mean ± σ):   57.2 ms ±  1.0 ms  [User:  54.1 ms, System: 3.0 ms]
  gzip -dc webster.gz  (mean ± σ):  180.9 ms ±  1.7 ms  [User: 178.1 ms, System: 2.7 ms]
  lz4 -dc webster.lz4  (mean ± σ):   25.3 ms ±  0.5 ms  [User:  21.7 ms, System: 3.5 ms]
  zstd -dc webster.zst (mean ± σ):   34.2 ms ±  0.3 ms  [User:  32.9 ms, System: 4.1 ms]
  Summary
    lz4 -dc webster.lz4 ran                                1562.770 MiB/s
      1.35 ± 0.03 times faster than zstd -dc webster.zst   1156.084 MiB/s
      2.26 ± 0.06 times faster than snappy -d webster.sz    691.225 MiB/s
      2.68 ± 0.07 times faster than snappy -d webster.sn    583.157 MiB/s
      7.16 ± 0.16 times faster than gzip -dc webster.gz     218.563 MiB/s


Speed testing on tmpfs on the Cortex-A72 half of sid@2024-01-07/MT8173/LPDDR3-1866:
  zstd -c1 100M ran                                       513.083 MiB/s   194.9 ms
    1.92 ± 0.11 times faster than snappy 100M             267.809 MiB/s   373.4 ms
    1.96 ± 0.14 times faster than snappy -f 100M          261.985 MiB/s   381.7 ms
    2.50 ± 0.13 times faster than lz4 -c1 100M            205.212 MiB/s   487.3 ms
   11.51 ± 0.25 times faster than gzip -c1 100M            44.563 MiB/s  2244   ms

  zstd -dc 100M.zst ran                                   938.967 MiB/s   106.5 ms
    1.88 ± 0.25 times faster than lz4 -dc 100M.lz4        499.001 MiB/s   200.4 ms
    2.92 ± 0.41 times faster than snappy -d 100M.sz       322.061 MiB/s   310.5 ms
    3.08 ± 0.34 times faster than snappy -d 100M.sn       304.692 MiB/s   328.2 ms
    8.84 ± 0.54 times faster than gzip -dc 100M.gz        106.157 MiB/s   942.0 ms

  snappy webster ran                                      146.545 MiB/s   269.8 ms
    1.03 ± 0.13 times faster than snappy -f webster       142.891 MiB/s   276.7 ms
    1.25 ± 0.14 times faster than lz4 -c1 webster         117.672 MiB/s   336.0 ms
    1.77 ± 0.17 times faster than zstd -c1 webster         82.750 MiB/s   477.8 ms
    5.06 ± 0.49 times faster than gzip -c1 webster         28.965 MiB/s  1365   ms

  lz4 -dc webster.lz4 ran                                 325.416 MiB/s   121.5 ms
    1.44 ± 0.14 times faster than zstd -dc webster.zst    226.579 MiB/s   174.5 ms
    1.90 ± 0.24 times faster than snappy -d webster.sz    171.383 MiB/s   230.7 ms
    2.01 ± 0.26 times faster than snappy -d webster.sn    161.842 MiB/s   244.3 ms
    4.59 ± 0.45 times faster than gzip -dc webster.gz      70.945 MiB/s   557.3 ms

Unabridged measurements at https://man.sr.ht/~nabijaczleweli/snappy-tools/measurements/2024-02-13-1.md


Release tarballs are signed with nabijaczleweli@nabijaczleweli.xyz
  (pull with WKD, but 7D69 474E 8402 8C5C C0C4  4163 BCFD 0B01 8D26 58F1).
аnd stored in git notes, as-if via the example program provided at
  https://man.sr.ht/git.sr.ht/#signing-tagsx27-tarballs
and are thus available on the refs listing/tag page as .tar.gz.asc.