1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-30 22:23:13 +03:00

Man Page Tweaks, Edits, Formatting Fixes

This started as an application of the edits suggested in #3201 and expanded
from there.
This commit is contained in:
W. Felix Handte
2022-12-22 14:04:36 -05:00
parent 40a7188130
commit 382026f096
3 changed files with 88 additions and 70 deletions

View File

@ -4,7 +4,7 @@ zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files
SYNOPSIS SYNOPSIS
-------- --------
`zstd` [*OPTIONS*] [-|_INPUT-FILE_] [-o _OUTPUT-FILE_] `zstd` [<OPTIONS>] [-|<INPUT-FILE>] [-o <OUTPUT-FILE>]
`zstdmt` is equivalent to `zstd -T0` `zstdmt` is equivalent to `zstd -T0`
@ -16,7 +16,7 @@ SYNOPSIS
DESCRIPTION DESCRIPTION
----------- -----------
`zstd` is a fast lossless compression algorithm and data compression tool, `zstd` is a fast lossless compression algorithm and data compression tool,
with command line syntax similar to `gzip (1)` and `xz (1)`. with command line syntax similar to `gzip`(1) and `xz`(1).
It is based on the **LZ77** family, with further FSE & huff0 entropy stages. It is based on the **LZ77** family, with further FSE & huff0 entropy stages.
`zstd` offers highly configurable compression speed, `zstd` offers highly configurable compression speed,
from fast modes at > 200 MB/s per core, from fast modes at > 200 MB/s per core,
@ -35,12 +35,13 @@ but features the following differences :
Use `-q` to turn it off. Use `-q` to turn it off.
- `zstd` does not accept input from console, - `zstd` does not accept input from console,
though it does accept `stdin` when it's not the console. though it does accept `stdin` when it's not the console.
- `zstd` does not store the input's filename or attributes, only its contents.
`zstd` processes each _file_ according to the selected operation mode. `zstd` processes each _file_ according to the selected operation mode.
If no _files_ are given or _file_ is `-`, `zstd` reads from standard input If no _files_ are given or _file_ is `-`, `zstd` reads from standard input
and writes the processed data to standard output. and writes the processed data to standard output.
`zstd` will refuse to write compressed data to standard output `zstd` will refuse to write compressed data to standard output
if it is a terminal : it will display an error message and skip the _file_. if it is a terminal: it will display an error message and skip the file.
Similarly, `zstd` will refuse to read compressed data from standard input Similarly, `zstd` will refuse to read compressed data from standard input
if it is a terminal. if it is a terminal.
@ -52,14 +53,15 @@ whose name is derived from the source _file_ name:
* When decompressing, the `.zst` suffix is removed from the source filename to * When decompressing, the `.zst` suffix is removed from the source filename to
get the target filename get the target filename
### Concatenation with .zst files ### Concatenation with .zst Files
It is possible to concatenate multiple `.zst` files. `zstd` will decompress It is possible to concatenate multiple `.zst` files. `zstd` will decompress
such agglomerated file as if it was a single `.zst` file. such agglomerated file as if it was a single `.zst` file.
OPTIONS OPTIONS
------- -------
### Integer suffixes and special values ### Integer Suffixes and Special Values
In most places where an integer argument is expected, In most places where an integer argument is expected,
an optional suffix is supported to easily indicate large integers. an optional suffix is supported to easily indicate large integers.
There must be no space between the integer and the suffix. There must be no space between the integer and the suffix.
@ -71,7 +73,8 @@ There must be no space between the integer and the suffix.
Multiply the integer by 1,048,576 (2\^20). Multiply the integer by 1,048,576 (2\^20).
`Mi`, `M`, and `MB` are accepted as synonyms for `MiB`. `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`.
### Operation mode ### Operation Mode
If multiple operation mode options are given, If multiple operation mode options are given,
the last one takes effect. the last one takes effect.
@ -88,19 +91,21 @@ the last one takes effect.
decompressed data is discarded and checksummed for errors. decompressed data is discarded and checksummed for errors.
No files are created or removed. No files are created or removed.
* `-b#`: * `-b#`:
Benchmark file(s) using compression level # Benchmark file(s) using compression level _#_.
* `--train FILEs`: See _BENCHMARK_ below for a description of this operation.
Use FILEs as a training set to create a dictionary. * `--train FILES`:
Use _FILES_ as a training set to create a dictionary.
The training set should contain a lot of small files (> 100). The training set should contain a lot of small files (> 100).
See _DICTIONARY BUILDER_ below for a description of this operation.
* `-l`, `--list`: * `-l`, `--list`:
Display information related to a zstd compressed file, such as size, ratio, and checksum. Display information related to a zstd compressed file, such as size, ratio, and checksum.
Some of these fields may not be available. Some of these fields may not be available.
This command's output can be augmented with the `-v` modifier. This command's output can be augmented with the `-v` modifier.
### Operation modifiers ### Operation Modifiers
* `-#`: * `-#`:
`#` compression level \[1-19] (default: 3) selects `#` compression level \[1-19\] (default: 3)
* `--ultra`: * `--ultra`:
unlocks high compression levels 20+ (maximum 22), using a lot more memory. unlocks high compression levels 20+ (maximum 22), using a lot more memory.
Note that decompression will also require more memory when using these levels. Note that decompression will also require more memory when using these levels.
@ -122,7 +127,9 @@ the last one takes effect.
As compression is serialized with I/O, this can be slightly slower. As compression is serialized with I/O, this can be slightly slower.
Single-thread mode features significantly lower memory usage, Single-thread mode features significantly lower memory usage,
which can be useful for systems with limited amount of memory, such as 32-bit systems. which can be useful for systems with limited amount of memory, such as 32-bit systems.
Note 1: this mode is the only available one when multithread support is disabled. Note 1: this mode is the only available one when multithread support is disabled.
Note 2: this mode is different from `-T1`, which spawns 1 compression thread in parallel with I/O. Note 2: this mode is different from `-T1`, which spawns 1 compression thread in parallel with I/O.
Final compressed result is also slightly different from `-T1`. Final compressed result is also slightly different from `-T1`.
* `--auto-threads={physical,logical} (default: physical)`: * `--auto-threads={physical,logical} (default: physical)`:
@ -134,9 +141,10 @@ the last one takes effect.
Adaptation can be constrained between supplied `min` and `max` levels. Adaptation can be constrained between supplied `min` and `max` levels.
The feature works when combined with multi-threading and `--long` mode. The feature works when combined with multi-threading and `--long` mode.
It does not work with `--single-thread`. It does not work with `--single-thread`.
It sets window size to 8 MB by default (can be changed manually, see `wlog`). It sets window size to 8 MiB by default (can be changed manually, see `wlog`).
Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible.
_note_ : at the time of this writing, `--adapt` can remain stuck at low speed
_Note_: at the time of this writing, `--adapt` can remain stuck at low speed
when combined with multiple worker threads (>=2). when combined with multiple worker threads (>=2).
* `--long[=#]`: * `--long[=#]`:
enables long distance matching with `#` `windowLog`, if `#` is not enables long distance matching with `#` `windowLog`, if `#` is not
@ -153,17 +161,20 @@ the last one takes effect.
* `--patch-from FILE`: * `--patch-from FILE`:
Specify the file to be used as a reference point for zstd's diff engine. Specify the file to be used as a reference point for zstd's diff engine.
This is effectively dictionary compression with some convenient parameter This is effectively dictionary compression with some convenient parameter
selection, namely that windowSize > srcSize. selection, namely that _windowSize_ > _srcSize_.
Note: cannot use both this and -D together Note: cannot use both this and `-D` together.
Note: `--long` mode will be automatically activated if chainLog < fileLog
(fileLog being the windowLog required to cover the whole file). You Note: `--long` mode will be automatically activated if _chainLog_ < _fileLog_
(_fileLog_ being the _windowLog_ required to cover the whole file). You
can also manually force it. can also manually force it.
Note: for all levels, you can use --patch-from in --single-thread mode
to improve compression ratio at the cost of speed Note: for all levels, you can use `--patch-from` in `--single-thread` mode
to improve compression ratio at the cost of speed.
Note: for level 19, you can get increased compression ratio at the cost Note: for level 19, you can get increased compression ratio at the cost
of speed by specifying `--zstd=targetLength=` to be something large of speed by specifying `--zstd=targetLength=` to be something large
(i.e. 4096), and by setting a large `--zstd=chainLog=` (i.e. 4096), and by setting a large `--zstd=chainLog=`.
* `--rsyncable`: * `--rsyncable`:
`zstd` will periodically synchronize the compression state to make the `zstd` will periodically synchronize the compression state to make the
compressed file more rsync-friendly. There is a negligible impact to compressed file more rsync-friendly. There is a negligible impact to
@ -177,22 +188,22 @@ the last one takes effect.
* `--[no-]content-size`: * `--[no-]content-size`:
enable / disable whether or not the original size of the file is placed in enable / disable whether or not the original size of the file is placed in
the header of the compressed file. The default option is the header of the compressed file. The default option is
--content-size (meaning that the original size will be placed in the header). `--content-size` (meaning that the original size will be placed in the header).
* `--no-dictID`: * `--no-dictID`:
do not store dictionary ID within frame header (dictionary compression). do not store dictionary ID within frame header (dictionary compression).
The decoder will have to rely on implicit knowledge about which dictionary to use, The decoder will have to rely on implicit knowledge about which dictionary to use,
it won't be able to check if it's correct. it won't be able to check if it's correct.
* `-M#`, `--memory=#`: * `-M#`, `--memory=#`:
Set a memory usage limit. By default, `zstd` uses 128 MB for decompression Set a memory usage limit. By default, `zstd` uses 128 MiB for decompression
as the maximum amount of memory the decompressor is allowed to use, but you can as the maximum amount of memory the decompressor is allowed to use, but you can
override this manually if need be in either direction (i.e. you can increase or override this manually if need be in either direction (i.e. you can increase or
decrease it). decrease it).
This is also used during compression when using with --patch-from=. In this case, This is also used during compression when using with `--patch-from=`. In this case,
this parameter overrides that maximum size allowed for a dictionary. (128 MB). this parameter overrides that maximum size allowed for a dictionary. (128 MiB).
Additionally, this can be used to limit memory for dictionary training. This parameter Additionally, this can be used to limit memory for dictionary training. This parameter
overrides the default limit of 2 GB. zstd will load training samples up to the memory limit overrides the default limit of 2 GiB. zstd will load training samples up to the memory limit
and ignore the rest. and ignore the rest.
* `--stream-size=#`: * `--stream-size=#`:
Sets the pledged source size of input coming from a stream. This value must be exact, as it Sets the pledged source size of input coming from a stream. This value must be exact, as it
@ -207,7 +218,7 @@ the last one takes effect.
Exact guesses result in better compression ratios. Overestimates result in slightly Exact guesses result in better compression ratios. Overestimates result in slightly
degraded compression ratios, while underestimates may result in significant degradation. degraded compression ratios, while underestimates may result in significant degradation.
* `-o FILE`: * `-o FILE`:
save result into `FILE` save result into `FILE`.
* `-f`, `--force`: * `-f`, `--force`:
disable input and output checks. Allows overwriting existing files, input disable input and output checks. Allows overwriting existing files, input
from console, output to stdout, operating on links, block devices, etc. from console, output to stdout, operating on links, block devices, etc.
@ -227,11 +238,11 @@ the last one takes effect.
enable / disable passing through uncompressed files as-is. During enable / disable passing through uncompressed files as-is. During
decompression when pass-through is enabled, unrecognized formats will be decompression when pass-through is enabled, unrecognized formats will be
copied as-is from the input to the output. By default, pass-through will copied as-is from the input to the output. By default, pass-through will
occur when the output destination is stdout and the force (-f) option is occur when the output destination is stdout and the force (`-f`) option is
set. set.
* `--rm`: * `--rm`:
remove source file(s) after successful compression or decompression. If used in combination with remove source file(s) after successful compression or decompression. If used in combination with
-o, will trigger a confirmation prompt (which can be silenced with -f), as this is a destructive operation. `-o`, will trigger a confirmation prompt (which can be silenced with `-f`), as this is a destructive operation.
* `-k`, `--keep`: * `-k`, `--keep`:
keep source file(s) after successful compression or decompression. keep source file(s) after successful compression or decompression.
This is the default behavior. This is the default behavior.
@ -281,15 +292,13 @@ the last one takes effect.
* `--no-progress`: * `--no-progress`:
do not display the progress bar, but keep all other messages. do not display the progress bar, but keep all other messages.
* `--show-default-cparams`: * `--show-default-cparams`:
Shows the default compression parameters that will be used for a shows the default compression parameters that will be used for a particular input file, based on the provided compression level and the input size.
particular src file. If the provided src file is not a regular file If the provided file is not a regular file (e.g. a pipe), this flag will output the parameters used for inputs of unknown size.
(e.g. named pipe), the cli will just output the default parameters.
That is, the parameters that are used when the src size is unknown.
* `--`: * `--`:
All arguments after `--` are treated as files All arguments after `--` are treated as files
### gzip Operation modifiers ### gzip Operation Modifiers
When invoked via a `gzip` symlink, `zstd` will support further When invoked via a `gzip` symlink, `zstd` will support further
options that intend to mimic the `gzip` behavior: options that intend to mimic the `gzip` behavior:
@ -300,7 +309,7 @@ options that intend to mimic the `gzip` behavior:
alias to the option `-9`. alias to the option `-9`.
### Interactions with Environment Variables ### Environment Variables
Employing environment variables to set parameters has security implications. Employing environment variables to set parameters has security implications.
Therefore, this avenue is intentionally limited. Therefore, this avenue is intentionally limited.
@ -341,7 +350,7 @@ Compression of small files similar to the sample set will be greatly improved.
Since dictionary compression is mostly effective for small files, Since dictionary compression is mostly effective for small files,
the expectation is that the training set will only contain small files. the expectation is that the training set will only contain small files.
In the case where some samples happen to be large, In the case where some samples happen to be large,
only the first 128 KB of these samples will be used for training. only the first 128 KiB of these samples will be used for training.
`--train` supports multithreading if `zstd` is compiled with threading support (default). `--train` supports multithreading if `zstd` is compiled with threading support (default).
Additional advanced parameters can be specified with `--train-fastcover`. Additional advanced parameters can be specified with `--train-fastcover`.
@ -394,6 +403,8 @@ Compression of small files similar to the sample set will be greatly improved.
and an ID < 65536 will only need 2 bytes. and an ID < 65536 will only need 2 bytes.
This compares favorably to 4 bytes default. This compares favorably to 4 bytes default.
Note that RFC8878 reserves IDs less than 32768 and greater than or equal to 2\^31, so they should not be used in public.
* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: * `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`:
Select parameters for the default dictionary builder algorithm named cover. Select parameters for the default dictionary builder algorithm named cover.
If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
@ -499,9 +510,10 @@ This minimum is either 512 KB, or `overlapSize`, whichever is largest.
Different job sizes will lead to non-identical compressed frames. Different job sizes will lead to non-identical compressed frames.
### --zstd[=options]: ### --zstd[=options]:
`zstd` provides 22 predefined compression levels. `zstd` provides 22 predefined regular compression levels plus the fast levels.
The selected or default predefined compression level can be changed with This compression level is translated internally into a number of specific parameters that actually control the behavior of the compressor.
advanced compression options. (You can see the result of this translation with `--show-default-cparams`.)
These specific parameters can be overridden with advanced compression options.
The _options_ are provided as a comma-separated list. The _options_ are provided as a comma-separated list.
You may specify only the options you want to change and the rest will be You may specify only the options you want to change and the rest will be
taken from the selected or default compression level. taken from the selected or default compression level.
@ -510,10 +522,10 @@ The list of available _options_:
- `strategy`=_strat_, `strat`=_strat_: - `strategy`=_strat_, `strat`=_strat_:
Specify a strategy used by a match finder. Specify a strategy used by a match finder.
There are 9 strategies numbered from 1 to 9, from faster to stronger: There are 9 strategies numbered from 1 to 9, from fastest to strongest:
1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy, 1=`ZSTD_fast`, 2=`ZSTD_dfast`, 3=`ZSTD_greedy`,
4=ZSTD\_lazy, 5=ZSTD\_lazy2, 6=ZSTD\_btlazy2, 4=`ZSTD_lazy`, 5=`ZSTD_lazy2`, 6=`ZSTD_btlazy2`,
7=ZSTD\_btopt, 8=ZSTD\_btultra, 9=ZSTD\_btultra2. 7=`ZSTD_btopt`, 8=`ZSTD_btultra`, 9=`ZSTD_btultra2`.
- `windowLog`=_wlog_, `wlog`=_wlog_: - `windowLog`=_wlog_, `wlog`=_wlog_:
Specify the maximum number of bits for a match distance. Specify the maximum number of bits for a match distance.
@ -533,19 +545,20 @@ The list of available _options_:
Bigger hash tables cause fewer collisions which usually makes compression Bigger hash tables cause fewer collisions which usually makes compression
faster, but requires more memory during compression. faster, but requires more memory during compression.
The minimum _hlog_ is 6 (64 B) and the maximum is 30 (1 GiB). The minimum _hlog_ is 6 (64 entries / 256 B) and the maximum is 30 (1B entries / 4 GiB).
- `chainLog`=_clog_, `clog`=_clog_: - `chainLog`=_clog_, `clog`=_clog_:
Specify the maximum number of bits for a hash chain or a binary tree. Specify the maximum number of bits for the secondary search structure,
whose form depends on the selected `strategy`.
Higher numbers of bits increases the chance to find a match which usually Higher numbers of bits increases the chance to find a match which usually
improves compression ratio. improves compression ratio.
It also slows down compression speed and increases memory requirements for It also slows down compression speed and increases memory requirements for
compression. compression.
This option is ignored for the ZSTD_fast strategy. This option is ignored for the `ZSTD_fast` `strategy`, which only has the primary hash table.
The minimum _clog_ is 6 (64 B) and the maximum is 29 (524 Mib) on 32-bit platforms The minimum _clog_ is 6 (64 entries / 256 B) and the maximum is 29 (512M entries / 2 GiB) on 32-bit platforms
and 30 (1 Gib) on 64-bit platforms. and 30 (1B entries / 4 GiB) on 64-bit platforms.
- `searchLog`=_slog_, `slog`=_slog_: - `searchLog`=_slog_, `slog`=_slog_:
Specify the maximum number of searches in a hash chain or a binary tree Specify the maximum number of searches in a hash chain or a binary tree
@ -567,19 +580,19 @@ The list of available _options_:
- `targetLength`=_tlen_, `tlen`=_tlen_: - `targetLength`=_tlen_, `tlen`=_tlen_:
The impact of this field vary depending on selected strategy. The impact of this field vary depending on selected strategy.
For ZSTD\_btopt, ZSTD\_btultra and ZSTD\_btultra2, it specifies For `ZSTD_btopt`, `ZSTD_btultra` and `ZSTD_btultra2`, it specifies
the minimum match length that causes match finder to stop searching. the minimum match length that causes match finder to stop searching.
A larger `targetLength` usually improves compression ratio A larger `targetLength` usually improves compression ratio
but decreases compression speed. but decreases compression speed.
t
For ZSTD\_fast, it triggers ultra-fast mode when > 0. For `ZSTD_fast`, it triggers ultra-fast mode when > 0.
The value represents the amount of data skipped between match sampling. The value represents the amount of data skipped between match sampling.
Impact is reversed: a larger `targetLength` increases compression speed Impact is reversed: a larger `targetLength` increases compression speed
but decreases compression ratio. but decreases compression ratio.
For all other strategies, this field has no impact. For all other strategies, this field has no impact.
The minimum _tlen_ is 0 and the maximum is 128 Kib. The minimum _tlen_ is 0 and the maximum is 128 KiB.
- `overlapLog`=_ovlog_, `ovlog`=_ovlog_: - `overlapLog`=_ovlog_, `ovlog`=_ovlog_:
Determine `overlapSize`, amount of data reloaded from previous job. Determine `overlapSize`, amount of data reloaded from previous job.
@ -641,6 +654,11 @@ similar to predefined level 19 for files bigger than 256 KB:
`--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6 `--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6
SEE ALSO
--------
`zstdgrep`(1), `zstdless`(1), `gzip`(1), `xz`(1)
The <zstandard> format is specified in Y. Collet, "Zstandard Compression and the 'application/zstd' Media Type", https://www.ietf.org/rfc/rfc8878.txt, Internet RFC 8878 (February 2021).
BUGS BUGS
---- ----

View File

@ -4,16 +4,16 @@ zstdgrep(1) -- print lines matching a pattern in zstandard-compressed files
SYNOPSIS SYNOPSIS
-------- --------
`zstdgrep` [*grep-flags*] [--] _pattern_ [_files_ ...] `zstdgrep` [<grep-flags>] [--] <pattern> [<files> ...]
DESCRIPTION DESCRIPTION
----------- -----------
`zstdgrep` runs `grep (1)` on files, or `stdin` if no files argument is given, after decompressing them with `zstdcat (1)`. `zstdgrep` runs `grep`(1) on files, or `stdin` if no files argument is given, after decompressing them with `zstdcat`(1).
The grep-flags and pattern arguments are passed on to `grep (1)`. If an `-e` flag is found in the `grep-flags`, `zstdgrep` will not look for a pattern argument. The <grep-flags> and <pattern> arguments are passed on to `grep`(1). If an `-e` flag is found in the <grep-flags>, `zstdgrep` will not look for a <pattern> argument.
Note that modern `grep` alternatives such as `ripgrep` (`rg`) support `zstd`-compressed files out of the box, Note that modern `grep` alternatives such as `ripgrep` (`rg`(1)) support `zstd`-compressed files out of the box,
and can prove better alternatives than `zstdgrep` notably for unsupported complex pattern searches. and can prove better alternatives than `zstdgrep` notably for unsupported complex pattern searches.
Note though that such alternatives may also feature some minor command line differences. Note though that such alternatives may also feature some minor command line differences.
@ -23,7 +23,7 @@ In case of missing arguments or missing pattern, 1 will be returned, otherwise 0
SEE ALSO SEE ALSO
-------- --------
`zstd (1)` `zstd`(1)
AUTHORS AUTHORS
------- -------

View File

@ -4,13 +4,13 @@ zstdless(1) -- view zstandard-compressed files
SYNOPSIS SYNOPSIS
-------- --------
`zstdless` [*flags*] [_file_ ...] `zstdless` [<flags>] [<file> ...]
DESCRIPTION DESCRIPTION
----------- -----------
`zstdless` runs `less (1)` on files or stdin, if no files argument is given, after decompressing them with `zstdcat (1)`. `zstdless` runs `less`(1) on files or stdin, if no <file> argument is given, after decompressing them with `zstdcat`(1).
SEE ALSO SEE ALSO
-------- --------
`zstd (1)` `zstd`(1)