```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
Summary:
Freeing an uninitialized pointer is undefined behavior. This caused a segfault
when compiling the benchmark with Clang -O3 and benching decompression.
V2: always create compressInstructions but check if cctxParams is NULL before
setting CCtx params to avoid segfault.
Test Plan:
make and run
Summary:
Added an option -p# where -p0 (default) sets the aggregation method to fastest
speed while -p1 sets the aggregation method to median. Also added a new column
in the csv file to report this option's value.
Test Plan:
``
$ ./largeNbDicts -1 --nbDicts=1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03 (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28 (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.0 MB/s
Fastest Speed : 310.6 MB/s
$ ./largeNbDicts -1 --nbDicts=1 -p1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03 (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28 (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.9 MB/s
Median Speed : 298.4 MB/s
```
Summary:
Add column headers and data for whether it's a compression or a decompression
run, compression level, nbDicts and dictAttachPref in additional to
compr/decompr speed.
Test Plan:
Example output:
```
./largeNbDicts
Compression/Decompression,Level,nbDicts,dictAttachPref,Speed
Compression,1,1,0,300.9
Compression,1,1,1,296.4
Compression,1,1,2,307.8
Compression,1,10,0,292.3
Compression,1,100,0,293.3
Compression,3,110,0,106.0
Decompression,-1,110,-1,155.6
Decompression,-1,110,-1,709.4
Decompression,-1,120,-1,709.1
Decompression,-1,120,-1,734.6
```
Benchmarking decompression results in a segfault in `createCompressInstructions`
because `cctxParams` is NULL. Skip running that function if we are not benching
compression.
Compiling with clang-8 fails with the following errors:
largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
U64 const dTime_ns = result.nanoSecPerRun;
~~~~~~~~ ~~~~~~~^~~~~~~~~~~~~
zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 1 means that cover parameters were correct
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 0 in case of malformed parameters
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.
Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.
Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
by triggering the prefetching decoder path
(which used to be dedicated to long-range offsets only).
Figures on my laptop :
no content prefetch : ~300 MB/s (for reference)
full content prefetch : ~325 MB/s (before this patch)
new prefetch path : ~375 MB/s (after this patch)
The benchmark speed is already significant,
but another side-effect is that this version
prefetch less data into memory,
since it only prefetches what's needed, instead of the full dictionary.
This is supposed to help highly active environments
such as active databases,
that can't be properly measured in benchmark environment (too clean).
Also :
fixed the largeNbDict test program
which was working improperly when setting nbBlocks > nbFiles.
will automatically truncate or repeat input as needed,
to create the requested nb of blocks.
default: nb of files, eventually increased appropriately if blockSize is set
measures :
- compression ratio with / without dictionary
- create one dictionary per block
- memory budget for dictionaries
- decompression speed, using one different dictionary per block
current limitations :
- only one file
- 4K blocks only
- automatic dictionary built with 4K size
dictionary can be selected on command line, with -D