1
0
mirror of https://github.com/facebook/zstd.git synced 2025-08-07 06:23:00 +03:00
Commit Graph

28 Commits

Author SHA1 Message Date
Dave Vasilevsky
448a09ff78 seekable_format: Fix conversion warnings in parallel_compression 2025-05-07 22:01:49 -07:00
Dave Vasilevsky
01c973de8d seekable_format: Fix race in parallel_processing
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
2025-05-07 22:01:49 -07:00
Dave Vasilevsky
6fc8455a72 seekable_format: Cleanup POOL in parallel_compression 2025-05-07 22:01:49 -07:00
Dave Vasilevsky
2d4cff69c4 seekable_format: Make parallel_compression use memory properly
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!

It also polled to check whether a job completed, which is racy without
any memory barrier.

Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.

This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
2025-05-07 22:01:49 -07:00
Dave Vasilevsky
f5b6531902 seekable_format: Link against multi-threaded libzstd.a
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.

The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
2025-05-07 22:01:49 -07:00
Dave Vasilevsky
6b0039abcf seekable_format: Build with $(MAKE)
This passes make flags, such as `-jN` for building in parallel, to
the underlying make.
2025-05-07 22:01:49 -07:00
Dimitri Papadopoulos
585aaa0ed3 Do not test WIN32, instead test _WIN32
To the best of my knowledge:
* `_WIN32` and `_WIN64` are defined by the compiler,
* `WIN32` and `WIN64` are defined by the user, to indicate whatever
  the user chooses them to indicate. They mean 32-bit and 64-bit Windows
  compilation by convention only.

See:
https://accu.org/journals/overload/24/132/wilson_2223/

Windows compilers in general, and MSVC in particular, have been defining
`_WIN32` and `_WIN64` for a long time, provably at least since Visual Studio
2015, and in practice as early as in the days of 16-bit Windows.

See:
https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140
https://learn.microsoft.com/en-us/windows/win32/winprog64/the-tools

Tests used to be inconsistent, sometimes testing `_WIN32`, sometimes
`_WIN32` and `WIN32`. This brings consistency to Windows detection.
2023-09-23 19:03:18 +02:00
Yann Collet
1df9f36c6c Improved seekable format ingestion speed for small frame size
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
2023-03-09 18:00:30 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
7f12f24cf4 Rewrite Copyright Date Ranges from -present to -2022
Apparently it's better. Somehow.

```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done

g co HEAD -- build/meson/
```
2022-12-20 12:44:56 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
sen
d6be7659b0 Add seekable roundtrip fuzzer (#2617) 2021-05-06 10:08:21 -04:00
Azat Khuzhin
53a60e98de seekable decompression fixes (#2594)
* seekable_format: fix from-file reading (not in-memory)

It tries to check the buffer boundary, but there is no buffer for
from-file reading.

* seekable_decompression: break when ZSTD_seekable_decompress() returns zero

* seekable_decompression_mem: break when ZSTD_seekable_decompress() returns zero

* seekable_format: cap the offset+len up to the last dOffset

This will allow to read the whole file w/o gotting corruption error if
the offset is more then the data left in file, i.e.:

    $ ./seekable_compression seekable_compression.c 8192 | head
    $ zstd -cdq seekable_compression.c.zst | wc -c
    4737

Before this patch:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    ZSTD_seekable_decompress() error : Corrupted block detected
    0

After:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    4737
2021-05-05 10:05:41 -04:00
W. Felix Handte
15da57820d Add New Seekable Compression Example to .gitignore 2019-07-24 18:22:20 -04:00
Sean Purcell
671d533ea7 Fix seekable decompression in-memory api 2019-07-21 23:22:25 -04:00
Yann Collet
34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00
Azat Khuzhin
d707692e05 seekable_decompression: support offset greater then UNIT_MAX 2018-09-16 18:05:32 +03:00
Yann Collet
36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet
30ee23e905 ensure seekable_format/examples generated libzstd.a
when it's not already present in the expected directory
2018-06-06 12:09:58 -07:00
Yann Collet
e9dc204f42 fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
Yann Collet
394bdd7db9 changed license for examples
intentionnally this time
2017-08-29 09:24:11 -07:00
Sean Purcell
470993c9b1 Add raw seek table construction API and parallel compression example 2017-04-28 12:17:09 -07:00
Sean Purcell
11dc940e72 Add parallel processing example for seekable API 2017-04-21 12:23:06 -07:00
Sean Purcell
0f7bd772e6 Update seekable API to simplify IO 2017-04-18 16:48:30 -07:00
Sean Purcell
9626cf1ac6 Address @terrelln's comments 2017-04-13 17:48:35 -07:00
Sean Purcell
5ee1135f30 s/chunk/frame/ 2017-04-12 11:15:50 -07:00
Sean Purcell
e80f1d74b3 Address PR comments and minor fixes 2017-04-12 11:15:46 -07:00
Sean Purcell
d048fefef7 Move seekable format content to /contrib 2017-04-11 14:38:56 -07:00