1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-29 11:21:22 +03:00

77 Commits

Author SHA1 Message Date
de8d9e8914 Fix several locations with potential memory leak 2025-06-09 21:23:23 +08:00
448a09ff78 seekable_format: Fix conversion warnings in parallel_compression 2025-05-07 22:01:49 -07:00
13cb7a10ae seekable_format: Add test for parallel_compression memory usage
Use ulimit to fail the test if we use O(filesize) memory, rather than
O(threads).
2025-05-07 22:01:49 -07:00
01c973de8d seekable_format: Fix race in parallel_processing
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
2025-05-07 22:01:49 -07:00
6fc8455a72 seekable_format: Cleanup POOL in parallel_compression 2025-05-07 22:01:49 -07:00
2d4cff69c4 seekable_format: Make parallel_compression use memory properly
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!

It also polled to check whether a job completed, which is racy without
any memory barrier.

Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.

This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
2025-05-07 22:01:49 -07:00
f5b6531902 seekable_format: Link against multi-threaded libzstd.a
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.

The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
2025-05-07 22:01:49 -07:00
6b0039abcf seekable_format: Build with $(MAKE)
This passes make flags, such as `-jN` for building in parallel, to
the underlying make.
2025-05-07 22:01:49 -07:00
a610550e2c Merge pull request #4218 from facebook/externC
Move #includes out of `extern "C"` blocks
2025-01-07 10:06:08 -08:00
6b046f5841 PR feedback 2025-01-02 15:05:58 -08:00
54c3d998a0 Support for libc variants without fseeko/ftello
Some older Android libc implementations don't support `fseeko` or `ftello`.
This commit adds a new compile-time macro `LIBC_NO_FSEEKO` as well as a usage in CMake for old Android APIs.
2025-01-02 14:02:10 -08:00
fc726da774 Move #includes out of extern "C" blocks
Do some include shuffling for `**.h` files within lib, programs, tests, and zlibWrapper.
`lib/legacy` and `lib/deprecated` are untouched.
`#include`s within `extern "C"` blocks in .cpp files are untouched.

todo: shuffling for `xxhash.h`
2024-12-17 17:55:07 -08:00
b683c0dbe2 prevent possible segfault when creating seek table
Add a check whether the seek table of a `ZSTD_seekable` is initialized
before creating a new seek table from it. Return `NULL`, if the check
fails.
2024-11-25 08:57:25 +01:00
585aaa0ed3 Do not test WIN32, instead test _WIN32
To the best of my knowledge:
* `_WIN32` and `_WIN64` are defined by the compiler,
* `WIN32` and `WIN64` are defined by the user, to indicate whatever
  the user chooses them to indicate. They mean 32-bit and 64-bit Windows
  compilation by convention only.

See:
https://accu.org/journals/overload/24/132/wilson_2223/

Windows compilers in general, and MSVC in particular, have been defining
`_WIN32` and `_WIN64` for a long time, provably at least since Visual Studio
2015, and in practice as early as in the days of 16-bit Windows.

See:
https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140
https://learn.microsoft.com/en-us/windows/win32/winprog64/the-tools

Tests used to be inconsistent, sometimes testing `_WIN32`, sometimes
`_WIN32` and `WIN32`. This brings consistency to Windows detection.
2023-09-23 19:03:18 +02:00
649a9c85c3 seekable_format: Add unit test for multiple decompress calls
This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
   that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
2023-03-29 21:35:52 -07:00
618bf84e0d seekable_format: Prevent rereading frame when seeking forward
When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
2023-03-29 21:24:12 -07:00
dd8cb5a0f1 added documentation for the seekable format
and notably provide additional context for the
Maximum Frame Size parameter.

requested by @P-E-Meunier
at 1df9f36c6c (commitcomment-103856979).
2023-03-10 15:54:31 -08:00
1df9f36c6c Improved seekable format ingestion speed for small frame size
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
2023-03-09 18:00:30 -08:00
63042f1f11 fix 32bit build errors in zstd seekable 2023-01-24 15:53:59 -08:00
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
7f12f24cf4 Rewrite Copyright Date Ranges from -present to -2022
Apparently it's better. Somehow.

```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done

g co HEAD -- build/meson/
```
2022-12-20 12:44:56 -05:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
e2fc93340f Merge branch 'dev' into http-to-https 2022-12-15 10:46:13 -05:00
4dffc35f2e Convert references to https from http 2022-12-14 06:58:35 -08:00
aece0f258a free memory in test case 2022-12-13 08:15:16 -08:00
f17652931c seekable_format no header when compressing empty string to stream 2022-02-08 14:06:00 +01:00
03903f5701 fixed minor compression difference in btlazy2
subtle dependency on sumtype numeric representation
2021-12-29 18:51:03 -08:00
sen
d6be7659b0 Add seekable roundtrip fuzzer (#2617) 2021-05-06 10:08:21 -04:00
53a60e98de seekable decompression fixes (#2594)
* seekable_format: fix from-file reading (not in-memory)

It tries to check the buffer boundary, but there is no buffer for
from-file reading.

* seekable_decompression: break when ZSTD_seekable_decompress() returns zero

* seekable_decompression_mem: break when ZSTD_seekable_decompress() returns zero

* seekable_format: cap the offset+len up to the last dOffset

This will allow to read the whole file w/o gotting corruption error if
the offset is more then the data left in file, i.e.:

    $ ./seekable_compression seekable_compression.c 8192 | head
    $ zstd -cdq seekable_compression.c.zst | wc -c
    4737

Before this patch:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    ZSTD_seekable_decompress() error : Corrupted block detected
    0

After:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    4737
2021-05-05 10:05:41 -04:00
3c6f5d5eca Fix seekable test to provide valid descriptor 2021-03-13 00:00:08 +02:00
21697b9c9e Fix seek table descriptor check when loading 2021-03-12 23:07:15 +02:00
2fa4c8c405 added code comments for new API ZSTD_seekTable 2021-03-03 22:54:04 -08:00
6e390ced1f Merge branch 'seekTable' of github.com:facebook/zstd into seekTable 2021-03-03 22:44:38 -08:00
16ec1cf355 added test case for seekTable API
and simple roundtrip test
2021-03-03 18:56:23 -08:00
713d4953f7 fixed gcc-7 conversion warning 2021-03-03 18:00:41 -08:00
6c0bfc468c fixed wrong assert condition 2021-03-03 15:30:55 -08:00
a1d7b9d654 fixed gcc conversion warnings 2021-03-03 15:17:12 -08:00
24d59a655d Merge branch 'dev' into seekTable 2021-03-03 15:08:40 -08:00
ac95a30455 various minor style fixes 2021-03-02 16:03:18 -08:00
029f974ddc strengthen compilation flags 2021-03-02 15:43:52 -08:00
c7e42e147b fixed const guarantees
read-only objects are properly const-ified in parameters
2021-03-02 15:24:30 -08:00
a80b10f5e6 fix potential leak on exit 2021-03-02 15:03:37 -08:00
527a20c3cd Fix seekable decompress hanging 2021-03-02 14:30:03 -08:00
3cbdbb888b ZSTD_seekable_decompress() example that hangs. 2021-03-02 14:25:17 -08:00
ce6d1b9376 Merge pull request #2113 from mdittmer/expose-seek-table
[contrib] Support seek table-only API
2021-03-02 10:50:47 -08:00
adb54293ab Stop using deprecated reset?Stream functions
These are replaced by the corresponding context resets. When
converting resetCStream, CCtx_setPledgedSrcSize isn't called if the
source size is "unknown".

This helps reduce the reliance on "static only" symbols, as well as
reducing the use of deprecated functions.

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-02-23 21:56:01 +01:00
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00
26f89d47aa Clean up makefile for seekable tests 2020-12-03 09:25:04 -05:00
152b55879c Add unit tests to seekable 2020-12-02 15:33:12 -05:00
9db49a3989 Add a forward progress requirement bound to seekable streaming decompression 2020-12-02 12:24:16 -05:00