1
0
mirror of https://github.com/facebook/zstd.git synced 2025-04-21 23:05:51 +03:00

394 Commits

Author SHA1 Message Date
Yann Collet
04a2a0219c update type names
naming convention: Type names should start with a Capital letter (after the prefix)
2024-12-29 14:25:33 -08:00
Yann Collet
a2ff6ea784 improve ZSTD_getFrameHeader on skippable frames
now reports:
- the header size
- the magic variant (within @dictID field)
2024-12-29 12:26:04 -08:00
Yann Collet
f8a2b352d6 clarify doc on Frame-level methods when invoked on a skippable frame
following discussion at #4226
2024-12-29 02:41:09 -08:00
Yann Collet
b7a9e69d8d added parameter litCapacity
to ZSTD_compressSequencesAndLiterals()
to enforce the litCapacity >= litSize+8 condition.
2024-12-20 10:37:01 -08:00
Yann Collet
ab0f1798e8 ensure that srcSize is controlled 2024-12-20 10:37:00 -08:00
Yann Collet
a80f55f47d added a test for ZSTD_compressSequencesAndLiterals
checks that srcSize is present in the frame header
and bounds the window size.
2024-12-20 10:37:00 -08:00
Yann Collet
0a54f6f288 ZSTD_compressSequencesAndLiterals requires srcSize as parameter
this makes it possible to adjust windowSize to its tightest.
2024-12-20 10:37:00 -08:00
Yann Collet
12c47d3262 improved speed of the Sequences converter 2024-12-20 10:37:00 -08:00
Yann Collet
f0d0d95234 added tests
check that ZSTD_compressAndLiterals() also controls that the `srcSize` field is exact.
2024-12-20 10:37:00 -08:00
Yann Collet
31b5ef2539 ZSTD_compressSequencesAndLiterals() now supports multi-blocks frames. 2024-12-20 10:36:59 -08:00
Yann Collet
0b013b2688 added unit tests to ZSTD_compressSequencesAndLiterals()
seems to work as expected,
correctly control that `litSize` and `srcSize` are exactly correct.
2024-12-20 10:36:58 -08:00
Yann Collet
c97522f7fb codemod: ZSTD_sequenceFormat_e -> ZSTD_SequenceFormat_e
since it's a type name.

Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
2024-12-20 10:36:56 -08:00
Yann Collet
bbaba45589 change experimental parameter name
from ZSTD_c_useBlockSplitter to ZSTD_c_splitAfterSequences.
2024-10-31 13:43:40 -07:00
Yann Collet
4f93206d62 changed variable name to ZSTD_c_blockSplitterLevel
suggested by @terrelln
2024-10-29 11:12:09 -07:00
Yann Collet
f593ccda04 removed trace left over 2024-10-28 16:57:01 -07:00
Yann Collet
37706a677c added a test
test both that the new parameter works as intended,
and that the over-split protection works as intended
2024-10-28 16:31:15 -07:00
Yann Collet
f83ed087f6 fixed RLE detection test 2024-10-23 11:50:56 -07:00
Yann Collet
730d2dce41 fix test 2024-10-15 18:44:40 -07:00
Nick Terrell
731f4b70fc Fix & fuzz ZSTD_generateSequences
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled

I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
2024-03-21 07:18:05 -07:00
Yann Collet
2abe8d63e0 fix LLU->ULL
LLU is a correct prefix according to C99 & C11 standards (but not C90).
However, older versions of Visual Studio do not work with it.
Replace by ULL, which doesn't have this issue.

Fixes https://github.com/facebook/zstd/issues/3647
2024-03-04 00:16:01 -08:00
Yann Collet
6b11fc436c fix issue with incompressible sections 2024-02-23 14:53:56 -08:00
Yann Collet
e8ff7d18eb removed FlexArray pattern from CCtxPool
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.

There are 2 other places in the code where this pattern is used.
This fixes just one of them.
2023-10-07 21:30:08 -07:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
W. Felix Handte
b12e8cb3e7 Merge Ultra and Ultra2 Exclusion
Ultra2 does not exist for dict compression, and so uses ultra. So ultra must
be present if ultra2 is.
2023-05-04 12:18:58 -04:00
W. Felix Handte
16bbd7437c Avoid Ratio Regression Tests When Compressors are Excluded 2023-05-04 12:18:58 -04:00
Nick Terrell
61efb2a047 Add ZSTD_d_maxBlockSize parameter
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.

This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
2023-04-17 22:06:44 -07:00
Yonatan Komornik
33e39094e7
Reduce RowHash's tag space size by x2 (#3543)
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
2023-03-10 14:15:04 -08:00
Nick Terrell
07a2a33135 Add ZSTD_set{C,F,}Params() helper functions
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions

Fixes #3396.
2023-03-08 09:57:35 -08:00
Nick Terrell
395a2c5462 [bug-fix] Fix rare corruption bug affecting the block splitter
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
2023-02-23 10:54:31 -08:00
Nick Terrell
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
Nick Terrell
b4467c1061 Fix bufferless API with attached dictionary
Fixes #3102.
2023-01-20 16:15:16 -08:00
Nick Terrell
666944fbe6 Cap hashLog & chainLog to ensure that we only use 32 bits of hash
* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
  are only run in 64-bit mode, because they allocate ~1GB.

Fixes #3336
2023-01-20 14:05:26 -08:00
daniellerozenblit
dc1c6cc5df
Merge pull request #3418 from daniellerozenblit/fuzz-max-block-size
Fuzz on maxBlockSize
2023-01-19 08:18:04 -05:00
Danielle Rozenblit
8353a4b095 fix maxBlockSize resolution + add test cases 2023-01-17 12:24:18 -08:00
Yann Collet
2086e7396e missing #include for Windows 2023-01-13 11:38:27 -08:00
Yann Collet
bcfb7ad03c refactor timefn
The timer storage type is no longer dependent on OS.
This will make it possible to re-enable posix precise timers
since the timer storage type will no longer be sensible to #include order.
See #3168 for details of pbs of previous interface.

Suggestion by @terrelln
2023-01-12 19:24:31 -08:00
Nick Terrell
5b266196a4 Add support for in-place decompression
* Add a function and macro ZSTD_decompressionMargin() that computes the
  decompression margin for in-place decompression. The function computes
  a tight margin that works in all cases, and the macro computes an upper
  bound that will only work if flush isn't used.
* When doing in-place decompression, make sure that our output buffer
  doesn't overlap with the input buffer. This ensures that we don't
  decide to use the portion of the output buffer that overlaps the input
  buffer for temporary memory, like for literals.
* Add a simple unit test.
* Add in-place decompression to the simple_round_trip and
  stream_round_trip fuzzers. This should help verify that our margin stays
  correct.
2023-01-12 16:28:08 -08:00
Danielle Rozenblit
908e812733 initial commit 2023-01-04 13:01:54 -08:00
Yann Collet
89342d1e07 New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
2022-12-27 23:49:22 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Yonatan Komornik
500f02eb66 Fixes two bugs in the Windows thread / pthread translation layer
1. If threads are resized the threads' `ZSTD_pthread_t` might move
while the worker still holds a pointer into it (see more details in #3120).
2. The join operation was waiting for a thread and then return its `thread.arg`
as a return value, but since the `ZSTD_pthread_t thread` was passed by value it
would have a stale `arg` that wouldn't match the thread's actual return value.

This fix changes the `ZSTD_pthread_join` API and removes support for returning
a value. This means that we are diverging from the `pthread_join` API and this
is no longer just an alias.
In the future, if needed, we could return a Windows thread's return value using
`GetExitCodeThread`, but as this path wouldn't be excised in any case, it's
preferable to not add it right now.
2022-12-17 13:38:02 -08:00
Yann Collet
2f4238e47a make ZSTD_DECOMPRESSBOUND() compatible with input size 0
for environments with stringent compilation warnings.
2022-12-16 16:05:39 -08:00
Yann Collet
ea24b88667 decompressBound() tests
fixed an overflow in an intermediate result on 32-bit platform.
Checked that the new test catch this bug in 32-bit mode.
2022-12-16 15:43:26 -08:00
Yann Collet
97f63ce2b5 added unit tests for compressBound()
and rephrased the code documentation, as suggested by @terrelln
2022-12-16 12:35:14 -08:00
Elliot Gorokhovsky
bb3c01c853 Migrate other test usages of boolean LDM flag to paramSwitch enum 2022-11-21 16:20:38 -05:00
Elliot Gorokhovsky
747e06f4f6 Add tests 2022-06-22 17:05:23 -04:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00