lib/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-28 00:01:53 +03:00

Author	SHA1	Message	Date
Yann Collet	e6f4b46493	playTests.sh does no longer needs grep -E it makes the test script more portable across posix systems because `grep -E` is not guaranteed while `grep` is fairly common.	2024-01-15 11:16:46 -08:00
Yann Collet	7f76d37044	Merge pull request #3850 from KapJI/better-errors cli: better errors on argument parsing	2024-01-13 11:37:25 -08:00
Elliot Gorokhovsky	c6cabf9441	Make offload API compatible with static CCtx (#3854 ) * Add ZSTD_CCtxParams_registerSequenceProducer() to public API * add unit test * add docs to zstd.h * nits * Add ZSTDLIB_STATIC_API prefix * Add asserts	2023-12-28 14:48:46 -05:00
Ruslan Sayfutdinov	8052cd0131	cli: better errors on arguent parsing	2023-12-18 13:59:33 +00:00
Yann Collet	e8ff7d18eb	removed FlexArray pattern from CCtxPool within ZSTDMT_. This pattern is flagged by less forgiving variants of ubsan notably used during compilation of the Linux Kernel. There are 2 other places in the code where this pattern is used. This fixes just one of them.	2023-10-07 21:30:08 -07:00
Yann Collet	c1e588fcb4	Merge pull request #3771 from DimitriPapadopoulos/codespell Fix new typos found by codespell	2023-10-07 19:29:41 -07:00
Nick Terrell	43118da8a7	Stop suppressing pointer-overflow UBSAN errors * Remove all pointer-overflow suppressions from our UBSAN builds/tests. * Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress pointer-overflow at a per-function level. This is a superior approach because it also applies to users who build zstd with UBSAN. * Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions. The end goal is to only tag these functions with `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions that rely on pointer overflow, and gradually transition to using these. * Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the pointer may be `NULL`. * Fix all the fuzzer issues that came up. I'm sure there will be a lot more, but these are the ones that came up within a few minutes of running the fuzzers, and while running GitHub CI.	2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos	fe34776c20	Fix new typos found by codespell	2023-09-23 18:56:01 +02:00
Yann Collet	f4dbfce79c	define LIB_SRCDIR and LIB_BINDIR	2023-09-12 13:46:03 -07:00
Yann Collet	0fcb28c5d2	Merge pull request #3720 from QBos07/cygwin-msys2-support Updated Makefiles for full MSYS2 and Cygwin installation and testing …	2023-08-22 16:29:34 -07:00
Yann Collet	a07d7c4e29	added ZSTD_decompressDCtx() benchmark option to fullbench useful to compare the difference between ZSTD_decompress and ZSTD_decompressDCtx().	2023-08-16 10:43:39 -07:00
Quentin Boswank	78dbba76b8	Updated Makefiles for full MSYS2 and Cygwin installation and testing support. They are Linux-like environments under Windows and have all the tools needed to support staged installation and testing. Beware: this only affects the make build system.	2023-08-13 19:44:15 +02:00
jysh1214	e99d554903	Fixed typo	2023-08-02 11:29:35 +08:00
Yann Collet	b46236278a	detect extraneous bytes in the Sequences section when nbSeq == 0. Reported by @ip7z	2023-06-13 11:43:45 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Yann Collet	1f83b7cfc4	fix a minor inefficiency in compress_superblock and in `decodecorpus`: the specific case `nbSeq=127` can be represented using the 1-byte format. Note that both the 1-byte and the 2-bytes formats are valid to represent this case, so there was no "error", produced data remains valid, it's just that the 1-byte format is more efficient. fix #3667 Credit to @ip7z for finding this issue.	2023-06-05 09:51:52 -07:00
W. Felix Handte	698af84fcf	Add CI Test for Excluding Matchfinders	2023-05-04 12:18:58 -04:00
W. Felix Handte	bae174960b	Add ZSTD_LIB_EXCLUDE_COMPRESSORS_DFAST_AND_UP Build Variable	2023-05-04 12:18:58 -04:00
W. Felix Handte	b12e8cb3e7	Merge Ultra and Ultra2 Exclusion Ultra2 does not exist for dict compression, and so uses ultra. So ultra must be present if ultra2 is.	2023-05-04 12:18:58 -04:00
W. Felix Handte	16bbd7437c	Avoid Ratio Regression Tests When Compressors are Excluded	2023-05-04 12:18:58 -04:00
Yann Collet	504c4a1f36	Merge pull request #3620 from facebook/errata_128k [doc] add decoder errata paragraph	2023-04-19 11:31:16 -07:00
Yann Collet	0d6954b4cc	added golden file for the new decompressor erratum	2023-04-19 00:24:35 -07:00
Nick Terrell	61efb2a047	Add ZSTD_d_maxBlockSize parameter Reduces memory when blocks are guaranteed to be smaller than allowed by the format. This is useful for streaming compression in conjunction with ZSTD_c_maxBlockSize. This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming. Once it is rebased on top of PR #3616 it will save 3 * (formatMaxBlockSize - paramMaxBlockSize).	2023-04-17 22:06:44 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
Nick Terrell	e72e13ac6c	[oss-fuzz] Fix simple_round_trip fuzzer with overlapping decompression When `ZSTD_c_maxBlockSize` is set, we weren't computing the decompression margin correctly, leading to `dstSize_tooSmall` errors. Fix that computation. This is just a bug in the fuzzer, not a bug in the library itself. Credit to OSS-Fuzz	2023-04-13 10:14:29 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00
Yann Collet	c1024af3e3	Merge pull request #3540 from dvoropaev/tests_timeout Increase tests timeout	2023-03-31 12:25:38 -07:00
Elliot Gorokhovsky	57e1b45920	Merge pull request #3551 from embg/seq_prod_fuzz Provide an interface for fuzzing sequence producer plugins	2023-03-28 14:20:54 -07:00
Elliot Gorokhovsky	a810e1eeb7	Provide an interface for fuzzing sequence producer plugins	2023-03-28 12:02:57 -07:00
Nick Terrell	a3c3a38b9b	[lazy] Skip over incompressible data Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. \| Dataset \| Level \| Compiler \| CSize ∆ \| Speed ∆ \| \|---------\|-------\|--------------\|---------\|---------\| \| Random \| 5 \| clang-14.0.6 \| 0.0% \| +704% \| \| Random \| 5 \| gcc-12.2.0 \| 0.0% \| +670% \| \| Random \| 7 \| clang-14.0.6 \| 0.0% \| +679% \| \| Random \| 7 \| gcc-12.2.0 \| 0.0% \| +657% \| \| Random \| 12 \| clang-14.0.6 \| 0.0% \| +1355% \| \| Random \| 12 \| gcc-12.2.0 \| 0.0% \| +1331% \| \| Silesia \| 5 \| clang-14.0.6 \| +0.002% \| +0.35% \| \| Silesia \| 5 \| gcc-12.2.0 \| +0.002% \| +2.45% \| \| Silesia \| 7 \| clang-14.0.6 \| +0.001% \| -1.40% \| \| Silesia \| 7 \| gcc-12.2.0 \| +0.007% \| +0.13% \| \| Silesia \| 12 \| clang-14.0.6 \| +0.011% \| +22.70% \| \| Silesia \| 12 \| gcc-12.2.0 \| +0.011% \| -6.68% \| \| Enwik8 \| 5 \| clang-14.0.6 \| 0.0% \| -1.02% \| \| Enwik8 \| 5 \| gcc-12.2.0 \| 0.0% \| +0.34% \| \| Enwik8 \| 7 \| clang-14.0.6 \| 0.0% \| -1.22% \| \| Enwik8 \| 7 \| gcc-12.2.0 \| 0.0% \| -0.72% \| \| Enwik8 \| 12 \| clang-14.0.6 \| 0.0% \| +26.19% \| \| Enwik8 \| 12 \| gcc-12.2.0 \| 0.0% \| -5.70% \| The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.	2023-03-20 11:18:29 -07:00
Peter Pentchev	3b001a38fe	Simplify line splitting in the CLI tests	2023-03-20 11:17:43 -07:00
Peter Pentchev	29b8a3d8f2	Fix a Python bytes/int mismatch in CLI tests In Python 3.x, a single element of a bytes array is returned as an integer number. Thus, NEWLINE is an int variable, and attempting to add it to the line array will fail with a type mismatch error that may be demonstrated as follows: [roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]' Traceback (most recent call last): File "<string>", line 1, in <module> TypeError: can't concat int to bytes [roam@straylight ~]$	2023-03-20 11:17:43 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
daniellerozenblit	53bad103ce	patch-from speed optimization (#3545 ) * patch-from speed optimization: only load portion of dictionary into normal matchfinders * test regression for x8 multiplier * fix off-by-one error for bit shift bound * restrict patchfrom speed optimization to strategy < ZSTD_btultra * update results.csv * update regression test	2023-03-14 20:36:56 -04:00
Yonatan Komornik	a91e91d614	[Bugfix] row hash tries to match position 0 (#3548 ) #3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.	2023-03-13 10:00:03 -07:00
Yonatan Komornik	33e39094e7	Reduce RowHash's tag space size by x2 (#3543 ) Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index). The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).	2023-03-10 14:15:04 -08:00
W. Felix Handte	957a0ae52d	Add CLI Test	2023-03-09 12:48:11 -05:00
W. Felix Handte	50e8f55e7d	Fix Python 3.6 Incompatibility in CLI Tests	2023-03-09 12:46:37 -05:00
Dmitriy Voropaev	b7080f4c67	Increase tests timeout Current timeout is too small for some slower machines, e.g. most modern riscv64 boards, where tests fail with the following diagnostics: Traceback (most recent call last): File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 734, in <module> success = run_tests(tests, opts) File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 601, in run_tests tests[test_case.name] = test_case.run() File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 285, in run return self.analyze() File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 275, in analyze self._join_test() File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 330, in _join_test (stdout, stderr) = self._test_process.communicate(timeout=self._opts.timeout) File "/usr/lib64/python3.10/subprocess.py", line 1154, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib64/python3.10/subprocess.py", line 2006, in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr) File "/usr/lib64/python3.10/subprocess.py", line 1198, in _check_timeout raise TimeoutExpired( subprocess.TimeoutExpired: Command '['/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/cli-tests/compression/window-resize.sh']' timed out after 60 seconds	2023-03-09 16:31:05 +04:00
Nick Terrell	07a2a33135	Add ZSTD_set{C,F,}Params() helper functions * Add ZSTD_setFParams() and ZSTD_setParams() * Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters * Add unit tests * Update documentation to suggest using them to replace deprecated functions Fixes #3396.	2023-03-08 09:57:35 -08:00
Yann Collet	db7d7b6974	Merge pull request #3516 from dloidolt/fullbench_2_files fullbench with two files	2023-03-06 11:56:30 -08:00
Yann Collet	bd86e24637	Merge pull request #3513 from DimitriPapadopoulos/codespell Fix typos found by codespell	2023-02-27 11:44:31 -08:00
Nick Terrell	395a2c5462	[bug-fix] Fix rare corruption bug affecting the block splitter The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block. All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out. The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue. Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!	2023-02-23 10:54:31 -08:00
Dominik Loidolt	4b9e3d11a6	When benchmarking two files with fullbench, the second file will not be benchmarked because the benchNb has not been reset to zero.	2023-02-20 16:36:26 +01:00
Dimitri Papadopoulos	547794ef40	Fix typos found by codespell	2023-02-18 10:31:48 +01:00
Felix Handte	1c42844668	Merge pull request #3479 from felixhandte/faster-file-ops Use `f`-variants of `chmod()` and `chown()`	2023-02-16 13:07:34 -05:00
Danielle Rozenblit	7da1c6ddbf	fix cli-tests issues	2023-02-14 11:33:26 -08:00
Yonatan Komornik	c78f434aa4	Fix zstd-dll build missing dependencies (#3496 ) * Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll	2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky	ff42ed1582	Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484 ) * change "external matchfinder" to "external sequence producer" * migrate contrib/ to new naming convention * fix contrib build * fix error message * update debug strings * fix def of invalid sequences in zstd.h * nit * update CHANGELOG * fix .gitignore	2023-02-09 17:01:17 -05:00
Nick Terrell	83f8a05f87	Fix empty-block.zst golden decompression file This frame is invalid because the `Window_Size = 0`, and the `Block_Maximum_Size = min(128 KB, Window_Size) = 0`. But the empty compressed block has a `Block_Content` size of 2, which is invalid. The fix is to switch to using a `Window_Descriptor` instead of the `Single_Segment_Flag`. This sets the `Window_Size = 1024`. Hexdump before this PR: `28b5 2ffd 2000 1500 0000 00` Hexdump after this PR: `28b5 2ffd 0000 1500 0000 00` For issue #3482.	2023-02-08 14:11:22 -08:00

... 2 3 4 5 6 ...

2058 Commits