lib/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-09-11 11:51:02 +03:00

Author	SHA1	Message	Date
W. Felix Handte	81b86a2024	NULL Out Block Compressor Table Entries When Excluded Don't check about excluding `ZSTD_fast`. It's always included so that we know we can resolve downwards and hit a strategy that's present.	2023-05-04 12:18:58 -04:00
W. Felix Handte	cbf3e26316	Allow `ZSTD_selectBlockCompressor()` to Return NULL Return an error rather than segfaulting.	2023-05-04 12:18:58 -04:00
Daniel Kutenin	4c25ea329b	Disable unused variable warning in msan configurations	2023-04-20 11:14:08 +01:00
Yann Collet	2e29728797	fix #3583 As reported by @georgmu, the previous fix is undone by the later initialization. Switch order, so that initialization is adjusted by special case.	2023-04-03 09:45:11 -07:00
daniellerozenblit	3e0550ee52	fix window update (#3556 )	2023-03-21 13:28:26 -04:00
Nick Terrell	a3c3a38b9b	[lazy] Skip over incompressible data Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. \| Dataset \| Level \| Compiler \| CSize ∆ \| Speed ∆ \| \|---------\|-------\|--------------\|---------\|---------\| \| Random \| 5 \| clang-14.0.6 \| 0.0% \| +704% \| \| Random \| 5 \| gcc-12.2.0 \| 0.0% \| +670% \| \| Random \| 7 \| clang-14.0.6 \| 0.0% \| +679% \| \| Random \| 7 \| gcc-12.2.0 \| 0.0% \| +657% \| \| Random \| 12 \| clang-14.0.6 \| 0.0% \| +1355% \| \| Random \| 12 \| gcc-12.2.0 \| 0.0% \| +1331% \| \| Silesia \| 5 \| clang-14.0.6 \| +0.002% \| +0.35% \| \| Silesia \| 5 \| gcc-12.2.0 \| +0.002% \| +2.45% \| \| Silesia \| 7 \| clang-14.0.6 \| +0.001% \| -1.40% \| \| Silesia \| 7 \| gcc-12.2.0 \| +0.007% \| +0.13% \| \| Silesia \| 12 \| clang-14.0.6 \| +0.011% \| +22.70% \| \| Silesia \| 12 \| gcc-12.2.0 \| +0.011% \| -6.68% \| \| Enwik8 \| 5 \| clang-14.0.6 \| 0.0% \| -1.02% \| \| Enwik8 \| 5 \| gcc-12.2.0 \| 0.0% \| +0.34% \| \| Enwik8 \| 7 \| clang-14.0.6 \| 0.0% \| -1.22% \| \| Enwik8 \| 7 \| gcc-12.2.0 \| 0.0% \| -0.72% \| \| Enwik8 \| 12 \| clang-14.0.6 \| 0.0% \| +26.19% \| \| Enwik8 \| 12 \| gcc-12.2.0 \| 0.0% \| -5.70% \| The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.	2023-03-20 11:18:29 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
daniellerozenblit	53bad103ce	patch-from speed optimization (#3545 ) * patch-from speed optimization: only load portion of dictionary into normal matchfinders * test regression for x8 multiplier * fix off-by-one error for bit shift bound * restrict patchfrom speed optimization to strategy < ZSTD_btultra * update results.csv * update regression test	2023-03-14 20:36:56 -04:00
Yonatan Komornik	91f4c23e63	Add salt into row hash (#3528 part 2) (#3533 ) Part 2 of #3528 Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).	2023-03-13 15:34:13 -07:00
Yonatan Komornik	9420bce8a4	Add init once memory (#3528 ) (#3529 ) - Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime. - Changes tag space in row hash to be based on init once memory.	2023-03-13 13:20:49 -07:00
Yonatan Komornik	a91e91d614	[Bugfix] row hash tries to match position 0 (#3548 ) #3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.	2023-03-13 10:00:03 -07:00
Yonatan Komornik	33e39094e7	Reduce RowHash's tag space size by x2 (#3543 ) Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index). The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).	2023-03-10 14:15:04 -08:00
Nick Terrell	07a2a33135	Add ZSTD_set{C,F,}Params() helper functions * Add ZSTD_setFParams() and ZSTD_setParams() * Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters * Add unit tests * Update documentation to suggest using them to replace deprecated functions Fixes #3396.	2023-03-08 09:57:35 -08:00
Yonatan Komornik	988ce61a0c	Adds initialization of clevel to static cdict (#3525 ) (#3527 ) - Initializes clevel in `ZSTD_CCtxParams_init` - Adds CI workflow for msan fuzzers runs without optimization (`-O0`) - Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten	2023-03-06 18:05:12 -08:00
Nick Terrell	395a2c5462	[bug-fix] Fix rare corruption bug affecting the block splitter The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block. All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out. The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue. Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!	2023-02-23 10:54:31 -08:00
Yonatan Komornik	c78f434aa4	Fix zstd-dll build missing dependencies (#3496 ) * Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll	2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky	ff42ed1582	Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484 ) * change "external matchfinder" to "external sequence producer" * migrate contrib/ to new naming convention * fix contrib build * fix error message * update debug strings * fix def of invalid sequences in zstd.h * nit * update CHANGELOG * fix .gitignore	2023-02-09 17:01:17 -05:00
Elliot Gorokhovsky	3fe5f1fbb9	assert externalRepSearch != ZSTD_ps_auto	2023-02-01 18:24:46 -08:00
Elliot Gorokhovsky	7f8189ca57	add ZSTD_c_fastExternalSequenceParsing cctxParam	2023-02-01 09:09:53 -08:00
Elliot Gorokhovsky	64052ef57d	Guard against invalid sequences from external matchfinders (#3465 )	2023-01-31 13:55:48 -05:00
daniellerozenblit	00176638e3	Merge pull request #3460 from daniellerozenblit/fix-long-offsets-resolution-pointer fix long offset resolution	2023-01-30 14:02:51 -05:00
daniellerozenblit	2bde9fbf85	Update lib/compress/zstd_compress.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2023-01-27 16:58:53 -05:00
Nick Terrell	423a74986f	[fse] Delete unused functions Delete all unused FSE functions, now that we are no longer syncing to/from upstream. This avoids confusion about Zstd's stack usage like in Issue #3453. It also removes dead code, which is always a plus.	2023-01-27 13:15:07 -08:00
Danielle Rozenblit	9e4c66b9e9	record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case	2023-01-27 12:04:29 -08:00
Danielle Rozenblit	814f4bfb99	fix long offset resolution	2023-01-27 08:21:47 -08:00
daniellerozenblit	f3255bfeff	Merge pull request #3447 from daniellerozenblit/fuzz-sequence-compression Fuzz large offsets through sequence compression api	2023-01-25 09:27:34 -05:00
Yonatan Komornik	1d636b4ba0	Bug fix redzones by unpoisoning only the intended buffer and not the followup redzone.	2023-01-24 12:54:43 -08:00
Danielle Rozenblit	7d600c628a	fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim()	2023-01-24 06:40:40 -08:00
daniellerozenblit	9116000be6	Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix Fix sequence validation and seqStore bounds check	2023-01-23 13:50:37 -05:00
Danielle Rozenblit	815d1d4eda	update external sequence error to fit error naming scheme	2023-01-23 09:58:34 -08:00
Danielle Rozenblit	1b65727e74	fix nits and add new error code for invalid external sequences	2023-01-23 07:59:02 -08:00
Nick Terrell	b4467c1061	Fix bufferless API with attached dictionary Fixes #3102.	2023-01-20 16:15:16 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	6742f20a7f	Merge pull request #3435 from facebook/c89build added c89 build test to CI	2023-01-20 14:07:12 -08:00
Nick Terrell	666944fbe6	Cap hashLog & chainLog to ensure that we only use 32 bits of hash * Cap shortCache chainLog to 24 * Cap row match finder hashLog so that rowLog <= 24 * Add unit tests to expose all cases. The row match finder unit tests are only run in 64-bit mode, because they allocate ~1GB. Fixes #3336	2023-01-20 14:05:26 -08:00
Danielle Rozenblit	aa385ece13	fix sequence validation and bounds check in ZSTD_copySequencesToSeqStore()	2023-01-20 10:32:35 -08:00
Yann Collet	ea684c335a	added c89 build test to CI	2023-01-19 14:59:30 -08:00
Elliot Gorokhovsky	bce0382c82	Bugfixes for the External Matchfinder API (#3433 ) * external matchfinder bugfixes + tests * small doc fix	2023-01-19 10:41:24 -05:00
daniellerozenblit	dc1c6cc5df	Merge pull request #3418 from daniellerozenblit/fuzz-max-block-size Fuzz on maxBlockSize	2023-01-19 08:18:04 -05:00
Danielle Rozenblit	8353a4b095	fix maxBlockSize resolution + add test cases	2023-01-17 12:24:18 -08:00
Yann Collet	ac45e078a5	add explanation about new test as requested by @terrelln	2023-01-12 15:49:01 -08:00
Yann Collet	796699c0bc	fix root cause of #3416 A minor change in `5434de0` changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.	2023-01-12 15:41:08 -08:00
Danielle Rozenblit	06b096db47	additional tests and documentation updates + allow maxBlockSize to be set to 0 (goes to default)	2023-01-12 13:41:50 -08:00
Danielle Rozenblit	53eb5a758c	add simple test for maxBlockSize expected functionality	2023-01-12 08:55:39 -08:00
Danielle Rozenblit	1fffcfe01d	update minimum threshold for max block size	2023-01-11 11:09:57 -08:00
Danielle Rozenblit	fe08137d9a	resolve max block value in cctx and use when calculating the max block size	2023-01-09 07:53:53 -08:00
Yann Collet	71dbe8f9d4	minor: fix conversion warnings	2023-01-04 20:00:04 -08:00
daniellerozenblit	d913417f72	Merge branch 'dev' into fuzz-max-block-size	2023-01-04 16:34:07 -05:00
Danielle Rozenblit	908e812733	initial commit	2023-01-04 13:01:54 -08:00

... 4 5 6 7 8 ...

2480 Commits