1
0
mirror of https://github.com/facebook/zstd.git synced 2025-08-10 04:43:07 +03:00
Commit Graph

10020 Commits

Author SHA1 Message Date
Yann Collet
ac0746ac19 Merge pull request #3470 from facebook/bench_zstd_only
ensure that benchmark mode can only be invoked with zstd format
2023-01-31 16:22:20 -08:00
Elliot Gorokhovsky
64052ef57d Guard against invalid sequences from external matchfinders (#3465) 2023-01-31 13:55:48 -05:00
Yann Collet
af09777b24 ensure that benchmark mode can only be invoked with zstd format
fix #3463
2023-01-31 09:04:29 -08:00
Yann Collet
4794bbfe00 Merge pull request #3469 from facebook/updateVersion
bump version number to v1.5.4
2023-01-30 19:58:22 -08:00
Yann Collet
71c911da36 Merge pull request #3464 from facebook/dependabot/github_actions/github/codeql-action-2.2.1
Bump github/codeql-action from 2.1.39 to 2.2.1
2023-01-30 19:07:33 -08:00
Yann Collet
39ceef27f9 bump version number to v1.5.4
start preparation for release
2023-01-30 19:06:39 -08:00
Nick Terrell
2f74507bbd Simplify 32-bit long offsets decoding logic
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.

Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.

Credit to OSS-Fuzz
2023-01-30 12:21:42 -08:00
daniellerozenblit
00176638e3 Merge pull request #3460 from daniellerozenblit/fix-long-offsets-resolution-pointer
fix long offset resolution
2023-01-30 14:02:51 -05:00
Danielle Rozenblit
0843d9bedf Merge branch 'fix-long-offsets-resolution-pointer' of github.com:daniellerozenblit/zstd into fix-long-offsets-resolution-pointer 2023-01-30 06:26:21 -08:00
Danielle Rozenblit
66fae56c86 remove big test around large offset with small window size 2023-01-30 06:26:03 -08:00
dependabot[bot]
dd7fdc98c8 Bump github/codeql-action from 2.1.39 to 2.2.1
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.39 to 2.2.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](a34ca99b46...3ebbd71c74)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 05:11:42 +00:00
daniellerozenblit
295724b515 Update .github/workflows/dev-long-tests.yml
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2023-01-28 12:14:48 -05:00
Nick Terrell
b3b43f2893 Fix invalid assert in 32-bit decoding
The assert is only correct for valid sequences, so disable it for
everything execpt round trip fuzzers.
2023-01-27 14:40:38 -08:00
Danielle Rozenblit
5ec77add20 Merge branch 'fix-long-offsets-resolution-pointer' of github.com:daniellerozenblit/zstd into fix-long-offsets-resolution-pointer 2023-01-27 14:18:44 -08:00
Danielle Rozenblit
da589a134a update CI 2023-01-27 14:18:29 -08:00
daniellerozenblit
2bde9fbf85 Update lib/compress/zstd_compress.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2023-01-27 16:58:53 -05:00
Nick Terrell
423a74986f [fse] Delete unused functions
Delete all unused FSE functions, now that we are no longer syncing
to/from upstream.

This avoids confusion about Zstd's stack usage like in Issue #3453.
It also removes dead code, which is always a plus.
2023-01-27 13:15:07 -08:00
Danielle Rozenblit
9e4c66b9e9 record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case 2023-01-27 12:04:29 -08:00
Danielle Rozenblit
d210628b0b initialize long offsets in decodecorpus 2023-01-27 09:52:00 -08:00
Danielle Rozenblit
814f4bfb99 fix long offset resolution 2023-01-27 08:21:47 -08:00
Yann Collet
88b7088d2e Merge pull request #3458 from facebook/stderr_finalStatus
Update logic when `stderr` is not the console
2023-01-26 17:22:38 -08:00
Nick Terrell
bda947e17a [huf] Fix bug in fast C decoders
The input bounds checks were buggy because they were only breaking from
the inner loop, not the outer loop. The fuzzers found this immediately.
The fix is to use `goto _out` instead of `break`.

This condition can happen on corrupted inputs.

I've benchmarked before and after on x86-64 and there were small changes
in performance, some positive, and some negative, and they end up about
balacing out.

Credit to  OSS-Fuzz
2023-01-26 14:39:13 -08:00
Yann Collet
82ca00811a change logic when stderr is not console : don't update progress status
but keep warnings and final operation statement.

updated tests/cli-tests/ accordingly
2023-01-26 13:00:52 -08:00
Yann Collet
3c215220e3 modify cli-test logic : ignore stderr message by default
Previously, cli-test would, by default, check that a stderr output is strictly identical to a saved outcome.
When there was no instructions on how to interpret stderr, it would default to requiring it to be empty.

There are many tests cases though where stderr content doesn't matter, and we are mainly interested in the return code of the cli.
For these cases, it was possible to set a .ignore document, which would instruct to ignore stderr content.

This PR update the logic, to make .ignore the default.
When willing to check that stderr content is empty, one must now add an empty .strict file.

This will allow status message to evolve without triggering many cli-tests errors.
This is especially important when some of these status include compression results, which may change as a result of compression optimizations.
It also makes it easier to add new tests which only care about the CLI's return code.
2023-01-26 10:57:41 -08:00
Yonatan Komornik
7b3f03bc9d Merge pull request #3457 from yoniko/fix-rowhash-cli
[Bugfix] CLI row hash flags set the wrong values


`--[no-]row-match-finder` do the opposite of what they are supposed to.
In effect the no option would activate row hash while the other option will disable it.
This commit fixes the issue and changes the code to use the more readable enum values.
2023-01-25 22:40:25 -08:00
Yonatan Komornik
6422d1d7a8 Bugfix: --[no-]row-match-finder do the opposite of what they are supposed to 2023-01-25 17:59:35 -08:00
Yann Collet
a82e0aac44 Merge pull request #3450 from facebook/no_rm_on_o
disable --rm on -o command
2023-01-25 17:51:53 -08:00
Yann Collet
02434e0867 enforce a hard fail when input files are set to be erased
in scenarios where it's supposed to not be possible.

suggested by @terrelln
2023-01-25 16:18:20 -08:00
Yann Collet
8c85b29e32 disable --rm on -o command
make it more similar to -c (aka `stdout`) convention.
2023-01-25 16:09:25 -08:00
Yann Collet
efc9ae3480 Merge pull request #3455 from facebook/fix3454
Provide more accurate error codes for busy-loop scenarios
2023-01-25 15:22:51 -08:00
Nick Terrell
321490cd5b [version-test] Work around bugs in v0.7.3 dict builder
Before calling a dictionary good, make sure that it can compress an
input. If v0.7.3 rejects v0.7.3's dictionary, fall back to the v1.0
dictionary. This is not the job of the verison test to test it, because
we cannot fix this code.
2023-01-25 13:47:51 -08:00
Nick Terrell
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
Yann Collet
db18a62f89 Provide more accurate error codes for busy-loop scenarios
fixes #3454
2023-01-25 13:07:53 -08:00
daniellerozenblit
f3255bfeff Merge pull request #3447 from daniellerozenblit/fuzz-sequence-compression
Fuzz large offsets through sequence compression api
2023-01-25 09:27:34 -05:00
daniellerozenblit
29a4c8cc4a Merge pull request #3452 from daniellerozenblit/fix-seekable-32bit
Fix 32-bit build errors in zstd seekable format
2023-01-25 09:23:34 -05:00
Danielle Rozenblit
63042f1f11 fix 32bit build errors in zstd seekable 2023-01-24 15:53:59 -08:00
Yonatan Komornik
2baac04110 Merge pull request #3451 from yoniko/red-zones-bugfix
Bugfix redzone unpoisoning
2023-01-24 14:32:56 -08:00
Yonatan Komornik
1d636b4ba0 Bug fix redzones by unpoisoning only the intended buffer and not the followup redzone. 2023-01-24 12:54:43 -08:00
Danielle Rozenblit
7d600c628a fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim() 2023-01-24 06:40:40 -08:00
Elliot Gorokhovsky
41682e6293 Merge pull request #3448 from facebook/embg-doc-fix
Fix ZSTD_estimate* and ZSTD_initCStream() docs
2023-01-23 15:04:53 -05:00
Danielle Rozenblit
0a91b31b17 Merge branch 'dev' into fuzz-sequence-compression
for testing
2023-01-23 11:11:33 -08:00
daniellerozenblit
9116000be6 Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix
Fix sequence validation and seqStore bounds check
2023-01-23 13:50:37 -05:00
Danielle Rozenblit
7fc00c18b8 calloc dictionary in sequence compression fuzzer rather than generating a random buffer 2023-01-23 10:42:09 -08:00
Elliot Gorokhovsky
3bfd3be5fb Fix ZSTD_estimate* and ZSTD_initCStream() docs
Fix the following documentation bugs:
* Note that `ZSTD_estimate*` functions are not compatible with the external matchfinder API
* Note that `ZSTD_estimateCStreamSize_usingCCtxParams()` is not compatible with `nbWorkers >= 1`
* Remove incorrect warning that the legacy streaming API is incompatible with advanced parameters and/or dictionary compression
* Note that `ZSTD_initCStream()` is incompatible with dictionary compression
* Warn that
2023-01-23 13:28:36 -05:00
Yann Collet
ced0882e45 Merge pull request #3443 from facebook/no_rm_w_stdout
refactor : --rm ignored with stdout
2023-01-23 10:22:11 -08:00
Nick Terrell
dc2b3e8876 Fix -Wstringop-overflow warning
Backported from kernel patch [0].

I wasn't able to reproduce the warning locally, but could repro it in
the kernel.

[0] https://lore.kernel.org/lkml/20220330193352.GA119296@embeddedor/
2023-01-23 10:12:25 -08:00
Danielle Rozenblit
815d1d4eda update external sequence error to fit error naming scheme 2023-01-23 09:58:34 -08:00
Elliot Gorokhovsky
6aee603d0e Merge pull request #3446 from facebook/dependabot/github_actions/github/codeql-action-2.1.39
Bump github/codeql-action from 2.1.38 to 2.1.39
2023-01-23 11:55:13 -05:00
Danielle Rozenblit
f75afb613f merge dev 2023-01-23 08:12:19 -08:00
Danielle Rozenblit
1b65727e74 fix nits and add new error code for invalid external sequences 2023-01-23 07:59:02 -08:00