1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-19 05:23:17 +03:00
Commit Graph

11277 Commits

Author SHA1 Message Date
2295826266 update tests duration indications 2025-06-21 12:01:07 -07:00
d77a7b6895 added test-largeDictionary to dev-long CI script 2025-06-21 11:34:10 -07:00
528132e9a0 Merge pull request #4402 from mugitya03/tests
Release resources in error paths via cleanup
2025-06-21 11:33:44 -07:00
878be1c8f0 fix 2025-06-21 13:43:47 -04:00
16e13ebdeb delete 2025-06-21 13:03:13 -04:00
a74f7fcabd merge 2025-06-21 12:57:12 -04:00
7eefc22169 Merge pull request #4367 from ClickHouse/cfi
Add unwind information in huf_decompress_amd64.S
2025-06-19 23:41:38 -07:00
354cede369 Merge pull request #4412 from Cyan4973/rm_bd
remove duplicate
2025-06-19 14:32:32 -07:00
e315155cc2 removed duplicate
this file is already present as `largeDictionary.c`
2025-06-18 15:07:32 -07:00
429dc891b2 Merge pull request #4411 from arpadpanyik-arm/hist_sve2
AArch64: Add SVE2 implementation of histogram computation
2025-06-18 13:48:54 -07:00
2082749775 Merge pull request #4409 from bgilbert/meson-license
meson: use SPDX expression for license
2025-06-16 10:54:43 -07:00
4255c5ea89 Merge pull request #4408 from mugitya03/MLK-3
Ensure BMK_timedFnState is always freed in benchMem
2025-06-16 09:01:58 -07:00
57bd0eb6a7 meson: use SPDX expression for license
This is the format recommended by Meson documentation.
2025-06-14 19:48:40 -07:00
d28a737750 Add unit tests for HIST_count_wksp
The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
  values, and include diverse symbol distributions.

These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
2025-06-13 22:55:53 +00:00
cad0b72ad8 Ensure BMK_timedFnState is always freed in benchMem
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
2025-06-12 19:52:58 -04:00
7e4937bc75 AArch64: Add SVE2 implementation of histogram computation
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.

On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.

The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.

The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.

This implementation is the best performing of a number of different
cache blocking schemes tested.

Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20    GCC-14
 1#silesia.tar:   +6.173%   +5.987%
 2#silesia.tar:   +5.200%   +5.011%
 3#silesia.tar:   +4.332%   +5.031%
 4#silesia.tar:   +2.789%   +3.064%
 5#silesia.tar:   +2.028%   +1.838%
 6#silesia.tar:   +1.562%   +1.340%
 7#silesia.tar:   +1.160%   +0.959%
2025-06-11 12:14:22 +00:00
5e6bdf5e3d Merge pull request #4406 from Cyan4973/separate-cmake-tests
cmake CI tests refactor
2025-06-09 15:19:47 -07:00
9a6fe9a428 remove global variable
overkill and leaky to transport a test result just in one place.
2025-06-09 21:55:06 +00:00
88bea95d11 Merge pull request #4403 from dloidolt/fix_FUZZ_malloc_rand
fuzz: Fix FUZZ_malloc_rand() to return non-NULL for zero-size allocations
2025-06-09 10:57:59 -07:00
39c091bc9e Merge pull request #4397 from xiaoge1001/free
Fix several locations with potential memory leak
2025-06-09 10:06:36 -07:00
de8d9e8914 Fix several locations with potential memory leak 2025-06-09 21:23:23 +08:00
472acf5d83 fix #4405 2025-06-09 07:24:03 +00:00
7e0324e124 fixed cmake + windows + visual + clang-cl
by removing processing of resource files in this case
2025-06-09 07:09:51 +00:00
b6dc2924f8 remove fail-fast so that the outcome of other tests can be observed 2025-06-09 06:59:18 +00:00
49fe2ec793 refactor: modularize CMakeLists.txt for better maintainability
- Split monolithic 235-line CMakeLists.txt into focused modules
- Main file reduced to 78 lines with clear section organization
- Created 5 specialized modules:
  * ZstdVersion.cmake - CMake policies and version management
  * ZstdOptions.cmake - Build options and platform configuration
  * ZstdDependencies.cmake - External dependency management
  * ZstdBuild.cmake - Build targets and validation
  * ZstdPackage.cmake - Package configuration generation

Benefits:
- Improved readability and maintainability
- Better separation of concerns
- Easier debugging and modification
- Preserved 100% backward compatibility
- All existing build options and targets unchanged

The refactored build system passes all tests and maintains
identical functionality while being much easier to understand
and maintain.
2025-06-09 03:47:33 +00:00
75abb8bc1c add cmake build test with ZSTD_BUILD_TESTS disabled
should reproduce #4405 and fail
2025-06-09 00:05:19 +00:00
c826c572cf added macos arm64 tests
and comment out windows arm64 tests due to unacceptably long queue time
2025-06-08 22:40:15 +00:00
a168ae7232 added windows arm64 runner to cmake tests 2025-06-08 22:19:57 +00:00
b922774602 refactor CMake tests workflow for readability 2025-06-08 21:44:21 +00:00
a2dba85fd1 ci: separate cmake tests into dedicated workflow file
- Create new .github/workflows/cmake-tests.yml with all cmake-related jobs
- Move cmake-build-and-test-check, cmake-source-directory-with-spaces, and cmake-visual-2022 jobs
- Remove cmake tests from dev-short-tests.yml to improve organization
- Maintain same trigger conditions and test configurations
- Add dedicated concurrency group for cmake tests

This separation allows cmake tests to run independently and makes
the CI configuration more modular and easier to maintain.
2025-06-08 20:25:25 +00:00
ec8ad2e552 Merge pull request #4384 from xiaoge1001/dev
update `--rm` cmd help info
2025-06-08 12:39:26 -07:00
5b6fbf5f96 Merge pull request #4392 from mugitya03/MLK
Fix potential memory leak in function `benchMem`
2025-06-08 12:38:31 -07:00
e23d0a9616 Merge pull request #4399 from zijianli1234/dev
Improve speed of convertSequences() and get1BlockSummary() using RVV
2025-06-08 12:38:02 -07:00
a480191f9e Fix Darwin build of huf_decompress_amd64.S 2025-06-08 05:07:09 +00:00
80cac404c7 Add unwind information in huf_decompress_amd64.S 2025-06-08 05:07:09 +00:00
4be08ba122 fuzz: Fix FUZZ_malloc_rand() to return non-NULL for zero-size allocations
The FUZZ_malloc_rand() function was incorrectly always returning NULL for
zero-size allocations. The random offset generated by
FUZZ_dataProducer_int32Range() was not being added to the pointer variable,
causing the function to always return (void *)0.
2025-06-05 17:28:30 +02:00
a81ffe11d4 Release resources in error paths via cleanup
Replace direct returns in error-handling branches with a unified
cleanup block that frees allocated resources before returning,
improving code quality and robustness.
2025-06-04 18:08:11 -04:00
bd894054c0 Merge pull request #4401 from mugitya03/MLK-1
Release resources before returning
2025-06-04 12:49:38 -07:00
dd4cee9190 Release resources before returning
In main, resources were freed on the success path but not in the error path.
This change ensures all allocated resources are released before returning.
2025-06-03 15:28:11 -04:00
d95123f2e6 Improve speed of ZSTD_compressSequencesAndLiterals() using RVV 2025-06-02 17:21:02 +08:00
4bd5654e72 update --rm cmd help info
Starting from cee6bec9fa, --rm is ignored when the output is `stdout`.
2025-05-30 06:49:52 +08:00
4618b255ea Fix memory leak in function benchMem
`speedPerRound` is allocated at the start of benchMem to collect per-round speeds,
but is never freed, causing a leak on each invocation.
2025-05-25 15:21:23 -04:00
f9938c217d lz4: Remove ancient test helpers
Building lz4 as root was causing `make clean` to fail with permission
errors.

We used to have to install lz4 from source back in Ubuntu 14.04, but
nowadays the installed lz4 is fine. Get rid of ancient helpers and
cruft!
2025-05-07 22:01:49 -07:00
448a09ff78 seekable_format: Fix conversion warnings in parallel_compression 2025-05-07 22:01:49 -07:00
13cb7a10ae seekable_format: Add test for parallel_compression memory usage
Use ulimit to fail the test if we use O(filesize) memory, rather than
O(threads).
2025-05-07 22:01:49 -07:00
01c973de8d seekable_format: Fix race in parallel_processing
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
2025-05-07 22:01:49 -07:00
6fc8455a72 seekable_format: Cleanup POOL in parallel_compression 2025-05-07 22:01:49 -07:00
2d4cff69c4 seekable_format: Make parallel_compression use memory properly
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!

It also polled to check whether a job completed, which is racy without
any memory barrier.

Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.

This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
2025-05-07 22:01:49 -07:00
f5b6531902 seekable_format: Link against multi-threaded libzstd.a
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.

The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
2025-05-07 22:01:49 -07:00
6b0039abcf seekable_format: Build with $(MAKE)
This passes make flags, such as `-jN` for building in parallel, to
the underlying make.
2025-05-07 22:01:49 -07:00