lib/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-29 11:21:22 +03:00

Author	SHA1	Message	Date
Yann Collet	e128976193	Merge pull request #4448 from Cyan4973/install_oses regroup list of OSes for install inside common variable	2025-07-28 11:01:58 -08:00
Yann Collet	8bca04ba9f	regroup list of OSes for install inside common variable within lib/install_oses.mk. fixes #4445	2025-07-28 11:33:22 -07:00
Yann Collet	5b89189741	Merge pull request #4450 from facebook/dependabot/github_actions/github/codeql-action-3.29.4 Bump github/codeql-action from 3.28.9 to 3.29.4	2025-07-28 07:33:09 -08:00
dependabot[bot]	96f316a246	Bump github/codeql-action from 3.28.9 to 3.29.4 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.9 to 3.29.4. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](`9e8d0789d4...4e828ff8d4`) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.4 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-28 06:30:43 +00:00
Yann Collet	9bf5d340ae	Merge pull request #4447 from facebook/android-cmake added android cmake build	2025-07-24 10:07:16 -08:00
Yann Collet	34f3a0ab11	Merge pull request #4413 from arpadpanyik-arm/huf_decode2x AArch64: Enhance struct access in Huffman decode 2X	2025-07-23 15:03:37 -08:00
Yann Collet	6f1cb87ade	Merge pull request #4443 from facebook/opt_simplify_4442 simplify sequence resolution in zstd_opt	2025-07-23 15:01:36 -08:00
Yann Collet	3b23f0c673	added android cmake build is expecte to fail, due to #4444	2025-07-23 15:07:20 -07:00
Yann Collet	0055ce7a02	simplify sequence resolution in zstd_opt initially hinted by @pitaj in #4442	2025-07-18 21:21:47 -07:00
Yann Collet	f9e26bb42b	Merge pull request #4394 from AZero13/zstd Remove redundant setting of allJobsCompleted to 1	2025-07-18 18:55:47 -08:00
Yann Collet	8c651868ff	Merge pull request #4418 from arpadpanyik-arm/decode_seq_opt AArch64: Improve ZSTD_decodeSequence performance	2025-07-18 18:54:49 -08:00
Yann Collet	a1e11db08a	Merge pull request #4435 from zijianli1234/dev add riscv ci	2025-07-18 18:54:24 -08:00
Yann Collet	afa96bbf25	Merge pull request #4429 from arpadpanyik-arm/convertSequences_Neon Improve speed of ZSTD_compressSequencesAndLiterals using Neon	2025-07-13 23:52:48 -08:00
Yann Collet	c768d7b94b	Merge pull request #4436 from facebook/dependabot/github_actions/cygwin/cygwin-install-action-6 Bump cygwin/cygwin-install-action from 5 to 6	2025-07-13 23:52:32 -08:00
dependabot[bot]	3ce4d1cba3	Bump cygwin/cygwin-install-action from 5 to 6 Bumps [cygwin/cygwin-install-action](https://github.com/cygwin/cygwin-install-action) from 5 to 6. - [Release notes](https://github.com/cygwin/cygwin-install-action/releases) - [Commits](`f61179d722...f200932376`) --- updated-dependencies: - dependency-name: cygwin/cygwin-install-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-14 06:27:46 +00:00
Yann Collet	9a41990883	Merge pull request #4433 from facebook/vs2025 removed VS2019 runners	2025-07-12 19:44:28 -08:00
ZijianLi	534860c90b	add -DMEM_FORCE_MEMORY_ACCESS=0 in CI RVV test	2025-07-13 10:51:08 +08:00
Yann Collet	7325384a68	removed VS2019 runners replaced by one vs2025 runner, which is badly named since it still running MSVC 2022, but it's a good test that shows that the matrix is able to handle multiple MSVC versions.	2025-07-11 10:29:07 -07:00
Arpad Panyik	703f855734	AArch64: Enable optimized QEMU CI builds Add missing `-O3` flag to the compilation of AArch64 SVE2 builds executed by QEMU. This can decrease the CI job runtime considerably.	2025-07-10 18:20:57 +00:00
Arpad Panyik	07cd78d366	AArch64: Add Neon path for convertSequences_noRepcodes Add a 4-way Neon implementation for the convertSequences_noRepcodes function. Remove 'static' keywords from all of its implementations to be able to add unit tests. Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5` Neoverse-V2 before after Clang-18: 100.000% 311.703% Clang-19: 100.191% 311.714% Clang-20: 100.181% 311.723% GCC-13: 107.520% 252.309% GCC-14: 107.652% 253.158% GCC-15: 107.674% 253.168% Cortex-A720 before after Clang-18: 100.000% 204.512% Clang-19: 102.825% 204.600% Clang-20: 102.807% 204.558% GCC-13: 110.668% 203.594% GCC-14: 110.684% 203.978% GCC-15: 102.864% 204.299% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:57 +00:00
Arpad Panyik	8e4400463a	Improve ZSTD_get1BlockSummary Add a faster scalar implementation of ZSTD_get1BlockSummary which removes the data dependency of the accumulators in the hot loop to leverage the superscalar potential of recent out-of-order CPUs. The new algorithm leverages SWAR (SIMD Within A Register) methodology to exploit the capabilities of 64-bit architectures. It achieves this by packing two 32-bit data elements into a single 64-bit register, enabling parallel operations on these subcomponents while ensuring that the 32-bit boundaries prevent overflow, thereby optimizing computational efficiency. Corresponding unit tests are included. Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5` Neoverse-V2 before after GCC-13: 100.000% 290.527% GCC-14: 100.000% 291.714% GCC-15: 99.914% 291.495% Clang-18: 148.072% 264.524% Clang-19: 148.075% 264.512% Clang-20: 148.062% 264.490% Cortex-A720 before after GCC-13: 100.000% 235.261% GCC-14: 101.064% 234.903% GCC-15: 112.977% 218.547% Clang-18: 127.135% 180.359% Clang-19: 127.149% 180.297% Clang-20: 127.154% 180.260% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:49 +00:00
ZijianLi	d04e7944dd	add compiler version check.	2025-07-07 23:07:39 +08:00
ZijianLi	2c3f23b018	fix dereferencing type-punned pointer error	2025-06-29 15:36:25 +08:00
ZijianLi	40f64f3493	add riscv rvv ci	2025-06-29 15:33:50 +08:00
Yann Collet	1dbc2e0908	Merge pull request #4414 from arpadpanyik-arm/copy8 AArch64: Use better block COPY8	2025-06-25 07:47:01 -04:00
Rose	50f169411b	Remove redundant setting of allJobsCompleted to 1 This will do it automatically.	2025-06-24 14:04:21 -04:00
Arpad Panyik	a28e8182b1	AArch64: Improve ZSTD_decodeSequence performance LLVM's alias-analysis sometimes fails to see that a static-array member of a struct cannot alias other members. This patch: - Reduces array accesses via struct indirection to aid load/store alias analysis under Clang. - Converts dynamic array indexing into conditional-move arithmetic, eliminating branches and extra loads/stores on out-of-order CPUs. - Reloads the bitstream only when match-length bits are consumed (assuming each reload only needs to happen once per match-length read), improving branch-prediction rates. - Removes the UNLIKELY() hint, which recent compilers already handle well without cost. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 Clang-* GCC-14 GCC-15 1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891% 2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926% 3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660% 4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187% 5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228% 6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851% 7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing) Co-authored by: David Sherwood, David.Sherwood@arm.com Ola Liljedahl, Ola.Liljedahl@arm.com	2025-06-24 12:22:23 +00:00
Arpad Panyik	bd38fc2c5f	AArch64: Enhance struct access in Huffman decode 2X In the multi-stream multi-symbol Huffman decoder GCC generates suboptimal code - emitting more loads for HUF_DEltX2 struct member accesses. Forcing it to use 32-bit loads and bit arithmetic to extract the necessary parts (UBFX) improves the overall decode speed. Also avoid integer type conversions in the symbol decodes, which leads to better instruction selection in table lookup accesses. On AArch64 the decoder no longer runs into register-pressure limits, so we can simplify the hot path and improve throughput Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 Clang-* GCC-13 GCC-14 GCC-15 1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987% 2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554% 3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing)	2025-06-23 14:16:25 +00:00
Yann Collet	3c3b8274c5	Merge pull request #4417 from facebook/dependabot/github_actions/msys2/setup-msys2-2.28.0 Bump msys2/setup-msys2 from 2.27.0 to 2.28.0	2025-06-23 06:32:14 -07:00
dependabot[bot]	7b1b6a0d2d	Bump msys2/setup-msys2 from 2.27.0 to 2.28.0 Bumps [msys2/setup-msys2](https://github.com/msys2/setup-msys2) from 2.27.0 to 2.28.0. - [Release notes](https://github.com/msys2/setup-msys2/releases) - [Changelog](https://github.com/msys2/setup-msys2/blob/main/CHANGELOG.md) - [Commits](`61f9e5e925...40677d36a5`) --- updated-dependencies: - dependency-name: msys2/setup-msys2 dependency-version: 2.28.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-06-23 06:24:00 +00:00
Yann Collet	bdceb81271	Merge pull request #4415 from bgilbert/buildtype meson: drop unused variable	2025-06-21 20:31:26 -07:00
Yann Collet	2e8ec28b30	Merge pull request #4416 from facebook/test_largeDictionary added test-largeDictionary to dev-long CI script	2025-06-21 12:37:08 -07:00
Yann Collet	2295826266	update tests duration indications	2025-06-21 12:01:07 -07:00
Yann Collet	d77a7b6895	added test-largeDictionary to dev-long CI script	2025-06-21 11:34:10 -07:00
Yann Collet	528132e9a0	Merge pull request #4402 from mugitya03/tests Release resources in error paths via cleanup	2025-06-21 11:33:44 -07:00
jinyaoguo	878be1c8f0	fix	2025-06-21 13:43:47 -04:00
jinyaoguo	16e13ebdeb	delete	2025-06-21 13:03:13 -04:00
jinyaoguo	a74f7fcabd	merge	2025-06-21 12:57:12 -04:00
Benjamin Gilbert	a4b9ebcbeb	meson: drop unused variable	2025-06-20 23:34:13 -07:00
Arpad Panyik	1e9d2006ae	AArch64: Use better block copy8 The vector copy is only necessary for 16-byte blocks on AArch64. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 GCC-14 GCC-15 1#silesia.tar: +0.316% +0.865% +0.025% +0.096% 2#silesia.tar: +0.689% +1.374% +0.027% +0.065% 3#silesia.tar: +0.811% +1.654% +0.034% +0.033% 4#silesia.tar: +0.912% +1.755% +0.027% +0.042% 5#silesia.tar: +0.995% +1.826% +0.062% +0.094% 6#silesia.tar: +0.976% +1.777% +0.065% +0.104% 7#silesia.tar: +0.910% +1.738% +0.077% +0.110%	2025-06-20 17:05:41 +00:00
Yann Collet	7eefc22169	Merge pull request #4367 from ClickHouse/cfi Add unwind information in huf_decompress_amd64.S	2025-06-19 23:41:38 -07:00
Yann Collet	354cede369	Merge pull request #4412 from Cyan4973/rm_bd remove duplicate	2025-06-19 14:32:32 -07:00
Yann Collet	e315155cc2	removed duplicate this file is already present as `largeDictionary.c`	2025-06-18 15:07:32 -07:00
Yann Collet	429dc891b2	Merge pull request #4411 from arpadpanyik-arm/hist_sve2 AArch64: Add SVE2 implementation of histogram computation	2025-06-18 13:48:54 -07:00
Yann Collet	2082749775	Merge pull request #4409 from bgilbert/meson-license meson: use SPDX expression for license	2025-06-16 10:54:43 -07:00
Yann Collet	4255c5ea89	Merge pull request #4408 from mugitya03/MLK-3 Ensure BMK_timedFnState is always freed in benchMem	2025-06-16 09:01:58 -07:00
Benjamin Gilbert	57bd0eb6a7	meson: use SPDX expression for license This is the format recommended by Meson documentation.	2025-06-14 19:48:40 -07:00
Arpad Panyik	d28a737750	Add unit tests for HIST_count_wksp The following tests are included: - Empty input scenario test. - Workspace size and alignment tests. - Symbol out-of-range tests. - Cover multiple input sizes, vary permitted maximum symbol values, and include diverse symbol distributions. These tests verifies count table correctness, maxSymbolValuePtr updates, and error-handling paths. It enables automated regression of core histogram logic as well.	2025-06-13 22:55:53 +00:00
jinyaoguo	cad0b72ad8	Ensure BMK_timedFnState is always freed in benchMem When an error occurs in BMK_isSuccessful_runOutcome, the code previously skipped the call to BMK_freeTimedFnState(tfs), leaking the allocated tfs object. Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.	2025-06-12 19:52:58 -04:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (`e26dde3d`) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00

1 2 3 4 5 ...

11311 Commits