lib/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-09-07 01:06:59 +03:00

Author	SHA1	Message	Date
Yann Collet	40c285e0ba	Merge pull request #4419 from AZero13/patch-1 Check for job before releasing resources	2025-08-19 17:02:48 -07:00
Yann Collet	6f1cb87ade	Merge pull request #4443 from facebook/opt_simplify_4442 simplify sequence resolution in zstd_opt	2025-07-23 15:01:36 -08:00
Yann Collet	0055ce7a02	simplify sequence resolution in zstd_opt initially hinted by @pitaj in #4442	2025-07-18 21:21:47 -07:00
Yann Collet	f9e26bb42b	Merge pull request #4394 from AZero13/zstd Remove redundant setting of allJobsCompleted to 1	2025-07-18 18:55:47 -08:00
Yann Collet	a1e11db08a	Merge pull request #4435 from zijianli1234/dev add riscv ci	2025-07-18 18:54:24 -08:00
Arpad Panyik	07cd78d366	AArch64: Add Neon path for convertSequences_noRepcodes Add a 4-way Neon implementation for the convertSequences_noRepcodes function. Remove 'static' keywords from all of its implementations to be able to add unit tests. Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5` Neoverse-V2 before after Clang-18: 100.000% 311.703% Clang-19: 100.191% 311.714% Clang-20: 100.181% 311.723% GCC-13: 107.520% 252.309% GCC-14: 107.652% 253.158% GCC-15: 107.674% 253.168% Cortex-A720 before after Clang-18: 100.000% 204.512% Clang-19: 102.825% 204.600% Clang-20: 102.807% 204.558% GCC-13: 110.668% 203.594% GCC-14: 110.684% 203.978% GCC-15: 102.864% 204.299% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:57 +00:00
Arpad Panyik	8e4400463a	Improve ZSTD_get1BlockSummary Add a faster scalar implementation of ZSTD_get1BlockSummary which removes the data dependency of the accumulators in the hot loop to leverage the superscalar potential of recent out-of-order CPUs. The new algorithm leverages SWAR (SIMD Within A Register) methodology to exploit the capabilities of 64-bit architectures. It achieves this by packing two 32-bit data elements into a single 64-bit register, enabling parallel operations on these subcomponents while ensuring that the 32-bit boundaries prevent overflow, thereby optimizing computational efficiency. Corresponding unit tests are included. Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5` Neoverse-V2 before after GCC-13: 100.000% 290.527% GCC-14: 100.000% 291.714% GCC-15: 99.914% 291.495% Clang-18: 148.072% 264.524% Clang-19: 148.075% 264.512% Clang-20: 148.062% 264.490% Cortex-A720 before after GCC-13: 100.000% 235.261% GCC-14: 101.064% 234.903% GCC-15: 112.977% 218.547% Clang-18: 127.135% 180.359% Clang-19: 127.149% 180.297% Clang-20: 127.154% 180.260% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:49 +00:00
ZijianLi	2c3f23b018	fix dereferencing type-punned pointer error	2025-06-29 15:36:25 +08:00
Rose	4efbd56749	Check for job before releasing ZSTDMT_freeCCtx calls ZSTDMT_releaseAllJobResources, but ZSTDMT_releaseAllJobResources may be called when ZSTDMT_freeCCtx is called when initialization fails, resulting in a NULL pointer dereference.	2025-06-24 14:05:08 -04:00
Rose	50f169411b	Remove redundant setting of allJobsCompleted to 1 This will do it automatically.	2025-06-24 14:04:21 -04:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (`e26dde3d`) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00
李子建	d95123f2e6	Improve speed of ZSTD_compressSequencesAndLiterals() using RVV	2025-06-02 17:21:02 +08:00
Yann Collet	2fec3989c1	add an assert to help static analyzers understand there is no overflow risk there.	2025-03-22 18:23:31 -07:00
Nick Terrell	68dfd14a8c	[linux] Opt out of row based match finder for the kernel The row based match finder is slower without SIMD. We used to detect the presence of SIMD to set the lower bound to 17, but that breaks determinism. Instead, specifically opt into it for the kernel, because it is one of the rare cases that doesn't have SIMD support.	2025-03-11 16:18:59 -04:00
Yann Collet	22b2fd2517	Merge pull request #4317 from hirohira9119/fix-function-signature Fix function signature mismatch for ZSTD_convertBlockSequences	2025-02-27 13:03:03 -08:00
Yann Collet	db2d205ada	fixed -Wconversion for lib/decompress/zstd_decompress_block.c	2025-02-26 10:01:05 -08:00
hirohira	2840631dc1	Fix function signature mismatch for ZSTD_convertBlockSequences	2025-02-26 08:23:48 +09:00
Yann Collet	d2c562b803	update hrlog comment	2025-02-10 10:48:56 -08:00
Yann Collet	67fad95f79	derive hashratelog from hashlog when only hashlog is set	2025-02-10 10:46:37 -08:00
Yann Collet	09d7e34ed8	adjust mml	2025-02-10 10:46:37 -08:00
Yann Collet	d5e4698267	fix boundary condition	2025-02-10 10:46:37 -08:00
Yann Collet	72406b71c3	update hrlog rule to favor compression ratio a bit more at low levels	2025-02-10 10:46:37 -08:00
Yann Collet	f26cc54f37	dynamic bucket sizes	2025-02-10 10:46:37 -08:00
Yann Collet	4609a40b89	dynamically adjust hratelog and ldmml based on strategy	2025-02-10 10:46:37 -08:00
Yann Collet	23e5f80390	Revert "pass dictionary loading method as parameter" This reverts commit `821fc567f9`.	2025-02-05 18:47:26 -08:00
Yann Collet	c7cd7dc04b	better MT fluidity --patch-from no longer blocked on first job dictionary loading	2025-02-05 18:42:00 -08:00
Yann Collet	f11bd19c7f	ensure cdict is properly reset to NULL	2025-02-05 18:42:00 -08:00
Yann Collet	7406d2b6eb	skips the need to create a temporary cdict for --patch-from thus saving a bit of memory and a little bit of cpu time	2025-02-05 18:42:00 -08:00
Yann Collet	220abe6da8	reduced memory usage by avoiding to duplicate in memory a dictionary that was passed by reference.	2025-02-05 18:42:00 -08:00
Yann Collet	85a44b233a	always free .cdictLocal	2025-02-05 18:41:59 -08:00
Yann Collet	e637fc64c5	update type naming convention	2025-02-05 18:41:59 -08:00
Yann Collet	34ba14437a	minor boundary change improves compression ratio at low levels	2025-02-05 18:41:59 -08:00
Yann Collet	ffa66a6971	fix speed of --patch-from at high compression mode	2025-02-05 18:41:59 -08:00
Yann Collet	e117d79e22	fix minor alignment warning	2025-02-05 16:13:58 -08:00
Yann Collet	c39424ea87	fix minor alignment warning this is a prototype definition error: `_mm_storeu_si128()` should accept a `void` pointer, since it explicitly states that it accepts unaligned addresses yet requiring a `__m128i` tells otherwise, and requires the compiler the enforce this alignment.	2025-02-05 16:11:54 -08:00
Yann Collet	32dff04d32	fix one minor alignment warning seems like a prototype interface error: input parameter should have been `const void`, since the documentation is explicit that input doesn't have to be aligned, but `const __m256i` makes the compiler enforce it.	2025-02-05 15:46:44 -08:00
Yann Collet	f0b5f65bca	fixed minor static function declaration issue in AVX2 mode only	2025-01-18 22:49:16 -08:00
Yann Collet	19025f3da0	Merge pull request #4238 from szsam/patch-1 fix out-of-bounds array index access	2025-01-15 17:56:41 -08:00
Yann Collet	87f0a4fbe0	restore full equation do not solve the equation, even though some members cancel each other, this is done for clarity, we'll let the compiler do the resolution at compile time.	2025-01-15 17:11:27 -08:00
Yann Collet	8bff69af86	Alignment instruction ZSTD_ALIGNED() in common/compiler.h	2025-01-15 17:11:27 -08:00
Yann Collet	2f3ee8b530	changed code compilation test to employ ZSTD_ARCH_X86_AVX2	2025-01-15 17:11:27 -08:00
Yann Collet	debe3d20d9	removed unused branch	2025-01-15 17:11:27 -08:00
Yann Collet	e3181cfd32	minor code doc update	2025-01-15 17:11:27 -08:00
Yann Collet	aa2cdf964f	added compilation-time checks to ensure AVX2 code is valid since it depends on a specific definition of ZSTD_Sequence structure.	2025-01-15 17:11:27 -08:00
Yann Collet	57a4554192	removed unused variable	2025-01-15 17:11:27 -08:00
Yann Collet	4aaf9cefe9	fix minor conversion warning	2025-01-15 17:11:27 -08:00
Yann Collet	db3d48823a	no need for specialized variant the branch is not in the hot loop	2025-01-15 17:11:27 -08:00
Yann Collet	cd53924eff	removed erroneous #includes that were automatically added by the editor without notification	2025-01-15 17:11:27 -08:00
Yann Collet	ed0a8b8be1	AVX2 version of ZSTD_get1BlockSummary()	2025-01-15 17:11:27 -08:00
Yann Collet	b6a4d5a8ba	minor +10% speed improvement for scalar ZSTD_get1BlockSummary()	2025-01-15 17:11:27 -08:00

1 2 3 4 5 ...

2480 Commits