Add a 4-way Neon implementation for the convertSequences_noRepcodes
function. Remove 'static' keywords from all of its implementations to
be able to add unit tests.
Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5`
Neoverse-V2 before after
Clang-18: 100.000% 311.703%
Clang-19: 100.191% 311.714%
Clang-20: 100.181% 311.723%
GCC-13: 107.520% 252.309%
GCC-14: 107.652% 253.158%
GCC-15: 107.674% 253.168%
Cortex-A720 before after
Clang-18: 100.000% 204.512%
Clang-19: 102.825% 204.600%
Clang-20: 102.807% 204.558%
GCC-13: 110.668% 203.594%
GCC-14: 110.684% 203.978%
GCC-15: 102.864% 204.299%
Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
Add a faster scalar implementation of ZSTD_get1BlockSummary which
removes the data dependency of the accumulators in the hot loop to
leverage the superscalar potential of recent out-of-order CPUs.
The new algorithm leverages SWAR (SIMD Within A Register) methodology
to exploit the capabilities of 64-bit architectures. It achieves this
by packing two 32-bit data elements into a single 64-bit register,
enabling parallel operations on these subcomponents while ensuring
that the 32-bit boundaries prevent overflow, thereby optimizing
computational efficiency.
Corresponding unit tests are included.
Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5`
Neoverse-V2 before after
GCC-13: 100.000% 290.527%
GCC-14: 100.000% 291.714%
GCC-15: 99.914% 291.495%
Clang-18: 148.072% 264.524%
Clang-19: 148.075% 264.512%
Clang-20: 148.062% 264.490%
Cortex-A720 before after
GCC-13: 100.000% 235.261%
GCC-14: 101.064% 234.903%
GCC-15: 112.977% 218.547%
Clang-18: 127.135% 180.359%
Clang-19: 127.149% 180.297%
Clang-20: 127.154% 180.260%
Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
values, and include diverse symbol distributions.
These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
The FUZZ_malloc_rand() function was incorrectly always returning NULL for
zero-size allocations. The random offset generated by
FUZZ_dataProducer_int32Range() was not being added to the pointer variable,
causing the function to always return (void *)0.
Replace direct returns in error-handling branches with a unified
cleanup block that frees allocated resources before returning,
improving code quality and robustness.
Building lz4 as root was causing `make clean` to fail with permission
errors.
We used to have to install lz4 from source back in Ubuntu 14.04, but
nowadays the installed lz4 is fine. Get rid of ancient helpers and
cruft!
checks that ZSTD_NBTHREADS triggers the expected verbose message
Also: checked that the new test script fails on current `dev` branch, and is fixed by this branch
so that a human reading the test log can determine everything was fine without consulting the shell error code.
Also: made `make check` slightly shorter by moving one longer test to `make test`