1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-29 11:21:22 +03:00

20 Commits

Author SHA1 Message Date
1df9f36c6c Improved seekable format ingestion speed for small frame size
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
2023-03-09 18:00:30 -08:00
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
7f12f24cf4 Rewrite Copyright Date Ranges from -present to -2022
Apparently it's better. Somehow.

```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done

g co HEAD -- build/meson/
```
2022-12-20 12:44:56 -05:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
f17652931c seekable_format no header when compressing empty string to stream 2022-02-08 14:06:00 +01:00
sen
d6be7659b0 Add seekable roundtrip fuzzer (#2617) 2021-05-06 10:08:21 -04:00
16ec1cf355 added test case for seekTable API
and simple roundtrip test
2021-03-03 18:56:23 -08:00
a1d7b9d654 fixed gcc conversion warnings 2021-03-03 15:17:12 -08:00
ac95a30455 various minor style fixes 2021-03-02 16:03:18 -08:00
adb54293ab Stop using deprecated reset?Stream functions
These are replaced by the corresponding context resets. When
converting resetCStream, CCtx_setPledgedSrcSize isn't called if the
source size is "unknown".

This helps reduce the reliance on "static only" symbols, as well as
reducing the use of deprecated functions.

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-02-23 21:56:01 +01:00
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00
b83d1e7714 removed some static const variables
and replaced by traditional macro constants.

Unfortunately, C doesn't consider `static const` to mean "constant"
2018-11-13 16:56:32 -08:00
355cb645bf fixed seekable format example 2018-03-15 16:29:28 -07:00
b0cb081dc8 last batch of header files changed to reflect new license (#825)
only remains to update contrib/linux-kernel (@terrelln)
2017-08-31 12:20:50 -07:00
470993c9b1 Add raw seek table construction API and parallel compression example 2017-04-28 12:17:09 -07:00
9626cf1ac6 Address @terrelln's comments 2017-04-13 17:48:35 -07:00
5ee1135f30 s/chunk/frame/ 2017-04-12 11:15:50 -07:00
e80f1d74b3 Address PR comments and minor fixes 2017-04-12 11:15:46 -07:00
d048fefef7 Move seekable format content to /contrib 2017-04-11 14:38:56 -07:00