lib/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-08-01 09:47:01 +03:00

Author	SHA1	Message	Date
Yann Collet	477a01067f	codemod: symbolEncodingType_e -> SymbolEncodingType_e	2024-12-20 10:36:56 -08:00
Yann Collet	a2245721ca	codemod: seqStore_t -> SeqStore_t same idea, SeqStore_t is a type name, it should start with a Capital letter.	2024-12-20 10:36:55 -08:00
Yann Collet	9671813375	codemod: seqDef -> SeqDef SeqDef is a type name, so it should start with a Capital letter. It's an internal symbol, no impact on public API.	2024-12-20 10:36:55 -08:00
Pawel Czarnecki	1f5df587fa	tests/decodecorpus: add more advanced options - add option to force a specific block type - add option to force a specific literal tyle - add option to generate only frame headers - add option to skip generating magic numbers Co-authored-by: Maciej Dudek <mdudek@antmicro.com> Co-authored-by: Pawel Czarnecki <pczarnecki@antmicro.com> Co-authored-by: Robert Winkler <rwinkler@antmicro.com> Co-authored-by: Roman Dobrodii <rdobrodii@antmicro.com>	2024-07-26 09:47:16 +02:00
Nick Terrell	43118da8a7	Stop suppressing pointer-overflow UBSAN errors * Remove all pointer-overflow suppressions from our UBSAN builds/tests. * Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress pointer-overflow at a per-function level. This is a superior approach because it also applies to users who build zstd with UBSAN. * Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions. The end goal is to only tag these functions with `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions that rely on pointer overflow, and gradually transition to using these. * Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the pointer may be `NULL`. * Fix all the fuzzer issues that came up. I'm sure there will be a lot more, but these are the ones that came up within a few minutes of running the fuzzers, and while running GitHub CI.	2023-09-28 17:35:05 -04:00
Yann Collet	1f83b7cfc4	fix a minor inefficiency in compress_superblock and in `decodecorpus`: the specific case `nbSeq=127` can be represented using the 1-byte format. Note that both the 1-byte and the 2-bytes formats are valid to represent this case, so there was no "error", produced data remains valid, it's just that the 1-byte format is more efficient. fix #3667 Credit to @ip7z for finding this issue.	2023-06-05 09:51:52 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	bcfb7ad03c	refactor timefn The timer storage type is no longer dependent on OS. This will make it possible to re-enable posix precise timers since the timer storage type will no longer be sensible to #include order. See #3168 for details of pbs of previous interface. Suggestion by @terrelln	2023-01-12 19:24:31 -08:00
Yann Collet	8b130009e3	minor simplification refactoring for timefn `UTIL_getSpanTimeMicro()` can be factored in a generic way, reducing OS-dependent code.	2023-01-06 16:12:54 -08:00
Nick Terrell	40a7188130	Fix `make clangbuild` & add CI Fix the errors for: * `-Wdocumentation` * `-Wconversion` except `-Wsign-conversion`	2022-12-21 17:31:04 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Yann Collet	7a18d709ae	updated all names to offBase convention	2021-12-29 17:30:43 -08:00
Yann Collet	681c81f06c	abstracted storeSeq() sumtype numeric representation from decodecorpus.c	2021-12-28 11:58:33 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00
Yann Collet	aeff128331	change seqDef.offset into seqDef.offBase to better reflect the value stored in this field.	2021-12-23 17:56:08 -08:00
Yann Collet	e145b58cfd	changed seqDef.matchLength into seqDef.mlBase since this is effectively what is stored in this field (== matchLength - MINMATCH). This makes it clearer what needs to be done when reading from / writing to this field.	2021-12-23 13:39:46 -08:00
Yann Collet	b77fcac61f	change ZSTD_storeSeq() interface to accept matchLength instead of mlBase. This removes the need to do `- MINMATCH` at every call site. The new interface contract is checked with an `assert()`.	2021-12-23 12:03:33 -08:00
Nick Terrell	46f2710562	[HUF] Improve Huffman encoding speed Improve Huffman encoding speed by 20% for gcc and 10% for clang. \| Compiler \| Benchmark \| Config \| Dataset \| Ratio \| Speed MB/s (dev) \| Speed MB/s (huf-cspeed) \| Speed MB/s (huf-cspeed - dev) \| \|----------\|-------------------\|---------\|-------------\|-------\|------------------\|-------------------------\|-------------------------------\| \| gcc \| compress \| level_1 \| enwik7 \| 2.43 \| 253.70 \| 258.72 \| 2.0% \| \| gcc \| compress \| level_1 \| silesia \| 2.88 \| 341.90 \| 348.15 \| 1.8% \| \| gcc \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 761.83 \| 912.76 \| 19.8% \| \| gcc \| compress_literals \| level_1 \| silesia \| 1.28 \| 754.83 \| 902.37 \| 19.5% \| \| gcc \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 502.81 \| 552.79 \| 9.9% \| \| gcc \| compress_literals \| level_7 \| silesia \| 1.11 \| 675.97 \| 776.44 \| 14.9% \| \| clang \| compress \| level_1 \| enwik7 \| 2.43 \| 277.54 \| 280.98 \| 1.2% \| \| clang \| compress \| level_1 \| silesia \| 2.88 \| 369.98 \| 375.46 \| 1.5% \| \| clang \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 828.83 \| 918.41 \| 10.8% \| \| clang \| compress_literals \| level_1 \| silesia \| 1.28 \| 815.81 \| 905.41 \| 11.0% \| \| clang \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 533.13 \| 553.30 \| 3.8% \| \| clang \| compress_literals \| level_7 \| silesia \| 1.11 \| 714.52 \| 775.38 \| 8.5% \|	2021-07-27 15:10:35 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Yann Collet	5c0a3489a5	fix aliasing warning in decodecorpus	2020-12-04 19:21:40 -08:00
Nick Terrell	a90779397a	[lib] Reduce zstd stack usage by 1KB	2020-09-09 14:35:39 -07:00
Nick Terrell	8f8bd2d1ac	[regression] Update results.csv	2020-08-20 12:41:35 -07:00
Meghna Malhotra	cc7c29595d	Fixed tests to use correct workspace size	2020-05-01 13:45:48 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Nick Terrell	e068bd01df	[tests] Fix decodecorpus	2019-09-20 01:09:47 -07:00
Vivek Miglani	a3ce0c9d04	Fixing decodecorpus test issue	2019-07-18 14:32:09 -07:00
Ephraim Park	734eff70b8	enable repeat mode on rle	2019-06-26 16:39:00 -07:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
Yann Collet	59a7116cc2	benchfn dependencies reduced to only timefn benchfn used to rely on mem.h, and util, which in turn relied on platform.h. Using benchfn outside of zstd required to bring all these dependencies. Now, dependency is reduced to timefn only. This required to create a separate timefn from util, and rewrite benchfn and timefn to no longer need mem.h. Separating timefn from util has a wide effect accross the code base, as usage of time functions is widespread. A lot of build scripts had to be updated to also include timefn.	2019-04-10 12:37:03 -07:00
W. Felix Handte	03e040a966	Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(...	2019-01-28 17:33:01 -05:00
Yann Collet	ededcfca57	fix confusion between unsigned <-> U32 as suggested in #1441. generally U32 and unsigned are the same thing, except when they are not ... case : 32-bit compilation for MIPS (uint32_t == unsigned long) A vast majority of transformation consists in transforming U32 into unsigned. In rare cases, it's the other way around (typically for internal code, such as seeds). Among a few issues this patches solves : - some parameters were declared with type `unsigned` in .h, but with type `U32` in their implementation .c . - some parameters have type unsigned*, but the caller user a pointer to U32 instead. These fixes are useful. However, the bulk of changes is about %u formating, which requires unsigned type, but generally receives U32 values instead, often just for brevity (U32 is shorter than unsigned). These changes are generally minor, or even annoying. As a consequence, the amount of code changed is larger than I would expect for such a patch. Testing is also a pain : it requires manually modifying `mem.h`, in order to lie about `U32` and force it to be an `unsigned long` typically. On a 64-bit system, this will break the equivalence unsigned == U32. Unfortunately, it will also break a few static_assert(), controlling structure sizes. So it also requires modifying `debug.h` to make `static_assert()` a noop. And then reverting these changes. So it's inconvenient, and as a consequence, this property is currently not checked during CI tests. Therefore, these problems can emerge again in the future. I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests. It's another restriction for coding, adding more frustration during merge tests, since most platforms don't need this distinction (hence contributor will not see it), and while this can matter in theory, the number of platforms impacted seems minimal. Thoughts ?	2018-12-21 18:09:41 -08:00
Yann Collet	7b74405150	refactor HUF_compress_internal for clarity changed workspace parameter convention to always provide workspaceSize, so that size can be explicitly checked. Also, use more enum to make the meaning of some parameters more explicit.	2018-10-26 13:21:37 -07:00
Yann Collet	f181799082	fix decodecorpus incorrect frame generation fix #1379 decodecorpus was generating one extraneous byte when `nbSeq==0`. This is disallowed by the specification. The reference decoder was just skipping the extraneous byte. It is now stricter, and flag such situation as an error.	2018-10-20 18:56:21 -07:00
Nick Terrell	e3b5286197	Fix decodecorpus	2018-08-28 13:56:47 -07:00
Nick Terrell	3b56bb1e4c	Fix decodecorpus	2018-08-23 17:48:06 -07:00
Yann Collet	2d76defbfe	grouped all histogram functions into hist.c renamed functions with HIST_* prefix	2018-06-13 19:49:31 -04:00
Nick Terrell	dab8cfa3c7	Combine definitions of SEC_TO_MICRO	2017-11-30 19:40:53 -08:00
Nick Terrell	9a2f6f477b	Use util.h for timing	2017-11-30 14:57:25 -08:00
Yann Collet	7f6a783862	fixed a small error in decodeCorpus a compressed block must be strictly smaller than its decompressed size.	2017-10-07 15:19:52 -07:00
Nick Terrell	bbe77212ef	[libzstd] Increase MaxOff	2017-09-25 13:36:18 -07:00
Stella Lau	963558a072	Fix implicit conversion error	2017-09-13 16:01:16 -07:00
Stella Lau	40bf0ced7d	Add flag to limit max decompressed size in decodeCorpus	2017-09-13 15:16:56 -07:00
Stella Lau	e89065506e	Make decodecorpus generate raw compressed blocks	2017-09-12 17:18:45 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Yann Collet	32fb407c9d	updated a bunch of headers for the new license	2017-08-18 16:52:05 -07:00
Paul Cruz	7ac4724bd2	removed fnum from DISPLAY statements	2017-06-28 13:00:49 -07:00

1 2 3

129 Commits