From a9fb8d4c41bf3cc829adf20aea3768863d03cd0d Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Fri, 8 Mar 2024 14:55:38 -0800 Subject: [PATCH 1/3] new method to deal with offset==0 in this new method, when an `offset==0` is detected, it's converted into (size_t)(-1), instead of 1. The logic is that (size_t)(-1) is effectively an extremely large positive number, which will not pass the offset distance test at next stage (`execSequence()`). Checked the source code, and offset is always checked (as it should), using a formula which is not vulnerable to arithmetic overflow: ``` RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart), ``` The benefit is that such a case (offset==0) is always detected as corrupted data as opposed to relying on the checksum to detect the error. --- lib/decompress/zstd_decompress_block.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/decompress/zstd_decompress_block.c b/lib/decompress/zstd_decompress_block.c index 8d9fea5fd..76d7332e8 100644 --- a/lib/decompress/zstd_decompress_block.c +++ b/lib/decompress/zstd_decompress_block.c @@ -1305,7 +1305,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffsets, c } else { offset = ofBase + ll0 + BIT_readBitsFast(&seqState->DStream, 1); { size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset]; - temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */ + temp -= !temp; /* 0 is not valid: input corrupted => force offset to -1 => corruption detected at execSequence */ if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1]; seqState->prevOffset[1] = seqState->prevOffset[0]; seqState->prevOffset[0] = offset = temp; From d2f56ba44208f56b5370a9ef6ce0d2c32f283131 Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Fri, 8 Mar 2024 15:55:30 -0800 Subject: [PATCH 2/3] update documentation --- doc/decompressor_accepted_invalid_data.md | 14 ----- doc/decompressor_permissive.md | 62 +++++++++++++++++++++++ 2 files changed, 62 insertions(+), 14 deletions(-) delete mode 100644 doc/decompressor_accepted_invalid_data.md create mode 100644 doc/decompressor_permissive.md diff --git a/doc/decompressor_accepted_invalid_data.md b/doc/decompressor_accepted_invalid_data.md deleted file mode 100644 index f08f963d9..000000000 --- a/doc/decompressor_accepted_invalid_data.md +++ /dev/null @@ -1,14 +0,0 @@ -Decompressor Accepted Invalid Data -================================== - -This document describes the behavior of the reference decompressor in cases -where it accepts an invalid frame instead of reporting an error. - -Zero offsets converted to 1 ---------------------------- -If a sequence is decoded with `literals_length = 0` and `offset_value = 3` -while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is -invalid. - -The reference decompressor will process this case as if the computed -offset was `1`, including inserting `1` into the repeated offset list. \ No newline at end of file diff --git a/doc/decompressor_permissive.md b/doc/decompressor_permissive.md new file mode 100644 index 000000000..29846c31a --- /dev/null +++ b/doc/decompressor_permissive.md @@ -0,0 +1,62 @@ +Decompressor Permissiveness to Invalid Data +=========================================== + +This document describes the behavior of the reference decompressor in cases +where it accepts formally invalid data instead of reporting an error. + +While the reference decompressor *must* decode any compliant frame following +the specification, its ability to detect erroneous data is on a best effort +basis: the decoder may accept input data that would be formally invalid, +when it causes no risk to the decoder, and which detection would cost too much +complexity or speed regression. + +In practice, the vast majority of invalid data are detected, if only because +many corruption events are dangerous for the decoder process (such as +requesting an out-of-bound memory access) and many more are easy to check. + +This document lists a few known cases where invalid data was formerly accepted +by the decoder, and what has changed since. + + +Offset == 0 +----------- + +**Last affected version**: v1.5.5 + +**Produced by the reference compressor**: No + +**Example Frame**: `28b5 2ffd 2000 1500 0000 00` + +If a sequence is decoded with `literals_length = 0` and `offset_value = 3` +while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is +invalid. + +The reference decompressor up to v1.5.5 processes this case as if the computed +offset was `1`, including inserting `1` into the repeated offset list. +This prevents the output buffer from remaining uninitialized, thus denying a +potential attack vector from an untrusted source. +However, in the rare case where this scenario would be the outcome of a +transmission or storage error, the decoder relies on the checksum to detect +the error. + +In newer versions, this case is always detected and reported as a corruption error. + + +Non-zeroes reserved bits +------------------------ + +**Last affected version**: v1.5.5 + +**Produced by the reference compressor**: No + +**Example Frame**: `28b5 2ffd 2000 1500 0000 00` + +The Sequences section of each block has a header, and one of its elements is a +byte, which describes the compression mode of each symbol. +This byte contains 2 reserved bits which must be set to zero. + +The reference decompressor up to v1.5.5 just ignores these 2 bits. +This behavior has no consequence for the rest of the frame decoding process. + +In newer versions, the 2 reserved bits are actively checked for value zero, +and the decoder reports a corruption error if they are not. From eb5f7a7fa278ab76c3390555f36162c638f63b53 Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Sat, 9 Mar 2024 00:33:44 -0800 Subject: [PATCH 3/3] produced golden sample for the offset==0 decoder test is correctly detected as corrupted by new version, and is accepted (changed into offset==1) by older version. updated documentation accordingly, with an hexadecimal representation. --- doc/decompressor_permissive.md | 4 +--- tests/golden-decompression-errors/off0.bin.zst | Bin 0 -> 17 bytes 2 files changed, 1 insertion(+), 3 deletions(-) create mode 100644 tests/golden-decompression-errors/off0.bin.zst diff --git a/doc/decompressor_permissive.md b/doc/decompressor_permissive.md index 29846c31a..bd77165f0 100644 --- a/doc/decompressor_permissive.md +++ b/doc/decompressor_permissive.md @@ -25,7 +25,7 @@ Offset == 0 **Produced by the reference compressor**: No -**Example Frame**: `28b5 2ffd 2000 1500 0000 00` +**Example Frame**: `28b5 2ffd 0000 4500 0008 0002 002f 430b ae` If a sequence is decoded with `literals_length = 0` and `offset_value = 3` while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is @@ -49,8 +49,6 @@ Non-zeroes reserved bits **Produced by the reference compressor**: No -**Example Frame**: `28b5 2ffd 2000 1500 0000 00` - The Sequences section of each block has a header, and one of its elements is a byte, which describes the compression mode of each symbol. This byte contains 2 reserved bits which must be set to zero. diff --git a/tests/golden-decompression-errors/off0.bin.zst b/tests/golden-decompression-errors/off0.bin.zst new file mode 100644 index 0000000000000000000000000000000000000000..13493fb336c6e3e339c1c224a1a2ada1a8b57a0b GIT binary patch literal 17 YcmdPcs{faP!Igo5gMo=b-