diff --git a/doc/decompressor_accepted_invalid_data.md b/doc/decompressor_accepted_invalid_data.md deleted file mode 100644 index f08f963d9..000000000 --- a/doc/decompressor_accepted_invalid_data.md +++ /dev/null @@ -1,14 +0,0 @@ -Decompressor Accepted Invalid Data -================================== - -This document describes the behavior of the reference decompressor in cases -where it accepts an invalid frame instead of reporting an error. - -Zero offsets converted to 1 ---------------------------- -If a sequence is decoded with `literals_length = 0` and `offset_value = 3` -while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is -invalid. - -The reference decompressor will process this case as if the computed -offset was `1`, including inserting `1` into the repeated offset list. \ No newline at end of file diff --git a/doc/decompressor_permissive.md b/doc/decompressor_permissive.md new file mode 100644 index 000000000..bd77165f0 --- /dev/null +++ b/doc/decompressor_permissive.md @@ -0,0 +1,60 @@ +Decompressor Permissiveness to Invalid Data +=========================================== + +This document describes the behavior of the reference decompressor in cases +where it accepts formally invalid data instead of reporting an error. + +While the reference decompressor *must* decode any compliant frame following +the specification, its ability to detect erroneous data is on a best effort +basis: the decoder may accept input data that would be formally invalid, +when it causes no risk to the decoder, and which detection would cost too much +complexity or speed regression. + +In practice, the vast majority of invalid data are detected, if only because +many corruption events are dangerous for the decoder process (such as +requesting an out-of-bound memory access) and many more are easy to check. + +This document lists a few known cases where invalid data was formerly accepted +by the decoder, and what has changed since. + + +Offset == 0 +----------- + +**Last affected version**: v1.5.5 + +**Produced by the reference compressor**: No + +**Example Frame**: `28b5 2ffd 0000 4500 0008 0002 002f 430b ae` + +If a sequence is decoded with `literals_length = 0` and `offset_value = 3` +while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is +invalid. + +The reference decompressor up to v1.5.5 processes this case as if the computed +offset was `1`, including inserting `1` into the repeated offset list. +This prevents the output buffer from remaining uninitialized, thus denying a +potential attack vector from an untrusted source. +However, in the rare case where this scenario would be the outcome of a +transmission or storage error, the decoder relies on the checksum to detect +the error. + +In newer versions, this case is always detected and reported as a corruption error. + + +Non-zeroes reserved bits +------------------------ + +**Last affected version**: v1.5.5 + +**Produced by the reference compressor**: No + +The Sequences section of each block has a header, and one of its elements is a +byte, which describes the compression mode of each symbol. +This byte contains 2 reserved bits which must be set to zero. + +The reference decompressor up to v1.5.5 just ignores these 2 bits. +This behavior has no consequence for the rest of the frame decoding process. + +In newer versions, the 2 reserved bits are actively checked for value zero, +and the decoder reports a corruption error if they are not. diff --git a/lib/decompress/zstd_decompress_block.c b/lib/decompress/zstd_decompress_block.c index 8d9fea5fd..76d7332e8 100644 --- a/lib/decompress/zstd_decompress_block.c +++ b/lib/decompress/zstd_decompress_block.c @@ -1305,7 +1305,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffsets, c } else { offset = ofBase + ll0 + BIT_readBitsFast(&seqState->DStream, 1); { size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset]; - temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */ + temp -= !temp; /* 0 is not valid: input corrupted => force offset to -1 => corruption detected at execSequence */ if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1]; seqState->prevOffset[1] = seqState->prevOffset[0]; seqState->prevOffset[0] = offset = temp; diff --git a/tests/golden-decompression-errors/off0.bin.zst b/tests/golden-decompression-errors/off0.bin.zst new file mode 100644 index 000000000..13493fb33 Binary files /dev/null and b/tests/golden-decompression-errors/off0.bin.zst differ