fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes

The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.
2025-07-30 22:23:13 +03:00 · 2023-06-05 16:03:00 -07:00
parent 3e815f5b3a
commit 3732a08f5b
6 changed files with 44 additions and 18 deletions
--- a/doc/decompressor_errata.md
+++ b/doc/decompressor_errata.md
@ -12,6 +12,26 @@ Each entry will contain:
 The document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first.


+No sequence using the 2-bytes format
+------------------------------------------------
+
+**Last affected version**: v1.5.5
+
+**Affected decompressor component(s)**: Library & CLI
+
+**Produced by the reference compressor**: No
+
+**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst
+
+The zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block
+if the value 0 is encoded using the 2-bytes format.
+Instead, it should immediately end the sequence section, and move on to next block.
+
+This situation was never generated by the reference compressor,
+because representing 0 sequences with the 2-bytes format is inefficient
+(the 1-byte format is always used in this case).
+
+
 Compressed block with a size of exactly 128 KB
 ------------------------------------------------

@ -32,6 +52,7 @@ These blocks used to be disallowed by the spec up until spec version 0.3.2 when

 > A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).

+
 Compressed block with 0 literals and 0 sequences
 ------------------------------------------------

@ -51,6 +72,7 @@ Additionally, these blocks were disallowed by the spec up until spec version 0.3

 > A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).

+
 First block is RLE block
 ------------------------

@ -72,6 +94,7 @@ block.

 https://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535

+
 Tiny FSE Table & Block
 ----------------------