1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-18 17:42:25 +03:00

Fix two issues in TOAST decompression.

pglz_maximum_compressed_size() potentially underestimated the amount
of compressed data required to produce N bytes of decompressed data;
this is a fault in commit 11a078cf8.

Separately from that, pglz_decompress() failed to protect itself
against corrupt compressed data, particularly off == 0 in a match
tag.  Commit c60e520f6 turned such a situation into an infinite loop,
where before it'd just have resulted in garbage output.

The combination of these two bugs seems like it may explain bug 
from Tom Vijlbrief, though it's impossible to be quite sure without
direct inspection of the failing session.  (One needs to assume that
the pglz_maximum_compressed_size() bug caused us to fail to fetch the
second byte of a match tag, and what happened to be there instead was
a zero.  The reported infinite loop is hard to explain without off == 0,
though.)

Aside from fixing the bugs, rewrite associated comments for more
clarity.

Back-patch to v13 where both these commits landed.

Discussion: https://postgr.es/m/16694-f107871e499ec114@postgresql.org
This commit is contained in:
Tom Lane
2020-11-01 18:38:42 -05:00
parent 7f4235032f
commit dfc797730f

@ -710,7 +710,6 @@ pglz_decompress(const char *source, int32 slen, char *dest,
for (ctrlc = 0; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++) for (ctrlc = 0; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)
{ {
if (ctrl & 1) if (ctrl & 1)
{ {
/* /*
@ -732,41 +731,62 @@ pglz_decompress(const char *source, int32 slen, char *dest,
len += *sp++; len += *sp++;
/* /*
* Now we copy the bytes specified by the tag from OUTPUT to * Check for corrupt data: if we fell off the end of the
* OUTPUT (copy len bytes from dp - off to dp). The copied * source, or if we obtained off = 0, we have problems. (We
* areas could overlap, to preven possible uncertainty, we * must check this, else we risk an infinite loop below in the
* copy only non-overlapping regions. * face of corrupt data.)
*/
if (sp > srcend || off == 0)
break;
/*
* Don't emit more data than requested.
*/ */
len = Min(len, destend - dp); len = Min(len, destend - dp);
/*
* Now we copy the bytes specified by the tag from OUTPUT to
* OUTPUT (copy len bytes from dp - off to dp). The copied
* areas could overlap, so to avoid undefined behavior in
* memcpy(), be careful to copy only non-overlapping regions.
*
* Note that we cannot use memmove() instead, since while its
* behavior is well-defined, it's also not what we want.
*/
while (off < len) while (off < len)
{ {
/*--------- /*
* When offset is smaller than length - source and * We can safely copy "off" bytes since that clearly
* destination regions overlap. memmove() is resolving * results in non-overlapping source and destination.
* this overlap in an incompatible way with pglz. Thus we
* resort to memcpy()-ing non-overlapping regions.
*
* Consider input: 112341234123412341234
* At byte 5 here ^ we have match with length 16 and
* offset 4. 11234M(len=16, off=4)
* We are decoding first period of match and rewrite match
* 112341234M(len=12, off=8)
*
* The same match is now at position 9, it points to the
* same start byte of output, but from another position:
* the offset is doubled.
*
* We iterate through this offset growth until we can
* proceed to usual memcpy(). If we would try to decode
* the match at byte 5 (len=16, off=4) by memmove() we
* would issue memmove(5, 1, 16) which would produce
* 112341234XXXXXXXXXXXX, where series of X is 12
* undefined bytes, that were at bytes [5:17].
*---------
*/ */
memcpy(dp, dp - off, off); memcpy(dp, dp - off, off);
len -= off; len -= off;
dp += off; dp += off;
/*----------
* This bit is less obvious: we can double "off" after
* each such step. Consider this raw input:
* 112341234123412341234
* This will be encoded as 5 literal bytes "11234" and
* then a match tag with length 16 and offset 4. After
* memcpy'ing the first 4 bytes, we will have emitted
* 112341234
* so we can double "off" to 8, then after the next step
* we have emitted
* 11234123412341234
* Then we can double "off" again, after which it is more
* than the remaining "len" so we fall out of this loop
* and finish with a non-overlapping copy of the
* remainder. In general, a match tag with off < len
* implies that the decoded data has a repeat length of
* "off". We can handle 1, 2, 4, etc repetitions of the
* repeated string per memcpy until we get to a situation
* where the final copy step is non-overlapping.
*
* (Another way to understand this is that we are keeping
* the copy source point dp - off the same throughout.)
*----------
*/
off += off; off += off;
} }
memcpy(dp, dp - off, len); memcpy(dp, dp - off, len);
@ -820,21 +840,32 @@ pglz_decompress(const char *source, int32 slen, char *dest,
int32 int32
pglz_maximum_compressed_size(int32 rawsize, int32 total_compressed_size) pglz_maximum_compressed_size(int32 rawsize, int32 total_compressed_size)
{ {
int32 compressed_size; int64 compressed_size;
/* /*
* pglz uses one control bit per byte, so we need (rawsize * 9) bits. We * pglz uses one control bit per byte, so if the entire desired prefix is
* care about bytes though, so we add 7 to make sure we include the last * represented as literal bytes, we'll need (rawsize * 9) bits. We care
* incomplete byte (integer division rounds down). * about bytes though, so be sure to round up not down.
* *
* XXX Use int64 to prevent overflow during calculation. * Use int64 here to prevent overflow during calculation.
*/ */
compressed_size = (int32) ((int64) rawsize * 9 + 7) / 8; compressed_size = ((int64) rawsize * 9 + 7) / 8;
/*
* The above fails to account for a corner case: we could have compressed
* data that starts with N-1 or N-2 literal bytes and then has a match tag
* of 2 or 3 bytes. It's therefore possible that we need to fetch 1 or 2
* more bytes in order to have the whole match tag. (Match tags earlier
* in the compressed data don't cause a problem, since they should
* represent more decompressed bytes than they occupy themselves.)
*/
compressed_size += 2;
/* /*
* Maximum compressed size can't be larger than total compressed size. * Maximum compressed size can't be larger than total compressed size.
* (This also ensures that our result fits in int32.)
*/ */
compressed_size = Min(compressed_size, total_compressed_size); compressed_size = Min(compressed_size, total_compressed_size);
return compressed_size; return (int32) compressed_size;
} }