mirror of
https://github.com/postgres/postgres.git
synced 2025-11-10 17:42:29 +03:00
Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
This commit is contained in:
@@ -213,7 +213,10 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
|
||||
(atttype->typstorage == TYPSTORAGE_EXTENDED ||
|
||||
atttype->typstorage == TYPSTORAGE_MAIN))
|
||||
{
|
||||
Datum cvalue = toast_compress_datum(value);
|
||||
Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
|
||||
keyno);
|
||||
Datum cvalue = toast_compress_datum(value,
|
||||
att->attcompression);
|
||||
|
||||
if (DatumGetPointer(cvalue) != NULL)
|
||||
{
|
||||
|
||||
@@ -25,6 +25,7 @@ OBJS = \
|
||||
scankey.o \
|
||||
session.o \
|
||||
syncscan.o \
|
||||
toast_compression.o \
|
||||
toast_internals.o \
|
||||
tupconvert.o \
|
||||
tupdesc.o
|
||||
|
||||
@@ -240,14 +240,20 @@ detoast_attr_slice(struct varlena *attr,
|
||||
*/
|
||||
if (slicelimit >= 0)
|
||||
{
|
||||
int32 max_size;
|
||||
int32 max_size = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
|
||||
|
||||
/*
|
||||
* Determine maximum amount of compressed data needed for a prefix
|
||||
* of a given length (after decompression).
|
||||
*
|
||||
* At least for now, if it's LZ4 data, we'll have to fetch the
|
||||
* whole thing, because there doesn't seem to be an API call to
|
||||
* determine how much compressed data we need to be sure of being
|
||||
* able to decompress the required slice.
|
||||
*/
|
||||
max_size = pglz_maximum_compressed_size(slicelimit,
|
||||
toast_pointer.va_extsize);
|
||||
if (VARATT_EXTERNAL_GET_COMPRESSION(toast_pointer) ==
|
||||
TOAST_PGLZ_COMPRESSION_ID)
|
||||
max_size = pglz_maximum_compressed_size(slicelimit, max_size);
|
||||
|
||||
/*
|
||||
* Fetch enough compressed slices (compressed marker will get set
|
||||
@@ -347,7 +353,7 @@ toast_fetch_datum(struct varlena *attr)
|
||||
/* Must copy to access aligned fields */
|
||||
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
|
||||
|
||||
attrsize = toast_pointer.va_extsize;
|
||||
attrsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
|
||||
|
||||
result = (struct varlena *) palloc(attrsize + VARHDRSZ);
|
||||
|
||||
@@ -408,7 +414,7 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
|
||||
*/
|
||||
Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
|
||||
|
||||
attrsize = toast_pointer.va_extsize;
|
||||
attrsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
|
||||
|
||||
if (sliceoffset >= attrsize)
|
||||
{
|
||||
@@ -418,8 +424,8 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
|
||||
|
||||
/*
|
||||
* When fetching a prefix of a compressed external datum, account for the
|
||||
* rawsize tracking amount of raw data, which is stored at the beginning
|
||||
* as an int32 value).
|
||||
* space required by va_tcinfo, which is stored at the beginning as an
|
||||
* int32 value.
|
||||
*/
|
||||
if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) && slicelength > 0)
|
||||
slicelength = slicelength + sizeof(int32);
|
||||
@@ -464,21 +470,24 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
|
||||
static struct varlena *
|
||||
toast_decompress_datum(struct varlena *attr)
|
||||
{
|
||||
struct varlena *result;
|
||||
ToastCompressionId cmid;
|
||||
|
||||
Assert(VARATT_IS_COMPRESSED(attr));
|
||||
|
||||
result = (struct varlena *)
|
||||
palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
|
||||
SET_VARSIZE(result, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
|
||||
|
||||
if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
|
||||
TOAST_COMPRESS_SIZE(attr),
|
||||
VARDATA(result),
|
||||
TOAST_COMPRESS_RAWSIZE(attr), true) < 0)
|
||||
elog(ERROR, "compressed data is corrupted");
|
||||
|
||||
return result;
|
||||
/*
|
||||
* Fetch the compression method id stored in the compression header and
|
||||
* decompress the data using the appropriate decompression routine.
|
||||
*/
|
||||
cmid = TOAST_COMPRESS_METHOD(attr);
|
||||
switch (cmid)
|
||||
{
|
||||
case TOAST_PGLZ_COMPRESSION_ID:
|
||||
return pglz_decompress_datum(attr);
|
||||
case TOAST_LZ4_COMPRESSION_ID:
|
||||
return lz4_decompress_datum(attr);
|
||||
default:
|
||||
elog(ERROR, "invalid compression method id %d", cmid);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -492,22 +501,24 @@ toast_decompress_datum(struct varlena *attr)
|
||||
static struct varlena *
|
||||
toast_decompress_datum_slice(struct varlena *attr, int32 slicelength)
|
||||
{
|
||||
struct varlena *result;
|
||||
int32 rawsize;
|
||||
ToastCompressionId cmid;
|
||||
|
||||
Assert(VARATT_IS_COMPRESSED(attr));
|
||||
|
||||
result = (struct varlena *) palloc(slicelength + VARHDRSZ);
|
||||
|
||||
rawsize = pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
|
||||
VARSIZE(attr) - TOAST_COMPRESS_HDRSZ,
|
||||
VARDATA(result),
|
||||
slicelength, false);
|
||||
if (rawsize < 0)
|
||||
elog(ERROR, "compressed data is corrupted");
|
||||
|
||||
SET_VARSIZE(result, rawsize + VARHDRSZ);
|
||||
return result;
|
||||
/*
|
||||
* Fetch the compression method id stored in the compression header and
|
||||
* decompress the data slice using the appropriate decompression routine.
|
||||
*/
|
||||
cmid = TOAST_COMPRESS_METHOD(attr);
|
||||
switch (cmid)
|
||||
{
|
||||
case TOAST_PGLZ_COMPRESSION_ID:
|
||||
return pglz_decompress_datum_slice(attr, slicelength);
|
||||
case TOAST_LZ4_COMPRESSION_ID:
|
||||
return lz4_decompress_datum_slice(attr, slicelength);
|
||||
default:
|
||||
elog(ERROR, "invalid compression method id %d", cmid);
|
||||
}
|
||||
}
|
||||
|
||||
/* ----------
|
||||
@@ -589,7 +600,7 @@ toast_datum_size(Datum value)
|
||||
struct varatt_external toast_pointer;
|
||||
|
||||
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
|
||||
result = toast_pointer.va_extsize;
|
||||
result = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
|
||||
}
|
||||
else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
|
||||
{
|
||||
|
||||
@@ -103,7 +103,8 @@ index_form_tuple(TupleDesc tupleDescriptor,
|
||||
(att->attstorage == TYPSTORAGE_EXTENDED ||
|
||||
att->attstorage == TYPSTORAGE_MAIN))
|
||||
{
|
||||
Datum cvalue = toast_compress_datum(untoasted_values[i]);
|
||||
Datum cvalue = toast_compress_datum(untoasted_values[i],
|
||||
att->attcompression);
|
||||
|
||||
if (DatumGetPointer(cvalue) != NULL)
|
||||
{
|
||||
|
||||
313
src/backend/access/common/toast_compression.c
Normal file
313
src/backend/access/common/toast_compression.c
Normal file
@@ -0,0 +1,313 @@
|
||||
/*-------------------------------------------------------------------------
|
||||
*
|
||||
* toast_compression.c
|
||||
* Functions for toast compression.
|
||||
*
|
||||
* Copyright (c) 2021, PostgreSQL Global Development Group
|
||||
*
|
||||
*
|
||||
* IDENTIFICATION
|
||||
* src/backend/access/common/toast_compression.c
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
*/
|
||||
#include "postgres.h"
|
||||
|
||||
#ifdef USE_LZ4
|
||||
#include <lz4.h>
|
||||
#endif
|
||||
|
||||
#include "access/detoast.h"
|
||||
#include "access/toast_compression.h"
|
||||
#include "common/pg_lzcompress.h"
|
||||
#include "fmgr.h"
|
||||
#include "utils/builtins.h"
|
||||
|
||||
/* Compile-time default */
|
||||
char *default_toast_compression = DEFAULT_TOAST_COMPRESSION;
|
||||
|
||||
/*
|
||||
* Compress a varlena using PGLZ.
|
||||
*
|
||||
* Returns the compressed varlena, or NULL if compression fails.
|
||||
*/
|
||||
struct varlena *
|
||||
pglz_compress_datum(const struct varlena *value)
|
||||
{
|
||||
int32 valsize,
|
||||
len;
|
||||
struct varlena *tmp = NULL;
|
||||
|
||||
valsize = VARSIZE_ANY_EXHDR(DatumGetPointer(value));
|
||||
|
||||
/*
|
||||
* No point in wasting a palloc cycle if value size is outside the allowed
|
||||
* range for compression.
|
||||
*/
|
||||
if (valsize < PGLZ_strategy_default->min_input_size ||
|
||||
valsize > PGLZ_strategy_default->max_input_size)
|
||||
return NULL;
|
||||
|
||||
/*
|
||||
* Figure out the maximum possible size of the pglz output, add the bytes
|
||||
* that will be needed for varlena overhead, and allocate that amount.
|
||||
*/
|
||||
tmp = (struct varlena *) palloc(PGLZ_MAX_OUTPUT(valsize) +
|
||||
VARHDRSZ_COMPRESS);
|
||||
|
||||
len = pglz_compress(VARDATA_ANY(value),
|
||||
valsize,
|
||||
(char *) tmp + VARHDRSZ_COMPRESS,
|
||||
NULL);
|
||||
if (len < 0)
|
||||
{
|
||||
pfree(tmp);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
SET_VARSIZE_COMPRESSED(tmp, len + VARHDRSZ_COMPRESS);
|
||||
|
||||
return tmp;
|
||||
}
|
||||
|
||||
/*
|
||||
* Decompress a varlena that was compressed using PGLZ.
|
||||
*/
|
||||
struct varlena *
|
||||
pglz_decompress_datum(const struct varlena *value)
|
||||
{
|
||||
struct varlena *result;
|
||||
int32 rawsize;
|
||||
|
||||
/* allocate memory for the uncompressed data */
|
||||
result = (struct varlena *) palloc(VARRAWSIZE_4B_C(value) + VARHDRSZ);
|
||||
|
||||
/* decompress the data */
|
||||
rawsize = pglz_decompress((char *) value + VARHDRSZ_COMPRESS,
|
||||
VARSIZE(value) - VARHDRSZ_COMPRESS,
|
||||
VARDATA(result),
|
||||
VARRAWSIZE_4B_C(value), true);
|
||||
if (rawsize < 0)
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_DATA_CORRUPTED),
|
||||
errmsg_internal("compressed pglz data is corrupt")));
|
||||
|
||||
SET_VARSIZE(result, rawsize + VARHDRSZ);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/*
|
||||
* Decompress part of a varlena that was compressed using PGLZ.
|
||||
*/
|
||||
struct varlena *
|
||||
pglz_decompress_datum_slice(const struct varlena *value,
|
||||
int32 slicelength)
|
||||
{
|
||||
struct varlena *result;
|
||||
int32 rawsize;
|
||||
|
||||
/* allocate memory for the uncompressed data */
|
||||
result = (struct varlena *) palloc(slicelength + VARHDRSZ);
|
||||
|
||||
/* decompress the data */
|
||||
rawsize = pglz_decompress((char *) value + VARHDRSZ_COMPRESS,
|
||||
VARSIZE(value) - VARHDRSZ_COMPRESS,
|
||||
VARDATA(result),
|
||||
slicelength, false);
|
||||
if (rawsize < 0)
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_DATA_CORRUPTED),
|
||||
errmsg_internal("compressed pglz data is corrupt")));
|
||||
|
||||
SET_VARSIZE(result, rawsize + VARHDRSZ);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/*
|
||||
* Compress a varlena using LZ4.
|
||||
*
|
||||
* Returns the compressed varlena, or NULL if compression fails.
|
||||
*/
|
||||
struct varlena *
|
||||
lz4_compress_datum(const struct varlena *value)
|
||||
{
|
||||
#ifndef USE_LZ4
|
||||
NO_LZ4_SUPPORT();
|
||||
#else
|
||||
int32 valsize;
|
||||
int32 len;
|
||||
int32 max_size;
|
||||
struct varlena *tmp = NULL;
|
||||
|
||||
valsize = VARSIZE_ANY_EXHDR(value);
|
||||
|
||||
/*
|
||||
* Figure out the maximum possible size of the LZ4 output, add the bytes
|
||||
* that will be needed for varlena overhead, and allocate that amount.
|
||||
*/
|
||||
max_size = LZ4_compressBound(valsize);
|
||||
tmp = (struct varlena *) palloc(max_size + VARHDRSZ_COMPRESS);
|
||||
|
||||
len = LZ4_compress_default(VARDATA_ANY(value),
|
||||
(char *) tmp + VARHDRSZ_COMPRESS,
|
||||
valsize, max_size);
|
||||
if (len <= 0)
|
||||
elog(ERROR, "lz4 compression failed");
|
||||
|
||||
/* data is incompressible so just free the memory and return NULL */
|
||||
if (len > valsize)
|
||||
{
|
||||
pfree(tmp);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
SET_VARSIZE_COMPRESSED(tmp, len + VARHDRSZ_COMPRESS);
|
||||
|
||||
return tmp;
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Decompress a varlena that was compressed using LZ4.
|
||||
*/
|
||||
struct varlena *
|
||||
lz4_decompress_datum(const struct varlena *value)
|
||||
{
|
||||
#ifndef USE_LZ4
|
||||
NO_LZ4_SUPPORT();
|
||||
#else
|
||||
int32 rawsize;
|
||||
struct varlena *result;
|
||||
|
||||
/* allocate memory for the uncompressed data */
|
||||
result = (struct varlena *) palloc(VARRAWSIZE_4B_C(value) + VARHDRSZ);
|
||||
|
||||
/* decompress the data */
|
||||
rawsize = LZ4_decompress_safe((char *) value + VARHDRSZ_COMPRESS,
|
||||
VARDATA(result),
|
||||
VARSIZE(value) - VARHDRSZ_COMPRESS,
|
||||
VARRAWSIZE_4B_C(value));
|
||||
if (rawsize < 0)
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_DATA_CORRUPTED),
|
||||
errmsg_internal("compressed lz4 data is corrupt")));
|
||||
|
||||
|
||||
SET_VARSIZE(result, rawsize + VARHDRSZ);
|
||||
|
||||
return result;
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Decompress part of a varlena that was compressed using LZ4.
|
||||
*/
|
||||
struct varlena *
|
||||
lz4_decompress_datum_slice(const struct varlena *value, int32 slicelength)
|
||||
{
|
||||
#ifndef USE_LZ4
|
||||
NO_LZ4_SUPPORT();
|
||||
#else
|
||||
int32 rawsize;
|
||||
struct varlena *result;
|
||||
|
||||
/* slice decompression not supported prior to 1.8.3 */
|
||||
if (LZ4_versionNumber() < 10803)
|
||||
return lz4_decompress_datum(value);
|
||||
|
||||
/* allocate memory for the uncompressed data */
|
||||
result = (struct varlena *) palloc(slicelength + VARHDRSZ);
|
||||
|
||||
/* decompress the data */
|
||||
rawsize = LZ4_decompress_safe_partial((char *) value + VARHDRSZ_COMPRESS,
|
||||
VARDATA(result),
|
||||
VARSIZE(value) - VARHDRSZ_COMPRESS,
|
||||
slicelength,
|
||||
slicelength);
|
||||
if (rawsize < 0)
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_DATA_CORRUPTED),
|
||||
errmsg_internal("compressed lz4 data is corrupt")));
|
||||
|
||||
SET_VARSIZE(result, rawsize + VARHDRSZ);
|
||||
|
||||
return result;
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Extract compression ID from a varlena.
|
||||
*
|
||||
* Returns TOAST_INVALID_COMPRESSION_ID if the varlena is not compressed.
|
||||
*/
|
||||
ToastCompressionId
|
||||
toast_get_compression_id(struct varlena *attr)
|
||||
{
|
||||
ToastCompressionId cmid = TOAST_INVALID_COMPRESSION_ID;
|
||||
|
||||
/*
|
||||
* If it is stored externally then fetch the compression method id from the
|
||||
* external toast pointer. If compressed inline, fetch it from the toast
|
||||
* compression header.
|
||||
*/
|
||||
if (VARATT_IS_EXTERNAL_ONDISK(attr))
|
||||
{
|
||||
struct varatt_external toast_pointer;
|
||||
|
||||
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
|
||||
|
||||
if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
|
||||
cmid = VARATT_EXTERNAL_GET_COMPRESSION(toast_pointer);
|
||||
}
|
||||
else if (VARATT_IS_COMPRESSED(attr))
|
||||
cmid = VARCOMPRESS_4B_C(attr);
|
||||
|
||||
return cmid;
|
||||
}
|
||||
|
||||
/*
|
||||
* Validate a new value for the default_toast_compression GUC.
|
||||
*/
|
||||
bool
|
||||
check_default_toast_compression(char **newval, void **extra, GucSource source)
|
||||
{
|
||||
if (**newval == '\0')
|
||||
{
|
||||
GUC_check_errdetail("%s cannot be empty.",
|
||||
"default_toast_compression");
|
||||
return false;
|
||||
}
|
||||
|
||||
if (strlen(*newval) >= NAMEDATALEN)
|
||||
{
|
||||
GUC_check_errdetail("%s is too long (maximum %d characters).",
|
||||
"default_toast_compression", NAMEDATALEN - 1);
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!CompressionMethodIsValid(CompressionNameToMethod(*newval)))
|
||||
{
|
||||
/*
|
||||
* When source == PGC_S_TEST, don't throw a hard error for a
|
||||
* nonexistent compression method, only a NOTICE. See comments in
|
||||
* guc.h.
|
||||
*/
|
||||
if (source == PGC_S_TEST)
|
||||
{
|
||||
ereport(NOTICE,
|
||||
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
||||
errmsg("compression method \"%s\" does not exist",
|
||||
*newval)));
|
||||
}
|
||||
else
|
||||
{
|
||||
GUC_check_errdetail("Compression method \"%s\" does not exist.",
|
||||
*newval);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
@@ -44,46 +44,54 @@ static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
|
||||
* ----------
|
||||
*/
|
||||
Datum
|
||||
toast_compress_datum(Datum value)
|
||||
toast_compress_datum(Datum value, char cmethod)
|
||||
{
|
||||
struct varlena *tmp;
|
||||
int32 valsize = VARSIZE_ANY_EXHDR(DatumGetPointer(value));
|
||||
int32 len;
|
||||
struct varlena *tmp = NULL;
|
||||
int32 valsize;
|
||||
ToastCompressionId cmid = TOAST_INVALID_COMPRESSION_ID;
|
||||
|
||||
Assert(!VARATT_IS_EXTERNAL(DatumGetPointer(value)));
|
||||
Assert(!VARATT_IS_COMPRESSED(DatumGetPointer(value)));
|
||||
|
||||
Assert(CompressionMethodIsValid(cmethod));
|
||||
|
||||
valsize = VARSIZE_ANY_EXHDR(DatumGetPointer(value));
|
||||
|
||||
/*
|
||||
* No point in wasting a palloc cycle if value size is out of the allowed
|
||||
* range for compression
|
||||
* Call appropriate compression routine for the compression method.
|
||||
*/
|
||||
if (valsize < PGLZ_strategy_default->min_input_size ||
|
||||
valsize > PGLZ_strategy_default->max_input_size)
|
||||
switch (cmethod)
|
||||
{
|
||||
case TOAST_PGLZ_COMPRESSION:
|
||||
tmp = pglz_compress_datum((const struct varlena *) value);
|
||||
cmid = TOAST_PGLZ_COMPRESSION_ID;
|
||||
break;
|
||||
case TOAST_LZ4_COMPRESSION:
|
||||
tmp = lz4_compress_datum((const struct varlena *) value);
|
||||
cmid = TOAST_LZ4_COMPRESSION_ID;
|
||||
break;
|
||||
default:
|
||||
elog(ERROR, "invalid compression method %c", cmethod);
|
||||
}
|
||||
|
||||
if (tmp == NULL)
|
||||
return PointerGetDatum(NULL);
|
||||
|
||||
tmp = (struct varlena *) palloc(PGLZ_MAX_OUTPUT(valsize) +
|
||||
TOAST_COMPRESS_HDRSZ);
|
||||
|
||||
/*
|
||||
* We recheck the actual size even if pglz_compress() reports success,
|
||||
* because it might be satisfied with having saved as little as one byte
|
||||
* in the compressed data --- which could turn into a net loss once you
|
||||
* consider header and alignment padding. Worst case, the compressed
|
||||
* format might require three padding bytes (plus header, which is
|
||||
* included in VARSIZE(tmp)), whereas the uncompressed format would take
|
||||
* only one header byte and no padding if the value is short enough. So
|
||||
* we insist on a savings of more than 2 bytes to ensure we have a gain.
|
||||
* We recheck the actual size even if compression reports success, because
|
||||
* it might be satisfied with having saved as little as one byte in the
|
||||
* compressed data --- which could turn into a net loss once you consider
|
||||
* header and alignment padding. Worst case, the compressed format might
|
||||
* require three padding bytes (plus header, which is included in
|
||||
* VARSIZE(tmp)), whereas the uncompressed format would take only one
|
||||
* header byte and no padding if the value is short enough. So we insist
|
||||
* on a savings of more than 2 bytes to ensure we have a gain.
|
||||
*/
|
||||
len = pglz_compress(VARDATA_ANY(DatumGetPointer(value)),
|
||||
valsize,
|
||||
TOAST_COMPRESS_RAWDATA(tmp),
|
||||
PGLZ_strategy_default);
|
||||
if (len >= 0 &&
|
||||
len + TOAST_COMPRESS_HDRSZ < valsize - 2)
|
||||
if (VARSIZE(tmp) < valsize - 2)
|
||||
{
|
||||
TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
|
||||
SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
|
||||
/* successful compression */
|
||||
Assert(cmid != TOAST_INVALID_COMPRESSION_ID);
|
||||
TOAST_COMPRESS_SET_SIZE_AND_METHOD(tmp, valsize, cmid);
|
||||
return PointerGetDatum(tmp);
|
||||
}
|
||||
else
|
||||
@@ -152,19 +160,21 @@ toast_save_datum(Relation rel, Datum value,
|
||||
&num_indexes);
|
||||
|
||||
/*
|
||||
* Get the data pointer and length, and compute va_rawsize and va_extsize.
|
||||
* Get the data pointer and length, and compute va_rawsize and va_extinfo.
|
||||
*
|
||||
* va_rawsize is the size of the equivalent fully uncompressed datum, so
|
||||
* we have to adjust for short headers.
|
||||
*
|
||||
* va_extsize is the actual size of the data payload in the toast records.
|
||||
* va_extinfo stored the actual size of the data payload in the toast
|
||||
* records and the compression method in first 2 bits if data is
|
||||
* compressed.
|
||||
*/
|
||||
if (VARATT_IS_SHORT(dval))
|
||||
{
|
||||
data_p = VARDATA_SHORT(dval);
|
||||
data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
|
||||
toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not short */
|
||||
toast_pointer.va_extsize = data_todo;
|
||||
toast_pointer.va_extinfo = data_todo;
|
||||
}
|
||||
else if (VARATT_IS_COMPRESSED(dval))
|
||||
{
|
||||
@@ -172,7 +182,10 @@ toast_save_datum(Relation rel, Datum value,
|
||||
data_todo = VARSIZE(dval) - VARHDRSZ;
|
||||
/* rawsize in a compressed datum is just the size of the payload */
|
||||
toast_pointer.va_rawsize = VARRAWSIZE_4B_C(dval) + VARHDRSZ;
|
||||
toast_pointer.va_extsize = data_todo;
|
||||
|
||||
/* set external size and compression method */
|
||||
VARATT_EXTERNAL_SET_SIZE_AND_COMPRESSION(toast_pointer, data_todo,
|
||||
VARCOMPRESS_4B_C(dval));
|
||||
/* Assert that the numbers look like it's compressed */
|
||||
Assert(VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
|
||||
}
|
||||
@@ -181,7 +194,7 @@ toast_save_datum(Relation rel, Datum value,
|
||||
data_p = VARDATA(dval);
|
||||
data_todo = VARSIZE(dval) - VARHDRSZ;
|
||||
toast_pointer.va_rawsize = VARSIZE(dval);
|
||||
toast_pointer.va_extsize = data_todo;
|
||||
toast_pointer.va_extinfo = data_todo;
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -20,6 +20,7 @@
|
||||
#include "postgres.h"
|
||||
|
||||
#include "access/htup_details.h"
|
||||
#include "access/toast_compression.h"
|
||||
#include "access/tupdesc_details.h"
|
||||
#include "catalog/pg_collation.h"
|
||||
#include "catalog/pg_type.h"
|
||||
@@ -664,6 +665,11 @@ TupleDescInitEntry(TupleDesc desc,
|
||||
att->attstorage = typeForm->typstorage;
|
||||
att->attcollation = typeForm->typcollation;
|
||||
|
||||
if (IsStorageCompressible(typeForm->typstorage))
|
||||
att->attcompression = GetDefaultToastCompression();
|
||||
else
|
||||
att->attcompression = InvalidCompressionMethod;
|
||||
|
||||
ReleaseSysCache(tuple);
|
||||
}
|
||||
|
||||
|
||||
@@ -19,6 +19,7 @@
|
||||
*/
|
||||
#include "postgres.h"
|
||||
|
||||
#include "access/detoast.h"
|
||||
#include "access/genam.h"
|
||||
#include "access/heapam.h"
|
||||
#include "access/heaptoast.h"
|
||||
@@ -26,6 +27,7 @@
|
||||
#include "access/rewriteheap.h"
|
||||
#include "access/syncscan.h"
|
||||
#include "access/tableam.h"
|
||||
#include "access/toast_compression.h"
|
||||
#include "access/tsmapi.h"
|
||||
#include "access/xact.h"
|
||||
#include "catalog/catalog.h"
|
||||
@@ -2469,6 +2471,44 @@ reform_and_rewrite_tuple(HeapTuple tuple,
|
||||
{
|
||||
if (TupleDescAttr(newTupDesc, i)->attisdropped)
|
||||
isnull[i] = true;
|
||||
|
||||
/*
|
||||
* Use this opportunity to force recompression of any data that's
|
||||
* compressed with some TOAST compression method other than the one
|
||||
* configured for the column. We don't actually need to perform the
|
||||
* compression here; we just need to decompress. That will trigger
|
||||
* recompression later on.
|
||||
*/
|
||||
else if (!isnull[i] && TupleDescAttr(newTupDesc, i)->attlen == -1)
|
||||
{
|
||||
struct varlena *new_value;
|
||||
ToastCompressionId cmid;
|
||||
char cmethod;
|
||||
|
||||
new_value = (struct varlena *) DatumGetPointer(values[i]);
|
||||
cmid = toast_get_compression_id(new_value);
|
||||
|
||||
/* nothing to be done for uncompressed data */
|
||||
if (cmid == TOAST_INVALID_COMPRESSION_ID)
|
||||
continue;
|
||||
|
||||
/* convert compression id to compression method */
|
||||
switch (cmid)
|
||||
{
|
||||
case TOAST_PGLZ_COMPRESSION_ID:
|
||||
cmethod = TOAST_PGLZ_COMPRESSION;
|
||||
break;
|
||||
case TOAST_LZ4_COMPRESSION_ID:
|
||||
cmethod = TOAST_LZ4_COMPRESSION;
|
||||
break;
|
||||
default:
|
||||
elog(ERROR, "invalid compression method id %d", cmid);
|
||||
}
|
||||
|
||||
/* if compression method doesn't match then detoast the value */
|
||||
if (TupleDescAttr(newTupDesc, i)->attcompression != cmethod)
|
||||
values[i] = PointerGetDatum(detoast_attr(new_value));
|
||||
}
|
||||
}
|
||||
|
||||
copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
|
||||
|
||||
@@ -54,6 +54,7 @@ toast_tuple_init(ToastTupleContext *ttc)
|
||||
|
||||
ttc->ttc_attr[i].tai_colflags = 0;
|
||||
ttc->ttc_attr[i].tai_oldexternal = NULL;
|
||||
ttc->ttc_attr[i].tai_compression = att->attcompression;
|
||||
|
||||
if (ttc->ttc_oldvalues != NULL)
|
||||
{
|
||||
@@ -226,9 +227,11 @@ void
|
||||
toast_tuple_try_compression(ToastTupleContext *ttc, int attribute)
|
||||
{
|
||||
Datum *value = &ttc->ttc_values[attribute];
|
||||
Datum new_value = toast_compress_datum(*value);
|
||||
Datum new_value;
|
||||
ToastAttrInfo *attr = &ttc->ttc_attr[attribute];
|
||||
|
||||
new_value = toast_compress_datum(*value, attr->tai_compression);
|
||||
|
||||
if (DatumGetPointer(new_value) != NULL)
|
||||
{
|
||||
/* successful compression */
|
||||
|
||||
Reference in New Issue
Block a user