mirror of
https://github.com/postgres/postgres.git
synced 2025-11-01 21:31:19 +03:00
Compression in pg_dump is abstracted using an API with multiple implementations which can be selected at runtime by the user. The API and its implementations have evolved over time, notable commits includebf9aa490db,e9960732a9,84adc8e20, and0da243fed. The errorhandling defined by the API was however problematic and the implementations had a few bugs and/or were not following the API specification. This commit modifies the API to ensure that callers can perform errorhandling efficiently and fixes all the implementations such that they all implement the API in the same way. A full list of the changes can be seen below. * write_func: - Make write_func throw an error on all error conditions. All callers of write_func were already checking for success and calling pg_fatal on all errors, so we might as well make the API support that case directly with simpler errorhandling as a result. * open_func: - zstd: move stream initialization from the open function to the read and write functions as they can have fatal errors. Also ensure to dup the file descriptor like none and gzip. - lz4: Ensure to dup the file descriptor like none and gzip. * close_func: - zstd: Ensure to close the file descriptor even if closing down the compressor fails, and clean up state allocation on fclose failures. Make sure to capture errors set by fclose. - lz4: Ensure to close the file descriptor even if closing down the compressor fails, and instead of calling pg_fatal log the failures using pg_log_error. Make sure to capture errors set by fclose. - none: Make sure to catch errors set by fclose. * read_func / gets_func: - Make read_func unconditionally return the number of read bytes instead of making it optional per implementation. - lz4: Make sure to call throw an error and not return -1 - gzip: gzread returning zero cannot be assumed to indicate EOF as it is documented to return zero for some types of errors. - lz4, zstd: Convert the _read_internal helper functions to not call pg_fatal on errors to be able to handle gets_func returning NULL on error. * getc_func: - zstd: Use an unsigned char rather than an int to read char into. * LZ4Stream_init: - Make sure to not switch to inited state until we know that initialization succeeded and reset errno just in case. On top of these changes there are minor comment cleanups and improvements as well as an attempt to consistently reset errno in codepaths where it is inspected. This work was initiated by a report of API misuse, which turned into a larger body of work. As this is an internal API these changes can be backpatched into all affected branches. Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Evgeniy Gorbanev <gorbanyoves@basealt.ru> Discussion: https://postgr.es/m/517794.1750082166@sss.pgh.pa.us Backpatch-through: 16
302 lines
8.7 KiB
C
302 lines
8.7 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* compress_io.c
|
|
* Routines for archivers to write an uncompressed or compressed data
|
|
* stream.
|
|
*
|
|
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* This file includes two APIs for dealing with compressed data. The first
|
|
* provides more flexibility, using callbacks to read/write data from the
|
|
* underlying stream. The second API is a wrapper around fopen and
|
|
* friends, providing an interface similar to those, but abstracts away
|
|
* the possible compression. The second API is aimed for the resulting
|
|
* files to be easily manipulated with an external compression utility
|
|
* program.
|
|
*
|
|
* Compressor API
|
|
* --------------
|
|
*
|
|
* The interface for writing to an archive consists of three functions:
|
|
* AllocateCompressor, writeData, and EndCompressor. First you call
|
|
* AllocateCompressor, then write all the data by calling writeData as many
|
|
* times as needed, and finally EndCompressor. writeData will call the
|
|
* WriteFunc that was provided to AllocateCompressor for each chunk of
|
|
* compressed data.
|
|
*
|
|
* The interface for reading an archive consists of the same three functions:
|
|
* AllocateCompressor, readData, and EndCompressor. First you call
|
|
* AllocateCompressor, then read all the data by calling readData to read the
|
|
* whole compressed stream which repeatedly calls the given ReadFunc. ReadFunc
|
|
* returns the compressed data one chunk at a time. Then readData decompresses
|
|
* it and passes the decompressed data to ahwrite(), until ReadFunc returns 0
|
|
* to signal EOF. The interface is the same for compressed and uncompressed
|
|
* streams.
|
|
*
|
|
* Compressed stream API
|
|
* ----------------------
|
|
*
|
|
* The compressed stream API is providing a set of function pointers for
|
|
* opening, reading, writing, and finally closing files. The implemented
|
|
* function pointers are documented in the corresponding header file and are
|
|
* common for all streams. It allows the caller to use the same functions for
|
|
* both compressed and uncompressed streams.
|
|
*
|
|
* The interface consists of three functions, InitCompressFileHandle,
|
|
* InitDiscoverCompressFileHandle, and EndCompressFileHandle. If the
|
|
* compression is known, then start by calling InitCompressFileHandle,
|
|
* otherwise discover it by using InitDiscoverCompressFileHandle. Then call
|
|
* the function pointers as required for the read/write operations. Finally
|
|
* call EndCompressFileHandle to end the stream.
|
|
*
|
|
* InitDiscoverCompressFileHandle tries to infer the compression by the
|
|
* filename suffix. If the suffix is not yet known then it tries to simply
|
|
* open the file and if it fails, it tries to open the same file with
|
|
* compressed suffixes (.gz, .lz4 and .zst, in this order).
|
|
*
|
|
* IDENTIFICATION
|
|
* src/bin/pg_dump/compress_io.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
#include "postgres_fe.h"
|
|
|
|
#include <sys/stat.h>
|
|
#include <unistd.h>
|
|
|
|
#include "compress_gzip.h"
|
|
#include "compress_io.h"
|
|
#include "compress_lz4.h"
|
|
#include "compress_none.h"
|
|
#include "compress_zstd.h"
|
|
#include "pg_backup_utils.h"
|
|
|
|
/*----------------------
|
|
* Generic functions
|
|
*----------------------
|
|
*/
|
|
|
|
/*
|
|
* Checks whether support for a compression algorithm is implemented in
|
|
* pg_dump/restore.
|
|
*
|
|
* On success returns NULL, otherwise returns a malloc'ed string which can be
|
|
* used by the caller in an error message.
|
|
*/
|
|
char *
|
|
supports_compression(const pg_compress_specification compression_spec)
|
|
{
|
|
const pg_compress_algorithm algorithm = compression_spec.algorithm;
|
|
bool supported = false;
|
|
|
|
if (algorithm == PG_COMPRESSION_NONE)
|
|
supported = true;
|
|
#ifdef HAVE_LIBZ
|
|
if (algorithm == PG_COMPRESSION_GZIP)
|
|
supported = true;
|
|
#endif
|
|
#ifdef USE_LZ4
|
|
if (algorithm == PG_COMPRESSION_LZ4)
|
|
supported = true;
|
|
#endif
|
|
#ifdef USE_ZSTD
|
|
if (algorithm == PG_COMPRESSION_ZSTD)
|
|
supported = true;
|
|
#endif
|
|
|
|
if (!supported)
|
|
return psprintf(_("this build does not support compression with %s"),
|
|
get_compress_algorithm_name(algorithm));
|
|
|
|
return NULL;
|
|
}
|
|
|
|
/*----------------------
|
|
* Compressor API
|
|
*----------------------
|
|
*/
|
|
|
|
/*
|
|
* Allocate a new compressor.
|
|
*/
|
|
CompressorState *
|
|
AllocateCompressor(const pg_compress_specification compression_spec,
|
|
ReadFunc readF, WriteFunc writeF)
|
|
{
|
|
CompressorState *cs;
|
|
|
|
cs = (CompressorState *) pg_malloc0(sizeof(CompressorState));
|
|
cs->readF = readF;
|
|
cs->writeF = writeF;
|
|
|
|
if (compression_spec.algorithm == PG_COMPRESSION_NONE)
|
|
InitCompressorNone(cs, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
|
|
InitCompressorGzip(cs, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
|
|
InitCompressorLZ4(cs, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_ZSTD)
|
|
InitCompressorZstd(cs, compression_spec);
|
|
|
|
return cs;
|
|
}
|
|
|
|
/*
|
|
* Terminate compression library context and flush its buffers.
|
|
*/
|
|
void
|
|
EndCompressor(ArchiveHandle *AH, CompressorState *cs)
|
|
{
|
|
cs->end(AH, cs);
|
|
pg_free(cs);
|
|
}
|
|
|
|
/*----------------------
|
|
* Compressed stream API
|
|
*----------------------
|
|
*/
|
|
|
|
/*
|
|
* Private routines
|
|
*/
|
|
static int
|
|
hasSuffix(const char *filename, const char *suffix)
|
|
{
|
|
int filenamelen = strlen(filename);
|
|
int suffixlen = strlen(suffix);
|
|
|
|
if (filenamelen < suffixlen)
|
|
return 0;
|
|
|
|
return memcmp(&filename[filenamelen - suffixlen],
|
|
suffix,
|
|
suffixlen) == 0;
|
|
}
|
|
|
|
/* free() without changing errno; useful in several places below */
|
|
static void
|
|
free_keep_errno(void *p)
|
|
{
|
|
int save_errno = errno;
|
|
|
|
free(p);
|
|
errno = save_errno;
|
|
}
|
|
|
|
/*
|
|
* Public interface
|
|
*/
|
|
|
|
/*
|
|
* Initialize a compress file handle for the specified compression algorithm.
|
|
*/
|
|
CompressFileHandle *
|
|
InitCompressFileHandle(const pg_compress_specification compression_spec)
|
|
{
|
|
CompressFileHandle *CFH;
|
|
|
|
CFH = pg_malloc0(sizeof(CompressFileHandle));
|
|
|
|
if (compression_spec.algorithm == PG_COMPRESSION_NONE)
|
|
InitCompressFileHandleNone(CFH, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
|
|
InitCompressFileHandleGzip(CFH, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
|
|
InitCompressFileHandleLZ4(CFH, compression_spec);
|
|
else if (compression_spec.algorithm == PG_COMPRESSION_ZSTD)
|
|
InitCompressFileHandleZstd(CFH, compression_spec);
|
|
|
|
return CFH;
|
|
}
|
|
|
|
/*
|
|
* Checks if a compressed file (with the specified extension) exists.
|
|
*
|
|
* The filename of the tested file is stored to fname buffer (the existing
|
|
* buffer is freed, new buffer is allocated and returned through the pointer).
|
|
*/
|
|
static bool
|
|
check_compressed_file(const char *path, char **fname, char *ext)
|
|
{
|
|
free_keep_errno(*fname);
|
|
*fname = psprintf("%s.%s", path, ext);
|
|
return (access(*fname, F_OK) == 0);
|
|
}
|
|
|
|
/*
|
|
* Open a file for reading. 'path' is the file to open, and 'mode' should
|
|
* be either "r" or "rb".
|
|
*
|
|
* If the file at 'path' contains the suffix of a supported compression method,
|
|
* currently this includes ".gz", ".lz4" and ".zst", then this compression will be used
|
|
* throughout. Otherwise the compression will be inferred by iteratively trying
|
|
* to open the file at 'path', first as is, then by appending known compression
|
|
* suffixes. So if you pass "foo" as 'path', this will open either "foo" or
|
|
* "foo.{gz,lz4,zst}", trying in that order.
|
|
*
|
|
* On failure, return NULL with an error code in errno.
|
|
*/
|
|
CompressFileHandle *
|
|
InitDiscoverCompressFileHandle(const char *path, const char *mode)
|
|
{
|
|
CompressFileHandle *CFH = NULL;
|
|
struct stat st;
|
|
char *fname;
|
|
pg_compress_specification compression_spec = {0};
|
|
|
|
compression_spec.algorithm = PG_COMPRESSION_NONE;
|
|
|
|
Assert(strcmp(mode, PG_BINARY_R) == 0);
|
|
|
|
fname = pg_strdup(path);
|
|
|
|
if (hasSuffix(fname, ".gz"))
|
|
compression_spec.algorithm = PG_COMPRESSION_GZIP;
|
|
else if (hasSuffix(fname, ".lz4"))
|
|
compression_spec.algorithm = PG_COMPRESSION_LZ4;
|
|
else if (hasSuffix(fname, ".zst"))
|
|
compression_spec.algorithm = PG_COMPRESSION_ZSTD;
|
|
else
|
|
{
|
|
if (stat(path, &st) == 0)
|
|
compression_spec.algorithm = PG_COMPRESSION_NONE;
|
|
else if (check_compressed_file(path, &fname, "gz"))
|
|
compression_spec.algorithm = PG_COMPRESSION_GZIP;
|
|
else if (check_compressed_file(path, &fname, "lz4"))
|
|
compression_spec.algorithm = PG_COMPRESSION_LZ4;
|
|
else if (check_compressed_file(path, &fname, "zst"))
|
|
compression_spec.algorithm = PG_COMPRESSION_ZSTD;
|
|
}
|
|
|
|
CFH = InitCompressFileHandle(compression_spec);
|
|
errno = 0;
|
|
if (!CFH->open_func(fname, -1, mode, CFH))
|
|
{
|
|
free_keep_errno(CFH);
|
|
CFH = NULL;
|
|
}
|
|
free_keep_errno(fname);
|
|
|
|
return CFH;
|
|
}
|
|
|
|
/*
|
|
* Close an open file handle and release its memory.
|
|
*
|
|
* On failure, returns false and sets errno appropriately.
|
|
*/
|
|
bool
|
|
EndCompressFileHandle(CompressFileHandle *CFH)
|
|
{
|
|
bool ret = false;
|
|
|
|
errno = 0;
|
|
if (CFH->private_data)
|
|
ret = CFH->close_func(CFH);
|
|
|
|
free_keep_errno(CFH);
|
|
|
|
return ret;
|
|
}
|