Add a test case to cover pg_dump with --compress=none. This
brings the coverage of compress_none.c up from about 64% to 90%,
in particular covering the new code added in a previous patch.
Include compression of toc.dat in manually-compressed test cases.
We would have found the bug fixed in commit a239c4a0c much sooner
if we'd done this. As far as I can tell, this doesn't reduce test
coverage at all, since there are other tests of directory format
that still use an uncompressed toc.dat.
Widen the wide row used to verify correct (de) compression.
Commit 1a05c1d25 advises us (not without reason) to ensure that
this test case fully fills DEFAULT_IO_BUFFER_SIZE, so that loops
within the compression logic will iterate completely. To follow
that advice with the proposed DEFAULT_IO_BUFFER_SIZE of 128K,
we need something close to this. This does indeed increase the
reported code coverage by a few lines.
While here, fix a glitch that I noticed in testing: the
$glob_patterns tests were incapable of failing, because glob()
will return 'foo' as 'foo' whether there is a matching file or
not. (Indeed, the stanza just above that one relies on that.)
In my testing, this patch adds approximately as much runtime
as was saved by the previous patch, so that it's about a wash
compared to the old code. However, we get better test coverage.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Add a new test script 006_pg_dump_compress.pl, containing just
the pg_dump tests specifically concerned with compression, and
remove those tests from 002_pg_dump.pl. We can also drop some
infrastructure in 002_pg_dump.pl that was used only for these tests.
The point of this is to avoid the cost of running these test
cases over and over in all the scenarios (runs) that 002_pg_dump.pl
exercises. We don't learn anything more about the behavior of the
compression code that way, and we expend significant amounts of
time, since one of these test cases is quite large and due to get
larger.
The intent of this specific patch is to provide exactly the same
coverage as before, except that I went back to using --no-sync
in all the test runs moved over to 006_pg_dump_compress.pl.
I think that avoiding that had basically been cargo-culted into
these test cases as a result of modeling them on the
defaults_custom_format test case; again, doing that over and over
isn't going to teach us anything new.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
After commit fe8192a95, compress_zstd.c tends to produce data block
sizes around 128K, and we don't really have any control over that
unless we want to overrule ZSTD_CStreamOutSize(). Which seems like
a bad idea. But let's try to align the other compression modes to
produce block sizes roughly comparable to that, so that pg_restore's
skip-data performance isn't enormously different for different modes.
gzip compression can be brought in line simply by setting
DEFAULT_IO_BUFFER_SIZE = 128K, which this patch does. That
increases some unrelated buffer sizes, but none of them seem
problematic for modern platforms.
lz4's idea of appropriate block size is highly nonlinear:
if we just increase DEFAULT_IO_BUFFER_SIZE then the output
blocks end up around 200K. I found that adjusting the slop
factor in LZ4State_compression_init was a not-too-ugly way
of bringing that number roughly into line.
With compress = none you get data blocks the same sizes as the
table rows, which seems potentially problematic for narrow tables.
Introduce a layer of buffering to make that case match the others.
Comments in compress_io.h and 002_pg_dump.pl suggest that if
we increase DEFAULT_IO_BUFFER_SIZE then we need to increase the
amount of data fed through the tests in order to improve coverage.
I've not done that here, leaving it for a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
EvalPlanQualStart() failed to propagate es_partition_directory into
the child EState used for EPQ rechecks. When execution time partition
pruning ran during the EPQ scan, executor code dereferenced a NULL
partition directory and crashed.
Previously, propagating es_partition_directory into the EPQ EState was
unnecessary because CreatePartitionPruneState(), which sets it on
demand, also initialized the exec-pruning context. After commit
d47cbf474, CreatePartitionPruneState() now initializes only the init-
time pruning context, leaving exec-pruning context initialization to
ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and
not CreatePartitionPruneState(), it can encounter a NULL
es_partition_directory. Other executor fields initialized during
CreatePartitionPruneState() are already copied into the child EState
thanks to commit 8741e48e5d, but es_partition_directory was missed.
Fix by borrowing the parent estate's es_partition_directory in
EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd()
so the parent remains responsible for freeing the directory.
Add an isolation test permutation that triggers EPQ with execution-
time partition pruning, the case that reproduces this crash.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Per report from buildfarm member prion. The CI does not use this
parameter, and this buildfarm member sets log_error_verbosity to
"verbose". This would generate extra LOCATION entries in the logs,
causing the regexps of the test to fail.
Trying to support log_error_verbosity=verbose in the test would mean to
tweak all the regexps used in the test to detect an optional set of
LOCATION lines, at least. This would not improve the coverage, and
forcing the GUC value is simpler.
Oversight in 76bba03312.
Discussion: https://postgr.es/m/aPBaNNGiYT3xMBN1@paquier.xyz
Temporary file usage is sometimes attributed to the wrong query in the logs
output. One identified reason is that unnamed portal cleanup (and
consequently temp file logging) happens during the next BIND message as
a, after debug_query_string has already been updated to the new query.
Dropping an unnamed portal in the next BIND message is a rather old
protocol behavior (fe19e56c57, also mentioned in the docs).
log_temp_files is a bit newer than that, as of be8a431881.
This commit adds tests to track which query is displayed next to the
temporary file(s) removed when a portal is dropped, and in some cases if
a query is displayed or not. We have not concluded how to improve the
situation yet; these tests will at least help in checking what changes
in the logs depending on the proposal discussed and how it affects the
scenarios tracked by this new test.
Author: Sami Imseih <samimseih@gmail.com>
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3d07ee43-8855-42db-97e0-bad5db82d972@dalibo.com
This commit adjusts RangeVarCallbackForReindexIndex() to handle an
extremely unlikely race condition involving concurrent OID reuse.
In short, if REINDEX INDEX is executed at the same time that the
index is re-created with the same name and OID but a different
parent table OID, we might lock the wrong parent table. To fix,
simply detect when this happens and emit an ERROR. Unfortunately,
we can't gracefully handle this situation because we will have
already locked the index, and we must lock the parent table before
the index to avoid deadlocks.
While at it, I've replaced all but one early return in this
callback function with ERRORs that should be unreachable. While I
haven't verified the presence of a live bug, the checks in question
appear to be unnecessary, and the early returns seem prone to
breaking the parent table locking code in subtle ways. If nothing
else, this simplifies the code a bit.
This is a bug fix and could be back-patched, but given the presumed
rarity of the race condition and the lack of reports, I'm not going
to bother.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Per-character pg_locale_t APIs. Useful for tsearch parsing and
potentially other places.
Significant overlap with the regc_wc_isalpha() and related functions
in regc_pg_locale.c, but this change leaves those intact for
now.
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
Presently, these functions look up the relation's OID, lock it, and
then check privileges. Not only does this approach provide no
guarantee that the locked relation matches the arguments of the
lookup, but it also allows users to briefly lock relations for
which they do not have privileges, which might enable
denial-of-service attacks. This commit adjusts these functions to
use RangeVarGetRelidExtended(), which is purpose-built to avoid
both of these issues. The new RangeVarGetRelidCallback function is
somewhat complicated because it must handle both tables and
indexes, and for indexes, we must check privileges on the parent
table and lock it first. Also, it needs to handle a couple of
extremely unlikely race conditions involving concurrent OID reuse.
A downside of this change is that the coding doesn't allow for
locking indexes in AccessShare mode anymore; everything is locked
in ShareUpdateExclusive mode. Per discussion, the original choice
of lock levels was intended for a now defunct implementation that
used in-place updates, so we believe this change is okay.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Backpatch-through: 18
The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.
Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.
Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.
This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.
Oversight in commit 385f337c9.
Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13
This essentially reverts commit 866566a690, which installed
safeguards against loading plpython2 and plpython3 into the same
process. We don't support plpython2 anymore, so this is obsolete.
The Python and PL/Python initialization now happens again in
_PG_init() rather than the first time a PL/Python call handler is
invoked. (Often, these will be very close together.)
I kept the separate PLy_initialize() function introduced by
866566a690 to keep _PG_init() a bit modular.
Reviewed-by: Mario González Troncoso <gonzalemario@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9eb9feb6-1df3-4f0c-a0dc-9bcf35273111%40eisentraut.org
pg_createsubscriber is documented as requiring the same major version as
the target clusters. Attempting to use this tool on a cluster where the
control file version read does not match with the version compiled with
would lead to the following error message:
pg_createsubscriber: error: control file appears to be corrupt
This is confusing as the control file is correct: only the version
expected does not match. This commit integrates pg_createsubscriber
with the facility added by cd0be131ba, where the contents of
PG_VERSION are read and compared with the value of PG_MAJORVERSION_NUM
expected by the tool. This puts pg_createsubscriber in line with the
documentation, with a better error message when the version does not
match.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aONDWig0bIGilixs@paquier.xyz
pg_resetwal's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to pg_resetwal is that the first line of PG_VERSION
is read and reported in the error report generated when the major
version read does not match with the version pg_resetwal has been
compiled with. The new logic preserves this property, without changes
to neither the error message nor the data used in the error report.
Note that as a chdir() is done within the data folder before checking the
data of PG_VERSION, get_pg_version() needs to be tweaked to look for
PG_VERSION in the current folder.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
pg_combinebackup's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to this tool is that backend versions older than v10
are not supported. The new code does the same checks as the previous
code.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
This reverts commit 74ac377d75.
The previous change contained a misconception about how publications
are cleaned up on the subscriber. The newly added log message could
confuse users, particularly when running pg_createsubscriber with
--dry-run - users would see a "dropping publication" message
immediately followed by a "no publications found" message.
Discussion: https://postgr.es/m/CAHut+Pu7xz1LqNvyQyvSHrV0Sw6D=e6T-Jm=gh1MRJrkuWGyBQ@mail.gmail.com
After scanning the line pointers on a heap page during the first phase
of vacuum, we use the information collected to decide whether to use
the assembled freeze plans.
Move this decision logic into a helper function to improve readability.
While here, rename a PruneState member and disambiguate some local
variables in heap_page_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
The present coding of dblink's get_rel_from_relname() predates the
introduction of RangeVarGetRelidExtended(), which provides a way to
check permissions before locking the relation. This commit adjusts
get_rel_from_relname() to use that function.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan
Previously WinCheckAndInitializeNullTreatment() used elog() to emit an
error message. ereport() should be used instead because it's a
user-facing error. Also use existing get_func_name() to get a
function's name, rather than own implementation.
Moreover add an assertion to validate winobj parameter, just like
other window function API.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
get_pg_version() is able to return a version number, that can be used
for comparisons based on PG_VERSION_NUM. A macro is added to convert
the result to a major version number, to work with PG_MAJORVERSION_NUM.
It is possible to pass to the routine an optional argument, where the
contents retrieved from PG_VERSION are saved. This requirement matters
for some of the frontend code (one example: pg_upgrade wants that for
tablespace paths with a version number strictly older than v10).
This will be used by a set of follow-up patches, to be consumed in
various frontend tools that duplicate a logic similar to do what this
new routine does, like:
- pg_resetwal
- pg_combinebackup
- pg_createsubscriber
- pg_upgrade
This routine supports both the post-v10 version number and the older
flavor (aka 9.6), as required at least by pg_upgrade.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
The version number calculated by read_pg_version_file() is multiplied
once by 10000, to be able to do comparisons based on PG_VERSION_NUM or
equivalents with a minor version included. However, the version number
given sync_pgdata() was multiplied by 10000 a second time, leading to an
overestimated number.
This issue was harmless (still incorrect) as pg_combinebackup does not
support versions of Postgres older than v10, and sync_pgdata() only
includes a version check due to the rename of pg_xlog/ to pg_wal/. This
folder rename happened in the development cycle of v10. This would
become a problem if in the future sync_pgdata() is changed to have more
version-specific checks.
Oversight in dc21234005, so backpatch down to v17.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aOil5d0y87ZM_wsZ@paquier.xyz
Backpatch-through: 17
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
There was some confusion around how to adjust the n_distinct estimates
for partitioned tables. Here we try and clarify that
n_distinct_inherited needs to be adjusted rather than n_distinct.
Also fix some slightly misleading text which was talking about table
size rather than table rows, fix a grammatical error, and adjust some
text which indicated that ANALYZE was performing calculations based on
the n_distinct settings. Really it's the query planner that does this
and ANALYZE only stores the overridden n_distinct estimate value in
pg_statistic.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvrL7a-ZytM1SP8Uk9nEw9bR2CPzVb+uP+bcNj=_q-ZmVw@mail.gmail.com
I was distressed to find that reading an LZ4-compressed toc.dat
file was hundreds of times slower than it ought to be. On
investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation. Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.
This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative. I chose to throw the code
away and rewrite from scratch. This version is about sixty
lines shorter as well as not having the performance issue.
Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive; in the main data files, we read blocks
that are large enough that the O(N^2) behavior doesn't manifest.
Few people do that, which likely explains the lack of field
complaints. Otherwise this performance bug might be considered
bad enough to warrant back-patching.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Both of these modules dumped each bit of output that they got from
the underlying compression library as a separate "data block" in
the emitted archive file. In the case of zstd this'd frequently
result in block sizes well under 100 bytes; lz4 is a little better
but still produces blocks around 300 bytes, at least in the test
case I tried. This bloats the archive file a little bit compared
to larger block sizes, but the real problem is that when pg_restore
has to skip each data block rather than seeking directly to some
target data, tiny block sizes are enormously inefficient.
Fix both modules so that they fill their allocated buffer reasonably
well before dumping a data block. In the case of lz4, also delete
some redundant logic that caused the lz4 frame header to be emitted
as a separate data block. (That saves little, but I see no reason
to expend extra code to get worse results.)
I fixed the "stream API" code too. In those cases, feeding small
amounts of data to fwrite() probably doesn't have any meaningful
performance consequences. But it seems like a bad idea to leave
the two sets of code doing the same thing in two different ways.
In passing, remove unnecessary "extra paranoia" check in
_ZstdWriteCommon. _CustomWriteFunc (the only possible referent
of cs->writeF) already protects itself against zero-length writes,
and it's really a modularity violation for _ZstdWriteCommon to know
that the custom format disallows empty data blocks. Also, fix
Zstd_read_internal to do less work when passed size == 0.
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
pg_dump expects a read request of zero bytes to be a no-op; see for
example ReadStr(). Gzip_read got this wrong and falsely supposed
that the resulting gzret == 0 indicated an error. We could complicate
that error-checking logic some more, but it seems best to just fall
out immediately when passed size == 0.
This bug breaks the nominally-supported case of manually gzip'ing
the toc.dat file within a directory-style dump, so back-patch to v16
where this code came in. (Prior branches already have a short-circuit
for size == 0 before their only gzread call.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 16
Since protocol version 3.2 the CancelRequest does not have a fixed size
length anymore. The protocol docs still listed the length field to be a
constant number though. This fixes that.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reported-by: Dmitry Igrishin <dmitigr@gmail.com>
Backpatch-through: 18
Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
In commit a45c78e32 I removed the only regression test case that
reaches this function, because it turns out that we only use it
if reading an LZ4-compressed blobs.toc file in a directory dump,
and that is a state that has to be created manually. That seems
like a bad thing to not test, not so much for LZ4Stream_gets()
itself as because it means the squirrely eol_flag logic in
LZ4Stream_read_internal() is not tested.
The reason for the change was that I thought the lz4 program did not
have any way to perform compression without explicit specification
of the output file name. However, it turns out that the syntax
synopsis in its man page is a lie, and if you read enough of the
man page you find out that with "-m" it will do what's needful.
So restore the manual compression step in that test case.
Noted while testing some proposed changes in pg_dump's compression
logic.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 17
Commit 71f4c8c6f7 (which implemented DETACH CONCURRENTLY) added code
to create a separate table constraint when a table is detached
concurrently, identical to the partition constraint, on the theory that
such a constraint was needed in case the optimizer had constructed any
query plans that depended on the constraint being there. However, that
theory was apparently bogus because any such plans would be invalidated.
For hash partitioning, those constraints are problematic, because their
expressions reference the OID of the parent partitioned table, to which
the detached table is no longer related; this causes all sorts of
problems (such as inability of restoring a pg_dump of that table, and
the table no longer working properly if the partitioned table is later
dropped).
We'd like to get rid of all those constraints. In fact, for branch
master, do that -- no longer create any substitute constraints.
However, out of fear that some users might somehow depend on these
constraints for other partitioning strategies, for stable branches
(back to 14, which added DETACH CONCURRENTLY), only do it for hash
partitioning.
(If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then
with this constraint addition you don't need to scan the table in the
ATTACH step, which presumably is good. But if users really valued this
feature, they would have requested that it worked for non-concurrent
DETACH also.)
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reported-by: Fei Changhong <feichanghong@qq.com>
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org
Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org
An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.
Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13
An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created. This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail. This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot. Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it. The bug reports point to a write
failure as the principal cause of the problems.
Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist. This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash. This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019. When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.
This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten. A recent report has reminded me
that this was still a problem.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13