regexp_matches, regexp_split_to_table and regexp_split_to_array all
work by compiling a list of match positions as character offsets (NOT
byte positions) in the source string.
Formerly, they then used text_substr to extract the matched text; but
in a multi-byte encoding, that counts the characters in the string,
and the characters needed to reach the starting byte position, on
every call. Accordingly, the performance degraded as the product of
the input string length and the number of match positions, such that
splitting a string of a few hundred kbytes could take many minutes.
Repair by keeping the wide-character copy of the input string
available (only in the case where encoding_max_length is not 1) after
performing the match operation, and extracting substrings from that
instead. This reduces the complexity to being linear in the number of
result bytes, discounting the actual regexp match itself (which is not
affected by this patch).
In passing, remove cleanup using retail pfree() which was obsoleted by
commit ff428cded (Feb 2008) which made cleanup of SRF multi-call
contexts automatic. Also increase (to ~134 million) the maximum number
of matches and provide an error message when it is reached.
Backpatch all the way because this has been wrong forever.
Analysis and patch by me; review by Kaiting Chen.
Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk
see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk
The previous coding figured it'd be good enough to postpone opening
the first CSV log file until we got a message we needed to write there.
This is unsafe, though, because if the open fails we end up in infinite
recursion trying to report the failure. Instead make the CSV log file
management code look as nearly as possible like the longstanding logic
for the stderr log file. In particular, open it immediately at postmaster
startup (if enabled), or when we get a SIGHUP in which we find that
log_destination has been changed to enable CSV logging.
It seems OK to fail if a postmaster-start-time open attempt fails, as
we've long done for the stderr log file. But we can't die if we fail
to open a CSV log file during SIGHUP, so we're still left with a problem.
In that case, write any output meant for the CSV log file to the stderr
log file. (This will also cover race-condition cases in which backends
send CSV log data before or after we have the CSV log file open.)
This patch also fixes an ancient oversight that, if CSV logging was
turned off during a SIGHUP, we never actually closed the last CSV
log file.
In passing, remember to reset whereToSendOutput = DestNone during syslogger
start, since (unlike all other postmaster children) it's forked before the
postmaster has done that. This made for a platform-dependent difference
in error reporting behavior between the syslogger and other children:
except on Windows, it'd report problems to the original postmaster stderr
as well as the normal error log file(s). It's barely possible that that
was intentional at some point; but it doesn't seem likely to be desirable
in production, and the platform dependency definitely isn't desirable.
Per report from Alexander Kukushkin. It's been like this for a long time,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com
The lexer's handling of operators contained an O(N^3) hazard when
dealing with long strings of + or - characters; it seems hard to
prevent this case from being O(N^2), but the additional N multiplier
was not needed.
Backpatch all the way since this has been there since 7.x, and it
presents at least a mild hazard in that trying to do Bind, PREPARE or
EXPLAIN on a hostile query could take excessive time (without
honouring cancels or timeouts) even if the query was never executed.
While monitoring the code, a couple of issues related to string
translation has showed up:
- Some routines for auto-updatable views return an error string, which
sometimes missed the shot. A comment regarding string translation is
added for each routine to help with future features.
- GSSAPI authentication missed two translations.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
Previously, this code blindly followed the common coding pattern of
passing PQserverVersion(AH->connection) as the server-version parameter
of fmtQualifiedId. That works as long as we have a connection; but in
pg_restore with text output, we don't. Instead we got a zero from
PQserverVersion, which fmtQualifiedId interpreted as "server is too old to
have schemas", and so the name went unqualified. That still accidentally
managed to work in many cases, which is probably why this ancient bug went
undetected for so long. It only became obvious in the wake of the changes
to force dump/restore to execute with restricted search_path.
In HEAD/v11, let's deal with this by ripping out fmtQualifiedId's server-
version behavioral dependency, and just making it schema-qualify all the
time. We no longer support pg_dump from servers old enough to need the
ability to omit schema name, let alone restoring to them. (Also, the few
callers outside pg_dump already didn't work with pre-schema servers.)
In older branches, that's not an acceptable solution, so instead just
tweak the DISABLE/ENABLE TRIGGER logic to ensure it will schema-qualify
its output regardless of server version.
Per bug #15338 from Oleg somebody. Back-patch to all supported branches.
Discussion: https://postgr.es/m/153452458706.1316.5328079417086507743@wrigleys.postgresql.org
When executing a SubPlan in an expression, the EState's direction
field was left alone, resulting in an attempt to execute the subplan
backwards if it was encountered during a backwards scan of a cursor.
Also, though much less likely, it was possible to reach the execution
of an InitPlan while in backwards-scan state.
Repair by saving/restoring estate->es_direction and forcing forward
scan mode in the relevant places.
Backpatch all the way, since this has been broken since 8.3 (prior to
commit c7ff7663e, SubPlans had their own EStates rather than sharing
the parent plan's, so there was no confusion over scan direction).
Per bug #15336 reported by Vladimir Baranoff; analysis and patch by
me, review by Tom Lane.
Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org
Commit 777e6ddf1723306bd2bf8fe6f804863f459b0323 checked for a shut down
message from a standby and allowed it to continue. This patch reports a
helpful error message in these cases, suggesting to use rsync as
documented.
Diagnosed-by: Martín Marqués
Discussion: https://postgr.es/m/CAPdiE1xYCow-reLjrhJ9DqrMu-ppNq0ChUUEvVdxhdjGRD5_eA@mail.gmail.com
Backpatch-through: 9.3
The function was forgetting to close the file descriptor, resulting
in failures like this:
ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open
file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
LOCATION: OpenTransientFile, fd.c:2161
Simply close the file at the end, and backpatch to 9.4 (where logical
decoding was introduced). While at it, fix a nearby typo.
Discussion: https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com
C99 says that the result should be the number of bytes that would have
been emitted given a large enough buffer, not the number we actually
were able to put in the buffer. It's time to make our substitute
implementation comply with that. Not doing so results in inefficiency
in buffer-enlargement cases, and also poses a portability hazard for
third-party code that might expect C99-compliant snprintf behavior
within Postgres.
In passing, remove useless tests for str == NULL; neither C99 nor
predecessor standards ever allowed that except when count == 0,
so I see no reason to expend cycles on making that a non-crash case
for this implementation. Also, don't waste a byte in pg_vfprintf's
local I/O buffer; this might have performance benefits by allowing
aligned writes during flushbuffer calls.
Back-patch of commit 805889d7d. There was some concern about this
possibly breaking code that assumes pre-C99 behavior, but there is
much more risk (and reality, in our own code) of code that assumes
C99 behavior and hence fails to detect buffer overrun without this.
Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly. The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize". Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.
(Note that this only makes these places correct if snprintf() delivers
C99-compliant results. But at least now these places are consistent
with all the other places where we assume that.)
Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands. There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.
Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf. In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.
These errors are all very old, so back-patch as appropriate. I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.
Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
The example output of pg_replication_slot is wrong. Correct it and make
the output stable by explicitly listing columns to output.
Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20180731.190909.42582169.horiguchi.kyotaro@lab.ntt.co.jp
Commit 244142d32afd02e7408a2ef1f249b00393983822 only tested for the
pg_controldata output for primary servers, but standby servers have
different "Database cluster state" output, so check for that too.
Diagnosed-by: Michael Paquier
Discussion: https://postgr.es/m/20180810164240.GM13638@paquier.xyz
Backpatch-through: 9.3
exit() is not async-signal safe. Even if the libc implementation is, 3rd
party libraries might have installed unsafe atexit() callbacks. After
receiving SIGQUIT, we really just want to exit as quickly as possible, so
we don't really want to run the atexit() callbacks anyway.
The original report by Jimmy Yih was a self-deadlock in startup_die().
However, this patch doesn't address that scenario; the signal handling
while waiting for the startup packet is more complicated. But at least this
alleviates similar problems in the SIGQUIT handlers, like that reported
by Asim R P later in the same thread.
Backpatch to 9.3 (all supported versions).
Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
CreateUserMapping has a recordDependencyOnCurrentExtension call that's
been there since extensions were introduced (very possibly my fault).
However, there's no support anywhere else for user mappings as members
of extensions, nor are they listed as a possible member object type in
the documentation. Nor does it really seem like a good idea for user
mappings to belong to extensions when roles don't. Hence, remove the
bogus call.
(As we saw in bug #15310, the lack of any pg_dump support for this case
ensures that any such membership record would silently disappear during
pg_upgrade. So there's probably no need for us to do anything else
about cleaning up after this mistake.)
Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us
Since commit c8e8b5a6e, this has been zeroed out using the wrong length.
In practice the length would always be too small, leading to not zeroing
the whole buffer rather than clobbering additional memory; and that's
pretty harmless, both because shmem would likely start out as zeroes
and because we'd reinitialize any given entry before use. Still,
it's bogus, so fix it.
Reported by Petru-Florin Mihancea (bug #15312)
Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org
pg_dump with --binary-upgrade must emit ALTER EXTENSION ADD commands
for all objects that are members of extensions. It forgot to do so for
event triggers, as per bug #15310 from Nick Barnes. Back-patch to 9.3
where event triggers were introduced.
Haribabu Kommi
Discussion: https://postgr.es/m/153360083872.1395.4593932457718151600@wrigleys.postgresql.org
The original coding here (which is, I believe, my fault) supposed that
it didn't need to concern itself with the possibility that one object
of a given type-priority has a namespace while another doesn't. But
that's not reliably true anymore, if it ever was; and if it does happen
then it's possible that DOTypeNameCompare returns self-inconsistent
comparison results. That leads to unspecified behavior in qsort()
and a resultant weird output order from pg_dump.
This should end up being only a cosmetic problem, because any ordering
constraints that actually matter should be enforced by the later
dependency-based sort. Still, it's a bug, so back-patch.
Report and fix by Jacob Champion, though I editorialized on his
patch to the extent of making NULL sort after non-NULL, for consistency
with our usual sorting definitions.
Discussion: https://postgr.es/m/CABAq_6Hw+V-Kj7PNfD5tgOaWT_-qaYkc+SRmJkPLeUjYXLdxwQ@mail.gmail.com
The logic in PQconnectPoll() did not take care to ensure that all of
a PGconn's internal state variables were reset before trying a new
connection attempt. If we got far enough in the connection sequence
to have changed any of these variables, and then decided to try a new
server address or server name, the new connection might be completed
with some state that really only applied to the failed connection.
While this has assorted bad consequences, the only one that is clearly
a security issue is that password_needed didn't get reset, so that
if the first server asked for a password and the second didn't,
PQconnectionUsedPassword() would return an incorrect result. This
could be leveraged by unprivileged users of dblink or postgres_fdw
to allow them to use server-side login credentials that they should
not be able to use.
Other notable problems include the possibility of forcing a v2-protocol
connection to a server capable of supporting v3, or overriding
"sslmode=prefer" to cause a non-encrypted connection to a server that
would have accepted an encrypted one. Those are certainly bugs but
it's harder to paint them as security problems in themselves. However,
forcing a v2-protocol connection could result in libpq having a wrong
idea of the server's standard_conforming_strings setting, which opens
the door to SQL-injection attacks. The extent to which that's actually
a problem, given the prerequisite that the attacker needs control of
the client's connection parameters, is unclear.
These problems have existed for a long time, but became more easily
exploitable in v10, both because it introduced easy ways to force libpq
to abandon a connection attempt at a late stage and then try another one
(rather than just giving up), and because it provided an easy way to
specify multiple target hosts.
Fix by rearranging PQconnectPoll's state machine to provide centralized
places to reset state properly when moving to a new target host or when
dropping and retrying a connection to the same host.
Tom Lane, reviewed by Noah Misch. Our thanks to Andrew Krasichkov
for finding and reporting the problem.
Security: CVE-2018-10915
6cb3372 enforces errno to ENOSPC when less bytes than what is expected
have been written when it is unset, though it forgot to properly reset
errno before doing a system call to write(), causing errno to
potentially come from a previous system call.
Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us
It's necessary to make sure that owning tables have a relcache
invalidation prior to advancing the command counter to make
newly-entered catalog tuples for the index visible. inval.c must be
able to maintain the consistency of the local caches in the event of
transaction abort. There is usually only a problem when CREATE INDEX
transactions abort, since there is a generic invalidation once we reach
index_update_stats().
This bug is of long standing. Problems were made much more likely by
the addition of parallel CREATE INDEX (commit 9da0cc35284), but it is
strongly suspected that similar problems can be triggered without
involving plan_create_index_workers(). (plan_create_index_workers()
triggers a relcache build or rebuild, which previously only happened in
rare edge cases.)
Author: Peter Geoghegan
Reported-By: Luca Ferrari
Diagnosed-By: Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAKoxK+5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf=HEQ@mail.gmail.com
Backpatch: 9.3-
Commits 742869946 et al turn out to be a couple bricks shy of a load.
We were dumping the stored values of GUC_LIST_QUOTE variables as they
appear in proconfig or setconfig catalog columns. However, although that
quoting rule looks a lot like SQL-identifier double quotes, there are two
critical differences: empty strings ("") are legal, and depending on which
variable you're considering, values longer than NAMEDATALEN might be valid
too. So the current technique fails altogether on empty-string list
entries (as reported by Steven Winfield in bug #15248) and it also risks
truncating file pathnames during dump/reload of GUC values that are lists
of pathnames.
To fix, split the stored value without any downcasing or truncation,
and then emit each element as a SQL string literal.
This is a tad annoying, because we now have three copies of the
comma-separated-string splitting logic in varlena.c as well as a fourth
one in dumputils.c. (Not to mention the randomly-different-from-those
splitting logic in libpq...) I looked at unifying these, but it would
be rather a mess unless we're willing to tweak the API definitions of
SplitIdentifierString, SplitDirectoriesString, or both. That might be
worth doing in future; but it seems pretty unsafe for a back-patched
bug fix, so for now accept the duplication.
Back-patch to all supported branches, as the previous fix was.
Discussion: https://postgr.es/m/7585.1529435872@sss.pgh.pa.us
pg_dump knew about printing ALTER TABLE ... REPLICA IDENTITY USING INDEX
for indexes declared as indexes, but it failed to print that for indexes
declared as unique or primary-key constraints. Per report from Achilleas
Mantzios.
This has been broken since the feature was introduced, AFAICS.
Back-patch to 9.4.
Discussion: https://postgr.es/m/1e6cc5ad-b84a-7c07-8c08-a4d0c3cdc938@matrix.gatewaynet.com
Affected test queries have been testing the wrong thing since their
introduction in commit 4c1383efd132e4f532213c8a8cc63a455f55e344.
Back-patch to 9.3 (all supported versions).
Commit 5770172cb0c9df9e6ce27c507b449557e5b45124 documented secure schema
usage, and that advice suffices for using unqualified names securely.
Document, in typeconv-func primarily, the additional issues that arise
with qualified names. Back-patch to 9.3 (all supported versions).
Reviewed by Jonathan S. Katz.
Discussion: https://postgr.es/m/20180721012446.GA1840594@rfd.leadboat.com
Previously pg_upgrade checked for the pid file and started/stopped the
server to force a clean shutdown. However, "pg_ctl -m immediate"
removes the pid file but doesn't do a clean shutdown, so check
pg_controldata for a clean shutdown too.
Diagnosed-by: Vimalraj A
Discussion: https://postgr.es/m/CAFKBAK5e4Q-oTUuPPJ56EU_d2Rzodq6GWKS3ncAk3xo7hAsOZg@mail.gmail.com
Backpatch-through: 9.3
I blew the dust off a Bourne shell (file date 1996, yea verily) and
tried to run test.sh with it. It mostly worked, but I found that the
temp-directory creation code introduced by commit be76a6d39 was not
compatible, for a couple of reasons: this shell thinks "set -e" should
force an exit if a command within backticks fails, and it also thinks code
within braces should be executed by a sub-shell, meaning that variable
settings don't propagate back up to the parent shell. In view of Victor
Wagner's report that Solaris is still using pre-POSIX shells, seems like
we oughta make this case work. It's not like the code is any less
idiomatic this way; the prior coding technique appeared nowhere else.
(There is a remaining bash-ism here, which is that $RANDOM doesn't do
what the code hopes in non-bash shells. But the use of $$ elsewhere in
that path should be enough to ensure uniqueness and some amount of
randomness, so I think it's okay as-is.)
Back-patch to all supported branches, as the previous commit was.
Discussion: https://postgr.es/m/20180720153820.69e9ae6c@fafnir.local.vm
PostgreSQL 9.4 introduces posting list compression in GIN. This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly. Underlying code appears to always
expect at least one item on uncompressed posting list page. But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees. This commit fixes that.
Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com
Author: Sivasubramanian Ramasubramanian, Alexander Korotkov
Backpatch-through: 9.4
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.
Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
lca_inner() wasn't prepared for the possibility of getting no inputs.
Fix that, and make some cosmetic improvements to the code while at it.
Also, I thought the documentation of this function as returning the
"longest common prefix" of the paths was entirely misleading; it really
returns a path one shorter than the longest common prefix, for the typical
definition of "prefix". Don't use that term in the docs, and adjust the
examples to clarify what really happens.
This has been broken since its beginning, so back-patch to all supported
branches.
Per report from Hailong Li. Thanks to Pierre Ducroquet for diagnosing
and for the initial patch, though I whacked it around some and added
test cases.
Discussion: https://postgr.es/m/5b0d8e4f-f2a3-1305-d612-e00e35a7be66@qunar.com
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking. There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.
This does not add any locking overhead in the normal code path where the
page is OK. It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.
Problem noted by R P Asim. It's been like this for a long time, so
back-patch to all supported branches.
Discussion: https://postgr.es/m/CANXE4Te4G0TGq6cr0-TvwP0H4BNiK_-hB5gHe8mF+nz0mcYfMQ@mail.gmail.com
Explain that you can use any integer expression for the "count" in
pl/pgsql's versions of FETCH/MOVE, unlike the SQL versions which only
allow a constant.
Remove the duplicate version of this para under MOVE. I don't see
a good reason to maintain two identical paras when we just said that
MOVE works exactly like FETCH.
Per Pavel Stehule, though I didn't use his text.
Discussion: https://postgr.es/m/CAFj8pRAcvSXcNdUGx43bOK1e3NNPbQny7neoTLN42af+8MYWEA@mail.gmail.com
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.
This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server. The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.
Backpatch down to 9.4 where this code has been introduced.
Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko
Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
Commit fafa374f2 caused _bt_getbuf() to possibly emit a WAL record for
a page that it was about to recycle. However, it failed to distinguish
all-zero pages from dead pages, which is important because only the
latter have valid btpo.xact values, or indeed any special space at all.
Recycling an all-zero page with XLogStandbyInfoActive() enabled therefore
led to an Assert failure, or to emission of a WAL record containing a
bogus cutoff XID, which might lead to unnecessary query cancellations
on hot standby servers.
Per reports from Antonin Houska and 自己. Amit Kapila was first to
propose this fix, and Robert Haas, myself, and Kyotaro Horiguchi
reviewed it at various times.
This is an old bug, so back-patch to all supported branches.
Discussion: https://postgr.es/m/2628.1474272158@localhost
Discussion: https://postgr.es/m/48875502.f4a0.1635f0c27b0.Coremail.zoulx1982@163.com
Back-patch commit dddfc4cb2, which broke LDFLAGS and related Makefile
variables into two parts, one for within-build-tree library references and
one for external libraries, to ensure that the order of -L flags has all
of the former before all of the latter. This turns out to fix a problem
recently noted on buildfarm member peripatus, that we attempted to
incorporate code from libpgport.a into a shared library. That will fail on
platforms that are sticky about putting non-PIC code into shared libraries.
(It's quite surprising we hadn't seen such failures before, since the code
in question has been like that for a long time.)
I think that peripatus' problem could have been fixed with just a subset
of this patch; but since the previous issue of accidentally linking to the
wrong copy of a Postgres shlib seems likely to bite people in the field,
let's just back-patch the whole change. Now that commit dddfc4cb2 has
survived some beta testing, I'm less afraid to back-patch it than I was
at the time.
This also fixes undesired inclusion of "-DFRONTEND" in pg_config's CPPFLAGS
output (in 9.6 and up) and undesired inclusion of "-L../../src/common" in
its LDFLAGS output (in all supported branches).
Back-patch to v10 and older branches; this is already in v11.
Discussion: https://postgr.es/m/20180704234304.bq2dxispefl65odz@ler-imac.local
If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position older than the first end-of-recovery checkpoint while
all the WAL available should be replayed. This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed. Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.
Pavan Deolasee has reported and diagnosed the failure in the first
place, and the base fix idea to rely on the local copy of
minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me. The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to make it faster and more reliable with sleep phases.
Backpatch down to all supported versions where the bug appears, aka 9.3
which is where the end-of-recovery checkpoint is not run by the startup
process anymore. The test gets easily supported down to 10, still it
has been tested on all branches.
Reported-by: Pavan Deolasee
Diagnosed-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
When deleting pages the nbtree code has to walk through siblings of a
tree node. When those sibling links are corrupted that can lead to
endless loops - which are currently not interruptible. This is
especially problematic if autovacuum is repeatedly blocked on such
indexes, as it can be hard to get out of that situation without
resorting to single user mode.
Thus add interrupt checks to appropriate places in such
loops. Unfortunately in one of the cases it's it's not easy to do so.
Between 9.3 and 9.4 the page deletion (and page split) code changed
significantly. Before it was significantly less robust against
interruptions. Therefore don't backpatch to 9.3.
Author: Andres Freund
Discussion: https://postgr.es/m/20180627191629.wkunw2qbibnvlz53@alap3.anarazel.de
Backpatch: 9.4-
When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.
To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.
This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
When these programs call pg_catalog.set_config, they need to check for
PGRES_TUPLES_OK instead of PGRES_COMMAND_OK. Fix for
5770172cb0c9df9e6ce27c507b449557e5b45124.
Reported-by: Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>