DST law changes in Morocco and the Canadian Yukon.
Historical corrections for Shanghai.
The America/Godthab zone is renamed to America/Nuuk to reflect
current English usage; however, the old name remains available as a
compatibility link.
The test is proving to have timing issues when looking at archive status
files on standbys after crash recovery, while other parts of the test
rely on pg_stat_archiver as a wait point to make sure that a given state
of the archiving is reached. The coverage is not heavily impacted by
the removal those extra tests.
Per reports from several buildfarm animals, like crake, piculet,
culicidae and francolin.
Discussion: https://postgr.es/m/20200424005929.GK33034@paquier.xyz
Backpatch-through: 9.5
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode. However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.
This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not. Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.
A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.
Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
In a9c35cf85ca I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.
In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.
Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.
Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85ca
Checking if Subject Alternative Names (SANs) from a certificate match
with the hostname connected to leaked memory after each lookup done.
This is broken since acd08d7 that added support for SANs in SSL
certificates, so backpatch down to 9.5.
Author: Roman Peshkurov
Reviewed-by: Hamid Akhtar, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CALLDf-pZ-E3mjxd5=bnHsDu9zHEOnpgPgdnO84E2RuwMCjjyPw@mail.gmail.com
Backpatch-through: 9.5
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation. However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening. And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.
If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory. In debug builds that'd fairly
reliably cause an assertion failure. In production we might often
get away with it, but with some bad luck it could cause a core dump.
Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away. (They'd still get dropped by the next session that
uses that temp schema.)
To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.
Per bug #16378 from Alexander Lakhin. This has been wrong for a
very long time, so back-patch to all supported branches.
Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call. Sure enough, I found some violations.
The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL. There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input. We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints. Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.
Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls. While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all. There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions. In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that. Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null. These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
When a partition is detached, any triggers that had been cloned from its
parent were not properly disentangled from its parent triggers.
This resulted in triggers that could not be dropped because they
depended on the trigger in the trigger in the no-longer-parent table:
ALTER TABLE t DETACH PARTITION t1;
DROP TRIGGER trig ON t1;
ERROR: cannot drop trigger trig on table t1 because trigger trig on table t requires it
HINT: You can drop trigger trig on table t instead.
Moreover the table can no longer be re-attached to its parent, because
the trigger name is already taken:
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR: trigger "trig" for relation "t1" already exists
The former is a bug introduced in commit 86f575948c77. (The latter is
not necessarily a bug, but it makes the bug more uncomfortable.)
To avoid the complexity that would be needed to tell whether the trigger
has a local definition that has to be merged with the one coming from
the parent table, establish the behavior that the trigger is removed
when the table is detached.
Backpatch to pg11.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com
The views pg_stat_progress_* had not gotten the memo that
pg_read_all_stats is supposed to be able to read all statistics. Also
make a pass over all text-returning pg_stat_xyz functions that could
return "insufficient privilege" and make sure they also respect
pg_read_all_status.
Reported-by: Andrey M. Borodin
Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi
Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
We have repeatedly seen the buildfarm reach the Assert(false) in
SyncRepGetSyncStandbysPriority. This apparently is due to failing to
consider the possibility that the sync_standby_priority values in
shared memory might be inconsistent; but they will be whenever only
some of the walsenders have updated their values after a change in
the synchronous_standby_names setting. That function is vastly too
complex for what it does, anyway, so rewriting it seems better than
trying to apply a band-aid fix.
Furthermore, the API of SyncRepGetSyncStandbys is broken by design:
it returns a list of WalSnd array indexes, but there is nothing
guaranteeing that the contents of the WalSnd array remain stable.
Thus, if some walsender exits and then a new walsender process
takes over that WalSnd array slot, a caller might make use of
WAL position data that it should not, potentially leading to
incorrect decisions about whether to release transactions that
are waiting for synchronous commit.
To fix, replace SyncRepGetSyncStandbys with a new function
SyncRepGetCandidateStandbys that copies all the required data
from shared memory while holding the relevant mutexes. If the
associated walsender process then exits, this data is still safe to
make release decisions with, since we know that that much WAL *was*
sent to a valid standby server. This incidentally means that we no
longer need to treat sync_standby_priority as protected by the
SyncRepLock rather than the per-walsender mutex.
SyncRepGetSyncStandbys is no longer used by the core code, so remove
it entirely in HEAD. However, it seems possible that external code is
relying on that function, so do not remove it from the back branches.
Instead, just remove the known-incorrect Assert. When the bug occurs,
the function will return a too-short list, which callers should treat
as meaning there are not enough sync standbys, which seems like a
reasonably safe fallback until the inconsistent state is resolved.
Moreover it's bug-compatible with what has been happening in non-assert
builds. We cannot do anything about the walsender-replacement race
condition without an API/ABI break.
The bogus assertion exists back to 9.6, but 9.6 is sufficiently
different from the later branches that the patch doesn't apply at all.
I chose to just remove the bogus assertion in 9.6, feeling that the
probability of a bad outcome from the walsender-replacement race
condition is too low to justify rewriting the whole patch for 9.6.
Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
In some corner cases, this could also lead to corrupted values being
included in the tuple.
Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.
Fixes bug #16369.
Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
Apparently in some language versions of Visual Studio nmake outputs some
material after the version number and before the end of the line. This
has been seen in Chinese versions. Therefore, we no longer demand that
the version string comes at the end of a line.
Per complaint from Cuiping Lin.
Backpatch to all live branches.
The result of the query used to retrieve the WAL segment size from the
backend was not getting freed in two code paths. Both pg_basebackup and
pg_receivewal exit immediately if a failure happened on this query, so
this was not an actual problem, but it could be an issue if this code
gets used for other tools in different ways, be they future tools in
this code tree or external, existing, ones.
Oversight in commit fc49e24, so backpatch down to 11.
Author: Jie Zhang
Discussion: https://postgr.es/m/970ad9508461469b9450b64027842331@G08CNEXMBPEKD06.g08.fujitsu.local
Backpatch-through: 11
ExecReScanHashJoin will destroy the join's hash table if it expects
that the inner relation will produce different rows on rescan.
Up to now it's not bothered to clear the additional pointer to that
hash table that exists in the child HashState node. However, it's
possible for the query to terminate without building a fresh hash
table (this happens if the outer relation is found to be empty
during the final rescan). So we can end with a dangling pointer
to a deleted hash table. That was harmless originally, but since
9.0 EXPLAIN ANALYZE has used that pointer to print hash table
statistics. In debug builds this reproducibly results in garbage
statistics. In non-debug builds there's frequently no ill effects,
but in principle one could get wrong EXPLAIN ANALYZE output, or
perhaps even a crash if free() has released the hashtable memory
back to the OS.
To fix, just make sure we clear the additional pointer when destroying
the hash table. In problematic cases, EXPLAIN ANALYZE will then print
no hashtable statistics (reverting to its pre-9.0 behavior). This isn't
ideal, but since the problem manifests only in unusual corner cases,
it's hard to justify taking any risks to do better in the back
branches. A follow-on patch will improve matters in HEAD.
Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.
Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
Suppress a probably-meaningless uninitialized-variable warning
(induced by my previous patch, I'm sorry to say).
Improve mark_hl_fragments()'s test for overlapping cover strings:
it failed to consider the possibility that the current string is
strictly within another one. That's unlikely given the preceding
splitting into MaxWords fragments, but I don't think it's impossible.
Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
This code could produce very poor results when asked to highlight a
string based on a query using phrase-match operators. The root cause
is that hlCover(), which is supposed to find a minimal substring that
matches the query, was written assuming that word position is not
significant. I'm only 95% convinced that its algorithm was correct even
for plain AND/OR queries; but it definitely fails completely for phrase
matches, causing it to possibly not identify a cover string at all.
Hence, rewrite hlCover() with a less-tense algorithm that just tries
all the possible substrings, earlier and shorter ones first. (This is
not as bad as it sounds performance-wise, because all of the string
matching has been done already: the repeated tsquery match checks
boil down to pointer comparisons.)
Unfortunately, since that approach produces more candidate cover
strings than before, it also exposes that there were bugs in the
heuristics in mark_hl_words() for selecting a best cover string.
Fixes there include:
* Do not apply the ShortWord filter to words that appear in the query.
* Remove a misguided optimization for quickly rejecting a cover.
* Fix order-of-operation bug that could cause computation of a
wrong figure of merit (poslen) when shortening a cover.
* Change the preference rule so that candidate headlines that do not
include their whole cover string (after MaxWords trimming) are lowest
priority, since they may not actually satisfy the user's query.
This results in some changes in existing regression test cases,
but they all seem reasonable. Note in particular that the tests
involving strings like "1 2 3" were previously being affected by
the ShortWord filter, masking the normal matching behavior.
Per bug #16345 from Augustinas Jokubauskas; the new test cases are
based on that example. Back-patch to 9.6 where phrase search was
added to tsquery.
Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
This code was woefully unreadable and under-commented. Try to improve
matters by adding comments, using some macros to make complicated
if-tests more readable, using boolean type where appropriate, etc.
There are a couple of tiny coding improvements too, but this commit
includes (I hope) no behavioral change.
Nonetheless, back-patch as far as 9.6, because a followup bug-fixing
commit depends on this.
Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
CREATE TABLE LIKE INCLUDING GENERATED would fail if a generated column
referred to a column with a higher attribute number. This is because
the column mapping mechanism created the mapping incrementally as
columns are added. This was sufficient for previous uses of that
mechanism (omitting dropped columns), and it also happened to work if
generated columns only referred to columns with lower attribute
numbers, but here it failed.
This fix is to build the attribute mapping in a separate loop before
processing the columns in detail.
Bug: #16342
Reported-by: Ethan Waldo <ewaldo@healthetechs.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Currently, we don't account for buffer usage incurred by parallel workers
for parallel create index. This commit allows each worker to record the
buffer usage stats and leader backend to accumulate that stats at the
end of the operation. This will allow pg_stat_statements to display
correct buffer usage stats for (parallel) create index command.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila
Backpatch-through: 11, where this was introduced
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
Repair an oversight in commit 8728b2c70: if we're postponing restore
of event triggers to the end, we must also postpone restoring any
comments on them, since of course we cannot create the comments first.
(This opens yet another opportunity for an event trigger to bollix
the restore, but there's no help for that.)
Per bug #16346 from Alexander Lakhin.
Like the previous commit, back-patch to all supported branches.
Hamid Akhtar and Tom Lane
Discussion: https://postgr.es/m/16346-6210ad7a0ea81be1@postgresql.org
Attempting to use a COLLATE clause with a type that it not collatable in
a partition bound expression could crash the server. This commit fixes
the code by adding more checks similar to what is done when computing
index or partition attributes by making sure that there is a collation
iff the type is collatable.
Backpatch down to 12, as 7c079d7 introduced this problem.
Reported-by: Alexander Lakhin
Author: Dmitry Dolgov
Discussion: https://postgr.es/m/16325-809194cf742313ab@postgresql.org
Backpatch-through: 12
Our documentation describes four allowed input syntaxes for circles,
but the regression tests tried only three ... with predictable
consequences. Remarkably, this has been wrong since the circle
datatype was added in 1997, but nobody noticed till now.
David Zhang, with some help from me
Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value. However, it'd be good if they could do at
least that much, and not fall over entirely for longer bytea values.
Adjust the comparisons to be done in int64 arithmetic so that works.
Also tweak the error reports to show sane values in case of overflow.
Also add some test cases to improve the miserable code coverage
of these functions.
Apply patch to back branches only; HEAD has a better solution
as of commit 26a944cf2.
Extracted from a much larger patch by Movead Li
Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER. This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly. Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.
Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
There's a very low risk that RecentGlobalXmin could be far enough in
the past to be older than relfrozenxid, or even wrapped
around. Luckily the consequences of that having happened wouldn't be
too bad - the page wouldn't be pruned for a while.
Avoid that risk by using TransactionXmin instead. As that's announced
via MyPgXact->xmin, it is protected against wrapping around (see code
comments for details around relfrozenxid).
Author: Andres Freund
Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de
Backpatch: 9.5-
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.
The posting-tree path failed to advance "advancePast" after having
decided to filter an item. If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page. Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit. To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.
The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.
The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement. I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.
Refactor all three loops so that the termination conditions are
more alike and less unreadable.
The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.
Back-patch to all supported branches.
Adé Heyward and Tom Lane
Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger. That naturally led to a core dump.
unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger. That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.
The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.
Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.
Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
In a psql session, if the connection to the server is abruptly cut, the
referenced connection would become NULL as of CheckConnection(). This
could cause a hard crash with psql if attempting to connect by reusing
the past connection's data because of a null-pointer dereference with
either PQhost() or PQdb(). This issue is fixed by making sure that no
reuse of the past connection is done if it does not exist.
Issue has been introduced by 6e5f8d4, so backpatch down to 12.
Reported-by: Hugh Wang
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/16330-b34835d83619e25d@postgresql.org
Backpatch-through: 12
Must hold some lock on the pg_statistic_ext_data catalog *before*
we look up the tuple we aim to replace. Otherwise a concurrent
VACUUM FULL or similar operation could move it to a different TID,
leaving us trying to replace the wrong tuple.
Back-patch to v12 where this got broken.
Credit goes to Dean Rasheed; I'm just doing the clerical work.
Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
Buildfarm experience shows that this function can fail with ENOENT
if some other process unlinks a file between when we read the directory
entry and when we try to stat() it. The problem is old but we had
not noticed it until 085b6b667 added regression test coverage.
To fix, just ignore ENOENT failures. There is one other case that
this might hide: a symlink that points to nowhere. That seems okay
though, at least better than erroring.
Back-patch to v10 where this function was added, since the regression
test cases were too.
Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
This reverts commit 2aa6e33, that added a fast path to skip
anti-wraparound and non-aggressive autovacuum jobs (these have no sense
as anti-wraparound implies aggressive). With a cluster using a high
amount of relations with a portion of them being heavily updated, this
could cause autovacuum to lock down, with autovacuum workers attempting
repeatedly those jobs on the same relations for the same database, that
just kept being skipped. This lock down can be solved with a manual
VACUUM FREEZE.
Justin King has reported one environment where the issue happened, and
Julien Rouhaud and I have been able to reproduce it in a second
environment. With a very aggressive autovacuum_freeze_max_age,
triggering those jobs with pgbench is a matter of minutes, and hitting
the lock down is a lot harder (my local tests failed to do that).
Note that anti-wraparound and non-aggressive jobs can only be triggered
on a subset of shared catalogs:
- pg_auth_members
- pg_authid
- pg_database
- pg_replication_origin
- pg_shseclabel
- pg_subscription
- pg_tablespace
While the lock down was possible down to v12, the root cause of those
jobs is a much older issue, which needs more analysis.
Bonus thanks to Andres Freund for the discussion.
Reported-by: Justin King
Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com
Backpatch-through: 12
INCLUDE indexes failed to have their non-key attributes physically
truncated away in certain rare cases. This led to physically larger
pivot tuples that contained useless non-key attribute values. The
impact on users should be negligible, but this is still clearly a
regression (Postgres 11 supports INCLUDE indexes, and yet was not
affected).
The bug appeared in commit dd299df8, which introduced "true" suffix
truncation of key attributes.
Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com
Backpatch: 12-, where "true" suffix truncation was introduced.
GetLocaleInfoEx() can fail on strings that setlocale() was perfectly
happy with. A common way for that to happen is if the locale string
is actually a Unix-style string, say "et_EE.UTF-8". In that case,
what's after the dot is an encoding name, not a Windows codepage number;
blindly treating it as a codepage number led to failure, with a fairly
silly error message. Hence, check to see if what's after the dot is
all digits, and if not, treat it as a literal encoding name rather than
a codepage number. This will do the right thing with many Unix-style
locale strings, and produce a more sensible error message otherwise.
Somewhat independently of that, treat a zero (CP_ACP) result from
GetLocaleInfoEx() as meaning that we must use UTF-8 encoding.
Back-patch to all supported branches.
Juan José Santamaría Flecha
Discussion: https://postgr.es/m/24905.1585445371@sss.pgh.pa.us
In 9.4 I added support to use a historical snapshot in
ScanPgRelation(), while adding logical decoding. Unfortunately a
conflict with the concurrent removal of SnapshotNow was incorrectly
resolved, leading to an unregistered snapshot being used.
It is not correct to use an unregistered (or non-active) snapshot for
anything non-trivial, because catalog invalidations can cause the
snapshot to be invalidated.
Luckily it seems unlikely to actively cause problems in practice, as
ScanPgRelation() requires that we already have a lock on the relation,
we only look for a single row, and we don't appear to rely on the
result's tid to be correct. It however is clearly wrong and potential
negative consequences would likely be hard to find. So it seems worth
backpatching the fix, even without a concrete hazard.
Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de
Backpatch: 9.5-
plpgsql_xact_cb ought to treat events XACT_EVENT_PARALLEL_COMMIT and
XACT_EVENT_PARALLEL_ABORT like XACT_EVENT_COMMIT and XACT_EVENT_ABORT
respectively, since its goal is to do process-local cleanup. This
oversight caused plpgsql's end-of-transaction cleanup to not get done
in parallel workers. Since a parallel worker will exit just after the
transaction cleanup, the effects of this are limited. I couldn't find
any case in the core code with user-visible effects, but perhaps there
are some in extensions. In any case it's wrong, so let's fix it before
it bites us not after.
In passing, add some comments around the handling of expression
evaluation resources in DO blocks. There's no live bug there, but it's
quite unobvious what's happening; at least I thought so. This isn't
related to the other issue, except that I found both things while poking
at expression-evaluation performance.
Back-patch the plpgsql_xact_cb fix to 9.5 where those event types
were introduced, and the DO-block commentary to v11 where DO blocks
gained the ability to issue COMMIT/ROLLBACK.
Discussion: https://postgr.es/m/10353.1585247879@sss.pgh.pa.us
When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.
This could result in a walsender being stuck on the lock forever. A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.
Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com>
Reported-by: Craig Ringer <craig@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
Now that we require C99, we can depend on __VA_ARGS__ to work, and
revising ereport() to use it has several significant benefits:
* The extra parentheses around the auxiliary function calls are now
optional. Aside from being a bit less ugly, this removes a common
gotcha for new contributors, because in some cases the compiler errors
you got from forgetting them were unintelligible.
* The auxiliary function calls are now evaluated as a comma expression
list rather than as extra arguments to errfinish(). This means that
compilers can be expected to warn about no-op expressions in the list,
allowing detection of several other common mistakes such as forgetting
to add errmsg(...) when converting an elog() call to ereport().
* Unlike the situation with extra function arguments, comma expressions
are guaranteed to be evaluated left-to-right, so this removes platform
dependency in the order of the auxiliary function calls. While that
dependency hasn't caused us big problems in the past, this change does
allow dropping some rather shaky assumptions around errcontext() domain
handling.
There's no intention to make wholesale changes of existing ereport
calls, but as proof-of-concept this patch removes the extra parens
from a couple of calls in postgres.c.
While new code can be written either way, code intended to be
back-patched will need to use extra parens for awhile yet. It seems
worth back-patching this change into v12, so as to reduce the window
where we have to be careful about that by one year. Hence, this patch
is careful to preserve ABI compatibility; a followup HEAD-only patch
will make some additional simplifications.
Andres Freund and Tom Lane
Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
src/port/getopt_long.c failed on such an argument, always seeing it
as an unrecognized switch. This is unhelpful; better is to treat such
an item as a non-switch argument. That behavior is what we find in
GNU's getopt_long(); it's what src/port/getopt.c does; and it is
required by POSIX for getopt(), which getopt_long() ought to be
generally a superset of. Moreover, it's expected by ecpg, which
intends an argument of "-" to mean "read from stdin". So fix it.
Also add some documentation about ecpg's behavior in this area, since
that was miserably underdocumented. I had to reverse-engineer it
from the code.
Per bug #16304 from James Gray. Back-patch to all supported branches,
since this has been broken forever.
Discussion: https://postgr.es/m/16304-c662b00a1322db7f@postgresql.org
This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64. Back-patch to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring
the actual arguments. Existing callers passed exactly those values, so
there's no live bug. Back-patch to v12, where the function first
appeared, because another fix needs this.
Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
swap_relation_files() calls toast_get_valid_index() to find and lock
this index, just before swapping with the rebuilt TOAST index. The
latter function releases the lock before returning. Potential for
mischief is low; a concurrent session can issue ALTER INDEX ... SET
(fillfactor = ...), which is not alarming. Nonetheless, changing
pg_class.relfilenode without a lock is unconventional. Back-patch to
9.5 (all supported versions), because another fix needs this.
Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com