Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
ALTER SYSTEM itself normally won't make duplicate entries (although
up till this patch, it was possible to confuse it by writing case
variants of a GUC's name). However, if some external tool has appended
entries to the file, that could result in duplicate entries for a single
GUC name. In such a situation, ALTER SYSTEM did exactly the wrong thing,
because it replaced or removed only the first matching entry, leaving
the later one(s) still there and hence still determining the active value.
This patch fixes that by making ALTER SYSTEM sweep through the file and
remove all matching entries, then (if not ALTER SYSTEM RESET) append the
new setting to the end. This means entries will be in order of last
setting rather than first setting, but that shouldn't hurt anything.
Also, make the comparisons case-insensitive so that the right things
happen if you do, say, ALTER SYSTEM SET "TimeZone" = 'whatever'.
This has been broken since ALTER SYSTEM was invented, so back-patch
to all supported branches.
Ian Barwick, with minor mods by me
Discussion: https://postgr.es/m/aed6cc9f-98f3-2693-ac81-52bb0052307e@2ndquadrant.com
The initial value of the nbtree stack downlink block number field
recorded during an initial descent of the tree wasn't actually used.
Both _bt_getstackbuf() callers overwrote the value with their own value.
Remove the block number field from the stack struct, and add a child
block number argument to _bt_getstackbuf() in its place. This makes the
overall design of _bt_getstackbuf() clearer.
Author: Peter Geoghegan
Reviewed-By: Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzmx+UbXt2YNOUCZ-a04VdXU=S=OHuAuD7Z8uQq-PXTYUg@mail.gmail.com
This is a fix similar to 2d7d67cc, where slight plan alteration can
cause a random failure of this regression test because of an incorect
tuple ordering, except that this one involves lookups of pg_type.
Similarly to the other case, add ORDER BY clauses to ensure the output
order.
The failure has been seen at least once on buildfarm member skink.
Reported-by: Thomas Munro
Discussion: https://postgr.es/m/CA+hUKGLjR9ZBvhXcr9b-NSBHPw9aRgbjyzGE+kqLsT4vwX+nkQ@mail.gmail.com
Backpatch-through: 12
Commit d2086b08b02 removed almost all cases where nbtree must release a
read buffer lock and acquire a write buffer lock instead, so remaining
cases in which that's still necessary are not notable enough to appear
in the nbtree README.
More importantly, holding on to a buffer pin in cases where nbtree must
trade a read lock for a write lock is very unlikely to save any I/O.
This seems to have been a long overlooked throwback to a time when
nbtree cared about write-ordering dependencies, and performed
synchronous buffer writes. It hasn't worked that way in many years.
Commit 07b39083c inserted an unconditional reference to pg_opfamily,
which of course fails on servers predating that catalog. Fortunately,
the case it's trying to solve can't occur on such old servers (AFAIK).
Hence, just skip the additional code when the source predates 8.3.
Per bug #15955 from sly. Back-patch to all supported branches,
like the previous patch.
Discussion: https://postgr.es/m/15955-1daa2e676e903d87@postgresql.org
Use the PageIndexTupleOverwrite() bufpage.c routine within nbtree
instead of deleting a tuple and re-inserting its replacement. This
makes the intent of affected code slightly clearer. It also makes
CREATE INDEX slightly faster, since there is no longer a need to shift
every leaf page's line pointer array back and forth during index builds.
Author: Peter Geoghegan, Anastasia Lubennikova
Reviewed-By: Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wz=Zk=B9+Vwm376WuO7YTjFc2SSskifQm4Nme3RRRPtOSQ@mail.gmail.com
We only need to invoke constraint exclusion on partitioned tables when
they are a partition, and they themselves contain a default partition;
it's not necessary otherwise, and it's expensive, so avoid it. Also, we
were trying once for each clause separately, but we can do it for all
the clauses at once.
While at it, centralize setting of RelOptInfo->partition_qual instead of
computing it in slightly different ways in different places.
Per complaints from Simon Riggs about 4e85642d935e; reviewed by Yuzuko
Hosoya, Kyotaro Horiguchi.
Author: Amit Langote. I (Álvaro) again mangled the patch somewhat.
Discussion: https://postgr.es/m/CANP8+j+tMCY=nEcQeqQam85=uopLBtX-2vHiLD2bbp7iQQUKpA@mail.gmail.com
This test case could fail because of an incorrect result ordering when
looking up at pg_class entries. This commit adds an ORDER BY to the
culprit query. The cause of the failure was likely caused by a plan
switch. By default, the planner would likely choose an index-only scan
or an index scan, but even a small change in the startup cost could have
caused a bitmap heap scan to be chosen, causing the failure.
While on it, switch some filtering quals to a regular expression as per
an idea of Tom Lane. As previously shaped, the quals would have
selected any relations whose name begins with "temp". And that could
cause failures if another test running in parallel began to use similar
relation names.
Per report from buildfarm member anole, though the failure was very
rare. This test has been introduced by 319a810, so backpatch down to
v10.
Discussion: https://postgr.es/m/20190807132422.GC15695@paquier.xyz
Backpatch-through: 10
contrib/amcheck failed to consider the possibility that unlogged
relations will not have any main relation fork files when running in hot
standby mode. This led to low-level "can't happen" errors that complain
about the absence of a relfilenode file.
To fix, simply skip verification of unlogged index relations during
recovery. In passing, add a direct check for the presence of a main
fork just before verification proper begins, so that we cleanly verify
the presence of the main relation fork file.
Author: Andrey Borodin, Peter Geoghegan
Reported-By: Andrey Borodin
Diagnosed-By: Andrey Borodin
Discussion: https://postgr.es/m/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru
Backpatch: 10-, where amcheck was introduced.
As coded, the ICU-collation path in pattern_char_isalpha() failed
to consider regular ASCII letters to be case-varying. This led to
like_fixed_prefix treating too much of an ILIKE pattern as being a
fixed prefix, so that indexscans derived from an ILIKE clause might
miss entries that they should find.
Per bug #15892 from James Inform. This is an oversight in the original
ICU patch (commit eccfef81e), so back-patch to v10 where that came in.
Discussion: https://postgr.es/m/15892-e5d2bea3e8a04a1b@postgresql.org
Now that list_nth is O(1), there's no good reason to maintain a
separate array of RTE pointers rather than indexing into
estate->es_range_table. Deleting the array doesn't save all that
much either; but just on cleanliness grounds, it's better not to
have duplicate representations of the identical information.
Discussion: https://postgr.es/m/14960.1565384592@sss.pgh.pa.us
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
This reverts much of commit f03a9ca4366d064d89b7cf7ed75d4e43f2ed0667,
but leaves the relpages/reltuples probe in select_parallel.sql.
The pg_stat_all_tables probes are unstable enough to be annoying,
and it no longer seems likely that they will teach us anything more
about the underlying problem. I'd still like some more confirmation
though that the observed plan instability is caused by VACUUM leaving
relpages/reltuples as zero for one of these tables.
Discussion: https://postgr.es/m/CA+hUKG+0CxrKRWRMf5ymN3gm+BECHna2B-q1w8onKBep4HasUw@mail.gmail.com
We have implemented jsonpath string comparison using default database locale.
However, standard requires us to compare Unicode codepoints. This commit
implements that, but for performance reasons we still use per-byte comparison
for "==" operator. Thus, for consistency other comparison operators do per-byte
comparison if Unicode codepoints appear to be equal.
In some edge cases, when same Unicode codepoints have different binary
representations in database encoding, we diverge standard to achieve better
performance of "==" operator. In future to implement strict standard
conformance, we can do normalization of input JSON strings.
Original patch was written by Nikita Glukhov, rewritten by me.
Reported-by: Markus Winand
Discussion: https://postgr.es/m/8B7FA3B4-328D-43D7-95A8-37B8891B8C78%40winand.at
Author: Nikita Glukhov, Alexander Korotkov
Backpatch-through: 12
This failed with either "tuple already updated by self" or "duplicate
key value violates unique constraint", depending on whether the table
had previously been analyzed or not. The reason is that ANALYZE tried
to insert or update the same pg_statistic rows twice, and there was no
CommandCounterIncrement between. So add one. The same case works fine
outside a transaction block, because then there's a whole transaction
boundary between, as a consequence of the way VACUUM works.
This issue has been latent all along, but the problem was unreachable
before commit 11d8d72c2 added the ability to specify multiple tables
in ANALYZE. We could, perhaps, alternatively fix it by adding code to
de-duplicate the list of VacuumRelations --- but that would add a
lot of overhead to work around dumb commands, so it's not attractive.
Per bug #15946 from Yaroslav Schekin. Back-patch to v11.
(Note: in v11 I also back-patched the test added by commit 23224563d;
otherwise the problem doesn't manifest in the test I added, because
"vactst" is empty when the tests for multiple ANALYZE targets are
reached. That seems like not a very good thing anyway, so I did this
rather than rethinking the choice of test case.)
Discussion: https://postgr.es/m/15946-5c7570a2884a26cf@postgresql.org
Rename the "tupindex" field from tuplesort.c's SortTuple struct to
"srctape", since it can only ever be used to store a source/input tape
number when merging external sort runs. This has been the case since
commit 8b304b8b72b, which removed replacement selection sort from
tuplesort.c.
Merge setup_append_rel_array into setup_simple_rel_arrays. There's no
particularly good reason to keep them separate, and it's inconsistent
with the lack of separation in expand_planner_arrays. The only apparent
benefit was that the fast path for trivial queries in query_planner()
doesn't need to set up the append_rel_array; but all we're saving there
is an if-test and NULL assignment, which surely ought to be negligible.
Also improve some obsolete comments.
Discussion: https://postgr.es/m/17220.1565301350@sss.pgh.pa.us
b654714 has reworked the way trailing CR/LF characters are removed from
strings. This commit introduces a new routine in common/string.c and
refactors the code so as the logic is in a single place, mostly.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/20190801031820.GF29334@paquier.xyz
READTUP() routines do not and cannot use the resettable "tuplecontext"
memory context, since it is deleted when merging begins. Update an
obsolete comment that claimed otherwise. This was an oversight in
commit e94568ecc10.
In passing, fix an unrelated tuplesort typo.
I left PG_CMD_PUTS around even though it could be handled by
PG_CMD_PRINTF since PG_CMD_PUTS is sometimes called with non-literal
arguments, and so that would create a potential problem if such a
string contained percent signs.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
VACUUM's reference page had this text, but ANALYZE's didn't. That's
a clear oversight given that section 5.7 explicitly delegates the
responsibility to define permissions requirements to the individual
commands' man pages.
Per gripe from Isaac Morland. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAMsGm5fp3oBUs-2iRfii0iEO=fZuJALVyM2zJLhNTjG34gpAVQ@mail.gmail.com
We were applying constraint exclusion on the partition constraint when
generating pruning steps for a clause, but only for the rather
restricted situation of them being boolean OR operators; however it is
possible to have differently shaped clauses that also benefit from
constraint exclusion. This applies particularly to the default
partition since their constraints are in essence a long list of OR'ed
subclauses ... but it applies to other cases too. So in certain cases
we're scanning partitions that we don't need to.
Remove the specialized code in OR clauses, and add a generally
applicable test of the clause refuting the partition constraint; mark
the whole pruning operation as contradictory if it hits.
This has the unwanted side-effect of testing some (most? all?)
constraints more than once if constraint_exclusion=on. That seems
unavoidable as far as I can tell without some additional work, but
that's not the recommended setting for that parameter anyway.
However, because this imposes additional processing cost for all
queries using partitioned tables, I decided not to backpatch this
change.
Author: Amit Langote, Yuzuko Hosoya, Álvaro Herrera
Reviewers: Shawn Wang, Thibaut Madeleine, Yoshikazu Imai, Kyotaro
Horiguchi; they were also uncredited reviewers for commit 489247b0e615.
Discussion: https://postgr.es/m/9bb31dfe-b0d0-53f3-3ea6-e64b811424cf@lab.ntt.co.jp
In serializable mode, heap_hot_search_buffer() incorrectly acquired a
predicate lock on the root tuple, not the returned tuple that satisfied
the visibility checks. As explained in README-SSI, the predicate lock does
not need to be copied or extended to other tuple versions, but for that to
work, the correct, visible, tuple version must be locked in the first
place.
The original SSI commit had this bug in it, but it was fixed back in 2013,
in commit 81fbbfe335. But unfortunately, it was reintroduced a few months
later in commit b89e151054. Wising up from that, add a regression test
to cover this, so that it doesn't get reintroduced again. Also, move the
code that sets 't_self', so that it happens at the same time that the
other HeapTuple fields are set, to make it more clear that all the code in
the loop operate on the "current" tuple in the chain, not the root tuple.
Bug spotted by Andres Freund, analysis and original fix by Thomas Munro,
test case and some additional changes to the fix by Heikki Linnakangas.
Backpatch to all supported versions (9.4).
Discussion: https://www.postgresql.org/message-id/20190731210630.nqhszuktygwftjty%40alap3.anarazel.de
When parsing a timetz string with a dynamic timezone abbreviation or a
timezone not specified, it was possible to generate incorrect timestamps
based on a date which uses some non-initialized variables if the input
string did not specify fully a date to parse. This is already checked
when a full timezone spec is included in the input string, but the two
other cases mentioned above missed the same checks.
This gets fixed by generating an error as this input is invalid, or in
short when a date is not fully specified.
Valgrind was complaining about this problem.
Bug: #15910
Author: Alexander Lakhin
Discussion: https://postgr.es/m/15910-2eba5106b9aa0c61@postgresql.org
Backpatch-through: 9.4
As of now, logical decoding of a multi-insert has been scanning all
xl_multi_insert_tuple entries only if XLH_INSERT_CONTAINS_NEW_TUPLE was
getting set in the record. This is not an issue on HEAD as multi-insert
records are not used for system catalogs, but the logical decoding logic
includes all the code necessary to handle that properly, except that the
code missed to iterate correctly over all xl_multi_insert_tuple entries
when the flag is not set. Hence, when trying to use multi-insert for
system catalogs, an assertion would be triggered.
An upcoming patch is going to make use of multi-insert for system
catalogs, and this fixes the logic to make sure that all entries are
scanned correctly without softening the existing assertions.
Reported-by: Daniel Gustafsson
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CBFFD532-C033-49EB-9A5A-F67EAEE9EB0B@yesql.se
contrib/intarray considers "arraycol <@ constant-array" to be indexable,
but its GiST opclass code fails to reliably find index entries for empty
array values (which of course should trivially match such queries).
This is because the test condition to see whether we should descend
through a non-leaf node is wrong.
Unfortunately, empty array entries could be anywhere in the index,
as these index opclasses are currently designed. So there's no way
to fix this except by lobotomizing <@ indexscans to scan the whole
index ... which is what this patch does. That's pretty unfortunate:
the performance is now actually worse than a seqscan, in most cases.
We'd be better off to remove <@ from the GiST opclasses entirely,
and perhaps a future non-back-patchable patch will do so.
In the meantime, applications whose performance is adversely impacted
have a couple of options. They could switch to a GIN index, which
doesn't have this bug, or they could replace "arraycol <@ constant-array"
with "arraycol <@ constant-array AND arraycol && constant-array".
That will provide about the same performance as before, and it will find
all non-empty subsets of the given constant-array, which is all that
could reliably be expected of the query before.
While at it, add some more regression test cases to improve code
coverage of contrib/intarray.
In passing, adjust resize_intArrayType so that when it's returning an
empty array, it uses construct_empty_array for that rather than
cowboy hacking on the input array. While the hack produces an array
that looks valid for most purposes, it isn't bitwise equal to empty
arrays produced by other code paths, which could have subtle odd
effects. I don't think this code path is performance-critical
enough to justify such shortcuts. (Back-patch this part only as far
as v11; before commit 01783ac36 we were not careful about this in
other intarray code paths either.)
Back-patch the <@ fixes to all supported versions, since this was
broken from day one.
Patch by me; thanks to Alexander Korotkov for review.
Discussion: https://postgr.es/m/458.1565114141@sss.pgh.pa.us
src/test/kerberos and src/test/ldap try to run private authentication
servers, which of course might fail. The logs from these servers
were being dropped into the tmp_check/ subdirectory, but they should
be put in tmp_check/log/, because the buildfarm will only capture
log files in that subdirectory. Without the log output there's
little hope of diagnosing buildfarm failures related to these servers.
Backpatch to v11 where these test suites were added.
Discussion: https://postgr.es/m/16017.1565047605@sss.pgh.pa.us
Commit a6417078 established a new project policy around OID assignment:
new patches are encouraged to choose a random OID in the 8000..9999
range when a manually-assigned OID is required (if multiple OIDs are
required, a consecutive block of OIDs starting from the random point
should be used). Catalog entries added by committed patches that use
OIDs from this "unstable" range are renumbered after feature freeze.
This practice minimizes OID collisions among concurrently-developed
patches.
Show a specific random OID suggestion when the unused_oids script is
run. This makes it easy for patch authors to use a random OID from the
unstable range, per the new policy.
Author: Julien Rouhaud, Peter Geoghegan
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/CAH2-WzkkRs2ScmuBQ7xWi7xzp7fC1B3w0Nt8X+n4rBw5k+Z=zA@mail.gmail.com
Commit bf6c614a2 rearranged the lookup of the comparison operators
needed in a hashed subplan, and in so doing, broke the cross-type
case: it caused the original LHS-vs-RHS operator to be used to compare
hash table entries too (which of course are all of the RHS type).
This leads to C functions being passed a Datum that is not of the
type they expect, with the usual hazards of crashes and unauthorized
server memory disclosure.
For the set of hashable cross-type operators present in v11 core
Postgres, this bug is nearly harmless on 64-bit machines, which
may explain why it escaped earlier detection. But it is a live
security hazard on 32-bit machines; and of course there may be
extensions that add more hashable cross-type operators, which
would increase the risk.
Reported by Andreas Seltenreich. Back-patch to v11 where the
problem came in.
Security: CVE-2019-10209
Commit aa27977fe21a7dfa4da4376ad66ae37cb8f0d0b5 introduced this
restriction for pg_temp.function_name(arg); do likewise for types
created in temporary schemas. Programs that this breaks should add
"pg_temp." schema qualification or switch to arg::type_name syntax.
Back-patch to 9.4 (all supported versions).
Reviewed by Tom Lane. Reported by Tom Lane.
Security: CVE-2019-10208
Those data types use parsing and/or calculation wrapper routines which
can generate some generic error messages in the event of a failure. The
caller of these routines can also pass a pointer variable settable by
the routine to track if an error has happened, letting the caller decide
what to do in the event of an error and what error message to generate.
Those routines have been slacking the initialization of the tracking
flag, which can be confusing when reading the code, so add some
safeguards against calls of these parsing routines which could lead to a
dubious result.
The LSN parsing gains an assertion to make sure that the tracking flag
is set, while numeric and float paths initialize the flag to a saner
state.
Author: Jeevan Ladhe
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CAOgcT0NOM9oR0Hag_3VpyW0uF3iCU=BDUFSPfk9JrWXRcWQHqw@mail.gmail.com
OWNER_TO was used for the completion, which is not a supported grammar,
but OWNER TO is.
This error has been introduced by d37b816, so backpatch down to 9.6.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
Backpatch-through: 9.6
This reverts commit 88bdbd3f746049834ae3cc972e6e650586ec3c9d.
As committed, statement sampling used the existing duration threshold
(log_min_duration_statement) when decide which statements to sample.
The issue is that even the longest statements are subject to sampling,
and so may not end up logged. An improvement was proposed, introducing
a second duration threshold, but it would not be backwards compatible.
So we've decided to revert this feature - the separate threshold should
be part of the feature itself.
Discussion: https://postgr.es/m/CAFj8pRDS8tQ3Wviw9%3DAvODyUciPSrGeMhJi_WPE%2BEB8%2B4gLL-Q%40mail.gmail.com
This reverts commit 9dc122585551516309c9362e673effdbf3bd79bd.
As committed, statement sampling used the existing duration threshold
(log_min_duration_statement) when decide which statements to sample.
The issue is that even the longest statements are subject to sampling,
and so may not end up logged. An improvement was proposed, introducing
a second duration threshold, but it would not be backwards compatible.
So we've decided to revert this feature - the separate threshold should
be part of the feature itself.
Discussion: https://postgr.es/m/CAFj8pRDS8tQ3Wviw9%3DAvODyUciPSrGeMhJi_WPE%2BEB8%2B4gLL-Q%40mail.gmail.com
Perl has multiple internal representations of "undef", and just
testing for SvTYPE(x) == SVt_NULL doesn't recognize all of them,
leading to "cannot transform this Perl type to jsonb" errors.
Use the approved test SvOK() instead.
Report and patch by Ivan Panchenko. Back-patch to v11 where
this module was added.
Discussion: https://postgr.es/m/1564783533.324795401@f193.i.mail.ru