Partitioned tables do not have relation options yet, but, similarly to
what's done for views which have their own parsing table, it could make
sense to introduce new parameters for some of the existing default ones
like fillfactor, autovacuum, etc. Splitting things has the advantage to
make the information stored in rd_options include only the necessary
information, reducing the amount of memory used for a relcache entry
with partitioned tables if new reloptions are introduced at this level.
Author: Nikolay Shaplov
Reviewed-by: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/1627387.Qykg9O6zpu@x200m
This new option terminates the other sessions connected to the target
database and then drop it. To terminate other sessions, the current user
must have desired permissions (same as pg_terminate_backend()). We don't
allow to terminate the sessions if prepared transactions, active logical
replication slots or subscriptions are present in the target database.
Author: Pavel Stehule with changes by me
Reviewed-by: Dilip Kumar, Vignesh C, Ibrar Ahmed, Anthony Nowocien,
Ryan Lambert and Amit Kapila
Discussion: https://postgr.es/m/CAP_rwwmLJJbn70vLOZFpxGw3XD7nLB_7+NKz46H5EOO2k5H7OQ@mail.gmail.com
SET CONSTRAINTS ... DEFERRED failed on partitioned tables, because of a
sanity check that ensures that the affected constraints have triggers.
On partitioned tables, the triggers are in the leaf partitions, not in
the partitioned relations themselves, so the sanity check fails.
Removing the sanity check solves the problem, because the code needed to
support the case is already there.
Backpatch to 11.
Note: deferred unique constraints are not affected by this bug, because
they do have triggers in the parent partitioned table. I did not add a
test for this scenario.
Discussion: https://postgr.es/m/20191105212915.GA11324@alvherre.pgsql
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.
By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.
Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
get_relkind_objtype, and hence get_object_type, failed when applied to a
toast table. This is not a good thing, because it prevents reporting of
perfectly legitimate permissions errors. (At present, these functions
are in fact *only* used to determine the ObjectType argument for
acl_error() calls.) It seems best to have them fall back to returning
OBJECT_TABLE in every case where they can't determine an object type
for a pg_class entry, so do that.
In passing, make some edits to alter.c to make it more obvious that
those calls of get_object_type() are used only for error reporting.
This might save a few cycles in the non-error code path, too.
Back-patch to v11 where this issue originated.
John Hsu, Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/C652D3DF-2B0C-4128-9420-FB5379F6B1E4@amazon.com
When using CREATE TABLE for a new partition, the partitioned indexes of
the parent are created automatically in a fashion similar to LIKE
INDEXES. The new partition and its parent use a mapping for attribute
numbers for this operation, and while the mapping was correctly built,
its length was defined as the number of attributes of the newly-created
child, and not the parent. If the parent includes dropped columns, this
could cause failures.
This is wrong since 8b08f7d which has introduced the concept of
partitioned indexes, so backpatch down to 11.
Reported-by: Wyatt Alt
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAGem3qCcRmhbs4jYMkenYNfP2kEusDXvTfw-q+eOhM0zTceG-g@mail.gmail.com
Backpatch-through: 11
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases. So instead of
PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_CATCH();
{
cleanup();
PG_RE_THROW();
}
PG_END_TRY();
cleanup();
one can write
PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_FINALLY();
{
cleanup();
}
PG_END_TRY();
Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
Phases 2 (building the new index) and 3 (validating the new index)
checked for interrupts outside a transaction context, having as
consequence to not release session-level locks taken on the parent
relation and the old and new indexes processed. This could for example
be triggered with statement_timeout and a bad timing, and would issue
confusing error messages when shutting down the session still holding
the locks (note that an assertion failure would be triggered first), on
top of more issues with concurrent sessions trying to take a lock that
would interfere with the SHARE UPDATE EXCLUSIVE locks hold here.
This moves all the interruption checks inside a transaction context.
Note that I have manually tested all interruptions to make sure that
invalid indexes can be cleaned up properly. Partition indexes still
have issues on their own with some missing dependency handling, which
will be dealt with in a follow-up patch.
Reported-by: Justin Pryzby
Author: Michael Paquier
Discussion: https://postgr.es/m/20191013025145.GC4475@telsasoft.com
Backpatch-through: 12
In the first transaction run for REINDEX CONCURRENTLY, a thinko in the
existing logic caused two session locks to be taken on the old index,
causing the session lock on the newly-created index to be missed. This
made possible concurrent DDL commands (like ALTER INDEX) on the new
index while REINDEX CONCURRENTLY was processing from the point where the
first internal transaction committed.
This issue has been discovered while digging into another bug.
Author: Michael Paquier
Discussion: https://postgr.es/m/20191021074323.GB1869@paquier.xyz
Backpatch-through: 12
Since pluggable storage has been introduced, those two routines have
been replaced by table_open/close, with some compatibility macros still
present to allow extensions to compile correctly with v12.
Some code paths using the old routines still remained, so replace them.
Based on the discussion done, the consensus reached is that it is better
to remove those compatibility macros so as nothing new uses the old
routines, so remove also the compatibility macros.
Discussion: https://postgr.es/m/20191017014706.GF5605@paquier.xyz
Apparently while this code was being developed,
ReindexRelationConcurrently operated on multiple relations. The version
that was ultimately pushed doesn't, so this comment's use of plural is
inaccurate.
Commits 801c2dc7 and 801c2dc7 made it possible for vacuum to
try to freeze a multixact that is still running. That was
prevented by a check, but raised an error. Repair.
Back-patch all the way.
Author: Nathan Bossart, Jeremy Schneider
Reported-by: Jeremy Schneider
Reviewed-by: Jim Nasby, Thomas Munro
Discussion: https://postgr.es/m/DAFB8AFF-2F05-4E33-AD7F-FF8B0F760C17%40amazon.com
A race condition can make us try to dereference a NULL pointer to the
PGPROC struct of a process that's already finished. That results in
crashes during REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.
This was introduced in ab0dfc961b, so backpatch to pg12.
Reported by: Justin Pryzby
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/20191012004446.GT10470@telsasoft.com
When dropping a column on a partitioned table which has one or more
partitioned indexes, the operation was failing as dependencies with
partitioned indexes using the column dropped were not getting removed in
a way consistent with the columns involved across all the relations part
of an inheritance tree.
This commit refactors the code executing column drop so as all the
columns from an inheritance tree to remove are gathered first, and
dropped all at the end. This way, we let the dependency machinery sort
out by itself the deletion of all the columns with the partitioned
indexes across a partition tree.
This issue has been introduced by 1d92a0c, so backpatch down to
REL_12_STABLE.
Author: Amit Langote, Michael Paquier
Reviewed-by: Álvaro Herrera, Ashutosh Sharma
Discussion: https://postgr.es/m/CA+HiwqE9kuBsZ3b5pob2-cvE8ofzPWs-og+g8bKKGnu6b4-yTQ@mail.gmail.com
Backpatch-through: 12
In c2fe139c20 I made ATRewriteTable() use tuple slots. Unfortunately
I did not notice that columns can be added in a rewrite that do not
have a default, when another column is added/altered requiring one.
Initialize columns to NULL again, and add tests.
Bug: #16038
Reported-By: anonymous
Author: Andres Freund
Discussion: https://postgr.es/m/16038-5c974541f2bf6749@postgresql.org
Backpatch: 12, where the bug was introduced in c2fe139c20
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ
chain the former needs to run the result tuple through the junkfilter
again, and update the slot containing the new version of the tuple to
contain that new version. The input tuple may already be in the
junkfilter's output slot, which used to be OK - we don't need the
previous version anymore. Unfortunately ff11e7f4b9 started to use
ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support
copying a slot into itself, leading to a slot in a corrupt
state, which then can cause crashes or other symptoms.
Fix this by skipping the ExecCopySlot() when copying into itself.
While we could have easily made ExecCopySlot() handle that case, it
seems better to add an assert forbidding doing so instead. As the goal
of copying might be to make the contents of one slot independent from
another, it seems failure prone to handle doing so silently.
A follow-up commit will add tests for the obviously under-covered
combination of EPQ and triggers. Done as a separate commit as it might
make sense to backpatch them further than this bug.
Also remove confusion with confusing variable names for slots in
ExecBRDeleteTriggers() and ExecBRUpdateTriggers().
Bug: #16036
Reported-By: Антон Власов
Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 12-, where ff11e7f4b9 was merged
All our current in core relation options of type string (not many,
admittedly) behave in reality like enums. But after seeing an
implementation for enum reloptions, it's clear that strings are messier,
so introduce the new reloption type. Switch all string options to be
enums instead.
Fortunately we have a recently introduced test module for reloptions, so
we don't lose coverage of string reloptions, which may still be used by
third-party modules.
Authors: Nikolay Shaplov, Álvaro Herrera
Reviewed-by: Nikita Glukhov, Aleksandr Parfenov
Discussion: https://postgr.es/m/43332102.S2V5pIjXRx@x200m
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
The progress state was being clobbered once the first index completed
being rebuilt, causing the final phases of the operation not show
anything in the progress view. This was inadvertently broken in
03f9e5cba0, which added progress tracking for REINDEX.
(The reason this bugfix is this small is that I had already noticed this
problem when writing monitoring for CREATE INDEX, and had already worked
around it, as can be seen in discussion starting at
https://postgr.es/m/20190329150218.GA25010@alvherre.pgsql Fixing the
problem is just a matter of fixing one place touched by the REINDEX
monitoring.)
Reported by: Álvaro Herrera
Author: Álvaro Herrera
Discussion: https://postgr.es/m/20190801184333.GA21369@alvherre.pgsql
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.
That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.
So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).
ALTER STATISTICS stat_name SET STATISTICS target_value;
When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.
Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams. But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.
As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.
As a third issue, the test coverage for EPQ was clearly insufficient.
Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.
To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.
To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).
As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.
Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.
Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.comhttps://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
Commit 6f6b99d13 stuck an INFO message into the fast path for
checking partition constraints, for no very good reason except
that it made it easy for the regression tests to verify that
that path was taken. Assorted later patches did likewise,
increasing the unsuppressable-chatter level from ALTER TABLE
even more. This isn't good for the user experience, so let's
drop these messages down to DEBUG1 where they belong. So as
not to have a loss of test coverage, create a TAP test that
runs the relevant queries with client_min_messages = DEBUG1
and greps for the expected messages.
This testing method is a bit brute-force --- in particular,
it duplicates the execution of a fair amount of the core
create_table and alter_table tests. We experimented with
other solutions, but running any significant amount of
standard testing with client_min_messages = DEBUG1 seems
to have a lot of output-stability pitfalls, cf commits
bbb96c370 and 5655565c0. Possibly at some point we'll look
into whether we can reduce the amount of test duplication.
Backpatch into v12, because some of these messages are new
in v12 and we don't really want to ship it that way.
Sergei Kornilov
Discussion: https://postgr.es/m/81911511895540@web58j.yandex.ru
Discussion: https://postgr.es/m/4859321552643736@myt5-02b80404fd9e.qloud-c.yandex.net
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.
toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.
heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.
detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap. At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.
Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
The message was included as a parameter when this function was added in
dcb2bda9b7, but I don't think it has ever served any useful purpose.
Let's stop spreading it pointlessly.
Reviewed by Amit Langote and Peter Eisentraut.
Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
If a table inherits from multiple unrelated parents, we must disallow
changing the type of a column inherited from multiple such parents, else
it would be out of step with the other parents. However, it's possible
for the column to ultimately be inherited from just one common ancestor,
in which case a change starting from that ancestor should still be
allowed. (I would not be excited about preserving that option, were
it not that we have regression test cases exercising it already ...)
It's slightly annoying that this patch looks different from the logic
with the same end goal in renameatt(), and more annoying that it
requires an extra syscache lookup to make the test. However, the
recursion logic is quite different in the two functions, and a
back-patched bug fix is no place to be trying to unify them.
Per report from Manuel Rigger. Back-patch to 9.5. The bug exists in
9.4 too (and doubtless much further back); but the way the recursion
is done in 9.4 is a good bit different, so that substantial refactoring
would be needed to fix it in 9.4. I'm disinclined to do that, or risk
introducing new bugs, for a bug that has escaped notice for this long.
Discussion: https://postgr.es/m/CA+u7OA4qogDv9rz1HAb-ADxttXYPqQdUdPY_yd4kCzywNxRQXA@mail.gmail.com
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
This failed with either "tuple already updated by self" or "duplicate
key value violates unique constraint", depending on whether the table
had previously been analyzed or not. The reason is that ANALYZE tried
to insert or update the same pg_statistic rows twice, and there was no
CommandCounterIncrement between. So add one. The same case works fine
outside a transaction block, because then there's a whole transaction
boundary between, as a consequence of the way VACUUM works.
This issue has been latent all along, but the problem was unreachable
before commit 11d8d72c2 added the ability to specify multiple tables
in ANALYZE. We could, perhaps, alternatively fix it by adding code to
de-duplicate the list of VacuumRelations --- but that would add a
lot of overhead to work around dumb commands, so it's not attractive.
Per bug #15946 from Yaroslav Schekin. Back-patch to v11.
(Note: in v11 I also back-patched the test added by commit 23224563d;
otherwise the problem doesn't manifest in the test I added, because
"vactst" is empty when the tests for multiple ANALYZE targets are
reached. That seems like not a very good thing anyway, so I did this
rather than rethinking the choice of test case.)
Discussion: https://postgr.es/m/15946-5c7570a2884a26cf@postgresql.org
These were introduced by pgindent due to fixe to broken
indentation (c.f. 8255c7a5ee). Previously the mis-indentation of
function prototypes was creatively used to reduce indentation in a few
places.
As that formatting only exists in master and REL_12_STABLE, it seems
better to fix it in both, rather than having some odd indentation in
v12 that somebody might copy for future patches or such.
Author: Andres Freund
Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de
Backpatch: 12-
When performing ANALYZE on inheritance trees, we collect two samples for
each relation - one for the relation alone, and one for the inheritance
subtree (relation and its child relations). And then we build statistics
on each sample, so for each relation we get two sets of statistics.
For regular (per-column) statistics this works fine, because the catalog
includes a flag differentiating statistics built from those two samples.
But we don't have such flag in the extended statistics catalogs, and we
ended up updating the same row twice, triggering this error:
ERROR: tuple already updated by self
The simplest solution is to disable extended statistics on inheritance
trees, which is what this commit is doing. In the future we may need to
do something similar to per-column statistics, but that requires adding a
flag to the catalog - and that's not backpatchable. Moreover, the current
selectivity estimation code only works with individual relations, so
building statistics on inheritance trees would be pointless anyway.
Author: Tomas Vondra
Backpatch-to: 10-
Discussion: https://postgr.es/m/20190618231233.GA27470@telsasoft.com
Reported-by: Justin Pryzby
When copying the definition of an index rebuilt concurrently for the new
entry, the index information was taken directly from the old index using
the relation cache. In this case, predicates and expressions have
some post-processing to prepare things for the planner, which loses some
information including the collations added in any of them.
This inconsistency can cause issues when attempting for example a table
rewrite, and makes the new indexes rebuilt concurrently inconsistent
with the old entries.
In order to fix the problem, fetch expressions and predicates directly
from the catalog of the old entry, and fill in IndexInfo for the new
index with that. This makes the process more consistent with
DefineIndex(), and the code is refactored with the addition of a routine
to create an IndexInfo node.
Reported-by: Manuel Rigger
Author: Michael Paquier
Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com
Backpatch-through: 12
If the user creates a deferred constraint in a partition, and in a
transaction they cause the constraint's trigger execution to be deferred
until commit time *and* drop the constraint, then when commit time comes
the queued trigger will fail to run because the trigger object will have
been dropped.
This is explained because when a constraint gets dropped in a
partitioned table, the recursion to drop the ones in partitions is done
by the dependency mechanism, not by ALTER TABLE traversing the recursion
tree as in all other cases. In the non-partitioned case, this problem
is avoided by checking that the table is not "in use" by alter-table;
other alter-table subcommands that recurse to partitions do that check
for each partition. But the dependency mechanism doesn't have a way to
do that. Fix the problem by applying the same check to all partitions
during ALTER TABLE's "prep" phase, which correctly raises the necessary
error.
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6nZiO9-eEpr1ZD84bT1mBoVmeZkfont8iSpcmYrjhGWgA@mail.gmail.com
The logic in ATExecDropColumn that rejects dropping partition key
columns is quite an inadequate defense, because it doesn't execute
in cases where a column needs to be dropped due to cascade from
something that only the column, not the whole partitioned table,
depends on. That leaves us with a badly broken partitioned table;
even an attempt to load its relcache entry will fail.
We really need to have explicit pg_depend entries that show that the
column can't be dropped without dropping the whole table. Hence,
add those entries. In v12 and HEAD, bump catversion to ensure that
partitioned tables will have such entries. We can't do that in
released branches of course, so in v10 and v11 this patch affords
protection only to partitioned tables created after the patch is
installed. Given the lack of field complaints (this bug was found
by fuzz-testing not by end users), that's probably good enough.
In passing, fix ATExecDropColumn and ATPrepAlterColumnType
messages to be more specific about which partition key column
they're complaining about.
Per report from Manuel Rigger. Back-patch to v10 where partitioned
tables were added.
Discussion: https://postgr.es/m/CA+u7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg@mail.gmail.com
Discussion: https://postgr.es/m/31920.1562526703@sss.pgh.pa.us
Some code could get confused when certain catalog state involving both
identity and serial sequences was present, perhaps during an attempt
to upgrade the latter to the former. Specifically, dropping the
default of a serial column maintains the ownership of the sequence by
the column, and so it would then be possible to afterwards make the
column an identity column that would now own two sequences. This
causes the code that looks up the identity sequence to error out,
making the new identity column inoperable until the ownership of the
previous sequence is released.
To fix this, make the identity sequence lookup only consider sequences
with the appropriate dependency type for an identity sequence, so it
only ever finds one (unless something else is broken). In the above
example, the old serial sequence would then be ignored. Reorganize
the various owned-sequence-lookup functions a bit to make this
clearer.
Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/470c54fc8590be4de0f41b0d295fd6390d5e8a6c.camel@cybertec.at
The current extended statistics code was a bit confused which collation
to use. When building the statistics, the collations defined as default
for the data types were used (since commit 5e0928005). The MCV code was
however using the column collations for MCV serialization, and then
DEFAULT_COLLATION_OID when computing estimates. So overall the code was
using all three possible options, inconsistently.
This uses the column colation everywhere - this makes it consistent with
what 5e0928005 did for regular stats. We however do not track the
collations in a catalog, because we can derive them from column-level
information. This may need to change in the future, e.g. after allowing
statistics on expressions.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12