This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY. First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations. Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs. The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.
While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs. Some error messages are improved,
with wording suggestion coming from Tom Lane.
Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.
The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed. That's even sillier. With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.
With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID. Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep
IsCatalogRelation as a convenience function.
This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid. The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.
While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.
Also improve the comments in catalog.c.
There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId. I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId. But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.
Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.
Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6af).
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX. However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only. This would lead to lock upgrades later,
leading to possible deadlocks.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.
This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.
Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result. A tiny minority of calls
creatively did it differently. Bring them into line with the rest.
This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.
Back-patch to v11 just to keep the branches in sync. (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)
Per searching to see if anyplace else had made the same error
repaired in 62148c352.
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4). A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.
Per Coverity. Back-patch to v11 where the broken code originated.
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".
1) To pass data to the relation_set_new_filenode()
RelationSetNewRelfilenode() was made to update RelationData.rd_rel
directly. That's not OK however, as it makes the relcache entries
temporarily inconsistent. Which among other scenarios is a problem
if a REINDEX targets an index on pg_class - the
CatalogTupleUpdate() in RelationSetNewRelfilenode(). Presumably
that was introduced because other places in the code do so - while
those aren't "good practice" they don't appear to be actively
buggy (e.g. because system tables may not be targeted).
I (Andres) should have caught this while reviewing and signficantly
evolving the code in that commit, mea culpa.
Fix that by instead passing in the new RelFileNode as separate
argument to relation_set_new_filenode() and rely on the relcache to
update the catalog entry. Also revert that the
RelationMapUpdateMap() call was changed to immediate, and undo some
other more unnecessary changes.
2) Document that the relation_set_new_filenode cannot rely on the
whole relcache entry to be valid. It might be worthwhile to
refactor the code to never have to rely on that, but given the way
heap_create() is currently coded, that'd be a large change.
3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
table AM might not use shared buffers at all. Move to
index_copy_data() and heapam_relation_copy_data().
4) heapam_relation_set_new_filenode() previously sometimes accessed
rel->rd_rel->relpersistence rather than the `persistence`
argument. Code movement mistake.
5) Previously heapam_relation_set_new_filenode() re-opened the smgr
relation to create the init for, if necesary. Instead have
RelationCreateStorage() return the SMgrRelation and use it to
create the init fork.
6) Add a note about the danger of modifying the relcache directly to
ATExecSetTableSpace() - it's currently not a bug because there's a
check ERRORing for catalog tables.
Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.
Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.
In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.
Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.
In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0. Both of these changes still seem like smart
future-proofing.
This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to. However, fixing those problems will take more work which may not
get back-patched into v11. We need a fix for the corruption problem now.
Per bug #15672 from Jianing Yang.
Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.
Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table. One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.
Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.
Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry. This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value. Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).
Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null. It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.
The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations. In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries. (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.) This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.
To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null. (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.) Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null. This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.
To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX. I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command. This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.
Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables. The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL. To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario. This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.
In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.
A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way. That seems fine to me, so
I didn't make any effort to avoid the wording change.
The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either. However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works. On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.
Patch by me, but thanks to Jie Zhang for a preliminary version.
Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true. Fix that by freeing the tuple in that case.
Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.
The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.
Noticed when running the out-of-core zheap AM against the in-core
version of tableam.
Author: Andres Freund
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway). This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.
Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.
More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.
Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
Such calls can confuse the reader as strcmp() uses an integer as result.
The places patched here have been spotted by Thomas Munro, David Rowley
and myself.
Author: Michael Paquier
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20190411021946.GG2728@paquier.xyz
This uses the same infrastructure that the CREATE INDEX progress
reporting uses. Add a column to pg_stat_progress_create_index to
report the OID of the index being worked on. This was not necessary
for CREATE INDEX, but it's useful for REINDEX.
Also edit the phase descriptions a bit to be more consistent with the
source code comments.
Discussion: https://www.postgresql.org/message-id/ef6a6757-c36a-9e81-123f-13b19e36b7d7%402ndquadrant.com
The foreign-key-checking loop in ATRewriteTables failed to ignore
relations without storage (e.g., partitioned tables), unlike the
initial loop. This accidentally worked as long as RI_Initial_Check
succeeded, which it does in most practical cases (including all the
ones exercised in the existing regression tests :-(). However, if
that failed, as for instance when there are permissions issues,
then we entered the slow fire-the-trigger-on-each-tuple path.
And that would try to read from the referencing relation, and fail
if it lacks storage.
A second problem, recently introduced in HEAD, was that this loop
had been broken by sloppy refactoring for the tableam API changes.
Repair both issues, and add a regression test case so we have some
coverage on this code path. Back-patch as needed to v11.
(It looks like this code could do with additional bulletproofing,
but let's get a working test case in place first.)
Hadi Moshayedi, Tom Lane, Andres Freund
Discussion: https://postgr.es/m/CAK=1=WrnNmBbe5D9sm3t0a6dnAq3cdbF1vXY816j1wsMqzC8bw@mail.gmail.com
Discussion: https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
Discussion: https://postgr.es/m/20190325180405.jytoehuzkeozggxx%40alap3.anarazel.de
Previously it was allowed to set default_table_access_method to an
empty string. That makes sense for default_tablespace, where that was
copied from, as it signals falling back to the database's default
tablespace. As there is no equivalent for table AMs, forbid that.
Also make sure to throw a usable error when creating a table using an
index AM, by using get_am_type_oid() to implement get_table_am_oid()
instead of a separate copy. Previously we'd error out only later, in
GetTableAmRoutine().
Thirdly remove GetTableAmRoutineByAmId() - it was only used in an
earlier version of 8586bf7ed8.
Add tests for the above (some for index AMs as well).
This adds table_multi_insert(), and converts COPY FROM, the only user
of heap_multi_insert, to it.
A simple conversion of COPY FROM use slots would have yielded a
slowdown when inserting into a partitioned table for some
workloads. Different partitions might need different slots (both slot
types and their descriptors), and dropping / creating slots when
there's constant partition changes is measurable.
Thus instead revamp the COPY FROM buffering for partitioned tables to
allow to buffer inserts into multiple tables, flushing only when
limits are reached across all partition buffers. By only dropping
slots when there've been inserts into too many different partitions,
the aforementioned overhead is gone. By allowing larger batches, even
when there are frequent partition changes, we actuall speed such cases
up significantly.
By using slots COPY of very narrow rows into unlogged / temporary
might slow down very slightly (due to the indirect function calls).
Author: David Rowley, Andres Freund, Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.dehttps://postgr.es/m/20190327054923.t3epfuewxfqdt22e@alap3.anarazel.de
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default. It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.
Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.
Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.
For now only md.c uses the new interface, but other users are being
proposed. Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.
Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key. This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).
Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values. With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan. Only options affecting planning,
with values different from the built-in default are printed.
This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.
Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys. Now it is possible to do that.
Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
This replaces the previous calls of heap_sync() in places using
bulk-insert. By passing in the flags used for bulk-insert the AM can
decide (first at insert time and then during the finish call) which of
the optimizations apply to it, and what operations are necessary to
finish a bulk insert operation.
Also change HEAP_INSERT_* flags to TABLE_INSERT, and rename hi_options
to ti_options.
These changes are made even in copy.c, which hasn't yet been converted
to tableam. There's no harm in doing so.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
This is a relatively straightforward move of the current
implementation to sit below tableam. As the current analyze sampling
implementation is pretty inherently block based, the tableam analyze
interface is as well. It might make sense to generalize that at some
point, but that seems like a larger project that shouldn't be
undertaken at the same time as the introduction of tableam.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.
This implements one kind of generated column: stored (computed on
write). Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
The previous code was adding pointers to transient variables to a
list, but by the time the list was read, the variable might be gone,
depending on the compiler. Fix it by making copies in the proper
memory context.
This adds the CONCURRENTLY option to the REINDEX command. A REINDEX
CONCURRENTLY on a specific index creates a new index (like CREATE
INDEX CONCURRENTLY), then renames the old index away and the new index
in place and adjusts the dependencies, and then drops the old
index (like DROP INDEX CONCURRENTLY). The REINDEX command also has
the capability to run its other variants (TABLE, DATABASE) with the
CONCURRENTLY option (but not SYSTEM).
The reindexdb command gets the --concurrently option.
Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut
Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov
Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
This moves the responsibility for:
- creating the storage necessary for a relation, including creating a
new relfilenode for a relation with existing storage
- non-transactional truncation of a relation
- VACUUM FULL / CLUSTER's rewrite of a table
below tableam.
This is fairly straight forward, with a bit of complexity smattered in
to move the computation of xid / multixid horizons below the AM, as
they don't make sense for every table AM.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the
index is defined contains more dropped columns than its partition, with
this message:
ERROR: incorrect attribute map
The cause was that one caller of CompareIndexInfo was passing the number
of attributes of the partition rather than the parent, which confused
the length check. Repair.
This can cause pg_upgrade to fail when used on such a database. Leave
some more objects around after regression tests, so that the case is
detected by pg_upgrade test suite.
Remove some spurious empty lines noticed while looking for other cases
of the same problem.
Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql
This uses the same progress reporting infrastructure added in commit
c16dc1aca5e01e6acaadfcf38f5fc964a381dc62 and extends it to these
additional cases. We lack the ability to track the internal progress
of sorts and index builds so the information reported is
coarse-grained for some parts of the operation, but it still seems
like a significant improvement over having nothing at all.
Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael
Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me. A fair
amount of polishing also by me.
Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp
This is essentially the tableam version of heapam_fetch(),
i.e. fetching a tuple identified by a tid, performing visibility
checks.
Note that this different from table_index_fetch_tuple(), which is for
index lookups. It therefore has to handle a tid pointing to an earlier
version of a tuple if the AM uses an optimization like heap's HOT. Add
comments to that end.
This commit removes the stats_relation argument from heap_fetch, as
it's been unused for a long time.
Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
Previously those directly performed a heap_insert(). Use
table_insert() instead. The input slot of those routines is not of
the target relation - we could fix that by copying if necessary, but
that'd not be beneficial for performance. As those codepaths don't
access any AM specific tuple fields (say xmin/xmax), there's no need
to use an AM specific slot.
Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.dehttps://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.dehttps://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com