Commit 325f2ec555 introduced pg_class.relwrite to skip operations on
tables created as part of a heap rewrite during DDL. It links such
transient heaps to the original relation OID via this new field in
pg_class but forgot to do anything about toast tables. So, logical
decoding was not able to skip operations on internally created toast
tables. This leads to an error when we tried to decode the WAL for the
next operation for which it appeared that there is a toast data where
actually it didn't have any toast data.
To fix this, we set pg_class.relwrite for internally created toast tables
as well which allowed skipping operations on them during logical decoding.
Author: Bertrand Drouvot
Reviewed-by: David Zhang, Amit Kapila
Backpatch-through: 11, where it was introduced
Discussion: https://postgr.es/m/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
Commit 8ff1c94649 allowed TRUNCATE command to truncate foreign tables.
Previously the information about "ONLY" options specified in TRUNCATE
command were passed to the foreign data wrapper. Then postgres_fdw
constructed the TRUNCATE command to issue the remote server and
included "ONLY" options in it based on the passed information.
On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE
have no effect when accessing or modifying the remote table, i.e.,
are not passed to the foreign data wrapper. So it's inconsistent to
make only TRUNCATE command pass the "ONLY" options to the foreign data
wrapper. Therefore this commit changes the TRUNCATE command so that
it doesn't pass the "ONLY" options to the foreign data wrapper,
for the consistency with other statements. Also this commit changes
postgres_fdw so that it always doesn't include "ONLY" options in
the TRUNCATE command that it constructs.
Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu
Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.
The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.
For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.
Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.
A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).
This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections. This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.
Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
We've had numerous bug reports about how (1) IF NOT EXISTS clauses in
ALTER TABLE don't behave as-expected, and (2) combining certain actions
into one ALTER TABLE doesn't work, though executing the same actions as
separate statements does. This patch cleans up all of the cases so far
reported from the field, though there are still some oddities associated
with identity columns.
The core problem behind all of these bugs is that we do parse analysis
of ALTER TABLE subcommands too soon, before starting execution of the
statement. The root of the bugs in group (1) is that parse analysis
schedules derived commands (such as a CREATE SEQUENCE for a serial
column) before it's known whether the IF NOT EXISTS clause should cause
a subcommand to be skipped. The root of the bugs in group (2) is that
earlier subcommands may change the catalog state that later subcommands
need to be parsed against.
Hence, postpone parse analysis of ALTER TABLE's subcommands, and do
that one subcommand at a time, during "phase 2" of ALTER TABLE which
is the phase that does catalog rewrites. Thus the catalog effects
of earlier subcommands are already visible when we analyze later ones.
(The sole exception is that we do parse analysis for ALTER COLUMN TYPE
subcommands during phase 1, so that their USING expressions can be
parsed against the table's original state, which is what we need.
Arguably, these bugs stem from falsely concluding that because ALTER
COLUMN TYPE must do early parse analysis, every other command subtype
can too.)
This means that ALTER TABLE itself must deal with execution of any
non-ALTER-TABLE derived statements that are generated by parse analysis.
Add a suitable entry point to utility.c to accept those recursive
calls, and create a struct to pass through the information needed by
the recursive call, rather than making the argument lists of
AlterTable() and friends even longer.
Getting this to work correctly required a little bit of fiddling
with the subcommand pass structure, in particular breaking up
AT_PASS_ADD_CONSTR into multiple passes. But otherwise it's mostly
a pretty straightforward application of the above ideas.
Fixing the residual issues for identity columns requires refactoring of
where the dependency link from an identity column to its sequence gets
set up. So that seems like suitable material for a separate patch,
especially since this one is pretty big already.
Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys. Now it is possible to do that.
Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
Change lock level for renaming index (either ALTER INDEX or implicitly
via some other commands) from AccessExclusiveLock to
ShareUpdateExclusiveLock.
One reason we need a strong lock for relation renaming is that the
name change causes a rebuild of the relcache entry. Concurrent
sessions that have the relation open might not be able to handle the
relcache entry changing underneath them. Therefore, we need to lock
the relation in a way that no one can have the relation open
concurrently. But for indexes, the relcache handles reloads specially
in RelationReloadIndexInfo() in a way that keeps changes in the
relcache entry to a minimum. As long as no one keeps pointers to
rd_amcache and rd_options around across possible relcache flushes,
which is the case, this ought to be safe.
We also want to use a self-exclusive lock for correctness, so that
concurrent DDL doesn't overwrite the rename if they start updating
while still seeing the old version. Therefore, we use
ShareUpdateExclusiveLock, which is already used by other DDL commands
that want to operate in a concurrent manner.
The reason this is interesting at all is that renaming an index is a
typical part of a concurrent reindexing workflow (CREATE INDEX
CONCURRENTLY new + DROP INDEX CONCURRENTLY old + rename back). And
indeed a future built-in REINDEX CONCURRENTLY might rely on the ability
to do concurrent renames as well.
Reviewed-by: Andrey Klychkov <aaklychkov@mail.ru>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1531767486.432607658@f357.i.mail.ru
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
As 'ALTER TABLESPACE .. MOVE ALL' really didn't change the tablespace
but instead changed objects inside tablespaces, it made sense to
rework the syntax and supporting functions to operate under the
'ALTER (TABLE|INDEX|MATERIALIZED VIEW)' syntax and to be in
tablecmds.c.
Pointed out by Alvaro, who also suggested the new syntax.
Back-patch to 9.4.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
Revert the matview-related changes in explain.c's API, as per recent
complaint from Robert Haas. The reason for these appears to have been
principally some ill-considered choices around having intorel_startup do
what ought to be parse-time checking, plus a poor arrangement for passing
it the view parsetree it needs to store into pg_rewrite when creating a
materialized view. Do the latter by having parse analysis stick a copy
into the IntoClause, instead of doing it at runtime. (On the whole,
I seriously question the choice to represent CREATE MATERIALIZED VIEW as a
variant of SELECT INTO/CREATE TABLE AS, because that means injecting even
more complexity into what was already a horrid legacy kluge. However,
I didn't go so far as to rethink that choice ... yet.)
I also moved several error checks into matview parse analysis, and
made the check for external Params in a matview more accurate.
In passing, clean things up a bit more around interpretOidsOption(),
and fix things so that we can use that to force no-oids for views,
sequences, etc, thereby eliminating the need to cons up "oids = false"
options when creating them.
catversion bump due to change in IntoClause. (I wonder though if we
really need readfuncs/outfuncs support for IntoClause anymore.)
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
Extracted from a larger patch by Dimitri Fontaine. It is hoped that
this will provide infrastructure for enriching the new event trigger
functionality, but it seems possibly useful for other purposes as
well.
In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database. This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.
To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.
There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail. Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.
Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane
If a CHECK constraint or index definition contained a whole-row Var (that
is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or
table inheritance produced incorrect results: the copied Var still claimed
to have the rowtype of the source table, rather than the created table.
For the LIKE case, it seems reasonable to just throw error for this
situation, since the point of LIKE is that the new table is not permanently
coupled to the old, so there's no reason to assume its rowtype will stay
compatible. In the inheritance case, we should ideally allow such
constraints, but doing so will require nontrivial refactoring of CREATE
TABLE processing (because we'd need to know the OID of the new table's
rowtype before we adjust inherited CHECK constraints). In view of the lack
of previous complaints, that doesn't seem worth the risk in a back-patched
bug fix, so just make it throw error for the inheritance case as well.
Along the way, replace change_varattnos_of_a_node() with a more robust
function map_variable_attnos(), which is capable of being extended to
handle insertion of ConvertRowtypeExpr whenever we get around to fixing
the inheritance case nicely, and in the meantime it returns a failure
indication to the caller so that a helpful message with some context can be
thrown. Also, this code will do the right thing with subselects (if we
ever allow them in CHECK or indexes), and it range-checks varattnos before
using them to index into the map array.
Per report from Sergey Konoplev. Back-patch to all supported branches.
ALTER TABLE (and ALTER VIEW, ALTER SEQUENCE, etc.) now use a
RangeVarGetRelid callback to check permissions before acquiring a table
lock. We also now use the same callback for all forms of ALTER TABLE,
rather than having separate, almost-identical callbacks for ALTER TABLE
.. SET SCHEMA and ALTER TABLE .. RENAME, and no callback at all for
everything else.
I went ahead and changed the code so that no form of ALTER TABLE works
on foreign tables; you must use ALTER FOREIGN TABLE instead. In 9.1,
it was possible to use ALTER TABLE .. SET SCHEMA or ALTER TABLE ..
RENAME on a foreign table, but not any other form of ALTER TABLE, which
did not seem terribly useful or consistent.
Patch by me; review by Noah Misch.
In the previous coding, a user could queue up for an AccessExclusiveLock
on a table they did not have permission to cluster, thus potentially
interfering with access by authorized users who got stuck waiting behind
the AccessExclusiveLock. This approach avoids that. cluster() has the
same permissions-checking requirements as REINDEX TABLE, so this commit
moves the now-shared callback to tablecmds.c and renames it, per
discussion with Noah Misch.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation. Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter. All this is now fixed.
Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.
Patch by me; review by Noah Misch.
In the past, relhassubclass always remained true if a relation had ever had
child relations, even if the last subclass was long gone. While this had
only marginal performance implications in most cases, it was annoying, and
I'm now considering some planner changes that would raise the cost of a
false positive. It was previously impractical to fix this because of race
condition concerns. However, given the recent change that made tablecmds.c
take ShareExclusiveLock on relations that are gaining a child (commit
fbcf4b92aa), we can now allow ANALYZE to
clear the flag when it's no longer relevant. There is no additional
locking cost to do so, since ANALYZE takes ShareExclusiveLock anyway.
This syntax allows a standalone table to be made into a typed table,
or a typed table to be made standalone. This is possibly a mildly
useful feature in its own right, but the real motivation for this
change is that we need it to make pg_upgrade work with typed tables.
This doesn't actually fix that problem, but it's necessary
infrastructure.
Noah Misch
Per discussion with Noah Misch, the previous coding, introduced by
my commit 65377e0b9c on 2011-02-06,
was really an abuse of RELKIND_COMPOSITE_TYPE, since the caller in
typecmds.c is actually passing the name of a domain. So go back
having a type name argument, but make the first argument a Relation
rather than just a string so we can tell whether it's a table or
a foreign table and emit the proper error message.
If the foreign table's rowtype is being used as the type of a column in
another table, we can't just up and change its data type. This was
already checked for composite types and ordinary tables, but we
previously failed to enforce it for foreign tables.