This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)
The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.
Michael Paquier, with plenty of kibitzing by me.
A few places assumed they could pass NULL for the argtypes array when
looking up functions known to have zero arguments. At first glance
it seems that this should be safe enough, since memcmp() is surely not
allowed to fetch any bytes if its count argument is zero. However,
close reading of the C standard says that such calls have undefined
behavior, so we'd probably best avoid it.
Since the number of places doing this is quite small, and some other
places looking up zero-argument functions were already passing dummy
arrays, let's standardize on the latter solution rather than hacking
the function lookup code to avoid calling memcmp() in these cases.
I also added Asserts to catch any future violations of the new rule.
Given the utter lack of any evidence that this actually causes any
problems in the field, I don't feel a need to back-patch this change.
Per report from Piotr Stefaniak, though this is not his patch.
Commit b488c580ae, which added the DDL command collection feature,
neglected to update the code that commit cac7658205 had previously
added two weeks earlier for the TRANSFORM feature.
Reported by Michael Paquier.
Don't apply rmtree(), which will gleefully remove an entire subtree,
and don't even apply unlink() unless it's symlink or a directory,
the only things that we expect to find.
Amit Kapila, with minor tweaks by me, per extensive discussions
involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas,
at least some of whom also reviewed the code.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.
KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
We already tried to improve this once, but the "improved" text was rather
off-target if you had provided a USING clause. Also, it seems helpful
to provide the exact text of a suggested USING clause, so users can just
copy-and-paste it when needed. Per complaint from Keith Rarick and a
suggestion from Merlin Moncure.
Back-patch to 9.2 where the current wording was adopted.
Seems to have been an oversight in the original leakproofness patch.
Per report and patch from Jeevan Chalke.
In passing, prettify some awkward leakproof-related code in AlterFunction.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.
Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.
Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
The plain C string language name needs to be wrapped in makeString() so
that the parse tree is copyable. This is detectable by
-DCOPY_PARSE_PLAN_TREES. Add a test case for the COMMENT case.
Also make the quoting in the error messages more consistent.
discovered by Tom Lane
These were "text", but that's a bad idea because it has collation-dependent
ordering. No index in template0 should have collation-dependent ordering,
especially not indexes on shared catalogs. There was general agreement
that provider names don't need to be longer than other identifiers, so we
can fix this at a small waste of table space by changing from text to name.
There's no way to fix the problem in the back branches, but we can hope
that security labels don't yet have widespread-enough usage to make it
urgent to fix.
There needs to be a regression sanity test to prevent us from making this
same mistake again; but before putting that in, we'll need to get rid of
similar brain fade in the recently-added pg_replication_origin catalog.
Note: for lack of a suitable testing environment, I've not really exercised
this change. I trust the buildfarm will show up any mistakes.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.
Petr Jelinek
Reviewed by Michael Paquier and Simon Riggs
When this option is specified, a progress report is printed as each index
is reindexed.
Per discussion, we agreed on the following syntax for the extensibility of
the options.
REINDEX (flexible options) { INDEX | ... } name
Sawada Masahiko.
Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi,
Jim Nasby and me.
Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.
The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object. This is sufficient for many
uses of this feature. For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code. The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.
There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension. The intention is that we will add some deparsing
code in a later release, as an in-core extension.
A new test module is included. It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.
Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.
Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.
Based on original work by Dimitri Fontaine, though I didn't use his
code.
Discussion:
https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.frhttps://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.orghttps://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
INSERT acquires RowExclusiveLock during normal operation and therefore
it makes sense to allow LOCK TABLE .. ROW EXCLUSIVE MODE to be executed
by users who have INSERT rights on a table (even if they don't have
UPDATE or DELETE).
Not back-patching this as it's a behavior change which, strictly
speaking, loosens security restrictions.
Per discussion with Tom and Robert (circa 2013).
If a row that potentially violates a deferred exclusion constraint is
HOT-updated later in the same transaction, the exclusion constraint would
be reported as violated when the check finally occurs, even if the row(s)
the new row originally conflicted with have since been removed. This
happened because the wrong TID was passed to check_exclusion_constraint(),
causing the live HOT-updated row to be seen as a conflicting row rather
than recognized as the row-under-test.
Per bug #13148 from Evan Martin. It's been broken since exclusion
constraints were invented, so back-patch to all supported branches.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.
To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases. This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.
This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off. We'll need to fix
that via a separate patch.
Thomas Munro and Robert Haas
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held. As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.
The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.
Bumps catversion as stored rules change.
Author: Peter Geoghegan
Reviewed-By: Andres Freund
When altering the deferredness state of a foreign key constraint, we
correctly updated the catalogs and then invalidated the relcache state for
the target relation ... but that's not the only relation with relevant
triggers. Must invalidate the other table as well, or the state change
fails to take effect promptly for operations triggered on the other table.
Per bug #13224 from Christian Ullrich.
In passing, reorganize regression test case for this feature so that it
isn't randomly injected into the middle of an unrelated test sequence.
Oversight in commit f177cbfe67. Back-patch
to 9.4 where the faulty code was added.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.
Custom scan providers can use this in a similar way. Previously,
it was only possible for a custom scan provider to scan a single
relation. Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.
KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
This is necessary in view of the changes to allow foreign tables to be
full members of inheritance hierarchies, but I (tgl) unaccountably missed
it in commit cb1ca4d800.
Noted by Amit Langote, patch by Etsuro Fujita
This provides a mechanism for specifying conversions between SQL data
types and procedural languages. As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.
reviews by Pavel Stěhule and Andres Freund
pg_dump has historically assumed that default_with_oids affects only plain
tables and not other relkinds. Conceivably we could make it apply to some
newly invented relkind if we did so from the get-go, but changing the
behavior for existing object types will break existing dump scripts.
Add code comments warning about this interaction.
Also, make sure that default_with_oids doesn't cause parse_utilcmd.c to
think that CREATE FOREIGN TABLE will create an OID column. I think this is
only a latent bug right now, since we don't allow UNIQUE/PKEY constraints
in CREATE FOREIGN TABLE, but it's better to be consistent and future-proof.
Commit a2e35b53c3 neglected to update the type OID to use further
down in DefineType when TypeShellMake was changed to return
ObjectAddress instead of OID (it got it right in DefineRange, however.)
This resulted in an internal error message being issued when looking up
I/O functions.
Author: Michael Paquier
Also add Asserts() to a couple of other places to ensure that the type
OID being used is as expected.
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished. Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.
The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.
The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
When certain event-trigger-only functions are called when not in the
wrong context, they were reporting the "feature not supported" SQLSTATE,
which is somewhat misleading. Create a new custom error code for such
uses instead.
Not backpatched since it may be seen as an undesirable behavioral
change.
Author: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAB7nPqQ-5NAkHQHh_NOm7FPep37NCiLKwPoJ2Yxb8TDoGgbYYA@mail.gmail.com
Commit a2e35b53 should have removed the variable declaration in the
inner block, but didn't. As a result, the returned address might end up
not being what was intended.
It now also reports temporary objects dropped that are local to the
backend. Previously we weren't reporting any temp objects because it
was deemed unnecessary; but as it turns out, it is necessary if we want
to keep close track of DDL command execution inside one session. Temp
objects are reported as living in schema pg_temp, which works because
such a schema-qualification always refers to the temp objects of the
current session.
Reduce lock levels to ShareRowExclusive for the following SQL
CREATE TRIGGER (but not DROP or ALTER)
ALTER TABLE ENABLE TRIGGER
ALTER TABLE DISABLE TRIGGER
ALTER TABLE … ADD CONSTRAINT FOREIGN KEY
Original work by Simon Riggs, extracted and refreshed by Andreas Karlsson
New test cases added by Andreas Karlsson
Reviewed by Noah Misch, Andres Freund, Michael Paquier and Simon Riggs
This lets later stages have access to the transformed expression; in
particular it allows DDL-deparsing code during event triggers to pass
the transformed expression to ruleutils.c, so that the complete command
can be deparsed.
This shuffles the timing of the transform calls a bit: previously,
nothing was transformed during parse analysis, and only the
RELKIND_RELATION case was being handled during execution. After this
patch, all expressions are transformed during parse analysis (including
those for relkinds other than RELATION), and the error for other
relation kinds is thrown only during execution. So we do more work than
before to reject some bogus cases. That seems acceptable.
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.
Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
ReindexIndex() trusts a parser-built RangeVar with the persistence to
use for the new copy of the index; but the parser naturally does not
know what's the persistence of the original index. To find out the
correct persistence, grab it from relcache.
This bug was introduced by commit 85b506bbfc, and therefore no
backpatch is necessary.
Bug reported by Thom Brown, analysis and patch by Michael Paquier; test
case provided by Fabrízio de Royes Mello.
Slow functions in index expressions might cause this loop to take long
enough to make it worth being cancellable. Probably it would be enough
to call CHECK_FOR_INTERRUPTS here, but for consistency with other
per-sample-row loops in this file, let's use vacuum_delay_point.
Report and patch by Jeff Janes. Back-patch to all supported branches.
Since commit a2e35b53c3, most CREATE and ALTER commands return the
ObjectAddress of the affected object. This is useful for event triggers
to try to figure out exactly what happened. This patch extends this
idea a bit further to cover ALTER TABLE as well: an auxiliary
ObjectAddress is returned for each of several subcommands of ALTER
TABLE. This makes it possible to decode with precision what happened
during execution of any ALTER TABLE command; for instance, which
constraint was added by ALTER TABLE ADD CONSTRAINT, or which parent got
dropped from the parents list by ALTER TABLE NO INHERIT.
As with the previous patch, there is no immediate user-visible change
here.
This is all really just continuing what c504513f83 started.
Reviewed by Stephen Frost.
Foreign tables can now be inheritance children, or parents. Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior. Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it. And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.
An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:
Update on pt1 (cost=0.00..321.05 rows=3541 width=46)
Update on pt1
Foreign Update on ft1
Foreign Update on ft2
Update on child3
-> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46)
-> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46)
-> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46)
-> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46)
This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me