Correct a misstatement about how things used to work: we did allow nested
SRFs before, as long as no function had more than one set-returning input.
Also, attempt to document the fact that the new implementation changes the
behavior for SRFs within conditional constructs (eg CASE): the conditional
construct no longer gates whether the SRF is run, and thus cannot affect
the number of rows emitted. We might want to change this behavior, but
first it behooves us to see if we can explain it.
Minor other wordsmithing on what I wrote yesterday, too.
Discussion: https://postgr.es/m/20170118214702.54b2mdbxce5piwv5@alap3.anarazel.de
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
This makes it possible to delete multiple keys from a jsonb value by
passing in an array of text values, which makes the operaiton much
faster than individually deleting the keys (which would require copying
the jsonb structure over and over again.
Reviewed by Dmitry Dolgov and Michael Paquier
Move this logic out of initdb into a user-callable function. This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Euler Taveira <euler@timbira.com.br>
Temporary replication slots will be used by default when wal streaming
is used and no slot name is specified with -S. If a slot name is
specified, then a permanent slot with that name is used. If --no-slot is
specified, then no permanent or temporary slot will be used.
Temporary slots are only used on 10.0 and newer, of course.
This changes the default values of the following parameters:
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
in order to make it possible to make a backup and set up simple
replication on the default settings, without requiring a system restart.
Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com
Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.
The different actions in pg_ctl had different defaults for -w and -W,
mostly for historical reasons. Most users will want the -w behavior, so
make that the default.
Remove the -w option in most example and test code, so avoid confusion
and reduce verbosity. pg_upgrade is not touched, so it can continue to
work with older installations.
Reviewed-by: Beena Emerson <memissemerson@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
Various example and test code used -m fast explicitly, but since it's
the default, this can be omitted now or should be replaced by a better
example.
pg_upgrade is not touched, so it can continue to operate with older
installations.
The previous --path documentation and --help output were wrong in both
its meaning and the defaults.
Reviewed-by: Michael Paquier
Backpatch-through: 9.6
Be more clear that we represent timestamps in microseconds when
integer timestamps are used, and in fractional seconds when
floating-point timestamps are used.
Discussion: http://postgr.es/m/20161212135045.GB15488@e733.localdomain
Report by Alexander Alekseev. Wording by me with a suggestion
from Tom Lane.
Allow pg_recvlogical to specify an ending LSN, complementing
the existing -—startpos=LSN option.
Craig Ringer, reviewed by Euler Taveira and Naoki Okano
Leave OpenSSL's default passphrase collection callback in place during
the first call of secure_initialize() in server startup. Although that
doesn't work terribly well in daemon contexts, some people feel we should
not break it for anyone who was successfully using it before. We still
block passphrase demands during SIGHUP, meaning that you can't adjust SSL
configuration on-the-fly if you used a passphrase, but this is no worse
than what it was before commit de41869b6. And we block passphrase demands
during EXEC_BACKEND reloads; that behavior wasn't useful either, but at
least now it's documented.
Tweak some related log messages for more readability, and avoid issuing
essentially duplicate messages about reload failure caused by a passphrase.
Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us
Since streaming is now supported for all output formats, make this the
default as this is what most people want.
To get the old behavior, the parameter -X none can be specified to turn
it off.
This also removes the parameter -x for fetch, now requiring -X fetch to
be specified to use that.
Reviewed by Vladimir Rusinov, Michael Paquier and Simon Riggs
OpenSSL's default behavior when loading a passphrase-protected key file
is to open /dev/tty and demand the password from there. It was kinda
sorta okay to allow that to happen at server start, but really that was
never workable in standard daemon environments. And it was a complete
fail on Windows, where the same thing would happen at every backend launch.
Yesterday's commit de41869b6 put the final nail in the coffin by causing
that to happen at every SIGHUP; even if you've still got a terminal acting
as the server's TTY, having the postmaster freeze until you enter the
passphrase again isn't acceptable.
Hence, override the default behavior with a callback that returns an empty
string, ensuring failure. Change the documentation to say that you can't
have a passphrase-protected server key, period.
If we can think of a production-grade way of collecting a passphrase from
somewhere, we might do that once at server startup and use this callback
to feed it to OpenSSL, but it's far from clear that anyone cares enough
to invest that much work in the feature. The lack of complaints about
the existing fractionally-baked behavior suggests nobody's using it anyway.
Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us
That was written when we still had "crypt" authentication, and it was
referring to the fact that an older client might support "crypt"
authentication but not "md5". But we haven't supported "crypt" for years.
(As soon as we add a new authentication mechanism that doesn't work with
MD5 hashes, we'll need a similar notice again. But this text as it's worded
now is just wrong.)
Backpatch to all supported versions.
Discussion: https://www.postgresql.org/message-id/9a7263eb-0980-2072-4424-440bb2513dc7@iki.fi
It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL. Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP. Testing
shows that this is fast enough that it shouldn't be a problem.
In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.
An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys? As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical. But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.
Andreas Karlsson, reviewed by Michael Banck and Michael Paquier
Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se
For aggregated logging, pg_bench supposed that printing the integer part of
INSTR_TIME_GET_DOUBLE() would produce a Unix timestamp. That was already
broken on Windows, and it's about to get broken on most other platforms as
well. As in commit 74baa1e3b, we can remove the entanglement at the price
of one extra syscall per transaction; though here it seems more convenient
to use time(NULL) instead of gettimeofday(), since we only need
integral-second precision.
I took the time to do some wordsmithing on the documentation about
pgbench's logging features, too.
Discussion: https://postgr.es/m/8837.1483216839@sss.pgh.pa.us
In addition to space accounted for by tuple_len, dead_tuple_len and
free_space, the table_len includes page overhead, the item pointers
table and padding bytes.
Backpatch to live branches.
We had an index entry for "median" attached to the percentile_cont function
entry, which was pretty useless because a person following the link would
never realize that that function was the one they were being hinted to use.
Instead, make the index entry point at the example in syntax-aggregates,
and add a <seealso> link to "percentile".
Also, since that example explicitly claims to be calculating the median,
make it use percentile_cont not percentile_disc. This makes no difference
in terms of the larger goals of that section, but so far as I can find,
nearly everyone thinks that "median" means the continuous not discrete
calculation.
Per gripe from Steven Winfield. Back-patch to 9.4 where we introduced
percentile_cont.
Discussion: https://postgr.es/m/20161223102056.25614.1166@wrigleys.postgresql.org
I got a little annoyed by reading documentation paragraphs containing
both spellings within a few lines of each other. My dictionary says
"descendant" is the preferred spelling, and it's certainly the majority
usage in our tree, so standardize on that.
For one usage in parallel.sgml, I thought it better to rewrite to avoid
the term altogether.
Documentation for pg_restore said COPY TO does not support row security
when in fact it should say COPY FROM. Fix that.
While at it, make it clear that "COPY FROM" does not allow RLS to be
enabled and INSERT should be used instead. Also that SELECT policies
will apply to COPY TO statements.
Back-patch to 9.5 where RLS first appeared.
Author: Joe Conway
Reviewed-By: Dean Rasheed and Robert Haas
Discussion: https://postgr.es/m/5744FA24.3030008%40joeconway.com
Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.
To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.
Use the new facility for checkpoints, archive timeout and standby
snapshots.
The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.
Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.orghttps://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch: -
In an attempt to simplify the tsquery matching engine, the original
phrase search patch invented rewrite rules that would rearrange a
tsquery so that no AND/OR/NOT operator appeared below a PHRASE operator.
But this approach had numerous problems. The rearrangement step was
missed by ts_rewrite (and perhaps other places), allowing tsqueries
to be created that would cause Assert failures or perhaps crashes at
execution, as reported by Andreas Seltenreich. The rewrite rules
effectively defined semantics for operators underneath PHRASE that were
buggy, or at least unintuitive. And because rewriting was done in
tsqueryin() rather than at execution, the rearrangement was user-visible,
which is not very desirable --- for example, it might cause unexpected
matches or failures to match in ts_rewrite.
As a somewhat independent problem, the behavior of nested PHRASE operators
was only sane for left-deep trees; queries like "x <-> (y <-> z)" did not
behave intuitively at all.
To fix, get rid of the rewrite logic altogether, and instead teach the
tsquery execution engine to manage AND/OR/NOT below a PHRASE operator
by explicitly computing the match location(s) and match widths for these
operators.
This requires introducing some additional fields into the publicly visible
ExecPhraseData struct; but since there's no way for third-party code to
pass such a struct to TS_phrase_execute, it shouldn't create an ABI problem
as long as we don't move the offsets of the existing fields.
Another related problem was that index searches supposed that "!x <-> y"
could be lossily approximated as "!x & y", which isn't correct because
the latter will reject, say, "x q y" which the query itself accepts.
This required some tweaking in TS_execute_ternary along with the main
tsquery engine.
Back-patch to 9.6 where phrase operators were introduced. While this
could be argued to change behavior more than we'd like in a stable branch,
we have to do something about the crash hazards and index-vs-seqscan
inconsistency, and it doesn't seem desirable to let the unintuitive
behaviors induced by the rewriting implementation stand as precedent.
Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us
Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
The ALTER TABLE documentation wasn't terribly clear when it came to
which commands could be combined together and what it meant when they
were.
In particular, SET TABLESPACE *can* be combined with other commands,
when it's operating against a single table, but not when multiple tables
are being moved with ALL IN TABLESPACE. Further, the actions are
applied together but not really in 'parallel', at least today.
Pointed out by: Amit Langote
Improved wording from Tom.
Back-patch to 9.4, where the ALL IN TABLESPACE option was added.
Discussion: https://www.postgresql.org/message-id/14c535b4-13ef-0590-1b98-76af355a0763%40lab.ntt.co.jp
On AIX, doubles are aligned at 4 bytes, but int64 is aligned at 8 bytes.
Our code assumes that doubles have alignment that can also be applied to
int64, but that fails in this case. One effect is that
heap_form_tuple() writes tuples in a different layout than
Form_pg_sequence expects.
Rather than rewrite the whole alignment code, work around the issue by
reordering the columns in pg_sequence so that the first int64 column
naturally comes out at an 8-byte boundary.
Move sequence metadata (start, increment, etc.) into a proper system
catalog instead of storing it in the sequence heap object. This
separates the metadata from the sequence data. Sequence metadata is now
operated on transactionally by DDL commands, whereas previously
rollbacks of sequence-related DDL commands would be ignored.
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.
This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.
FIRST num_sync (standby_name [, ...])
ANY num_sync (standby_name [, ...])
The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.
The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.
The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.
Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>
Many thanks to the various individuals who were involved in
discussing and developing this feature.
Per discussion, we will not at this time remove trailing whitespace in
psql output displays where it is part of the actual psql output.
From: Vladimir Rusinov <vrusinov@google.com>
array_base and array_stride were added so that we could identify the
offset of an LWLock within a tranche, but this facility is only very
marginally used apart from the main tranche. So, give every lock in
the main tranche its own tranche ID and get rid of array_base,
array_stride, and all that's attached. For debugging facilities
(Trace_lwlocks and LWLOCK_STATS) print the pointer address of the
LWLock using %p instead of the offset. This is arguably more useful,
and certainly a lot cheaper. Drop the offset-within-tranche from
the information reported to dtrace and from one can't-happen message
inside lwlock.c.
The main user-visible impact of this change is that pg_stat_activity
will now report all waits for LWLocks as "LWLock" rather than
reporting some as "LWLockTranche" and others as "LWLockNamed".
The main motivation for this change is that the need to specify an
array_base and an array_stride is awkward for parallel query. There
is only a very limited supply of tranche IDs so we can't just keep
allocating new ones, and if we try to use the same tranche IDs every
time then we run into trouble when multiple parallel contexts are
use simultaneously. So if we didn't get rid of this mechanism we'd
have to make it even more complicated. By simplifying it in this
way, we instead reduce the size of the generated code for lwlock.c
by about 5%.
Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com
The description of effective_io_concurrency option was missing in ALTER
TABLESPACE docs though it's included in CREATE TABLESPACE one.
Back-patch to 9.6 where effective_io_concurrency tablespace option was added.
Michael Paquier, reported by Marc-Olaf Jaschke
Commit f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63 implemented table
partitioning, but failed to mention the "no row movement"
restriction in the documentation. Fix that and a few other issues.
Amit Langote, with some additional wordsmithing by me.
This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.
From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
If the PAGER environment variable is set but contains an empty string,
psql would pass it to "sh" which would silently exit, causing whatever
query output we were printing to vanish entirely. This is quite
mystifying; it took a long time for us to figure out that this was the
cause of Joseph Brenner's trouble report. Rather than allowing that
to happen, we should treat this as another way to specify "no pager".
(We could alternatively treat it as selecting the default pager, but
it seems more likely that the former is what the user meant to achieve
by setting PAGER this way.)
Nonempty, but all-white-space, PAGER values have the same behavior, and
it's pretty easy to test for that, so let's handle that case the same way.
Most other cases of faulty PAGER values will result in the shell printing
some kind of complaint to stderr, which should be enough to diagnose the
problem, so we don't need to work harder than this. (Note that there's
been an intentional decision not to be very chatty about apparent failure
returns from the pager process, since that may happen if, eg, the user
quits the pager with control-C or some such. I'd just as soon not start
splitting hairs about which exit codes might merit making our own report.)
libpq's old PQprint() function was already on board with ignoring empty
PAGER values, but for consistency, make it ignore all-white-space values
as well.
It's been like this a long time, so back-patch to all supported branches.
Discussion: https://postgr.es/m/CAFfgvXWLOE2novHzYjmQK8-J6TmHz42G8f3X0SORM44+stUGmw@mail.gmail.com
Previously, a prepared statement created via a Parse message could get
a parallel plan, but one created with a PREPARE statement could not.
This state of affairs was due to confusion on my (rhaas) part: I
erroneously believed that a CREATE TABLE .. AS EXECUTE statement could
only be performed with a prepared statement by PREPARE, but in fact
one created by a Prepare message works just as well. Therefore, it
makes no sense to allow parallel query in one case but not the other.
To fix, allow parallel query with all prepared statements, but run
the parallel plan serially (i.e. without workers) in the case of
CREATE TABLE .. AS EXECUTE. Also, document this.
Amit Kapila and Tobias Bussman, plus an extra sentence of
documentation by me.
Add the descriptions of possible values in "state" and "sync_state" columns
of pg_stat_replication view.
Author: Michael Paquier, slightly modified by me
Discussion: <CAB7nPqT7APWrvPFZrcjKEHoq4=g3z2ErxtTdojSf+sDALzuemA@mail.gmail.com>
We have had support for restrictive RLS policies since 9.5, but they
were only available through extensions which use the appropriate hooks.
This adds support into the grammer, catalog, psql and pg_dump for
restrictive RLS policies, thus reducing the cases where an extension is
necessary.
In passing, also move away from using "AND"d and "OR"d in comments.
As pointed out by Alvaro, it's not really appropriate to attempt
to make verbs out of "AND" and "OR", so reword those comments which
attempted to.
Reviewed By: Jeevan Chalke, Dean Rasheed
Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net
Commit b460f5d6693103076dc554aa7cbb96e1e53074f9 -- at my suggestion --
increased the default value of max_worker_processes from 8 to 16, on
the theory that this would be harmless and convenient for users.
Unfortunately, this caused some buildfarm machines with low connection
limits to start failing, so apparently it's not harmless after all.