PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms. Having the choice of "none" prevents us from
relying on DSM in core features. So this patch removes the choice of
"none".
Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs. However, it affects not only autovacuum, but also manually run
VACUUM. It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.
Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample. So, add it there with appropriate comment.
Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com
This controls both plan-time and execution-time new-style partition
pruning. While finer-grain control is possible (maybe using an enum GUC
instead of boolean), there doesn't seem to be much need for that.
This new parameter controls partition pruning for all queries:
trivially, SELECT queries that affect partitioned tables are naturally
under its control since they are using the new technology. However,
while UPDATE/DELETE queries do not use the new code, we make the new GUC
control their behavior also (stealing control from
constraint_exclusion), because it is more natural, and it leads to a
more natural transition to the future in which those queries will also
use the new pruning code.
Constraint exclusion still controls pruning for regular inheritance
situations (those not involving partitioned tables).
Author: David Rowley
Review: Amit Langote, Ashutosh Bapat, Justin Pryzby, David G. Johnston
Discussion: https://postgr.es/m/CAKJS1f_0HwsxJG9m+nzU+CizxSdGtfe6iF_ykPYBiYft302DCw@mail.gmail.com
This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.
This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.
To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories. PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.
Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds. Once that is resolved we can remove the superfluous
read from disk.
Docs will follow in a later commit containing docs for the whole JIT
feature.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This adds simple cost based plan time decision about whether JIT
should be performed. jit_above_cost, jit_optimize_above_cost are
compared with the total cost of a plan, and if the cost is above them
JIT is performed / optimization is performed respectively.
For that PlannedStmt and EState have a jitFlags (es_jit_flags) field
that stores information about what JIT operations should be performed.
EState now also has a new es_jit field, which can store a
JitContext. When there are no errors the context is released in
standard_ExecutorEnd().
It is likely that the default values for jit_[optimize_]above_cost
will need to be adapted further, but in my test these values seem to
work reasonably.
Author: Andres Freund, with feedback by Peter Eisentraut
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition. This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition. This should be no worse, and often better, than the normal
approach.
If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results. This is less certain
to be a win, but it's still useful.
Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
This commit introduces:
1) JIT provider abstraction, which allows JIT functionality to be
implemented in separate shared libraries. That's desirable because
it allows to install JIT support as a separate package, and because
it allows experimentation with different forms of JITing.
2) JITContexts which can be, using functions introduced in follow up
commits, used to emit JITed functions, and have them be cleaned up
on error.
3) The outline of a LLVM JIT provider, which will be fleshed out in
subsequent commits.
Documentation for GUCs added, and for JIT in general, will be added in
later commits.
Author: Andres Freund, with architectural input from Jeff Davis
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This allows specifying an external command for prompting for or
otherwise obtaining passphrases for SSL key files. This is useful
because in many cases there is no TTY easily available during service
startup.
Also add a setting ssl_passphrase_command_supports_reload, which allows
supporting SSL configuration reload even if SSL files need passphrases.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL". The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.
Also, the row_security parameter was inconsistently categorized, so
clean that up while we're here.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash. While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.
After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory. If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.
The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:
* avoids wasting memory on duplicated hash tables
* avoids wasting disk space on duplicated batch files
* divides the work of building the hash table over the CPUs
One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables. This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes. Another is that
outer batch 0 must be written to disk if multiple batches are required.
A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.
A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.
Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.comhttps://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
A handful of settings, most notably shared_preload_libraries, were
just plain the wrong place compared to their assigned config_group
value in guc.c (and thus pg_settings). In other cases the names of
the sections in postgresql.conf.sample were mildly different from
the corresponding entries in config_group_names[]. Make it all
consistent.
Adrián Escoms, reviewed by me.
Discussion: http://postgr.es/m/CACksPC2veEmFRYqwYepWYO9U7aFhAx6sYq+WqjTyHw7uV=E=pw@mail.gmail.com
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
At the time replacement_sort_tuples was introduced, there were still
cases where replacement selection sort noticeably outperformed using
quicksort even for the first run. However, those cases seem to have
evaporated as a result of further improvements made since that time
(and perhaps also advances in CPU technology). So remove replacement
selection and the controlling GUC entirely. This makes tuplesort.c
noticeably simpler and probably paves the way for further
optimizations someone might want to do later.
Peter Geoghegan, with review and testing by Tomas Vondra and me.
Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
1024 bits is considered weak these days, but OpenSSL always passes 1024 as
the key length to the tmp_dh callback. All the code to handle other key
lengths is, in fact, dead.
To remedy those issues:
* Only include hard-coded 2048-bit parameters.
* Set the parameters directly with SSL_CTX_set_tmp_dh(), without the
callback
* The name of the file containing the DH parameters is now a GUC. This
replaces the old hardcoded "dh1024.pem" filename. (The files for other
key lengths, dh512.pem, dh2048.pem, etc. were never actually used.)
This is not a new problem, but it doesn't seem worth the risk and churn to
backport. If you care enough about the strength of the DH parameters on
old versions, you can create custom DH parameters, with as many bits as you
wish, and put them in the "dh1024.pem" file.
Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier.
Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com
This goes together with the changes made to enable replication on the
sending side by default (wal_level, max_wal_senders etc) by making the
receiving stadby node also enable it by default.
Huong Dangminh
This commit also does
- add REPLICATION_SUBSCRIBERS into config_group
- mark max_logical_replication_workers and max_sync_workers_per_subscription
as REPLICATION_SUBSCRIBERS parameters
- move those parameters into "Subscribers" section in postgresql.conf.sample
Author: Masahiko Sawada, Petr Jelinek and me
Reported-by: Masahiko Sawada
Discussion: http://postgr.es/m/CAD21AoAonSCoa=v=87ZO3vhfUZA1k_E2XRNHTt=xioWGUa+0ug@mail.gmail.com
Defaults match the fixed behavior of prior releases, but now DBAs
have better options to tune serializable workloads.
It might be nice to be able to set this per relation, but that part
will need to wait for another release.
Author: Dagfinn Ilmari Mannsåker
The previous default 'pg_log' might have indicated by its "pg_" prefix
that it is an internal system directory. The new default is more in
line with the typical naming of directories with user-facing log files.
Together with the renaming of pg_clog and pg_xlog, this should clear up
that difference.
Author: Andreas Karlsson <andreas@proxel.se>
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
This changes the default values of the following parameters:
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
in order to make it possible to make a backup and set up simple
replication on the default settings, without requiring a system restart.
Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com
Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.
It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL. Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP. Testing
shows that this is fast enough that it shouldn't be a problem.
In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.
An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys? As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical. But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.
Andreas Karlsson, reviewed by Michael Banck and Michael Paquier
Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se
This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.
This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.
FIRST num_sync (standby_name [, ...])
ANY num_sync (standby_name [, ...])
The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.
The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.
The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.
Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>
Many thanks to the various individuals who were involved in
discussing and developing this feature.
Commit b460f5d6693103076dc554aa7cbb96e1e53074f9 -- at my suggestion --
increased the default value of max_worker_processes from 8 to 16, on
the theory that this would be harmless and convenient for users.
Unfortunately, this caused some buildfarm machines with low connection
limits to start failing, so apparently it's not harmless after all.
Increase the default value of the existing max_worker_processes GUC
from 8 to 16, and add a new max_parallel_workers GUC with a maximum
of 8. This way, even if the maximum amount of parallel query is
happening, there is still room for background workers that do other
things, as originally envisioned when max_worker_processes was added.
Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
Instead of confusingly stating platform-dependent defaults for these
parameters in the comments in postgresql.conf.sample (with the main
entry being a lie on Linux), teach initdb to install the correct
platform-dependent value in postgresql.conf, similarly to the way
we handle other platform-dependent defaults. This won't do anything
for existing 9.6 installations, but since it's effectively only a
documentation improvement, that seems OK.
Since this requires initdb to have access to the default values,
move the #define's for those to pg_config_manual.h; the original
placement in bufmgr.h is unworkable because that file can't be
included by frontend programs.
Adjust the default value for wal_writer_flush_after so that it is 1MB
regardless of XLOG_BLCKSZ, conforming to what is stated in both the
SGML docs and postgresql.conf. (We could alternatively make it scale
with XLOG_BLCKSZ, but I'm not sure I see the point.)
Copy-edit related SGML documentation.
Fabien Coelho and Tom Lane, per a gripe from Tomas Vondra.
Discussion: <30ebc6e3-8358-09cf-44a8-578252938424@2ndquadrant.com>
This value might not be to everyone's taste; in particular, some
people might prefer %t to %m, and others may want %u, %d, or other
fields. However, it's a vast improvement on the old default of ''.
Christoph Berg
This makes the parameter easier to extend, to support other password-based
authentication protocols than MD5. (SCRAM is being worked on.)
The GUC still accepts on/off as aliases for "md5" and "plain", although
we may want to remove those once we actually add support for another
password hash type.
Michael Paquier, reviewed by David Steele, with some further edits by me.
Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>
runtime.sgml used to contain a table of estimated shared memory consumption
rates for max_connections and some other GUCs. Commit 390bfc643 removed
that on the well-founded grounds that (a) we weren't maintaining the
entries well and (b) it no longer mattered so much once we got out from
under SysV shmem limits. But it missed that there were even-more-obsolete
versions of some of those numbers in comments in postgresql.conf.sample.
Remove those too. Back-patch to 9.3 where the aforesaid commit went in.