It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that. However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)
For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.
The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data. What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.
Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
A proposed patch, also by Thomas and in the same thread, would change
the output order of these. Independent of the follow-up patches
getting committed, nailing down the order in these specific tests at
worst seems harmless.
Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=1D4-tP7j7UAgT_j4ZX2j4Ehe1qgZQWFKBMb8F76UW5Rg@mail.gmail.com
When converting a float value to integer microseconds, we should be careful
to round the value to the nearest integer, typically with rint(); simply
assigning to an int64 variable will truncate, causing apparently off-by-one
values in cases that should work. Most places in the datetime code got
this right, but not these two.
float8_timestamptz() is new as of commit e511d878f (9.6). Previous
versions effectively depended on interval_mul() to do roundoff correctly,
which it does, so this fixes an accuracy regression in 9.6.
The problem in make_interval() dates to its introduction in 9.4. Aside
from being careful to round not truncate, let's incorporate the hours and
minutes inputs into the result with exact integer arithmetic, rather than
risk introducing roundoff error where there need not have been any.
float8_timestamptz() problem reported by Erik Nordström, though this is
not his proposed patch. make_interval() problem found by me.
Discussion: https://postgr.es/m/CAHuQZDS76jTYk3LydPbKpNfw9KbACmD=49dC4BrzHcfPv6yA1A@mail.gmail.com
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images). Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.
Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas. Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
Add link to description of libpq connection strings. Add link to
explanation of replication access control. This currently points to the
description of streaming replication access control, which is currently
the same as for logical replication, but that might be refined later.
Also remove plain-text passwords from the examples, to not encourage
that dubious practice.
based on suggestions from Simon Riggs
In the large DO block, collect row TIDs into array variables instead of
creating and dropping a pile of temporary tables. In a normal build,
this reduces the brin test script's runtime from about 1.1 sec to 0.4 sec
on my workstation. That's not all that exciting perhaps, but in a
CLOBBER_CACHE_ALWAYS test build, the runtime drops from 20 min to 17 min,
which is a little more useful. In combination with some other changes
I plan to propose, this will help provide a noticeable reduction in
cycle time for CLOBBER_CACHE_ALWAYS buildfarm critters.
This is infrastructure for a pending patch to allow parallel bitmap
heap scans.
Dilip Kumar, reviewed (in earlier versions) by Andres Freund and
(more recently) by me. Some further renaming by me, also.
This avoids a very significant amount of buffer manager traffic and
contention when scanning hash indexes, because it's no longer
necessary to lock and pin the metapage for every scan. We do need
some way of figuring out when the cache is too stale to use any more,
so that when we lock the primary bucket page to which the cached
metapage points us, we can tell whether a split has occurred since we
cached the metapage data. To do that, we use the hash_prevblkno field
in the primary bucket page, which would otherwise always be set to
InvalidBuffer.
This patch contains code so that it will continue working (although
less efficiently) with hash indexes built before this change, but
perhaps we should consider bumping the hash version and ripping out
the compatibility code. That decision can be made later, though.
Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me.
Before committing, I made a number of cosmetic changes to the last
posted version of the patch, adjusted _hash_getcachedmetap to be more
careful about order of operation, and made some necessary updates to
the pageinspect documentation and regression tests.
The CREATE INDEX CONCURRENTLY bug can only be triggered by row updates,
not inserts, since the problem would arise from an update incorrectly
being made HOT. Noted by Alvaro.
Document the privileges required for each of the sequence functions.
This was already in the GRANT reference page, but also add it to the
function description for easier reference.
Before, reading pg_sequences.last_value would fail unless the user had
appropriate sequence permissions, which would make the pg_sequences view
cumbersome to use. Instead, return null instead of the real value when
there are no permissions.
From: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Shinoda, Noriyoshi <noriyoshi.shinoda@hpe.com>
Add item for last-minute CREATE INDEX CONCURRENTLY fix.
Repair a couple of misspellings of patch authors' names.
Back-branch updates will follow shortly, but I thought I'd
commit this separately just to make it more visible.
The problem with the original coding here is that we might receive (and
clear) a relcache invalidation signal for the target relation down inside
one of the index_open calls we're doing. Since the target is open, we
would not drop the relcache entry, just reset its rd_indexvalid and
rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and
would eventually cache and return potentially-obsolete attribute bitmaps.
The case where this matters is where the inval signal was from a CREATE
INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed
column. (In all other cases, the lock we hold on the target rel should
prevent any concurrent change in index state.) Even just returning the
stale attribute bitmap is not such a problem, because it shouldn't matter
during the transaction in which we receive the signal. What hurts is
caching the stale data, because it can survive into later transactions,
breaking CREATE INDEX CONCURRENTLY's expectation that later transactions
will not create new broken HOT chains. The upshot is that there's a window
for building corrupted indexes during CREATE INDEX CONCURRENTLY.
This patch fixes the problem by rechecking that the set of index OIDs
is still the same at the end of RelationGetIndexAttrBitmap() as it was
at the start. If not, we loop back and try again. That's a little
more than is strictly necessary to fix the bug --- in principle, we
could return the stale data but not cache it --- but it seems like a
bad idea on general principles for relcache to return data it knows
is stale.
There might be more hazards of the same ilk, or there might be a better
way to fix this one, but this patch definitely improves matters and seems
unlikely to make anything worse. So let's push it into today's releases
even as we continue to study the problem.
Pavan Deolasee and myself
Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
The example of using CREATE DATABASE with the ENCODING option did not
work anymore (except in special circumstances) and did not represent a
good general-purpose example, so write some new examples.
Reported-by: marc+pgsql@milestonerdl.com
Commit 665d1fad9 introduced rd_pkindex, and made RelationGetIndexList
responsible for updating it, but didn't bother to fix
RelationGetIndexList's header comment to say so.
Uniformly expose unsigned quantities using the next-wider signed
integer type (since we have no unsigned types at the SQL level).
At the SQL level, this results a change to report itemoffset as
int4 rather than int2. Also at the SQL level, report one value
that is an OID as type oid. Under the hood, uniformly use macros
that match the SQL output type as to both width and signedness.
Since pgstattuple v1.5 hasn't been released yet, no need for a new
extension version. The new function exposes statistics about hash
indexes similar to what other pgstatindex functions return for other
index types.
Ashutosh Sharma, reviewed by Kuntal Ghosh. Substantial further
revisions by me.
On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
since it will start 4 bytes into a palloc'd value. On alignment-picky
hardware, this will cause failures in accesses to 8-byte-wide values
within the page. We already encountered this problem when we introduced
GIN index inspection functions, and fixed it in commit 84ad68d64. Make
use of the same function for hash indexes.
A small difficulty is that up to now contrib/pageinspect has not shared
any functions at all across files. To support that, introduce a common
header file "pageinspect.h" for the module.
Also, move get_page_from_raw() out of ginfuncs.c, where it didn't
especially belong, and put it in rawpage.c which seems a more natural home.
Per buildfarm.
Discussion: https://postgr.es/m/17311.1486134714@sss.pgh.pa.us
Per a report from Tom Lane, the ffactor reported by hash_metapage_info
and the free_size reported by hash_page_stats vary by platform.
Ashutosh Sharma and Robert Haas
It seems like somebody used a dartboard while choosing integer widths
for the various values taken and returned by these functions ... and
then threw a fresh set of darts while writing the SQL declarations.
This patch brings the C code into line with what the SQL declarations
say, which is enough to make it not dump core on the particular 32-bit
machine I'm testing on. But I think we could do with another round
of looking at what the datum widths *should* be. For instance, it's
not all that sensible that hash_bitmap_info decided to use int64 to
represent a BlockNumber input when get_raw_page doesn't do it that way.
There's also a remaining problem that the expected outputs from the
test script are platform-dependent, but I'll leave that issue for
somebody else.
Per buildfarm.
Commit 08bf6e529587e1e9075d013d859af2649c32a511 seems not to have
used the correct *GetDatum and PG_GETARG_* macros for the SQL types
in some cases, and some of the SQL types seem to have been poorly
chosen, too. Try to fix it. I'm not sure if this is the reason
why the buildfarm is currently unhappy with this code, but it
seems like a good place to start.
Buildfarm unhappiness reported by Tom Lane.
Modify FETCH_COUNT to always have a defined value, like other control
variables, mainly so it will always appear in "\set" output.
Add hooks to force HISTSIZE to be defined and require it to have an
integer value. (I don't see any point in allowing it to be set to
non-integral values.)
Add hooks to force IGNOREEOF to be defined and require it to have an
integer value. Unlike the other cases, here we're trying to be
bug-compatible with a rather bogus externally-defined behavior, so I think
we need to continue to allow "\set IGNOREEOF whatever". Fix it so that
the substitution hook silently replace non-numeric values with "10",
so that the stored value always reflects what we're really doing.
Add a dummy assign hook for HISTFILE, just so it's always in
variables.c's list. We can't require it to be defined always, because
that would break the interaction with the PSQL_HISTORY environment
variable, so there isn't any change in visible behavior here.
Remove tab-complete.c's private list of known variable names, since that's
really a maintenance nuisance. Given the preceding changes, there are no
control variables it won't show anyway. This does mean that if for some
reason you've unset one of the status variables (DBNAME, HOST, etc), that
variable would not appear in tab completion for \set. But I think that's
fine, for at least two reasons: we shouldn't be encouraging people to use
those variables as regular variables, and if someone does do so anyway,
why shouldn't it act just like a regular variable?
Remove ugly and no-longer-used-anywhere GetVariableNum(). In general,
future additions of integer-valued control variables should follow the
paradigm of adding an assign hook using ParseVariableNum(), so there's
no reason to expect we'd need this again later.
Discussion: https://postgr.es/m/17516.1485973973@sss.pgh.pa.us
Coverity complained that we might pass a null pointer to strcmp()
if PQresultErrorField were to return NULL. That shouldn't be possible,
since the server is supposed to always provide some SQLSTATE or other
in an error message. But we usually defend against such hazards, and
it only takes a little more code to do so here.
There's no good reason to think this is a live bug, so no back-patch.
If we forcibly place a Material node atop a finished subplan, we need
to move any initPlans attached to the subplan up to the Material node,
in order to keep SS_finalize_plan() happy. I'd figured this out in
commit 7b67a0a49 for the case of materializing a cursor plan, but out of
an abundance of caution, I put the initPlan movement hack at the call
site for that case, rather than inside materialize_finished_plan().
That was the wrong thing, because it turns out to also be necessary for
the only other caller of materialize_finished_plan(), ie subselect.c.
We lacked any test cases that exposed the mistake, but bug#14524 from
Wei Congrui shows that it's possible to get an initPlan reference into
the top tlist in that case too, and then SS_finalize_plan() complains.
Hence, move the hack into materialize_finished_plan().
In HEAD, also relocate some recently-added tests in subselect.sql, which
I'd unthinkingly dropped into the middle of a sequence of related tests.
Report: https://postgr.es/m/20170202060020.1400.89021@wrigleys.postgresql.org
It's not broken because the header file is included via other headers,
but for better style we should be more explicit.
Reported-by: mthrockmorton@hme.com
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal(). Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.
What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.
Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation. But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".
Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.
In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.
Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
There is no particularly good reason to limit this value to 1000,
so increase the limit to INT_MAX / 2, the same limit we use for
shared_buffers. It's not clear how much practical effect larger
settings will have, but there seems no harm in letting people try it.
Jim Nasby, less a comment change I stripped out.
Discussion: http://postgr.es/m/f6e58a22-030b-eb8a-5457-f62fb08d701c@BlueTreble.com
Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling
improvements by me. Tests from Peter Eisentraut. Reviewed by Álvaro
Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter
Eisentraut, Amit Kapila, Mithun Cy, and me.
Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com
Remove $(pkglibdir) from $(rpathdir), since commits
d51924be886c2a05e691fa05b16cb6b30ab8370f and
eda04886c1e048d695728206504ab4198462168e removed direct linkage to
objects stored there. Users are unlikely to notice the difference.
Accompany every $(python_libspec) with $(python_additional_libs); this
doesn't fix a demonstrated bug, but it might do so on rare Python
configurations. With these changes, AIX ceases to be a special case.
Doing so doesn't seem to be within the purpose of the per user
connection limits, and has particularly unfortunate effects in
conjunction with parallel queries.
Backpatch to 9.6 where parallel queries were introduced.
David Rowley, reviewed by Robert Haas and Albe Laurenz.
Add CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo to let
callers use the CatalogTupleXXX abstraction layer even in cases where
we want to share the results of CatalogOpenIndexes across multiple
inserts/updates for efficiency. This finishes the job begun in commit
2f5c9d9c9, by allowing some remaining simple_heap_insert/update
calls to be replaced. The abstraction layer is now complete enough
that we don't have to export CatalogIndexInsert at all anymore.
Also, this fixes several places in which 2f5c9d9c9 introduced performance
regressions by using retail CatalogTupleInsert or CatalogTupleUpdate even
though the previous coding had been able to amortize CatalogOpenIndexes
work across multiple tuples.
A possible future improvement is to arrange for the indexing.c functions
to cache the CatalogIndexState somewhere, maybe in the relcache, in which
case we could get rid of CatalogTupleInsertWithInfo and
CatalogTupleUpdateWithInfo again. But that's a task for another day.
Discussion: https://postgr.es/m/27502.1485981379@sss.pgh.pa.us
This extends the work done in commit 2f5c9d9c9 to provide a more nearly
complete abstraction layer hiding the details of index updating for catalog
changes. That commit only invented abstractions for catalog inserts and
updates, leaving nearby code for catalog deletes still calling the
heap-level routines directly. That seems rather ugly from here, and it
does little to help if we ever want to shift to a storage system in which
indexing work is needed at delete time.
Hence, create a wrapper function CatalogTupleDelete(), and replace calls
of simple_heap_delete() on catalog tuples with it. There are now very
few direct calls of [simple_]heap_delete remaining in the tree.
Discussion: https://postgr.es/m/462.1485902736@sss.pgh.pa.us
"\set" with no arguments displays all defined variables, but it does so
in the order that they appear in variables.c's list, which previously
was mostly creation order. That makes the list ugly and hard to find
things in, and it exposes some psql implementation details to users.
(For instance, ordinary variables will move to the bottom of the list
if unset and set again, but variables that have hooks won't.)
Fix that by keeping the list in alphabetical order at all times, which
isn't much more complicated than breaking out of the insertion search
loops once we reach an entry that should be after the one to be inserted.
Discussion: https://postgr.es/m/31785.1485900786@sss.pgh.pa.us