In a complete brown paper bag moment, I forgot to include equalfuncs
in my previous fix of copy/out/readfuncs. Thanks Tom for noticing.
Discussion: https://postgr.es/m/1659.1551903210@sss.pgh.pa.us
This includes a catversion bump, as IntoClause is theoretically
speaking part of storable rules. In practice I don't think that can
happen, but there's no reason to be stingy here.
Per buildfarm member calliphoridae.
This adds pg_dump support for table AMs in a similar manner to how
tablespaces are handled. That is, instead of specifying the AM for
every CREATE TABLE etc, emit SET default_table_access_method
statements. That makes it easier to change the AM for all/most tables
in a dump, and allows restore to succeed even if some AM is not
available.
This increases the dump archive version, as a tables/matview's AM
needs to be tracked therein.
Author: Dimitri Dolgov, Andres Freund
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.dehttps://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
This introduces the concept of table access methods, i.e. CREATE
ACCESS METHOD ... TYPE TABLE and
CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.
Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.
Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.
Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.
Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs. It's quite possible that this behaviour
still needs to be fine tuned.
For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.
Catversion bumped, to add the heap table AM and references to it.
Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.dehttps://postgr.es/m/20160812231527.GA690404@alvherre.pgsqlhttps://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.dehttps://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
I broke/typoed this in 4da597edf1bae0c. Astonishingly this mostly
doesn't cause breakage, except when trying to change the tuple
descriptor of a slot (because TTS_FLAG_FIXED is assumed to be set).
Author: Andres Freund
The original 200 default value was set back in f425b605f4e when the cost
delay settings were first added. Hardware has improved quite a bit since
then and we've also made improvements such as sorting buffers during
checkpoints (9cd00c457e6) which should result in less random writes.
This low default value was reportedly causing problems with badly
configured servers and in the absence of a native method to remove
excessive bloat from tables without incurring an AccessExclusiveLock, this
often made cleaning up the damage caused by badly configured auto-vacuums
difficult.
It seems more likely that someone will notice that auto-vacuum is running
too quickly than too slowly, so let's go all out and multiple the default
value for the setting by 10. With the default vacuum_cost_page_dirty and
autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this
allows autovacuum a theoretical maximum dirty write rate of around 39MB/s
instead of just 3.9MB/s.
Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
This test fails if the containing directory contains a funny character
such as a space or some perl metacharacter. To avoid that, we check for
files names using readdir and a regex, rather than using a glob pattern.
Discussion: https://postgr.es/m/CAM6_UM6dGdU39PKAC24T+HD9ouy0jLN9vH6163K8QEEzr__iZw@mail.gmail.com
Author: Fabien COELHO
Reviewed-by: Raúl Marín Rodríguez
Scanning an index in physical order is faster than walking it in logical
order, because sequential I/O is faster than random I/O. The idea and code
structure is borrowed from B-tree vacuum code.
Patch by Andrey Borodin, with changes by me. Based on early work by
Konstantin Kuznetsov, although the patch has been rewritten multiple times
since his original version.
Discussion: https://www.postgresql.org/message-id/1B9FAC6F-FA19-4A24-8C1B-F4F574844892%40yandex-team.ru
For some reason the dump test with names with high bits set fails on
Msys2 (although not Msys1). Disable the tests for now, so that other
tests can run.
This is the only test that fails when run as an admin user. The reason
is that when Postgres is started via pg_ctl its admin privileges are
lowered. However, this test called 'postgres -D datadir' directly,
resulting in a failure. Replace that by calling pg_ctl and then checking
the result for the expected failure, and the logfile for the expected
error message.
Currently only frogmouth in the buildfarm uses the 32bit params, and
it's not able to build past release 10, so put those last, saving
substantial configure time on more modern systems. Even if we get a
modern 32 bit Windows system at some stage we should probably prefer the
64 bit interface here these days.
The implementation of readdir() in src/port/ which gets used by MSVC has
been added in 399a36a, and since the beginning it considers all errors
on the first file lookup as ENOENT, setting errno accordingly and
letting the routine caller think that the directory is empty. While
this is normally enough for the case of the backend, this can confuse
callers of this routine on Windows as all errors would map to the same
behavior. So, for example, even permission errors would be thought as
having an empty directory, while there could be contents in it.
This commit changes the error handling so as readdir() gets a behavior
similar to native implementations: force errno=0 when seeing
ERROR_FILE_NOT_FOUND as error and consider other errors as plain
failures.
While looking at the patch, I noticed that MinGW does not enforce
errno=0 when looking at the first file, but it gets enforced on the next
file lookups. A comment related to that was incorrect in the code.
Reported-by: Yuri Kurenkov
Diagnosed-by: Yuri Kurenkov, Grigory Smolkin
Author: Konstantin Knizhnik
Reviewed-by: Andrew Dunstan, Michael Paquier
Discussion: https://postgr.es/m/2cad7829-8d66-e39c-b937-ac825db5203d@postgrespro.ru
Backpatch-through: 9.4
The test crashes and burns quite badly, for some reason, but even if it
didn't it wouldn't work, since Windows doesn't let you rename a file
held by a running process.
StoreIndexTuple was a loop over index_getattr, which is O(N^2)
if the index columns are variable-width, and the performance
impact is already quite visible at ten columns. The obvious
move is to replace that with a call to index_deform_tuple ...
but that's *also* a loop over index_getattr. Improve it to
be essentially a clone of heap_deform_tuple.
(There are a few other places that loop over all index columns
with index_getattr, and perhaps should be changed likewise,
but most of them don't seem performance-critical. Anyway, the
rest would mostly only be interested in the index key columns,
which there aren't likely to be so many of. Wide index tuples
are a new thing with INCLUDE.)
Konstantin Knizhnik
Discussion: https://postgr.es/m/e06b2d27-04fc-5c0e-bb8c-ecd72aa24959@postgrespro.ru
Commit f092de05 added a test for pg_dumpall --exclude-database including
the wildcard pattern '*dump*' which matches some files in the source
directory. The test library on msys uses the shell which expands this
and thus the program gets incorrect arguments. This doesn't happen if
the pattern doesn't match any files, so here the pattern is set to
'*dump_test*' which is such a pattern.
Per buildfarm animal jacana.
Previously, rewriteTargetListIU() generated a list of attribute
numbers from the targetlist, which were passed to rewriteValuesRTE(),
which expected them to contain the same number of entries as there are
columns in the VALUES RTE, and to be in the same order. That was fine
when the target relation was a table, but for an updatable view it
could be broken in at least three different ways ---
rewriteTargetListIU() could insert additional targetlist entries for
view columns with defaults, the view columns could be in a different
order from the columns of the underlying base relation, and targetlist
entries could be merged together when assigning to elements of an
array or composite type. As a result, when recursing to the base
relation, the list of attribute numbers generated from the rewritten
targetlist could no longer be relied upon to match the columns of the
VALUES RTE. We got away with that prior to 41531e42d3 because it used
to always be the case that rewriteValuesRTE() did nothing for the
underlying base relation, since all DEFAULTS had already been replaced
when it was initially invoked for the view, but that was incorrect
because it failed to apply defaults from the base relation.
Fix this by examining the targetlist entries more carefully and
picking out just those that are simple Vars referencing the VALUES
RTE. That's sufficient for the purposes of rewriteValuesRTE(), which
is only responsible for dealing with DEFAULT items in the VALUES
RTE. Any DEFAULT item in the VALUES RTE that doesn't have a matching
simple-Var-assignment in the targetlist is an error which we complain
about, but in theory that ought to be impossible.
Additionally, move this code into rewriteValuesRTE() to give a clearer
separation of concerns between the 2 functions. There is no need for
rewriteTargetListIU() to know about the details of the VALUES RTE.
While at it, fix the comment for rewriteValuesRTE() which claimed that
it doesn't support array element and field assignments --- that hasn't
been true since a3c7a993d5 (9.6 and later).
Back-patch to all supported versions, with minor differences for the
pre-9.6 branches, which don't support array element and field
assignments to the same column in multi-row VALUES lists.
Reviewed by Amit Langote.
Discussion: https://postgr.es/m/15623-5d67a46788ec8b7f@postgresql.org
This changes the partition functions so as tables and indexes which are
not part of partition trees are handled the same way as what is done for
undefined objects and unsupported relkinds: pg_partition_tree() returns
no rows and pg_partition_root() returns a NULL result. Hence,
partitioned tables, partitioned indexes and relations whose flag
pg_class.relispartition is set are considered as valid objects to
process.
Previously, tables and indexes not included in a partition tree were
processed the same way as a partition or a partitioned table, which
caused the functions to return inconsistent results for inherited
tables, especially when inheriting from multiple tables.
Reported-by: Álvaro Herrera
Author: Amit Langote, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190228193203.GA26151@alvherre.pgsql
Future-proofing against a common mistake in attempts to optimize NOT IN.
We don't have such an optimization right now, but attempts to do so
are in the works, and some of 'em are buggy. Add a regression test case
covering the point.
David Rowley
Discussion: https://postgr.es/m/CAKJS1f90E9agVZryVyUpbHQbjTt5ExqS2Fsodmt5_A7E_cEyVA@mail.gmail.com
In particular, make it possible to prove/refute "x IS NULL" and
"x IS NOT NULL" predicates from a clause involving a ScalarArrayOpExpr
even when we are unable or unwilling to deconstruct the expression
into an AND/OR tree. This avoids a former unexpected degradation of
plan quality when the size of an ARRAY[] expression or array constant
exceeded the arbitrary MAX_SAOP_ARRAY_SIZE limit. For IS-NULL proofs,
we don't really care about the values of the individual array elements;
at most, we care whether there are any, and for some common cases we
needn't even know that.
The main user-visible effect of this is to let the optimizer recognize
applicability of partial indexes with "x IS NOT NULL" predicates to
queries with "x IN (array)" clauses in some cases where it previously
failed to recognize that. The structure of predtest.c is such that a
bunch of related proofs will now also succeed, but they're probably
much less useful in the wild.
James Coleman, reviewed by David Rowley
Discussion: https://postgr.es/m/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg@mail.gmail.com
It turns out that different getopt implementations spell the error for
missing arguments different ways. This test is of fairly marginal
value, so instead of trying to keep up with the different error
messages just remove the test.
For the upcoming pluggable table access methods it's quite
inconvenient to store tuples as HeapTuples, as that'd require
converting tuples from a their native format into HeapTuples. Instead
use slots to manage epq tuples.
To fit into that scheme, change the foreign data wrapper callback
RefetchForeignRow, to store the tuple in a slot. Insist on using the
caller provided slot, so it conveniently can be stored in the
corresponding EPQ slot. As there is no in core user of
RefetchForeignRow, that change was done blindly, but we plan to test
that soon.
To avoid duplicating that work for row locks, move row locks to just
directly use the EPQ slots - it previously temporarily stored tuples
in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given
we'd possibly end up with a significant number of additional slots.
The behaviour of es_epqTupleSet[rti -1] is now checked by
es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a
slot containing an empty tuple.
Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
This option functions similarly to pg_dump's --exclude-table option, but
for database names. The option can be given once, and the argument can
be a pattern including wildcard characters.
Author: Andrew Dunstan.
Reviewd-by: Fabien Coelho and Michael Paquier
Discussion: https://postgr.es/m/43a54a47-4aa7-c70e-9ca6-648f436dd6e6@2ndQuadrant.com
After commit b0eaa4c51b, we use a local map of pages to find the required
space for small relations. We do clear this map when we have found a block
with enough free space, when we extend the relation, or on transaction
abort so that it can be used next time. However, we miss to clear it when
we didn't find any pages to try from the map which leads to an assertion
failure when we later tried to use it after relation extension.
In the passing, I have improved some comments in this area.
Reported-by: Tom Lane based on buildfarm results
Author: Amit Kapila
Reviewed-by: John Naylor
Tested-by: Kuntal Ghosh
Discussion: https://postgr.es/m/32368.1551114120@sss.pgh.pa.us
The function was tweaked so as it returned one row full of NULLs when
working on an unsupported relkind or an undefined object as of cc53123,
and after discussion with Amit and Álvaro it looks more natural to make
it return no rows.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Amit Langote
Discussion: https://postgr.es/m/20190227184808.GA17357@alvherre.pgsql
After 5408e233f0667478 it's not necessary to force materializing the
target slot when copying from one buffer slot to another. Previously
that was required because the HeapTupleData portion of the source slot
wasn't guaranteed to stay valid long enough, but now we can simply
copy that part into the destination slot's tupdata.
Author: Andres Freund
When backend functions were added to expose controldata via SQL,
reading of pg_control was consolidated under src/common so that
both frontend and backend could share the same code. That move
from frontend-only to shared frontend-backend failed to recognize
the risk (and coding standards violation) of using a bare open().
In particular, it risked leaking file descriptors if transient
errors occurred while reading the file. Fix that by using
OpenTransientFile() instead in the backend case, which is
purpose-built for this type of usage.
Since there have been no complaints from the field, and an intermittent
failure low risk, no backpatch. Hard failure would of course be bad, but
in that case these functions are probably the least of your worries.
Author: Joe Conway
Reviewed-By: Michael Paquier
Reported by: Michael Paquier
Discussion: https://postgr.es/m/20190227074728.GA15710@paquier.xyz
Commit f831d4acc changed what pg_dump emits for some empty fields: they
were output as empty strings before, NULL pointer afterwards. That
makes old pg_restore unable to work (crash) with such files, which is
unacceptable. Return to the original representation by explicitly
setting those struct members to "" where needed; remove some no longer
needed checks for NULL input.
We can declutter the code a little by returning to NULLs when we next
update the archive version, so add a note to remind us later.
Discussion: https://postgr.es/m/20190225074539.az6j3u464cvsoxh6@depesz.com
Reported-by: hubert depesz lubaczewski
Author: Dmitry Dolgov
We have forboth() and forthree() macros that simplify iterating
through several parallel lists, but not everyplace that could
reasonably use those was doing so. Also invent forfour() and
forfive() macros to do the same for four or five parallel lists,
and use those where applicable.
The immediate motivation for doing this is to reduce the number
of ad-hoc lnext() calls, to reduce the footprint of a WIP patch.
However, it seems like good cleanup and error-proofing anyway;
the places that were combining forthree() with a manually iterated
loop seem particularly illegible and bug-prone.
There was some speculation about restructuring related parsetree
representations to reduce the need for parallel list chasing of
this sort. Perhaps that's a win, or perhaps not, but in any case
it would be considerably more invasive than this patch; and it's
not particularly related to my immediate goal of improving the
List infrastructure. So I'll leave that question for another day.
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
There was a mix of old_slot/oldslot, new_slot/newslot. Since we've
changed everything from row to slot, we might as well take this
opportunity to clean this up.
Also update some more comments for the slot change.