Amit Langote reported that partition prune was unable to work with
arrays, enums, etc, which led him to research the appropriate way to
match query clauses to partition keys: instead of searching for an exact
match of the expression's type, it is better to rely on the fact that
the expression qual has already been resolved to a specific operator,
and that the partition key is linked to a specific operator family.
With that info, it's possible to figure out the strategy and comparison
function to use for the pruning clause in a manner that works reliably
for pseudo-types also.
Include new test cases that demonstrate pruning where pseudotypes are
involved.
Author: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/2b02f1e9-9812-9c41-972d-517bdc0f815d@lab.ntt.co.jp
It appears that new fields introduced in 857f9c36 have inconsistent datatypes:
BTMetaPageData.btm_last_cleanup_num_heap_tuples is of float4 type,
while xl_btree_metadata.last_cleanup_num_heap_tuples is of double type.
IndexVacuumInfo.num_heap_tuples, which is a source of values for
both former fields is of double type. So, make both those fields in
BTMetaPageData and xl_btree_metadata use float8 type in order to match the
precision of the source. That shouldn't be double type, because we always
use types with explicit width in WAL.
Patch introduces incompatibility of on-disk format since 857f9c36 commit, but
that versions never was released, so just bump catalog version to avoid
possible confusion.
Author: Alexander Korortkov
Remove an obsolete reference to the 'afteritem' argument, which was
removed by commit bc292937. Add a comment that clarifies how
_bt_insertonpg() indirectly handles the insertion of high key items.
Author: Peter Geoghegan
New WAL record XLOG_BTREE_META_CLEANUP introduced in 857f9c36 has no handling
in btree_desc() and btree_identify(). This patch implements corresponding
handling.
Alexander Korotkov
Add several assertions that ensure that we're dealing with a pivot tuple
without non-key attributes where that's expected. Also, remove the
assertion within _bt_isequal(), restoring the v10 function signature. A
similar check will be performed for the page highkey within
_bt_moveright() in most cases. Also avoid dropping all objects within
regression tests, to increase pg_dump test coverage for INCLUDE indexes.
Rather than using infrastructure that's generally intended to be used
with reference counted heap tuple descriptors during truncation, use the
same function that was introduced to store flat TupleDescs in shared
memory (we use a temp palloc'd buffer). This isn't strictly necessary,
but seems more future-proof than the old approach. It also lets us
avoid including rel.h within indextuple.c, which was arguably a
modularity violation. Also, we now call index_deform_tuple() with the
truncated TupleDesc, not the source TupleDesc, since that's more robust,
and saves a few cycles.
In passing, fix a memory leak by pfree'ing truncated pivot tuple memory
during CREATE INDEX. Also pfree during a page split, just to be
consistent.
Refactor _bt_check_natts() to be more readable.
Author: Peter Geoghegan with some editorization by me
Reviewed by: Alexander Korotkov, Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CAH2-Wz%3DkCWuXeMrBCopC-tFs3FbiVxQNjjgNKdG2sHxZ5k2y3w%40mail.gmail.com
Clean up error messages relating to mistakes in .dat files: make sure they
provide the .dat file name and line number, not the place in the Perl
script that's reporting the problem. Adopt more uniform message phrasing,
too.
Make genbki.pl spit up on unrecognized field names in the input hashes.
Previously, it just silently ignored such fields, which could make a
misspelled field name into a very hard-to-decipher problem. (This is in
genbki.pl, *not* Catalog.pm, because we don't want reformat_dat_file.pl to
complain about unrecognized fields. We'd rather it silently dropped them,
to facilitate removing unwanted fields after a reorganization.)
Commit 54eff5311 did not account for the possibility that we'd have
a transaction snapshot due to default_transaction_isolation being
set high enough to require one. The transaction snapshot is enough
to hold back our advertised xmin and thus risk deadlock anyway.
The only way to get rid of that snap is to start a new transaction,
so let's do that instead. Also throw in an assert checking that we
really have gotten to a state where no xmin is being advertised.
Back-patch to 9.4, like the previous commit.
Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com
Change things around so that proper quoting of values interpolated into
the BKI data by initdb is the responsibility of initdb, not something
we half-heartedly handle by putting double quotes into the raw BKI data.
(Note: experimentation shows that it still doesn't work to put a double
quote into the initial superuser username, but that's the fault of
inadequate quoting while interpolating the name into SQL scripts;
the BKI aspect of it works fine now.)
Having done that, we can remove the special-case handling of values
that look like "something" from genbki.pl, and instead teach it to
escape double --- and single --- quotes properly. This removes the
nowhere-documented need to treat those specially in the BKI source
data; whatever you write will be passed through unchanged into the
inserted data value, modulo Perl's rules about single-quoted strings.
Add documentation explaining the (pre-existing) handling of backslashes
in the BKI data.
Per an earlier discussion with John Naylor.
Discussion: https://postgr.es/m/CAJVSVGUNao=-Q2-vAN3PYcdF5tnL5JAHwGwzZGuYHtq+Mk_9ng@mail.gmail.com
Formerly, Catalog.pm turned a C array type declaration in the catalog
header files into a SQL type, e.g., 'foo[]'. Along the way, genbki.pl
turned this into '_foo' for the purpose of type lookups, but wrote 'foo[]'
to postgres.bki. During bootstrap, bootscanner.l had to have a special
case rule to tokenize this, and then MapArrayTypeName() would turn 'foo[]'
into '_foo' one more time.
This seems unnecessarily complicated, especially since nobody cares that
much about the readability of postgres.bki. Instead, make Catalog.pm
convert the C declaration into '_foo' to start with, and preserve that
representation of the type name throughout bootstrap data processing.
Then rip out the special-case code in bootscanner.l and bootstrap.c.
This changes postgres.bki to the extent that array fields are now
declared like
proconfig = _text ,
rather than
proconfig = text[] ,
No documentation update, since the SGML docs didn't mention any of this
in the first place, and it's all pretty transparent to writers of
catalog header files anyway.
John Naylor
Discussion: https://postgr.es/m/CAJVSVGUNao=-Q2-vAN3PYcdF5tnL5JAHwGwzZGuYHtq+Mk_9ng@mail.gmail.com
During the bootstrap data format conversion, it seemed important for
verifiability's sake that the generated postgres.bki file stayed the same
as before. That resulted in adding a bunch of ad-hoc rules about when to
quote emitted data values, to match previous manual decisions that had
often quoted values unnecessarily. Now that the conversion is complete,
it seems fine to remove all those ad-hoc rules. The net actual effect on
the current contents of postgres.bki is that some fields that had been
quoted despite containing only digits or only "-" lose their unnecessary
quotes.
Also, now that genbki.pl will always quote values containing a backslash,
there's no need for bootscanner.l to allow unquoted octal escapes;
so simplify its production for "id" by removing that possibility.
John Naylor, slightly modified by me
Discussion: https://postgr.es/m/CAJVSVGUNao=-Q2-vAN3PYcdF5tnL5JAHwGwzZGuYHtq+Mk_9ng@mail.gmail.com
Review of commit 1eb6d652: It's pointless to add padding to the GID fields,
when the code that follows assumes that there is no alignment, and uses
memcpy(). Remove the pointless padding.
Update comments to note the new fields in the WAL records.
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/33b787bf-dc20-1161-54e9-3f3b607bf59d%40iki.fi
It turns out that after runtime partition pruning, Append's
first_partial_plan does not accurately represent partial plans to run,
if any of those got pruned. This could limit participation of workers
in some partial subplans, if other subplans got pruned. Fix it by
keeping an index of the first valid partial subplan in the state node,
determined at execnode Init time.
Author: David Rowley, with cosmetic changes by me.
Discussion: https://postgr.es/m/CAKJS1f8o2Yd=rOP=Et3A0FWgF+gSAOkFSU6eNhnGzTPV7nN8sQ@mail.gmail.com
coverage report indicated that mark_invalid_subplans_as_finished() and
nearby code was not getting exercised by any tests. Add a new one which
has execution-time Params rather than only external Params to fix this.
In passing, David noticed that ab_q6 tests were not actually required to
have a generic plan. The tests were testing exec Params not external
Params, so there was no need for the PREPARE. Remove the PREPARE,
making these plain queries. (The new queries are called from
explain_parallel_append, which may be unnecessary since they don't
actually have a Parallel Append node, just an Append. But it doesn't
seem to hurt anything, either.)
Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f--hopb6JBSDY4wiXTS3ZcDp-wparXjTQ1nzNdBa04Fog@mail.gmail.com
This reverts commit 4d0f6d3f207d ("Attempt to stabilize partition_prune
test output (2)"), and attempts to stabilize the test by using string
replacement to hide any loop count difference in parallel nodes.
Discussion: https://postgr.es/m/4475.1523628300@sss.pgh.pa.us
spg_text_leaf_consistent() supposed that it should compare only
Min(querylen, entrylen) bytes of the two strings, and then deal with
any excess bytes in one string or the other by assuming the longer
string is greater if the prefixes are equal. Quite aside from the
fact that that's just wrong in some locales (e.g., 'ch' is not less
than 'd' in cs_CZ), it also risked passing incomplete multibyte
characters to strcoll(), with ensuing bad results.
Instead, just pass the full strings to varstr_cmp, and let it decide
what to do about unequal-length strings.
Fortunately, this error doesn't imply any index corruption, it's just
that searches might return the wrong set of entries.
Per report from Emre Hasegeli, though this is not his patch.
Thanks to Peter Geoghegan for review and discussion.
This code was born broken, so back-patch to all supported branches.
In HEAD, I failed to resist the temptation to do a bit of cosmetic
cleanup/pgindent'ing on 710d90da1, too.
Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com
We had an Assert() preventing whole-row expressions from being used in
the SET clause of INSERT ON CONFLICT, but it seems unnecessary, given
some tests, so remove it. Add a new test to exercise the case.
Still at ExecInitPartitionInfo, we used map_partition_varattnos (which
constructs an attribute map, then calls map_variable_attnos) using
the same two relations many times in different expressions and with
different parameters. Constructing the map over and over is a waste.
To avoid this repeated work, construct the map once, and use
map_variable_attnos() directly instead.
Author: Amit Langote, per comments by me (Álvaro)
Discussion: https://postgr.es/m/20180326142016.m4st5e34chrzrknk@alvherre.pgsql
Spelling access(2)'s second argument as "2" is just horrid.
POSIX makes no promises as to the numeric values of W_OK and related
macros. Even if it accidentally works as intended on every supported
platform, it's still unreadable and inconsistent with adjacent code.
In passing, don't spell "NULL" as "0" either. Yes, that's legal C;
no, it's not project style.
Back-patch, just in case the unportability is real and not theoretical.
(Most likely, even if a platform had different bit assignments for
access()'s modes, there'd not be an observable behavior difference
here; but I'm being paranoid today.)
Coverity complained about the lack of a check on the return value in
parse_jsonb_index_flags' last call of JsonbIteratorNext. Seems like
a reasonable gripe to me, especially since the code is depending on
that being WJB_DONE to not leak memory, so add a check.
In passing, improve a couple other places where the result was being
ignored, either by adding an assert or at least a cast to void.
Also, don't spell "WJB_DONE" as "0". That's horrid coding style,
and it wasn't consistent either.
Teach both base backups and pg_verify_checksums that if a page is new,
it does not have a checksum yet, so it shouldn't be verified.
Noted by Tomas Vondra, review by David Steele.
They were accidentally excluded when reverting the backend online
checksum functionality, and since they weren't built the incorrect
reference to a removed section also did not trigger a problem.
Author: Christoph Berg
Make it clear that a cluster has to be shut down cleanly before
pg_verify_checksum can be run against it.
Author: Michael Paquier
Review: Daniel Gustafsson
This option makes no sense when the cluster checksum state cannot be
changed, and should have been removed in the revert.
Author: Daniel Gustafsson
Review: Michael Paquier
In the wake of commit 50c6bb022, it's not necessary for ApplyRetrieveRule
to have a forUpdatePushedDown parameter. By the time control gets here for
any given view-referencing RTE, we should already have pushed down the
effects of any FOR UPDATE/SHARE clauses affecting the view from outer query
levels. Hence if we don't find a RowMarkClause at the current query level,
that's sufficient proof that there is no outer one either. This in turn
means we need no forUpdatePushedDown parameter for fireRIRrules.
I wonder whether we oughtn't also revert commit cba2d2717, since it now
seems likely that that was band-aiding around the bad effects of doing
FOR UPDATE pushdown and view expansion in the wrong order. However,
in the absence of evidence that the current coding of markQueryForLocking
is actually buggy (i.e. missing RTEs it ought to mark), it seems best to
leave it alone.
Discussion: https://postgr.es/m/24db7b8f-3de5-e25f-7ab9-d8848351d42c@gmail.com
I was dissatisfied with the code coverage report for expand_tuple() in the
wake of commit 7c44c46de: while better than no coverage at all, it was
still not exercising the core function of inserting out-of-line default
values, nor was the HeapTuple-output path covered. So far as I can find,
the only code path that reaches the latter at present is EvalPlanQual
fetches for non-locked tables. Hence, extend eval-plan-qual.spec to
test cases where out-of-line defaults must be inserted into a tuple
fetched from a non-locked table.
Discussion: https://postgr.es/m/87woxi24uw.fsf@ansel.ydns.eu
SELECT FOR UPDATE on a view should require UPDATE (as well as SELECT)
permissions on the view, and then the view's owner needs those same
permissions against the relations it references, and so on all the way
down to base tables. But ApplyRetrieveRule did things in the wrong order,
resulting in failure to mark intermediate view levels as needing UPDATE
permission. Thus for example, if user A creates a table T and an updatable
view V1 on T, then grants only SELECT permissions on V1 to user B, B could
create a second view V2 on V1 and then would be allowed to perform SELECT
FOR UPDATE via V2 (since V1 wouldn't be checked for UPDATE permissions).
To fix, just switch the order of expanding sub-views and marking referenced
objects as needing UPDATE permission. I think additional simplifications
are now possible, but that's distinct from the bug fix proper.
This is certainly a security issue, but the consequences are pretty minor
(just the ability to lock rows that shouldn't be lockable). Against that
we have a small risk of breaking applications that are working as-desired,
since nested views have behaved this way since such cases worked at all.
On balance I'm inclined not to back-patch.
Per report from Alexander Lakhin.
Discussion: https://postgr.es/m/24db7b8f-3de5-e25f-7ab9-d8848351d42c@gmail.com
MaxIndexTuplesPerPage ignores the fact that btree indexes sometimes
store tuples with no data payload. But it also ignores the possibility
of "special space" on index pages, which offsets that, so that the
result isn't an underestimate. This all seems worth documenting, though.
In passing, remove #define MinIndexTupleSize, which was added by
commit 2c03216d8 but not used in that commit nor later ones.
Comment text by me; issue noticed by Peter Geoghegan.
Discussion: https://postgr.es/m/CAH2-WzkQmb54Kbx-YHXstRKXcNc+_87jwV3DRb54xcybLR7Oig@mail.gmail.com
We need to call expand_function_arguments() to expand named and default
arguments.
In PL/pgSQL, we also need to deal with named and default INOUT arguments
when receiving the output values into variables.
Author: Pavel Stehule <pavel.stehule@gmail.com>
Commit 16828d5c forgot to check that it had a set of missing values
before trying to retrieve a value from it.
An additional query to add coverage for this code is added to the
regression test.
Per bug report from Andreas Seltenreich.
In passing, throw an error if the AF count is too small, rather than
just silently discarding extra affix entries.
Note that the new regression test cases require installing the
updated src/backend/tsearch/dicts files.
Arthur Zakirov
Discussion: https://postgr.es/m/20180413113447.GA32474@zakirov.localdomain
We'd throw away the partial result anyway after parsing the error message.
Throwing it away beforehand costs nothing and reduces the risk of
out-of-memory failure. Also, at least in systems that behave like
glibc/Linux, if the partial result was very large then the error PGresult
would get allocated at high heap addresses, preventing the heap storage
used by the partial result from being released to the OS until the error
PGresult is freed.
In psql >= 9.6, we hold onto the error PGresult until another error is
received (for \errverbose), so that this behavior causes a seeming
memory leak to persist for awhile, as in a recent complaint from
Darafei Praliaskouski. This is a potential performance regression from
older versions, justifying back-patching at least that far. But similar
behavior may occur in other client applications, so it seems worth just
back-patching to all supported branches.
Discussion: https://postgr.es/m/CAC8Q8tJ=7cOkPePyAbJE_Pf691t8nDFhJp0KZxHvnq_uicfyVg@mail.gmail.com
This custom opclass was already in use in other tests -- defined
independently in every such file. Move the definition to the earliest
test that uses it, and keep it around so that later tests can reuse it.
Use it in the tests for pruning of hash partitioning, and since this
makes the second expected file unnecessary, put those tests back in
partition_prune.sql whence they sprang.
Author: Amit Langote
Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
NISortAffixes() compared successive compound affixes incorrectly,
thus possibly failing to merge identical affixes, or (less likely)
merging ones that shouldn't be merged. The user-visible effects
of this are unclear, to me anyway.
Per bug #15150 from Alexander Lakhin. It's been broken for a long time,
so back-patch to all supported branches.
Arthur Zakirov
Discussion: https://postgr.es/m/152353327780.31225.13445405496721177988@wrigleys.postgresql.org
We've made multiple attempts to stabilize the plans shown by commit
1bc0100d2, with little success so far. The reason for the remaining
instability seems to be that if a transaction (such as auto-analyze)
is running concurrently with the test, then get_actual_variable_range may
return a maximum value for "T 1"."C 1" that's far away from the actual max,
as a result of our having transiently inserted such a value earlier in
the test. Because we use a non-MVCC snapshot to fetch the value (for
performance reasons), the presence of other transactions can cause that
function to return entries that are actually dead.
To fix, use a less extreme value in the earlier transient insertion, so
that whether it is visible or not won't affect the selectivity estimate.
The use of 9999 there seems to have been picked with the aid of a
dartboard anyway, rather than having a specific reason.
Discussion: https://postgr.es/m/16962.1523551784@sss.pgh.pa.us
We were using CurrentMemoryContext to put the partsupfunc fmgr_info
into, which isn't right, because we want the PartitionKey as a whole to
be in the isolated Relation->rd_partkeycxt context. This can cause a
crash with user-defined support functions in the operator classes used
by partitioning keys. (Maybe this can cause problems with core-supplied
opclasses too, not sure.)
This is demonstrably broken in Postgres 10, too, but the initial
proposed fix runs afoul of a problem discussed back when 8a0596cb656e
("Get rid of copy_partition_key") reorganized that code: namely that it
is possible to jump out of RelationBuildPartitionKey because of some
error and leave a dangling memory context child of CacheMemoryContext.
Also, while reviewing this I noticed that the removed-in-pg11
copy_partition_key was doing something wrong, unfixed in pg10, namely
doing memcpy() on the FmgrInfo, which is bogus (should be doing
fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to
backpatch both the aforementioned 8a0596cb656e and its followup
be2343221fb7 ("Protect against hypothetical memory leaks in
RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt
bugfix on top.
Add a test case exercising btree-based custom operator classes, which
causes a crash prior to this fix. This is not a security problem,
because in order to create an operator class you need superuser
privileges anyway.
Authors: Álvaro Herrera and Amit Langote
Reported and diagnosed by: Amit Langote
Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp
The bug is caused due to the original IndexStmt that DefineIndex receives
being overwritten when processing the INCLUDE columns. Use separate list of
index params to propagate to child tables. Add tests covering this case.
Amit Langote and Alexander Korotkov.
Re-commit 5c6110c6a960ad6fe1b0d0fec6ae36ef4eb913f5 because it discovered a bug
fixed in c266ed31a8a3beed3533e6a78faeca78234cbd43
Discussion: https://www.postgresql.org/message-id/CAJGNTeO%3DBguEyG8wxMpU_Vgvg3nGGzy71zUQ0RpzEn_mb0bSWA%40mail.gmail.com
- Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for
including columns. Throw an error if user points that.
- Truncated storage arrays for such attributes to store only key atrributes,
added assertion checks.
- Do not check opfamily and collation for including columns in
CompareIndexInfo()
Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru
This reverts commits d204ef63776b8a00ca220adec23979091564e465,
83454e3c2b28141c0db01c7d2027e01040df5249 and a few more commits thereafter
(complete list at the end) related to MERGE feature.
While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.
Thanks again to all reviewers and bug reporters.
List of commits reverted, in reverse chronological order:
f1464c5380 Improve parse representation for MERGE
ddb4158579 MERGE syntax diagram correction
530e69e59b Allow cpluspluscheck to pass by renaming variable
01b88b4df5 MERGE minor errata
3af7b2b0d4 MERGE fix variable warning in non-assert builds
a5d86181ec MERGE INSERT allows only one VALUES clause
4b2d44031f MERGE post-commit review
4923550c20 Tab completion for MERGE
aa3faa3c7a WITH support in MERGE
83454e3c2b New files for MERGE
d204ef6377 MERGE SQL Command following SQL:2016
Author: Pavan Deolasee
Reviewed-by: Michael Paquier