The logic just added by commit e3899ffd falls back on a 50:50 page split
in the event of a new item that's just to the right of our provisional
"many duplicates" split point. Fix a comment that incorrectly claimed
that the new item had to be just to the left of our provisional split
point.
Backpatch: 12-, just like commit e3899ffd.
Specific ever-decreasing insertion patterns could cause successive
unbalanced nbtree page splits. Problem cases involve a large group of
duplicates to the left, and ever-decreasing insertions to the right.
To fix, detect the situation by considering the newitem offset before
performing a split using nbtsplitloc.c's "many duplicates" strategy. If
the new item was inserted just to the right of our provisional "many
duplicates" split point, infer ever-decreasing insertions and fall back
on a 50:50 (space delta optimal) split. This seems to barely affect
cases that already had acceptable space utilization.
An alternative fix also seems possible. Instead of changing
nbtsplitloc.c split choice logic, we could instead teach _bt_truncate()
to generate a new value for new high keys by interpolating from the
lastleft and firstright key values. That would certainly be a more
elegant fix, but it isn't suitable for backpatching.
Discussion: https://postgr.es/m/CAH2-WznCNvhZpxa__GqAa1fgQ9uYdVc=_apArkW2nc-K3O7_NA@mail.gmail.com
Backpatch: 12-, where the nbtree page split enhancements were introduced.
Commit 578b229718e8f15fa779e20f086c4b6bb3776106 replaced it with a
concurrent "nextval" test. That version does not detect PostgreSQL's
incompatibility with xlc 13.1.3, so bring back an OID-based test that
does. Back-patch to v12, where that commit first appeared.
Discussion: https://postgr.es/m/20190707170035.GA1485546@rfd.leadboat.com
Commit 3ca930fc3 modified get_actual_variable_range() to use a new
"SnapshotNonVacuumable" snapshot type for selecting tuples that it
would consider valid. However, because that snapshot type can accept
recently-dead tuples, this caused a bug when using a recently-created
index: we might accept a recently-dead tuple that is an early member
of a broken HOT chain and does not actually match the index entry.
Then, the data extracted from the heap tuple would not necessarily be
an endpoint value of the column; it could even be NULL, leading to
get_actual_variable_range() itself reporting "found unexpected null
value in index". Even without an error, this could lead to poor
plan choices due to an erroneous notion of the endpoint value.
We can improve matters by changing the code to use the index-only
scan technique (which didn't exist when get_actual_variable_range was
originally written). If any of the tuples in a HOT chain are live
enough to satisfy SnapshotNonVacuumable, we take the data from the
index entry, ignoring what is in the heap. This fixes the problem
without changing the live-vs-dead-tuple behavior from what was
intended by commit 3ca930fc3.
A side benefit is that for static tables we might not have to touch
the heap at all (when the extremal value is in an all-visible page).
In addition, we can save some overhead by not having to create a
complete ExecutorState, and we don't need to run FormIndexDatum,
avoiding more cycles as well as the possibility of failure for
indexes on expressions. (I'm not sure that this code would ever
be used to determine the extreme value of an expression, in the
current state of the planner; but it's definitely possible that
lower-order columns of the selected index could be expressions.
So one could construct perhaps-artificial examples in which the
old code unexpectedly failed due to trying to compute an
expression's value for a now-dead row.)
Per report from Manuel Rigger. Back-patch to v11 where commit
3ca930fc3 came in.
Discussion: https://postgr.es/m/CA+u7OA7W4NWEhCvftdV6_8bbm2vgypi5nuxfnSEJQqVKFSUoMg@mail.gmail.com
match_clause_to_partition_key incorrectly would return
PARTCLAUSE_UNSUPPORTED if a bool qual could not be matched to the current
partition key. This was a problem, as it causes the calling function to
discard the qual and not try to match it to any other partition key. If
there was another partition key which did match this qual, then the qual
would not be checked again and we could fail to prune some partitions.
The worst this could do was to cause partitions not to be pruned when they
could have been, so there was no danger of incorrect query results here.
Fix this by changing match_boolean_partition_clause to have it return a
PartClauseMatchStatus rather than a boolean value. This allows it to
communicate if the qual is unsupported or if it just does not match this
particular partition key, previously these two cases were treated the
same. Now, if match_clause_to_partition_key is unable to match the qual
to any other qual type then we can simply return the value from the
match_boolean_partition_clause call so that the calling function properly
treats the qual as either unmatched or unsupported.
Reported-by: Rares Salcudean
Reviewed-by: Amit Langote
Backpatch-through: 11 where partition pruning was introduced
Discussion: https://postgr.es/m/CAHp_FN2xwEznH6oyS0hNTuUUZKp5PvegcVv=Co6nBXJ+mC7Y5w@mail.gmail.com
This can cause valgrind to complain, as the flag marking a buffer as a
temporary copy was not getting initialized.
While on it, fill in with zeros newly-created buffer pages. This does
not matter when loading a block from a temporary file, but it makes the
push of an index tuple into a new buffer page safer.
This has been introduced by 1d27dcf, so backpatch all the way down to
9.4.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/15899-0d24fb273b3dd90c@postgresql.org
Backpatch-through: 9.4
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM. However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition. Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL. For partitioned
tables, this was unsupported anyway, so there was no issue. With
pluggable storage, we can't make any assumptions about what the table AM
might want to do in its equivalent function, so we'd better ensure we
always call table_finish_bulk_insert each partition that's received a row.
For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo. This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.
This also changes things so that we no longer needlessly call
table_finish_bulk_insert when performing a COPY FROM for a non-partitioned
table when not using multi-inserts.
Reported-by: Robert Haas
Backpatch-through: 12
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
Otherwise the regressplans.sh tests generate extremely slow nested
loop joins. Back-patch to 11 where the hash join tests came in.
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/20190708055256.GB2709%40paquier.xyz
This code failed to account for the possibility that malloc() would
change errno, resulting in wrong output for %m, not to mention the
possibility of message truncation. Such a change is obviously
expected when malloc fails, but there's reason to fear that on some
platforms even a successful malloc call can modify errno.
Discussion: https://postgr.es/m/2576.1527382833@sss.pgh.pa.us
In normal interactive mode, psql's log messages accidentally got a
"psql:" prefix that was not supposed to be there. This only happened
if there was no .psqlrc file being read, so it wasn't discovered for a
while. Fix this by adding the appropriate logging format
configuration call in the right code path.
Discussion: https://www.postgresql.org/message-id/7586.1560540361@sss.pgh.pa.us
The itemlen variable used to be referenced in multiple places, but since
reworking the serialization code it's used only in one assert. Fixed by
removing the variable and calling the macro from the assert directly.
Backpatch to 12, where this code was introduced.
Reported-by: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1zc_ovH9NZd_9ovuiEWkF9yX06URUDdXCmgDydf-bqB5A@mail.gmail.com
The serialization format of multivariate MCV lists included alignment in
order to allow direct access to part of the serialized data, but despite
multiple fixes (see for example commits d85e0f366a and ea4e1c0e8f) this
proved to be problematic.
This commit abandons alignment in the serialized format, and just copies
everything during deserialization. We now also track amount of memory
needed after deserialization (including alignment), which allows us to
deserialize the MCV list in a single pass.
Bump catversion, as this affects contents of pg_statistic_ext_data.
Backpatch to 12, where multi-column MCV lists were introduced.
Author: Tomas Vondra
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/2201.1561521148@sss.pgh.pa.us
The function pg_mcv_list_items() returns values stored in MCV items. The
items may contain columns with different data types, so the function was
generating text array-like representation, but in an ad-hoc way without
properly escaping various characters etc.
Fixed by simply building a text[] array, which also makes it easier to
use from queries etc.
Requires changes to pg_proc entry, so bump catversion.
Backpatch to 12, where multi-column MCV lists were introduced.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed
Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
When building multi-column MCV lists, we compute base frequency for each
item, i.e. a product of per-column frequencies for values from the item.
As a value may be in multiple groups, the code was scanning the whole
array of groups while adding items to the MCV list. This works fine as
long as the number of distinct groups is small, but it's easy to trigger
trigger O(N^2) behavior, especially after increasing statistics target.
This commit precomputes frequencies for values in all columns, so that
when computing the base frequency it's enough to make a simple bsearch
lookup in the array.
Backpatch to 12, where multi-column MCV lists were introduced.
Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
A function that is declared to return a named composite type must
return tuple datums that are physically marked as having that type.
The plpgsql code path that allowed directly returning an expanded-record
datum forgot to check that, so that an expanded record marked as type
RECORDOID could be returned if it had a physically-compatible tupdesc.
This'd be harmless, I think, if the record value never escaped the
current session --- but it's possible for it to get stored into a table,
and then subsequent sessions can't interpret the anonymous record type.
Fix by flattening the record into a tuple datum and overwriting its
type/typmod fields, if its declared type doesn't match the function's
declared type. (In principle it might be possible to just change the
expanded record's stored type ID info, but there are enough tricky
consequences that I didn't want to mess with that, especially not in
a back-patched bug fix.)
Per bug report from Steve Rogerson. Back-patch to v11 where the bug
was introduced.
Discussion: https://postgr.es/m/cbaecae6-7b87-584e-45f6-4d047b92ca2a@yewtc.demon.co.uk
d4c3a156c added code to remove columns that were not part of a table's
PRIMARY KEY constraint from the GROUP BY clause when all the primary key
columns were present in the group by. This is fine to do since we know
that there will only be one row per group coming from this relation.
However, the logic failed to consider inheritance parent relations. These
can have child relations without a primary key, but even if they did, they
could duplicate one of the parent's rows or one from another child
relation. In this case, those additional GROUP BY columns are required.
Fix this by disabling the optimization for inheritance parent tables.
In v11 and beyond, partitioned tables are fine since partitions cannot
overlap and before v11 partitioned tables could not have a primary key.
Reported-by: Manuel Rigger
Discussion: http://postgr.es/m/CA+u7OA7VLKf_vEr6kLF3MnWSA9LToJYncgpNX2tQ-oWzYCBQAw@mail.gmail.com
Backpatch-through: 9.6
This fixes at the same time a set of inconsistencies in the
documentation and the scripts related to the versions of Windows SDK
supported.
Author: Haribabu Kommi
Reviewed-by: Andrew Dunstan, Juan José Santamaría Flecha, Michael
Paquier
Discussion: https://postgr.es/m/CAJrrPGcfqXhfPyMrny9apoDU7M1t59dzVAvoJ9AeAh5BJi+UzA@mail.gmail.com
Backpatch-through: 9.4
Commit 4f3b38fe2 supposed that complete_from_const() is equivalent to
the one-element-list case of complete_from_list(), but that's not
really true at all. complete_from_const() supposes that the completion
is certain enough to justify wiping out whatever the user typed, while
complete_from_list() will only provide completions that match the
word-so-far.
In practice, given the lame parsing technology used by tab-complete.c,
it's fairly hard to believe that we're *ever* certain enough about
a completion to justify auto-correcting user input that doesn't match.
Hence, remove the inappropriate unification of the two cases.
As things now stand, complete_from_const() is used only for the
situation where we have no matches and we need to keep readline
from applying its default complete-with-file-names behavior.
This (mis?) behavior actually exists much further back, but
I'm hesitant to change it in released branches. It's not too
late for v12, though, especially seeing that the aforesaid
commit is new in v12.
Per gripe from Ken Tanzer.
Discussion: https://postgr.es/m/CAD3a31XpXzrZA9TT3BqLSHghdTK+=cXjNCE+oL2Zn4+oWoc=qA@mail.gmail.com
Don't think that the context "UPDATE tab SET var =" is a GUC-setting
command.
If we have "SET var =" but the "var" is not a known GUC variable,
don't offer any completions. The most likely explanation is that
we've misparsed the context and it's not really a GUC-setting command.
Per gripe from Ken Tanzer. Back-patch to 9.6. The issue exists
further back, but before 9.6 the code looks very different and it
doesn't actually know whether the "var" name matches anything,
so I desisted from trying to fix it.
Discussion: https://postgr.es/m/CAD3a31XpXzrZA9TT3BqLSHghdTK+=cXjNCE+oL2Zn4+oWoc=qA@mail.gmail.com
This reverts commit f03a9ca4366d064d89b7cf7ed75d4e43f2ed0667,
in the v12 branch only. We don't want to ship v12 with that,
since it causes occasional test failures (as a result of statistics
transmission not being entirely reliable).
I'll leave it in HEAD though, in hopes that we'll eventually
capture an instance of the original problematic behavior.
4de60244e added the call to table_finish_bulk_insert to the
CopyMultiInsertBufferCleanup function. We use a CopyMultiInsertBuffer even
for non-partitioned tables, so having the cleanup do that meant we would
call table_finsh_bulk_insert twice when performing COPY FROM with
a non-partitioned table.
Here we can just remove the direct call in CopyFrom and let
CopyMultiInsertBufferCleanup handle the call instead.
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM. However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition. Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL. For partitioned
tables, this was unsupported anyway, so there was no issue. With pluggable
storage, we can't make any assumptions about what the table AM might want
to do in its equivalent function, so we'd better ensure we always call
table_finish_bulk_insert each partition that's received a row.
For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo. This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.
In passing, move the table_finish_bulk_insert for the target of the COPY
command so that it's only called when we're actually performing bulk
inserts. We don't need to call this when inserting 1 row at a time.
Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
UBSan complains about this. Instead, cast to a suitable type requiring
only 4-byte alignment. DatumGetAnyArrayP() already assumes one can cast
between AnyArrayType and ArrayType, so this doesn't introduce a new
assumption. Back-patch to 9.5, where AnyArrayType was introduced.
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190629210334.GA1244217@rfd.leadboat.com
The logic in reorder_grouping_sets to order grouping set elements to
match a pre-specified sort ordering was defective, resulting in
unnecessary sort nodes (though the query output would still be
correct). Repair, simplifying the code a little, and add a test.
Per report from Richard Guo, though I didn't use their patch. Original
bug seems to have been my fault.
Backpatch back to 9.5 where grouping sets were introduced.
Discussion: https://postgr.es/m/CAN_9JTzyjGcUjiBHxLsgqfk7PkdLGXiM=pwM+=ph2LsWw0WO1A@mail.gmail.com