When pulling up a subquery that is under an outer join, if the
subquery's target list contains a strict expression that uses a
subquery variable, it's okay to pull up the expression without
wrapping it in a PlaceHolderVar: if the subquery variable is forced to
NULL by the outer join, the expression result will come out as NULL
too.
If the strict expression does not contain any subquery variables, the
current code always wraps it in a PlaceHolderVar. While this is not
incorrect, the analysis could be tighter: if the strict expression
contains any variables of rels that are under the same lowest nulling
outer join as the subquery, we can also avoid wrapping it. This is
safe because if the subquery variable is forced to NULL by the outer
join, the variables of rels that are under the same lowest nulling
outer join will also be forced to NULL, resulting in the expression
evaluating to NULL as well. Therefore, it's not necessary to force
the expression to be evaluated below the outer join. It could be
beneficial to get rid of such PHVs because they could imply lateral
dependencies, which force us to resort to nestloop joins.
This patch checks if the lateral references in the strict expression
contain any variables of rels under the same lowest nulling outer join
as the subquery, and avoids wrapping the expression in that case.
This is fundamentally a generalization of the optimizations for bare
Vars and PHVs introduced in commit f64ec81a8.
No backpatch as this could result in plan changes.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_ENtfRdLaM_bXAxiKRYO7DmwDBDG4_2=VTDi0mJP-jAw@mail.gmail.com
Commit 84db9a0eb1 has added the incorrect link to
'initial data synchronization'. It was a subsection of Row Filter and
didn't provide the required information.
Author: Peter Smith
Reviewed-by: Vignesh C, Pavel Luzanov
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHut+PtnA4DB_pcv4TDr4NjUSM1=P2N_cuZx5DX09k7LVmaqUA@mail.gmail.com
The following commands gain some information about the error position in
the query, should they fail when looking at the type used:
- CREATE TYPE (LIKE)
- CREATE TABLE OF
Both are related to typenameType() where the type name lookup is done.
These calls gain the ParseState that already exists in these paths.
Author: Kirill Reshke, Jian He
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
The test was creating both the dumps to compare from the same database
on the same node, so it would never detect any mismatches when comparing
the logical dumps of the two servers.
Fixing this issue has revealed that there is a difference in the dumps:
the tablespaces paths are different. This commit uses compare_text()
with a custom comparison function to erase the difference (slightly
tweaked to be able to work with WIN32 and non-WIN32 paths). This way,
the non-relevant parts of the tablespace path are ignored from the check
with the basic structure of the query string still compared.
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87h67653ns.fsf@wibble.ilmari.org
Backpatch-through: 17
Two places in the documentation suggest B-tree is the only index access
method allowing parallel builds. Commit b4375717 added parallel builds
for BRIN too, but failed to update the docs. So fix that, and backpatch
to 17, where parallel BRIN builds were introduced.
Author: Egor Rogov
Backpatch-through: 17
Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru
Create API entry points pg_strlower(), etc., that work with any
provider and give the caller control over the destination
buffer. Then, move provider-specific logic into pg_locale_builtin.c,
pg_locale_icu.c, and pg_locale_libc.c as appropriate.
Discussion: https://postgr.es/m/7aa46d77b377428058403723440862d12a8a129a.camel@j-davis.com
Updates table completion for ALTER TYPE to offer CASCADE/RESTRICT for a
number of actions on attributes:
ALTER TYPE ... ADD/DROP/RENAME ATTRIBUTE ... [CASCADE|RESTRICT]
ALTER TYPE ... TYPE ... [CASCADE|RESTRICT]
Author: Kirill Reshke
Reviewed-By: Karina Litskevich
Discussion: https://postgr.es/m/CALdSSPhVELkvutquqrDB=Ujfq_Pjz=6jn-kzh+291KPNViLTfw@mail.gmail.com
The test failed if you ran the regression tests with TEMP_CONFIG with
recovery_min_apply_delay = '500ms'. Fix the race condition by waiting
for transaction to be applied in the replica, like in a few other
tests.
The failing test was introduced in commit cbfbda7841. Backpatch to all
supported versions like that commit (except v12, which is no longer
supported).
Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/09e2a70a-a6c2-4b5c-aeae-040a7449c9f2@gmail.com
This is simply done by pushing down the ParseState available in
ProcessUtility() to DefineDomain(), giving more information about the
position of an error when running a CREATE DOMAIN query.
Most of the queries impacted by this change have been added previously
in 0172b4c9449e.
Author: Kirill Reshke, Jian He
Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
This adds a couple of tests to trigger encoding conversion when input
and server encodings do not match in COPY FROM/TO, or need_transcoding
set to true in the COPY state data. These tests rely on UTF8 <-> LATIN1
for the valid cases as LATIN1 accepts any bytes, and UTF8 <-> EUC_JP for
some of the invalid cases where a character cannot be understood,
causing a conversion failure.
Both ENCODING and client_encoding are covered. Test suggested by Andres
Freund.
Author: Sutou Kouhei
Discussion: https://postgr.es/m/20240206222445.hzq22pb2nye7rm67@awork3.anarazel.de
I went through the buildfarm's reports of "warning: variable 'foo'
might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]". As usual,
none of them are live problems according to my understanding of the
effects of setjmp/longjmp, to wit that local variables might revert
to their values as of PG_TRY entry, due to being kept in registers.
But I did happen to notice that XmlTableGetValue's "cstr" variable
doesn't need to be declared outside the PG_TRY block at all (thus
giving further proof that the -Wclobbered warning has little
connection to real problems). We might as well move it inside,
and "cur" too, in hopes of eliminating one of the bogus warnings.
An \if command appearing within a false (not-to-be-executed) \if
branch was incorrectly treated the same as \elif. This could allow
statements within the inner \if to be executed when they should
not be. Also the missing inner \if stack entry would result in an
assertion failure (in assert-enabled builds) when the final \endif
is reached.
Report and patch by Michail Nikolaev. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/CANtu0oiA1ke=SP6tauhNqkUdv5QFsJtS1p=aOOf_iU+EhyKkjQ@mail.gmail.com
The documentation in wal.sgml explains that old WAL files cannot be
removed or recycled until they are archived (when WAL archiving is used)
or replicated (when using replication slots). However, it did not mention
that, similarly, old WAL files are also kept until they are summarized
if WAL summarization is enabled. This commit adds that clarification
to the documentation.
Back-patch to v17 where WAL summarization was added.
Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/fd0eb0a5-f43b-4e06-b450-cbca011b6cff@oss.nttdata.com
The @extschema:name@ feature added by 72a5b1fc8 allows us to
make earthdistance's references to the cube extension fully
search-path-secure, so long as all those references are
resolved at extension installation time not runtime.
To do that, we must convert earthdistance's SQL functions to
the new SQL-standard style; but we wanted to do that anyway.
The functions can be updated in our customary style by running
CREATE OR REPLACE FUNCTION in an extension update script.
However, there's still problems in the "CREATE DOMAIN earth"
command: its references to cube functions could be captured
by hostile objects in earthdistance's installation schema,
if that's not where the cube extension is. Worse, the reference
to the cube type itself as the domain's base could be captured,
and that's not something we could fix after-the-fact in the
update script.
What I've done about that is to change the "CREATE DOMAIN earth"
command in the base script earthdistance--1.1.sql. Ordinarily,
changing a released extension script is forbidden; but I think
it's okay here since the results of successful (non-trojaned)
script execution will be identical to before.
A good deal of care is still needed to make the extension's scripts
proof against search-path-based attacks. We have to make sure that
all the function and operator invocations have exact argument-type
matches, to forestall attacks based on supplying a better match.
Fortunately earthdistance isn't very big, so I've just gone through
it and inspected each call to be sure of that. The only actual code
changes needed were to spell all floating-point constants in the style
'-1'::float8, rather than depending on runtime type conversions and/or
negations. (I'm not sure that the shortcuts previously used were
attackable, but removing run-time effort is a good thing anyway.)
I believe that this fixes earthdistance enough that we could
mark it trusted and remove the warnings about it that were
added by 7eeb1d986; but I've not done that here.
The primary reason for dealing with this now is that we've
received reports of pg_upgrade failing for databases that use
earthdistance functions in contexts like generated columns.
That's a consequence of 2af07e2f7 having restricted the search_path
used while evaluating such expressions. The only way to fix that
is to make the earthdistance functions independent of run-time
search_path. This patch is very much nicer than the alternative of
attaching "SET search_path" clauses to earthdistance's functions:
it is more secure and doesn't create a run-time penalty. Therefore,
I've chosen to back-patch this to v16 where @extschema:name@
was added. It won't help unless users update to 16.7 and issue
"ALTER EXTENSION earthdistance UPDATE" before upgrading to 17,
but at least there's now a way to deal with the problem without
manual intervention in the dump/restore process.
Tom Lane and Ronan Dunklau
Discussion: https://postgr.es/m/3316564.aeNJFYEL58@aivenlaptop
Discussion: https://postgr.es/m/6a6439f1-8039-44e2-8fb9-59028f7f2014@mailbox.org
POSIX says that the global variable environ shouldn't be declared in a
header, and that you have to declare it yourself. MinGW declares it in
<stdlib.h> with some macrology that messes up our declarations. Visual
Studio doesn't warn (there are clues that it may also declare it, but if
so, apparently compatibly). Suppress our declarations, on MinGW only.
This clears the last warnings on CI's optional MinGW task, and hopefully
on build farm animal fairywren too.
Like 1319997d, no back-patch for now as it's not known to be breaking
anything, and my humble goal is just to keep the MinGW build clean going
forward.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJLMh%2B6W5E4M_jSFb43gnrA_-Q6-%2BBf3HkBXyGfRFcBsQ%40mail.gmail.com
Commits 7bb3102c and 3eb77eba removed the only user of the
EXTENSION_DONT_CHECK_SIZE flag, which had previously been required to
checkpoint truncated relations. Since 7bb3102c, segments have been
opened directly for synchronization without calling _mdfd_getseg(), so
it doesn't need a mode that tolerates non-final short segments. Remove
the redundant flag and associated comments.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/nyj4k7yur5t27rtygvx2i2lrlp6rqfvvhoiiwx4fznynksf2et%404hj2sp42alpe
If an owned sequence is considered interesting, force its owning
table to be marked interesting too. This ensures, in particular,
that we'll fetch the owning table's column names so we have the
data needed for ALTER TABLE ... ADD GENERATED. Previously there were
edge cases where pg_dump could get SIGSEGV due to not having filled in
the column names. (The known case is where the owning table has been
made part of an extension while its identity sequence is not a member;
but there may be others.)
Also, if it's an identity sequence, force its dumped-components mask
to exactly match the owning table: dump definition only if we're
dumping the table's definition, dump data only if we're dumping the
table's data, etc. This generalizes the code introduced in commit
b965f2617 that set the sequence's dump mask to NONE if the owning
table's mask is NONE. That's insufficient to prevent failures,
because for example the table's mask might only request dumping ACLs,
which would lead us to still emit ALTER TABLE ADD GENERATED even
though we didn't create the table. It seems better to treat an
identity sequence as though it were an inseparable aspect of the
table, matching the treatment used in the backend's dependency logic.
Perhaps this policy needs additional refinement, but let's wait to
see some field use-cases before changing it further.
While here, add a comment in pg_dump.h warning against writing tests
like "if (dobj->dump == DUMP_COMPONENT_NONE)", which was a bug in this
case. There is one other example in getPublicationNamespaces, which
if it's not a bug is at least remarkably unclear and under-documented.
Changing that requires a separate discussion, however.
Per report from Artur Zakirov. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAKNkYnwXFBf136=u9UqUxFUVagevLQJ=zGd5BsLhCsatDvQsKQ@mail.gmail.com
With not-null constraints defined in child tables for columns that are
coming from their parent tables, we were printing ALTER TABLE SET NOT
NULL commands that were missing the constraint name, so the original
constraint name was being lost, which is bogus. Fix by instead adding
a table-constraint constraint declaration with the correct constraint
name in the CREATE TABLE instead.
Oversight in commit 14e87ffa5c54.
We could have fixed it by changing the ALTER TABLE SET NOT NULL to ALTER
TABLE ADD CONSTRAINT, but I'm not sure that's any better. A potential
problem here might be that if sent to a non-Postgres server, the new
pg_dump output would fail because the "CONSTRAINT foo NOT NULL colname"
syntax isn't SQL-conforming. However, Postgres' implementation of
inheritance is already non-SQL-conforming, so that'd likely fail anyway.
This problem was only noticed by Ashutosh's proposed test framework for
pg_dump, https://postgr.es/m/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAExHW5tbdgAKDfqjDJ-7Fk6PJtHg8D4zUF6FQ4H2Pq8zK38Nyw@mail.gmail.com
This reverts commit 562bee0fc13dc95710b8db6a48edad2f3d052f2e.
We received a report from the field about this change in behavior,
so it seems best to revert this commit and to add proper
multibyte-aware truncation as a follow-up exercise.
Fixes bug #18711.
Reported-by: Adam Rauch
Reviewed-by: Tom Lane, Bertrand Drouvot, Bruce Momjian, Thomas Munro
Discussion: https://postgr.es/m/18711-7503ee3e449d2c47%40postgresql.org
Backpatch-through: 17
One comment of PgStat_TableCounts mentioned that its pending stats use
memcmp() to check for the all-zero case if there is any activity. This
is not true since 07e9e28b56, as pg_memory_is_all_zeros() is used.
PgStat_FunctionCounts incorrectly documented that it relied on memcpy().
This has never been correct, and not relevant because function
statistics do not have an all-zero check for pending stats.
Checkpoint and bgwriter statistics have been always relying on memcmp()
or pg_memory_is_all_zeros() (since 07e9e28b56 for the latter), and never
mentioned the dependency on event counters for their all-zero checks.
Let's document these properties, like the table statistics.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Z1hNLvcPgVLPxCoc@ip-10-97-1-34.eu-west-3.compute.internal
d4c3a156c added support that when the GROUP BY contained all of the
columns belonging to a relation's PRIMARY KEY, all other columns
belonging to that relation would be removed from the GROUP BY clause.
That's possible because all other columns are functionally dependent on
the PRIMARY KEY and those columns alone ensure the groups are distinct.
Here we expand on that optimization and allow it to work for any unique
indexes on the table rather than just the PRIMARY KEY index. This
normally requires that all columns in the index are defined with NOT NULL,
however, we can relax that requirement when the index is defined with
NULLS NOT DISTINCT.
When there are multiple suitable indexes to allow columns to be removed,
we prefer the index with the least number of columns as this allows us
to remove the highest number of GROUP BY columns. One day, we may want to
revisit that decision as it may make more sense to use the narrower set of
columns in terms of the width of the data types and stored/queried data.
This also adjusts the code to make use of RelOptInfo.indexlist rather
than looking up the catalog tables.
In passing, add another short-circuit path to allow bailing out earlier
in cases where it's certainly not possible to remove redundant GROUP BY
columns. This early exit is now cheaper to do than when this code was
originally written as 00b41463c made it cheaper to check for empty
Bitmapsets.
Patch originally by Zhang Mingli and later worked on by jian he, but after
I (David) worked on it, there was very little of the original left.
Author: Zhang Mingli, jian he, David Rowley
Reviewed-by: jian he, Andrei Lepikhov
Discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark
In commit 5668a857d, we fixed an issue with incorrect results in right
semi joins and introduced a test case to verify the fix. The test
case involves SubPlans and InitPlans, which may not be immediately
apparent in relation to the issue we addressed.
This patch simplifies the test case with a more straightforward query.
Per discussion with Melanie Plageman.
Author: Richard Guo
Discussion: https://postgr.es/m/CAAKRu_a-Cip2XCXp13fmxq+T9BhLAVApHTyjr94awL2mbXHC-Q@mail.gmail.com
The following commands gain increased coverage for some of the errors
they can trigger:
- ALTER TABLE .. ALTER COLUMN
- CREATE DOMAIN
- CREATE TYPE (LIKE)
This has come up while discussing the possibility to add more
information about the location of the error in such queries, and it
is useful on its own as there was no coverage until now for the
patterns added in this commit.
Author: Jian He, Kirill Reshke
Reviewed-By: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
Traditionally, remove_useless_groupby_columns() was called during
grouping_planner() directly after the call to preprocess_groupclause().
While in many ways, it made sense to populate the field and remove the
functionally dependent columns from processed_groupClause at the same
time, it's just that doing so had the disadvantage that
remove_useless_groupby_columns() was being called before the RelOptInfos
were populated for the relations mentioned in the query. Not having
RelOptInfos available meant we needed to manually query the catalog tables
to get the required details about the primary key constraint for the
table.
Here we move the remove_useless_groupby_columns() call to
query_planner() and put it directly after the RelOptInfos are populated.
This is fine to do as processed_groupClause still isn't final at this
point as it can still be modified inside standard_qp_callback() by
make_pathkeys_for_sortclauses_extended().
This commit is just a refactor and simply moves
remove_useless_groupby_columns() into initsplan.c. A planned follow-up
commit will adjust that function so it uses RelOptInfo instead of doing
catalog lookups and also teach it how to use unique indexes as proofs to
expand the cases where we can remove functionally dependent columns from
the GROUP BY.
Reviewed-by: Andrei Lepikhov, jian he
Discussion: https://postgr.es/m/CAApHDvqLezKwoEBBQd0dp4Y9MDkFBDbny0f3SzEeqOFoU7Z5+A@mail.gmail.com
This commit introduces the uuidv7() SQL function, which generates UUID
version 7 as specified in RFC 9652. UUIDv7 combines a Unix timestamp
in milliseconds and random bits, offering both uniqueness and
sortability.
In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, in the space referred to as
"rand_a" in the RFC. This ensures additional monotonicity within a
millisecond. The rand_a bits also function as a counter. We select a
sub-millisecond timestamp so that it monotonically increases for
generated UUIDs within the same backend, even when the system clock
goes backward or when generating UUIDs at very high
frequency. Therefore, the monotonicity of generated UUIDs is ensured
within the same backend.
This commit also expands the uuid_extract_timestamp() function to
support UUID version 7.
Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.
Bump catalog version.
Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Przemysław Sztoch, Nikolay Samokhvalov
Reviewed-by: Peter Eisentraut, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Masahiko Sawada, Lukas Fittl, Michael Paquier, Japin Li
Reviewed-by: Marcos Pegoraro, Junwang Zhao, Stepan Neretin
Reviewed-by: Daniel Vérité
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
c2a4078eb adjusted EXPLAIN ANALYZE to default the BUFFERS to ON. This
(hopefully) fixes the last remaining issue with regression test failures
with -D RELCACHE_FORCE_RELEASE -D CATCACHE_FORCE_RELEASE builds, where
the planner accesses more buffers due to the cold caches.
Discussion: https://postgr.es/m/CAApHDvqLdzgz77JsE-yTki3w9UiKQ-uTMLRctazcu+99-ips3g@mail.gmail.com
There are a few places in this file that use memset() and memcmp()
to determine whether a section of memory is all zeros. This commit
modifies them to use pg_memory_is_all_zeros() instead. These
aren't expected to be hot code paths, but this may optimize them a
bit. Plus, this allows us to remove some variables that were only
needed for the memset() and memcmp().
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/Z1hNubHfvMxlW6eu%40ip-10-97-1-34.eu-west-3.compute.internal
The functions without arguments don't need to be marked
leakproof. This commit unmarks gen_random_uuid() leakproof for
consistency with upcoming UUID generation functions. Also, this commit
adds a regression test to prevent reintroducing such cases.
Bump catalog version.
Reported-by: Peter Eisentraut
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAD21AoBE1ePPWY1NQEgk3DkqjYzLPZwYTzCySHm0e%2B9a69PfZw%40mail.gmail.com
The gneration of the dump clause for functions with TRANSFORM
calls would leak the memory for holding the result of the Oid
array parsing. Fix by freeing.
While in there, switch to using pg_malloc instead of palloc in
order to be consistent with the rest of the file.
Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/baf1ae4511288e5b421f41e79a3df1a0@postgrespro.ru
c2a4078eb adjusted EXPLAIN ANALYZE to include BUFFERS by default, but
a few tests in select_into.sql neglected to add BUFFERS OFF. The
failing tests seem unlikely to ever access buffers during execution, but
they certainly could during planning.
Per buildfarm member kestrel, tayra and calliphoridae.
Discussion: https://postgr.es/m/CANNMO++W7MM8T0KyXN3ZheXXt-uLVM3aEtZd+WNfZ=obxffUiA@mail.gmail.com
The topic of turning EXPLAIN's BUFFERS option on with the ANALYZE option
has come up a few times over the past few years. In many ways, doing this
seems like a good idea as it may be more obvious to users why a given
query is running more slowly than they might expect. Also, from my own
(David's) personal experience, I've seen users posting to the mailing
lists with two identical plans, one slow and one fast asking why their
query is sometimes slow. In many cases, this is due to additional reads.
Having BUFFERS on by default may help reduce some of these questions, and
if not, make it more obvious to the user before they post, or save a
round-trip to the mailing list when additional I/O effort is the cause of
the slowness.
The general consensus is that we want BUFFERS on by default with
ANALYZE. However, there were more than zero concerns raised with doing
so. The primary reason against is the additional verbosity, making it
harder to read large plans. Another concern was that buffer information
isn't always useful so may not make sense to have it on by default.
It's currently December, so let's commit this to see if anyone comes
forward with a strong objection against making this change. We have over
half a year remaining in the v18 cycle where we could still easily consider
reverting this if someone were to come forward with a convincing enough
reason as to why doing this is a bad idea.
There were two patches independently submitted to achieve this goal, one
by me and the other by Guillaume. This commit is a mix of both of these
patches with some additional work done by me to adjust various
additional places in the documentation which include EXPLAIN ANALYZE
output.
Author: Guillaume Lelarge, David Rowley
Reviewed-by: Robert Haas, Greg Sabino Mullane, Michael Christofides
Discussion: https://postgr.es/m/CANNMO++W7MM8T0KyXN3ZheXXt-uLVM3aEtZd+WNfZ=obxffUiA@mail.gmail.com
This speeds up obtaining hash values for GROUP BY and hashed SubPlans by
using the ExprState support for hashing, thus allowing JIT compilation for
obtaining hash values for these operations.
This, even without JIT compilation, has been shown to improve Hash
Aggregate performance in some cases by around 15% and hashed NOT IN
queries in one case by over 30%, however, real-world cases are likely to
see smaller gains as the test cases used were purposefully designed to
have high hashing overheads by keeping the hash table small to prevent
additional memory overheads that would be a factor when working with large
hash tables.
In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the
initial value is stored directly in the ExprState's result field if
there are no expressions to hash. None of the current users of this
function use an initial value, so the bug is only hypothetical.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com
On failure, the pg_upgrade log files are automatically appended to the
test log file, but the information reported was inconsistent.
A header, with the log file name, was reported with note(), while the
log contents and a footer used print(), making it harder to diagnose
failures when these are split into console output and test log file
because the pg_upgrade log file path in the header may not be included
in the test log file.
The output is now consolidated so as the header uses print() rather than
note(). An extra note() is added to inform that the contents of a
pg_upgrade log file are appended to the test log file.
The diffs from the regression test suite and dump files all use print()
to show their contents on failure.
Author: Joel Jacobson
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/49f7e64a-b9be-4a90-a9fe-210a7740405e@app.fastmail.com
Backpatch-through: 15
Hashing of a single Var is a very common operation for ExprState to
perform. Here we add dedicated ExecJust* functions which helps speed up
Hash Joins by removing the interpretation overhead in ExecInterpExpr().
This change currently only affects Hash Joins on a single column. Hash
Joins with multiple join keys or an expression still run through
ExecInterpExpr().
Some testing has shown up to 10% query performance increases on recent AMD
hardware and nearly 7% increase on an Apple M2 for a query performing a
hash join with a large number of lookups on a small hash table.
This change was extracted from a larger patch which adjusts GROUP BY /
hashed subplans / hashed set operations to use ExprState hashing.
Discussion: https://postgr.es/m/CAApHDvr8Zc0ZgzVoCZLdHGOFNhiJeQ6vrUcS9V7N23zMWQb-eA@mail.gmail.com
Robert Haas expressed concern about whether commit 3eea7a0c9 exposed
the parallel-execution machinery to a case it isn't tested for, namely
a second non-parallel execution of a plan after a parallel execution.
Investigation shows that that can't happen because of pquery.c's
manipulation of the scan direction, but it sure wasn't obvious to
start with. Add some commentary about that.
Discussion: https://postgr.es/m/CA+TgmoagyKQy=HFw+wLo0AKTYWHui+iKswZ8Jnqqd-cFby-WVg@mail.gmail.com
Since commit 97550c0711972a9856b5db751539bbaf2f88884c, these failed with
"PANIC: proc_exit() called in child process" due to uninitialized or
stale MyProcPid. That was reachable if close() failed in
ClosePostmasterPorts() or setlocale(category, "C") failed, both
unlikely. Back-patch to v13 (all supported versions).
Discussion: https://postgr.es/m/20241208034614.45.nmisch@google.com
A WITHOUT OVERLAPS primary key or unique constraint is accepted as a
REPLICA IDENTITY, since it guarantees uniqueness. But subscribers
applying logical decoding messages would fail because there was not
support for looking up the equals operator for a gist index. This
fixes that: For GiST indexes we can use the stratnum GiST support
function.
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com