We have an assertion to ensure that a command tag has been assigned by
the time we're done executing, but if we happen to execute a command
with no queries, the assertion would fail. Per discussion, rather than
contort things to get a tag assigned, just remove the assertion.
Oversight in 2f9661311b. That commit also retained a comment that
explained logic that had been adjacent to it but diffused into various
places, leaving none apt to keep part of the comment. Remove that part,
and rewrite what remains for extra clarity.
Bug: #18984
Backpatch-through: 13
Reported-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org
Letting the rewriter keep RangeTblEntry.relid when expanding a view
RTE, without making the outfuncs/readfuncs changes that went along
with that originally, is more problematic than I realized. It causes
WRITE_READ_PARSE_PLAN_TREES testing to fail because outfuncs/readfuncs
don't think relid need be saved in an RTE_SUBQUERY RTE.
There doesn't seem to be any other good route to fixing the whole-row
Var problem solved at f4e7756ef, so we just have to deal with the
consequences. We can make the eventually-produced plan tree safe
for WRITE_READ_PARSE_PLAN_TREES by clearing the relid field at the
end of planning, as was already being done for the functions field.
(The functions field is not problematic here because our abuse of it
is strictly local to the planner.) However, there is no nice fix for
the post-rewrite WRITE_READ_PARSE_PLAN_TREES test.
The solution adopted here is to remove the post-rewrite test in the
affected branches. That's surely less than ideal, but a couple of
arguments can be made why it's not unacceptable. First, the behavior
of outfuncs/readfuncs for parsetrees in these branches is frozen no
matter what, because of catalog stability requirements. So we're not
testing anything that is going to change. Second, testing
WRITE_READ_PARSE_PLAN_TREES at this particular time doesn't correspond
to any direct system functionality requirement, neither rule storage
nor plan transmission.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13-15
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
When using a pipeline, a transaction starts from the first command and
is committed with a Sync message or when the pipeline ends.
Functions like IsInTransactionBlock() or PreventInTransactionBlock()
were already able to understand a pipeline as being in a transaction
block, but it was not the case of CheckTransactionBlock(). This
function is called for example to generate a WARNING for SET LOCAL,
complaining that it is used outside of a transaction block.
The current state of the code caused multiple problems, like:
- SET LOCAL executed at any stage of a pipeline issued a WARNING, even
if the command was at least second in line where the pipeline is in a
transaction state.
- LOCK TABLE failed when invoked at any step of a pipeline, even if it
should be able to work within a transaction block.
The pipeline protocol assumes that the first command of a pipeline is
not part of a transaction block, and that any follow-up commands is
considered as within a transaction block.
This commit changes the backend so as an implicit transaction block is
started each time the first Execute message of a pipeline has finished
processing, with this implicit transaction block ended once a sync is
processed. The checks based on XACT_FLAGS_PIPELINING in the routines
checking if we are in a transaction block are not necessary: it is
enough to rely on the existing ones.
Some tests are added to pgbench, that can be backpatched down to v17
when \syncpipeline is involved and down to v14 where \startpipeline and
\endpipeline are available. This is unfortunately limited regarding the
error patterns that can be checked, but it provides coverage for various
pipeline combinations to check if these succeed or fail. These tests
are able to capture the case of SET LOCAL's WARNING. The author has
proposed a different feature to improve the coverage by adding similar
meta-commands to psql where error messages could be checked, something
more useful for the cases where commands cannot be used in transaction
blocks, like REINDEX CONCURRENTLY or VACUUM. This is considered as
future work for v18~.
Author: Anthonin Bonnefoy
Reviewed-by: Jelte Fennema-Nio, Michael Paquier
Discussion: https://postgr.es/m/CAO6_XqrWO8uNBQrSu5r6jh+vTGi5Oiyk4y8yXDORdE2jbzw8xw@mail.gmail.com
Backpatch-through: 13
This commit adds query ID reports for two code paths when processing
extended query protocol messages:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.
An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough (neither is an
addition in ExecutorRun(), actually, for the second point):
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message. This corresponds to the case where
execute_is_fetch is set, for example.
Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it. Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter as the first
report takes priority except if the report is forced. The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.
The test added in pg_stat_statements is a courtesy of Robert Haas. This
uses psql's \bind metacommand, hence this part is backpatched down to
v16.
Reported-by: Kaido Vaikla, Erik Wienhold
Author: Sami Imseih
Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org
Backpatch-through: 14
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.
This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.
To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.
Back-patch to all supported branches.
Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
Since PostgresMain calls sigsetjmp, any local variables that are not
marked "volatile" have a risk of unspecified behavior. In practice
this means that when control returns via longjmp, such variables might
get reset to their values as of the time of sigsetjmp, depending on
whether the compiler chose to put them in registers or on the stack.
We were careful about this for "send_ready_for_query", but not the
other local variables.
In the case of the timeout_enabled flags, resetting them to
their initial "false" states is actually good, since we do
"disable_all_timeouts()" in the longjmp cleanup code path. If that
does not happen, we risk uselessly calling "disable_timeout()" later,
which is harmless but a little bit expensive. Let's explicitly reset
these flags so that the behavior is correct and platform-independent.
(This change means that we really don't need the new "volatile"
markings after all, but let's install them anyway since any change
in this logic could re-introduce a problem.)
There is no issue for "firstchar" and "input_message" because those
are explicitly reinitialized each time through the query processing
loop. To make that clearer, move them to be declared inside the loop.
That leaves us with all the function-lifespan locals except the
sigjmp_buf itself marked as volatile, which seems like a good policy
to have going forward.
Because of the possibility of extra disable_timeout() calls, this
seems worth back-patching.
Sergey Shinderuk and Tom Lane
Discussion: https://postgr.es/m/2eda015b-7dff-47fd-d5e2-f1a9899b90a6@postgrespro.ru
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock(). This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.
Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.
To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it. Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".
Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems. This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock. (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)
For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock. This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.
Per report from Peter Eisentraut. As before, back-patch to all
supported branches (which sadly no longer includes v10).
Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol. In extended protocol, we didn't commit
until receiving a Sync message. Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK. In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.
This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.
To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit. This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.
While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining. That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.
Per bug #17434 from Yugo Nagata. It's been wrong for ages,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
Previously we did this after InitPostgres, at a somewhat randomly chosen
place within PostgresMain. However, since commit a0ffa885e doing this
outside a transaction can cause a crash, if we need to check permissions
while replacing a placeholder GUC. (Besides which, a preloaded library
could itself want to do database access within _PG_init.)
To avoid needing an additional transaction start/end in every session,
move the process_session_preload_libraries call to within InitPostgres's
transaction. That requires teaching the code not to call it when
InitPostgres is called from somewhere other than PostgresMain, since
we don't want session_preload_libraries to affect background workers.
The most future-proof solution here seems to be to add an additional
flag parameter to InitPostgres; fortunately, we're not yet very worried
about API stability for v15.
Doing this also exposed the fact that we're currently honoring
session_preload_libraries in walsenders, even those not connected to
any database. This seems, at minimum, a POLA violation: walsenders
are not interactive sessions. Let's stop doing that.
(All these comments also apply to local_preload_libraries, of course.)
Per report from Gurjeet Singh (thanks also to Nathan Bossart and Kyotaro
Horiguchi for review). Backpatch to v15 where a0ffa885e came in.
Discussion: https://postgr.es/m/CABwTF4VEpwTHhRQ+q5MiC5ucngN-whN-PdcKeufX7eLSoAfbZA@mail.gmail.com
Without processing shared_preload_libraries, it's impossible to
recover if custom WAL resource managers are needed. It may also pose a
problem running VACUUM on a table with a custom AM, if the module
implementing the AM is expecting to be loaded by
shared_preload_libraries.
The reason this wasn't done before was just the general principle to
do fewer things in single-user mode. But it's easy enough to just set
shared_preload_libraries to empty, for the same effect.
Discussion: https://postgr.es/m/9decc18a42634f8a2f15c97a385a0f51a752f396.camel%40j-davis.com
Reviewed-by: Tom Lane, Andres Freund
Backpatch-through: 15
Previously the timer was enabled whenever there were any pending stats after
executing a statement, just to then be disabled again when not idle
anymore. That lead to an increase in GetCurrentTimestamp() calls from within
timeout.c compared to 14.
To avoid that increase, leave the timer enabled until stats are reported,
rather than until idle. The timer is only disabled once the pending stats have
been reported.
For me this fixes the increase in GetCurrentTimestamp() calls, there now are
fewer calls in 15 than in 14, in the previously slowed down workload.
While at it, also update assertion in pgstat_report_stat() to be more precise.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch: 15-
Previously the statistics collector received statistics updates via UDP and
shared statistics data by writing them out to temporary files regularly. These
files can reach tens of megabytes and are written out up to twice a
second. This has repeatedly prevented us from adding additional useful
statistics.
Now statistics are stored in shared memory. Statistics for variable-numbered
objects are stored in a dshash hashtable (backed by dynamic shared
memory). Fixed-numbered stats are stored in plain shared memory.
The header for pgstat.c contains an overview of the architecture.
The stats collector is not needed anymore, remove it.
By utilizing the transactional statistics drop infrastructure introduced in a
prior commit statistics entries cannot "leak" anymore. Previously leaked
statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On
systems with many small relations pgstat_vacuum_stat() could be quite
expensive.
Now that replicas drop statistics entries for dropped objects, it is not
necessary anymore to reset stats when starting from a cleanly shut down
replica.
Subsequent commits will perform some further code cleanup, adapt docs and add
tests.
Bumps PGSTAT_FILE_FORMAT_ID.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version)
Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version)
Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version)
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
MERGE performs actions that modify rows in the target table using a
source table or query. MERGE provides a single SQL statement that can
conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise
require multiple PL statements. For example,
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular tables, partitioned tables and inheritance
hierarchies, including column and row security enforcement, as well as
support for row and statement triggers and transition tables therein.
MERGE is optimized for OLTP and is parameterizable, though also useful
for large scale ETL/ELT. MERGE is not intended to be used in preference
to existing single SQL commands for INSERT, UPDATE or DELETE since there
is some overhead. MERGE can be used from PL/pgSQL.
MERGE does not support targetting updatable views or foreign tables, and
RETURNING clauses are not allowed either. These limitations are likely
fixable with sufficient effort. Rewrite rules are also not supported,
but it's not clear that we'd want to support them.
Author: Pavan Deolasee <pavan.deolasee@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
This new function extracts common code from PrepareQuery() and
exec_parse_message(). It is then exactly analogous to the existing
pg_analyze_and_rewrite_fixedparams() and
pg_analyze_and_rewrite_withcb().
To unify these two code paths, this makes PrepareQuery() now subject
to log_parser_stats. Also, both paths now invoke
TRACE_POSTGRESQL_QUERY_REWRITE_START(). PrepareQuery() no longer
checks whether a utility statement was specified. The grammar doesn't
allow that anyway, and exec_parse_message() supports it, so
restricting it doesn't seem necessary.
This also adds QueryEnvironment support to the *varparams functions,
for consistency with its cousins, even though it is not used right
now.
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
There are three parallel ways to call parse/analyze: with fixed
parameters, with variable parameters, and by supplying your own parser
callback. Some of the involved functions were confusingly named and
made this API structure more confusing. This patch renames some
functions to make this clearer:
parse_analyze() -> parse_analyze_fixedparams()
pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams()
(Otherwise one might think this variant doesn't accept parameters, but
in fact all three ways accept parameters.)
pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb()
(Before, and also when considering pg_analyze_and_rewrite(), one might
think this is the only way to pass parameters. Moreover, the parser
callback doesn't necessarily need to parse only parameters, it's just
one of the things it could do.)
parse_fixed_parameters() -> setup_parse_fixed_parameters()
parse_variable_parameters() -> setup_parse_variable_parameters()
(These functions don't actually do any parsing, they just set up
callbacks to use during parsing later.)
This patch also adds some const decorations to the fixed-parameters
API, so the distinction from the variable-parameters API is more
clear.
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
SPI_commit previously left it up to the caller to recover from any error
occurring during commit. Since that's complicated and requires use of
low-level xact.c facilities, it's not too surprising that no caller got
it right. Let's move the responsibility for cleanup into spi.c. Doing
that requires redefining SPI_commit as starting a new transaction, so
that it becomes equivalent to SPI_commit_and_chain except that you get
default transaction characteristics instead of preserving the prior
transaction's characteristics. We can make this pretty transparent
API-wise by redefining SPI_start_transaction() as a no-op. Callers
that expect to do something in between might be surprised, but
available evidence is that no callers do so.
Having made that API redefinition, we can fix this mess by having
SPI_commit[_and_chain] trap errors and start a new, clean transaction
before re-throwing the error. Likewise for SPI_rollback[_and_chain].
Some cleanup is also needed in AtEOXact_SPI, which was nowhere near
smart enough to deal with SPI contexts nested inside a committing
context.
While plperl and pltcl need no changes beyond removing their now-useless
SPI_start_transaction() calls, plpython needs some more work because it
hadn't gotten the memo about catching commit/rollback errors in the
first place. Such an error resulted in longjmp'ing out of the Python
interpreter, which leaks Python stack entries at present and is reported
to crash Python 3.11 altogether. Add the missing logic to catch such
errors and convert them into Python exceptions.
We are probably going to have to back-patch this once Python 3.11 ships,
but it's a sufficiently basic change that I'm a bit nervous about doing
so immediately. Let's let it bake awhile in HEAD first.
Peter Eisentraut and Tom Lane
Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com
Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
GCC 12 complains that set_stack_base is storing the address of
a local variable in a long-lived pointer. This is an entirely
reasonable warning (indeed, it just helped us find a bug);
but that behavior is intentional here. We can work around it
by using __builtin_frame_address(0) instead of a specific local
variable; that produces an address a dozen or so bytes different,
in my testing, but we don't care about such a small difference.
Maybe someday a compiler lacking that function will start to issue
a similar warning, but we'll worry about that when it happens.
Patch by me, per a suggestion from Andres Freund. Back-patch to
v12, which is as far back as the patch will go without some pain.
(Recently-established project policy would permit a back-patch as
far as 9.2, but I'm disinclined to expend the work until GCC 12
is much more widespread.)
Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.
To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.
Patch by Masahiko Sawada, with very minor polishing by me.
I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
This adds to database objects the same version tracking that collation
objects have. There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.
This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported. But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's. In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.
To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.
Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems. It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.
Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.
Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
It was harder than necessary to understand PostgresMain() because the code for
a normal backend was interspersed with single-user mode specific code. Split
most of the single-user mode code into its own function
PostgresSingleUserMain(), that does all the necessary setup for single-user
mode, and then hands off after that to PostgresMain().
There still is some single-user mode code in InitPostgres(), and it'd likely
be worth moving at least some of it out. But that's for later.
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
Session statistics, as introduced by 960869da08, had several shortcomings:
- an additional GetCurrentTimestamp() call that also impaired the accuracy of
the data collected
This can be avoided by passing the current timestamp we already have in
pgstat_report_stat().
- an additional statistics UDP packet sent every 500ms
This is solved by adding the new statistics to PgStat_MsgTabstat.
This is conceptually ugly, because session statistics are not
table statistics. But the struct already contains data unrelated
to tables, so there is not much damage done.
Connection and disconnection are reported in separate messages, which
reduces the number of additional messages to two messages per session and a
slight increase in PgStat_MsgTabstat size (but the same number of table
stats fit).
- Session time computation could overflow on systems where long is 32 bit.
Reported-By: Andres Freund <andres@anarazel.de>
Author: Andres Freund <andres@anarazel.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/20210801205501.nyxzxoelqoo4x2qc%40alap3.anarazel.de
Backpatch: 14-, where the feature was introduced.
An upcoming patch splits single user mode handling out of PostgresMain(). The
startup time only needs to be determined in single user mode. Currently the
initialization happens late, which makes the split a bit harder. As postmaster
determines the time earlier it makes sense to move the time for single user
mode to a roughly similar point in time.
Reviewd-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
It's possible to execute user-defined SQL in some background processes;
for example, logical replication workers can fire triggers. This opens
the possibility that someone would try to execute LISTEN in such a
context. But since only regular backends ever call
ProcessNotifyInterrupt, no messages would actually be received, and
thus the registered listener would simply prevent the message queue
from being cleaned. Eventually NOTIFY would stop working, which is bad.
Perhaps someday somebody will invent infrastructure to make listening
in a background worker actually useful. In the meantime, forbid it.
Back-patch to v13, which is where we introduced the MyBackendType
variable. It'd be a lot harder to implement the check without that,
and it doesn't seem worth the trouble.
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
We have long forbidden fetching backwards from a NO SCROLL cursor,
but the prohibition didn't extend to cases in which we rewind the
query altogether and then re-fetch forwards. I think the reason is
that this logic was mainly meant to protect plan nodes that can't
be run in the reverse direction. However, re-reading the query output
is problematic if the query is volatile (which includes SELECT FOR
UPDATE, not just queries with volatile functions): the re-read can
produce different results, which confuses the cursor navigation logic
completely. Another reason for disliking this approach is that some
code paths will either fetch backwards or rewind-and-fetch-forwards
depending on the distance to the target row; so that seemingly
identical use-cases may or may not draw the "cursor can only scan
forward" error. Hence, let's clean things up by disallowing rewind
as well as fetch-backwards in a NO SCROLL cursor.
Ordinarily we'd only make such a definitional change in HEAD, but
there is a third reason to consider this change now. Commit ba2c6d6ce
created some new user-visible anomalies for non-scrollable cursors
WITH HOLD, in that navigation in the cursor result got confused if the
cursor had been partially read before committing. The only good way
to resolve those anomalies is to forbid rewinding such a cursor, which
allows removal of the incorrect cursor state manipulations that
ba2c6d6ce added to PersistHoldablePortal.
To minimize the behavioral change in the back branches (including
v14), refuse to rewind a NO SCROLL cursor only when it has a holdStore,
ie has been held over from a previous transaction due to WITH HOLD.
This should avoid breaking most applications that have been sloppy
about whether to declare cursors as scrollable. We'll enforce the
prohibition across-the-board beginning in v15.
Back-patch to v11, as ba2c6d6ce was.
Discussion: https://postgr.es/m/3712911.1631207435@sss.pgh.pa.us
For EXEC_BACKEND InitProcess()/InitAuxiliaryProcess() needs to have been
called well before we call BaseInit(), as SubPostmasterMain() needs LWLocks to
work. Having the order of initialization differ between platforms makes it
unnecessarily hard to understand the system and to add initialization points
for new subsystems without a lot of duplication.
To be able to change the order, BaseInit() cannot trigger
CreateSharedMemoryAndSemaphores() anymore - obviously that needs to have
happened before we can call InitProcess(). It seems cleaner to create shared
memory explicitly in single user/bootstrap mode anyway.
After this change the separation of bufmgr initialization into
InitBufferPoolAccess() / InitBufferPoolBackend() is not meaningful anymore so
the latter is removed.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
As of commit 84f5c2908, executing SQL commands (via SPI or otherwise)
requires having either an active Portal, or a caller-established
active snapshot. We were simply Assert'ing that that's the case.
But we've now had a couple different reports of people testing
extensions that didn't meet this requirement, and were confused by
the resulting crash. Let's convert the Assert to a test-and-elog,
in hopes of making the issue clearer for extension authors.
Per gripes from Liu Huailing and RekGRpth. Back-patch to v11,
like the prior commit.
Discussion: https://postgr.es/m/OSZPR01MB6215671E3C5956A034A080DFBEEC9@OSZPR01MB6215.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/17035-14607d308ac8643c@postgresql.org
When reporting "conflicting or redundant options" errors, try to
ensure that errposition() is used, to help the user identify the
offending option.
Formerly, errposition() was invoked in less than 60% of cases. This
patch raises that to over 90%, but there remain a few places where the
ParseState is not readily available. Using errdetail() might improve
the error in such cases, but that is left as a task for the future.
Additionally, since this error is thrown from over 100 places in the
codebase, introduce a dedicated function to throw it, reducing code
duplication.
Extracted from a slightly larger patch by Vignesh C. Reviewed by
Bharath Rupireddy, Alvaro Herrera, Dilip Kumar, Hou Zhijie, Peter
Smith, Daniel Gustafsson, Julien Rouhaud and me.
Discussion: https://postgr.es/m/CALDaNm33FFSS5tVyvmkoK2cCMuDVxcui=gFrjti9ROfynqSAGA@mail.gmail.com
In the "simple Query" code path, it's fine for parse analysis or
execution of a utility statement to scribble on the statement's node
tree, since that'll just be thrown away afterwards. However it's
not fine if the node tree is in the plan cache, as then it'd be
corrupted for subsequent executions. Up to now we've dealt with
that by having individual utility-statement functions apply
copyObject() if they were going to modify the tree. But that's
prone to errors of omission. Bug #17053 from Charles Samborski
shows that CREATE/ALTER DOMAIN didn't get this memo, and can
crash if executed repeatedly from plan cache.
In the back branches, we'll just apply a narrow band-aid for that,
but in HEAD it seems prudent to have a more principled fix that
will close off the possibility of other similar bugs in future.
Hence, let's hoist the responsibility for doing copyObject up into
ProcessUtility from its children, thus ensuring that it happens for
all utility statement types.
Also, modify ProcessUtility's API so that its callers can tell it
whether a copy step is necessary. It turns out that in all cases,
the immediate caller knows whether the node tree is transient, so
this doesn't involve a huge amount of code thrashing. In this way,
while we lose a little bit in the execute-from-cache code path due
to sometimes copying node trees that wouldn't be mutated anyway,
we gain something in the simple-Query code path by not copying
throwaway node trees. Statements that are complex enough to be
expensive to copy are almost certainly ones that would have to be
copied anyway, so the loss in the cache code path shouldn't be much.
(Note that this whole problem applies only to utility statements.
Optimizable statements don't have the issue because we long ago made
the executor treat Plan trees as read-only. Perhaps someday we will
make utility statement execution act likewise, but I'm not holding
my breath.)
Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal. In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.
Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve. So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.
We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK. As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands. Hence, teach
spi.c about doing this at the right time. (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)
This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.
replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix. But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.
This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).
Per bug #15990 from Andreas Wicht. Back-patch to v11 where
intra-procedure COMMIT was added.
Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
Allowing only on/off meant that all either all existing configuration
guides would become obsolete if we disabled it by default, or that we
would have to accept a performance loss in the default config if we
enabled it by default. By allowing 'auto' as a middle ground, the
performance cost is only paid by those who enable pg_stat_statements and
similar modules.
I only edited the release notes to comment-out a paragraph that is now
factually wrong; further edits are probably needed to describe the
related change in more detail.
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210513002623.eugftm4nk2lvvks3@nol
Split up CHECK_FOR_INTERRUPTS() to provide an additional macro
INTERRUPTS_PENDING_CONDITION(), which just tests whether an
interrupt is pending without attempting to service it. This is
useful in situations where the caller knows that interrupts are
blocked, and would like to find out if it's worth the trouble
to unblock them.
Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether
CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt.
This commit doesn't actually add any uses of the new macros,
but a follow-on bug fix will do so. Back-patch to all supported
branches to provide infrastructure for that fix.
Alvaro Herrera and Tom Lane
Discussion: https://postgr.es/m/20210513155351.GA7848@alvherre.pgsql