The contents of the line whose parsing failed was not reported in the
error message produced by generate-wait_event_types.pl, making harder
than necessary the debugging of incorrectly-shaped entries in the file.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
The double quotes used for the wait event names are not required, as
the values quoted are made of single words. The files generated by
generate-wait_event_types.pl (pgstat_wait_event.c, wait_event_types.h
and wait_event_types.sgml) are exactly the same before and after this
commit, hence the wait event names and the enum elements have the same
names as before.
Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.
To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.
An invalid database cannot be connected to anymore, but can still be
dropped.
Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.
Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.
Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
The first check on the enum values was not necessary as the values set
in wait_event_names.txt for the classes LWLock and Lock were able to
satisfy the check. The second check when generating the C and header
files is now based on a match of the class names, making it simpler to
understand.
Author: Masahiro Ikeda, Michael Paquier
Discussion: https://postgr.es/m/eaf82a85c0ef1b55dc3b651d3f7b867a@oss.nttdata.com
This commit adds a new type of parallel message 'P' to allow a
parallel worker to poke at a leader to update the progress.
Currently it supports only incremental progress reporting but it's
possible to allow for supporting of other backend progress APIs in the
future.
There are no users of this new message type as of this commit. That
will follow in future commits.
Idea from Andres Freund.
Author: Sami Imseih
Reviewed by: Michael Paquier, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
Windows has similar functions with leading underscores. Previously, we
provided the rename via a macro in win32_port.h. In fact its functions
are not always good replacements for the Unix functions, since they
can't deal with UTF-8. They are only currently used by pg_locale.c,
which is careful to redirect to other Windows routines for UTF-8. Given
that portability hazard, it seem unlikely to be a good idea to encourage
any other code to think of these functions as being available outside
pg_locale.c. Any code that thinks it wants these functions probably
wants our wchar2char() or char2wchar() routines instead, or it won't
actually work on Windows in UTF-8 databases.
Furthermore, some major libc implementations including glibc don't have
them (they only have the standard variants without _l), so external code
is very unlikely to require them to exist.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.
Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed. With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
This is useful to show the allocation state of huge pages when setting
up a server with "huge_pages = try", where allocating huge pages would
be attempted but the server would continue its startup sequence even if
the allocation fails. The effective status of huge pages is not easily
visible without OS-level tools (or for instance, a lookup at
/proc/N/smaps), and the environments where Postgres runs may not
authorize that. Like the other GUCs related to huge pages, this works
for Linux and Windows.
This GUC can report as values:
- "on", if huge pages were allocated.
- "off", if huge pages were not allocated.
- "unknown", a special state that could only be seen when using for
example postgres -C because it is only possible to know if the shared
memory allocation worked after we can check for the GUC values, even if
checking a runtime-computed GUC. This value should never be seen when
querying for the GUC on a running server. An assertion is added to
check that.
The discussion has also turned around having a new function to grab this
status, but this would have required more tricks for -DEXEC_BACKEND,
something that GUCs already handle.
Noriyoshi Shinoda has initiated the thread that has led to the result of
this commit.
Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
The header file wait_event_types.h was generated without a newline at
its end, which was inconsistent with all the other things generated
automatically.
Per offline gripe from Nathan Bossart.
This commit comes as a continuation of the discussion that has led to
d522b05, as \v was handled inconsistently when parsing array values or
anything going through the parsers, and changing a parser behavior in
stable branches is a scary thing to do. The parsing of array values now
uses the more central scanner_isspace() and array_isspace() is removed.
As pointing out by Peter Eisentraut, fix a confusing reference to
horizontal space in the parsers with the term "horiz_space". \f was
included in this set since 3cfdd8f from 2000, but it is not horizontal.
"horiz_space" is renamed to "non_newline_space", to refer to all
whitespace characters except newlines.
The changes impact the parsers for the backend, psql, seg, cube, ecpg
and replication commands. Note that JSON should not escape \v, as per
RFC 7159, so these are not touched.
Reviewed-by: Peter Eisentraut, Tom Lane
Discussion: https://postgr.es/m/ZJKcjNwWHHvw9ksQ@paquier.xyz
BuildEventTriggerCache sets up a context "EventTriggerCache" which
house a hash named "Event Trigger Cache", which in turn creates a
context with the table name. This generated log output for memory
context dumps like below:
LOG: level: 2; EventTriggerCache: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
LOG: level: 3; Event Trigger Cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
This renames the hash to ensure that the hash context has a unique
name for easier log reading and debugging.
Discussion: https://postgr.es/m/5EDC969E-CAE3-4CBD-965E-3B8A1294CFA4@yesql.se
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs
Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.
The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them. These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.
Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.
This commit has been tested with configure/Makefile, the CI and VPATH
build. clean, distclean and maintainer-clean were working fine.
Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
It's OK to be lazy about re-binning memory segments when allocating,
because that can only leave segments in a bin that's too high. We'll
search higher bins if necessary while allocating next time, and
also eventually re-bin, so no memory can become unreachable that way.
However, when freeing memory, the largest contiguous range of free pages
might go up, so we should re-bin eagerly to make sure we don't leave the
segment in a bin that is too low for get_best_segment() to find.
The re-binning code is moved into a function of its own, so it can be
called whenever free pages are returned to the segment's free page map.
Back-patch to all supported releases.
Author: Dongming Liu <ldming101@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com
The VacAttrStats structure contained the whole Form_pg_attribute for a
column, but it actually only needs attstattarget from there. So
remove the Form_pg_attribute field and make a separate field for
attstattarget. This simplifies some code for extended statistics that
doesn't deal with a column but an expression, which had to fake up
pg_attribute rows to satisfy internal APIs. Also, we can remove some
comments that essentially said "don't look at pg_attribute directly".
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
The following changes are done:
- Addition of WaitEventBufferPin and WaitEventExtension, that hold a
list of wait events related to each category.
- Addition of two functions that encapsulate the list of wait events for
each category.
- Rename BUFFER_PIN to BUFFERPIN (only this wait event class used an
underscore, requiring a specific rule in the automation script).
These changes make a bit easier the automatic generation of all the code
and documentation related to wait events, as all the wait event
categories are now controlled by consistent structures and functions.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/c6f35117-4b20-4c78-1df5-d3056010dcf5@gmail.com
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
exec_parse_message() wants to create a cached plan in all cases,
including for empty input. The empty-input path does not have
a test for being in an aborted transaction, making it possible
that plancache.c will fail due to trying to do database lookups
even though there's no real work to do.
One solution would be to throw an aborted-transaction error in
this path too, but it's not entirely clear whether the lack of
such an error was intentional or whether some clients might be
relying on non-error behavior. Instead, let's hack plancache.c
so that it treats empty statements with the same logic it
already had for transaction control commands, ensuring that it
can soldier through even in an already-aborted transaction.
Per bug #17983 from Alexander Lakhin. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/17983-da4569fcb878672e@postgresql.org
Run pgindent and pgperltidy. It seems we're still some ways
away from all committers doing this automatically. Now that
we have a buildfarm animal that will whine about poorly-indented
code, we'll try to keep the tree more tidy.
Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
Previously stats in the startup process would only get reported during
shutdown of the startup process. It has been that way for a long time, but
became a lot more noticeable with the new pg_stat_io view, which separates out
IO done by different backend types...
While replaying after every XLOG_RUNNING_XACTS isn't the prettiest approach,
it has the advantage of being quite easy. Given that we're well past feature
freeze...
It's not a problem that we don't report stats more frequently with
wal_level=minimal, in that case stats can't be read before the stats process
has shut down.
Besides the above, this commit also changes pgstat_report_stat() to acquire
the timestamp with GetCurrentTimestamp() instead of
GetCurrentTransactionStopTimestamp().
Thanks to Melih Mutlu, Kyotaro Horiguchi for prototypes of other approaches to
solving this issue.
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/5315aedc-fbca-1556-c5de-dc2e00b23a14@oss.nttdata.com
Commit 927d9abb6 purported to make datetime() accept any string
that could be output for a datetime value by to_jsonb(). But it
overlooked the possibility of fractional seconds being present,
so that cases as simple as to_jsonb(now()) would defeat it.
Fix by adding formats that include ".US" to the list in
executeDateTimeMethod(). (Note that while this is nominally
microseconds, it'll do the right thing for fractions with
fewer than six digits.)
In passing, re-order the list to restore the datatype ordering
specified in its comment. The violation accidentally did not
break anything; but the next edit might be less lucky, so add
more comments.
Per report from Tim Field. Back-patch to v13 where datetime()
was added, like the previous patch.
Discussion: https://postgr.es/m/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE@mohiohio.com
- Commit 3eb77eba5a, which moved the pending ops queue from md.c to
sync.c, introduced a duplicate, unused 'pendingOpsCxt'
variable. (I'm surprised none of the compilers or static analysis
tools have complained about that.)
- Commit c2fe139c20 moved the 'synchronize_seqscans' variable and
introduced an extern declaration in tableam.h, making the one in
guc_tables.c unnecessary.
- Commit 6f0cf87872 removed the 'pgstat_temp_directory' GUC, but
forgot to remove the corresponding global variable.
- Commit 1b4e729eaa removed the 'pg_krb_realm' GUC, and its global
variable, but forgot the declaration in auth.h.
Spotted all these by reading the code.
Split nbtree's _bt_getbuf function is two: code that read locks or write
locks existing pages remains in _bt_getbuf, while code that deals with
allocating new pages is moved to a new, dedicated function called
_bt_allocbuf. This simplifies most _bt_getbuf callers, since it is no
longer necessary for them to pass a heaprel argument. Many of the
changes to nbtree from commit 61b313e4 can be reverted. This minimizes
the divergence between HEAD/PostgreSQL 16 and earlier release branches.
_bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to
_bt_getbuf. There are only 3 affected call sites, all of which continue
to pass a heaprel for recovery conflict purposes. Note that nbtree's
use of P_NEW was superficial; nbtree never actually relied on the P_NEW
code paths in bufmgr.c, so this change is strictly mechanical.
GiST already took the same approach; it has a dedicated function for
allocating new pages called gistNewBuffer(). That factor allowed commit
61b313e4 to make much more targeted changes to GiST.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com
pg_base64_enc_len() and its clones overestimated the output
length by up to 2 bytes, as a result of sloppy thinking about
where to divide. No callers require a precise estimate, so
this has no consequences worse than palloc'ing a byte or two
more than necessary. We might as well get it right though.
This bug is very ancient, dating to commit 79d78bb26 which
added encode.c. (The other instances were presumably copied
from there.) Still, it doesn't quite seem worth back-patching.
Oleg Tselebrovskiy
Discussion: https://postgr.es/m/f94da55286a63022150bc266afdab754@postgrespro.ru
The GUC settings lc_collate and lc_ctype are from a time when those
locale settings were cluster-global. When those locale settings were
made per-database (PG 8.4), the settings were kept as read-only. As
of PG 15, you can use ICU as the per-database locale provider, so
examining these settings is already less meaningful and possibly
confusing, since you need to look into pg_database to find out what is
really happening, and they would likely become fully obsolete in the
future anyway.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/696054d1-bc88-b6ab-129a-18b8bce6a6f0@enterprisedb.com
Complete the task begun in 9c0a0e2ed: we don't want to use the
abbreviation "deleg" for GSS delegation in any user-visible places.
(For consistency, this also changes most internal uses too.)
Abhijit Menon-Sen and Tom Lane
Discussion: https://postgr.es/m/949048.1684639317@sss.pgh.pa.us
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
Should a hash join exceed memory limit, the hashtable is split up into
multiple batches. The number of batches is doubled each time a given
batch is determined not to fit in memory. Each batch file is allocated
with a block-sized buffer for buffering tuples and parallel hash join
has additional sharedtuplestore accessor buffers.
In some pathological cases requiring a lot of batches, often with skewed
data, bad stats, or very large datasets, users can run out-of-memory
solely from the memory overhead of all the batch files' buffers.
Batch files were allocated in the ExecutorState memory context, making
it very hard to identify when this batch explosion was the source of an
OOM. This commit allocates the batch files in a dedicated memory
context, making it easier to identify the cause of an OOM and work to
avoid it.
Based on initial draft by Tomas Vondra, with significant reworks and
improvements by Jehan-Guillaume de Rorthais.
Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Author: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development
Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17
Non-leaf pages of GiST indexes contain key attributes, leaf pages
contain both key and non-key attributes, and gist_page_items() ignored
the handling of non-key attributes. This caused a few problems when
using gist_page_items() on a GiST index with INCLUDE:
- On a non-leaf page, the function would crash.
- On a leaf page, the function would work, but miss to display all the
values for included attributes.
This commit fixes gist_page_items() to handle such cases in a more
appropriate way, and now displays the values of key and non-key
attributes for each item separately in a style consistent with what
ruleutils.c would generate for the attribute list, depending on the page
type dealt with. In a way similar to how a record is displayed, values
would be double-quoted for key or non-key attributes if required.
ruleutils.c did not provide a routine able to control if non-key
attributes should be displayed, so an extended() routine for index
definitions is added to work around the leaf and non-leaf page
differences.
While on it, this commit fixes a third problem related to the amount of
data reported for key attributes. The code originally relied on
BuildIndexValueDescription() (used for error reports on constraints)
that would not print all the data stored in the index but the index
opclass's input type, so this limited the amount of information
available. This switch makes gist_page_items() much cheaper as there is
no need to run ACL checks for each item printed, which is not an issue
anyway as superuser rights are required to execute the functions of
pageinspect. Opclasses whose data cannot be displayed can rely on
gist_page_items_bytea().
The documentation of this function was slightly incorrect for the
output results generated on HEAD and v15, so adjust it on these
branches.
Author: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/17884-cb8c326522977acb@postgresql.org
Backpatch-through: 14
Fixes memory error in cases where the length of the language name
returned by uloc_getLanguage() is exactly ULOC_LANG_CAPACITY, in which
case the status is set to U_STRING_NOT_TERMINATED_WARNING.
Also check in call sites for other ICU functions that are expected to
return a C string to be safe (no bug is known at these other call
sites).
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com
This is equivalent to a revert of f193883 and fb32748, with the addition
that the declaration of the SQLValueFunction node needs to gain a couple
of node_attr for query jumbling. The performance impact of removing the
function call inlining is proving to be too huge for some workloads
where these are used. A worst-case test case of involving only simple
SELECT queries with a SQL keyword is proving to lead to a reduction of
10% in TPS via pgbench and prepared queries on a high-end machine.
None of the tests I ran back for this set of changes saw such a huge
gap, but Alexander Lakhin and Andres Freund have found that this can be
noticeable. Keeping the older performance would mean to do more
inlining in the executor when using COERCE_SQL_SYNTAX for a function
expression, similarly to what SQLValueFunction does. This requires more
redesign work and there is little time until 16beta1 is released, so for
now reverting the change is the best way forward, bringing back the
previous performance.
Bump catalog version.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com
Give the new GUC introduced by d4e71df6 a name that is clearly not
intended for mainstream use quite yet.
Future proposals would drop the prefix only after adding infrastructure
to make it efficient. Having the switch in the tree sooner is good
because it might lead to new discoveries about the hazards awaiting us
on a wide range of systems, but that name was too enticing and could
lead to cross-version confusion in future, per complaints from Noah and
Justin.
Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto)
Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot. In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").
In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed. Some tests are added to
check, while on it.
Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.
Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached. Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt. Finally, snapshot->cache would cache a new snapshot on
the second attempt.
Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15