In the past, FileRead() and FileWrite() used types based on the Unix
read() and write() functions from before C and POSIX standardization,
though not exactly (we had int for amount instead of unsigned). In
commit 2d4f1ba6 we changed to the appropriate standard C types, just
like the modern POSIX functions they wrap, but again not exactly: the
return type stayed as int. In theory, a ssize_t value could be returned
by the underlying call that is too large for an int.
That wasn't really a live bug, because we don't expect PostgreSQL code
to perform reads or writes of gigabytes, and OSes probably apply
internal caps smaller than that anyway. This change is done on the
principle that the return might as well follow the standard interfaces
consistently.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us
This commit switches pg_enc2gettext_tbl[] in encnames.c to use a
C99-designated initializer syntax.
pg_bind_textdomain_codeset() is simplified so as it is possible to do
a direct lookup at the gettext() array with a value of the enum pg_enc
rather than doing a loop through all its elements, as long as the
encoding value provided by GetDatabaseEncoding() is in the correct range
of supported encoding values. Note that PG_MULE_INTERNAL gains a value
in the array, pointing to NULL.
Author: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
Commit bd5132db55 introduced new atomic read/write functions with
full barrier semantics, which are intended to simplify converting
non-performance-critical code to use atomic variables. This commit
demonstrates one such conversion.
Reviewed-by: Yong Li
Discussion: https://postgr.es/m/20231110205128.GB1315705%40nathanxps13
Writing correct code using atomic variables is often difficult due
to the memory barrier semantics (or lack thereof) of the underlying
operations. This commit introduces atomic read/write functions
with full barrier semantics to ease this cognitive load. For
example, some spinlocks protect a single value, and these new
functions make it easy to convert the value to an atomic variable
(thus eliminating the need for the spinlock) without modifying the
barrier semantics previously provided by the spinlock. Since these
functions may be less performant than the other atomic reads and
writes, they are not suitable for every use-case. However, using a
single atomic operation with full barrier semantics may be more
performant in cases where a separate explicit barrier would
otherwise be required.
The base implementations for these new functions are atomic
exchanges (for writes) and atomic fetch/adds with 0 (for reads).
These implementations can be overwritten with better architecture-
specific versions as they are discovered.
This commit leaves converting existing code to use these new
functions as a future exercise.
Reviewed-by: Andres Freund, Yong Li, Jeff Davis
Discussion: https://postgr.es/m/20231110205128.GB1315705%40nathanxps13
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.
A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.
Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.
Dean Rasheed, reviewed by Jian He and Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
This field has been redundant ever since it was added by commit
25e777cf8e, which split up ExecUpdate() and ExecDelete() into reusable
pieces. The only place that reads it is ExecMergeMatched(), if the
result from ExecUpdateAct() is TM_Ok. However, all paths through
ExecUpdateAct() that return TM_Ok also set this field to true, so the
return status by itself is sufficient to tell if the update happened.
Removing this field is a modest simplification, and it brings the
UPDATE path in ExecMergeMatched() more into line with ExecUpdate(),
ensuring that ExecUpdateEpilogue() is always called if ExecUpdateAct()
returns TM_Ok, reducing the chance of bugs.
Dean Rasheed, reviewed by Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCWGGmigGBzLHkJm5Ccv2mMxXmwi3%2Buq0yhwDHm-tsvSLg%40mail.gmail.com
dsa_dump would print a large negative number instead of zero for
segment bin 0. Fix by explicitly checking for underflow and add
special case for bin 0. Backpatch to all supported versions.
Author: Ian Ilyasov <ianilyasov@outlook.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/GV1P251MB1004E0D09D117D3CECF9256ECD502@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM
Backpatch-through: v12
This updates the following lookup arrays to use C99-designated
initializer syntax, indexed based on the enum pg_enc:
pg_enc2icu_tbl[]
pg_enc2name_tbl[]
pg_wchar_table[]
This is more readable, and removes problems with ordering mistakes as
this removes dependencies between the arrays and their lookup index in
the enum pg_enc. So, adding new encodings becomes easier, even if this
does not happen often.
Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
Removing the get_columns_length() function from regress.so
means we have to drop it when testing upgrades from versions
that had it. Per buildfarm.
Discussion: https://postgr.es/m/2520881.1709159002@sss.pgh.pa.us
The config files which are used to generate the server and client
CAs claimed that these were self-signed, when they in reality are
signed by the root_ca (which however is self-signed). Reword the
comments to match.
Author: David Zhang <david.zhang@highgo.ca>
Discussion: https://postgr.es/m/12f4c425-45fe-480f-a692-b3ed82ebcb33@highgo.ca
If one of these constructs referenced a nonexistent object, we'd fall
through to feeding the whole construct to the core parser, which would
reject it with a "syntax error" message. That's pretty unhelpful and
misleading. There's no good reason for plpgsql_parse_wordtype and
friends not to throw a useful error for incorrect input, so make them
do that instead of returning NULL.
Discussion: https://postgr.es/m/1964516.1708977740@sss.pgh.pa.us
This is a first step toward modernizing our README file. Some
popular developer platforms support rendering README.md files, so
a direct conversion to Markdown seems like a good place to start.
The intent is to keep this file legible as plain text even as it
accumulates more content.
Suggested-by: Andrew Atkinson
Reviewed-by: Tom Lane, Daniel Gustafsson, Joe Conway
Discussion: https://postgr.es/m/CAG6XLEmGE95DdKqjk%2BDd9vC8mfN7BnV2WFgYk_9ovW6ikN0YSg%40mail.gmail.com
Commit 0b16bb877 removed the test query added by commit 79b716cfb,
but not the C-language support function used by that query. I don't
see any plausible reason why we'd need that function again, so throw
it overboard too.
In the case where the target timestamp is before the origin timestamp
and their difference is already an exact multiple of the stride, the
code incorrectly subtracted the stride anyway.
Also detect several integer-overflow cases that previously produced
bogus results. (The submitted patch tried to avoid overflow, but
I'm not convinced it's right, and problematic cases are so far out of
the plausibly-useful range that they don't seem worth sweating over.
Let's just use overflow-detecting arithmetic and throw errors.)
timestamp_bin() and timestamptz_bin() are basically identical and
so had identical bugs. Fix both.
Report and patch by Moaaz Assali, adjusted some by me. Back-patch
to v14 where date_bin() was introduced.
Discussion: https://postgr.es/m/CALkF+nvtuas-2kydG-WfofbRSJpyODAJWun==W-yO5j2R4meqA@mail.gmail.com
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches. In order for this to work with good
performance, two additional changes are made:
1. the cache is divided in "banks" (to borrow terminology from CPU
caches), and algorithms such as eviction buffer search only affect
one specific bank. This forestalls the problem that linear searching
for a specific buffer across the whole cache takes too long: we only
have to search the specific bank, whose size is small. This work is
authored by Andrey Borodin.
2. Change the locking regime for the SLRU banks, so that each bank uses
a separate LWLock. This allows for increased scalability. This work
is authored by Dilip Kumar. (A part of this was previously committed as
d172b717c6f4.)
Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next. This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this. I (Álvaro) also added
lots of comments to make sure the design is sound.
The new GUCs match the names introduced by bcdfa5f2e2f2 in the
pg_stat_slru view.
The default values for these parameters are similar to the previous
sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB. (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.) The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0. These values based on
shared_buffers may need to be revisited, but that's an easy change.
There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally). However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
There isn't a lot of user demand for AIX support, we have a bunch of
hacks to work around AIX-specific compiler bugs and idiosyncrasies,
and no one has stepped up to the plate to properly maintain it.
Remove support for AIX to get rid of that maintenance overhead. It's
still supported for stable versions.
The acute issue that triggered this decision was that after commit
8af2565248, the AIX buildfarm members have been hitting this
assertion:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728
Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX
for values larger than PG_IO_ALIGN_SIZE, for a static const variable.
That could be worked around, but we decided to just drop the AIX support
instead.
Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com
Reviewed-by: Andres Freund, Noah Misch, Thomas Munro
The new names are intended to match those in an upcoming patch that adds
a few GUCs to configure the SLRU buffer sizes.
Backwards compatibility concern: this changes the accepted names for
function pg_stat_slru_rest(). Since this function recognizes "any other
string" as a request to reset the entry for "other", this means that
calling it with the old names would silently reset "other" instead of
doing nothing or throwing an error.
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
Allocating from a free list or from a block which contains enough space
already, we deem to be common code paths and want to optimize for those.
Having to allocate a new block, either a normal block or a dedicated one
for a large allocation, we deem to be less common, therefore we class
that as "cold". Both cold paths require a malloc so are going to be
slower as a result of that regardless.
The main motivation here is to remove the calls to malloc() in the hot
path and because of this, the compiler is now free to not bother setting
up the stack frame in AllocSetAlloc(), thus making the hot path much
cheaper.
Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
This is in the same spirit as ef5e2e90859a, updating this time some
arrays in parser.c, relpath.c, guc_tables.c and pg_dump_sort.c so as the
order of their elements has no need to match the enum structures they
are based on anymore.
Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
A recent commit added a copy_function member to the
dshash_parameters struct, but it missed updating a couple of
comments that refer to the function pointer members of this struct.
One of those comments also refers to a tranche_name member and non-
arg variants of the function pointer members, all of which were
either removed during development or removed shortly after dshash
table support was committed.
Oversights in commits 8c0d7bafad, d7694fc148, and 42a1de3013.
Discussion: https://postgr.es/m/20240227045213.GA2329190%40nathanxps13
This is a followup to commit 66ea94e8e6.
Error mssages concerning incorrect formats for date-time types are
unified and parameterized, instead of using a fully separate error
message for each type.
Similarly, error messages regarding numeric and string arguments to
certain items are standardized, and instead of saying that the argument
is out of range simply say that it is invalid. The actual invalid
arguments to these itesm are now shown in the error message.
Error messages relating to numeric inputs of Nan or Infinity are
made more informative.
Jeevan Chalke and Kyotaro Horiguchi, with some input from Tom Lane.
Discussion: https://postgr.es/m/20240129.121200.235012930453045390.horikyota.ntt@gmail.com
object_classes[] provided unnecessary indirection between catalog OIDs
and the enum ObjectClass when calling add_object_address(). This array
has been originally introduced in 30ec31604d5 and was useful because not
all relation OIDs were compile-time constants back then, which has not
been the case for a long time now for all the elements of ObjectClass.
This commit removes object_classes[], switching to the catalog OIDs
when calling add_object_address(). This shaves some code while saving
in maintenance because it was necessary to maintain the enum ObjectClass
and the array in sync when adding new object types.
Reported-by: Jeff Davis
Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
Many modern compilers are able to optimize function calls to functions
where the parameters of the called function match a leading subset of
the calling function's parameters. If there are no instructions in the
calling function after the function is called, then the compiler is free
to avoid any stack frame setup and implement the function call as a
"jmp" rather than a "call". This is called sibling call optimization.
Here we adjust the memory allocation functions in mcxt.c to allow this
optimization. This requires moving some responsibility into the memory
context implementations themselves. It's now the responsibility of the
MemoryContext to check for malloc failures. This is good as it both
allows the sibling call optimization, but also because most small and
medium allocations won't call malloc and just allocate memory to an
existing block. That can't fail, so checking for NULLs in that case
isn't required.
Also, traditionally it's been the responsibility of palloc and the other
allocation functions in mcxt.c to check for invalid allocation size
requests. Here we also move the responsibility of checking that into the
MemoryContext. This isn't to allow the sibling call optimization, but
more because most of our allocators handle large allocations separately
and we can just add the size check when doing large allocations. We no
longer check this for non-large allocations at all.
To make checking the allocation request sizes and ERROR handling easier,
add some helper functions to mcxt.c for the allocators to use.
Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
Presently, string keys are not well-supported for dshash tables.
The dshash code always copies key_size bytes into new entries'
keys, and dshash.h only provides compare and hash functions that
forward to memcmp() and tag_hash(), both of which do not stop at
the first NUL. This means that callers must pad string keys so
that the data beyond the first NUL does not adversely affect the
results of copying, comparing, and hashing the keys.
To better support string keys in dshash tables, this commit does
a couple things:
* A new copy_function field is added to the dshash_parameters
struct. This function pointer specifies how the key should be
copied into new table entries. For example, we only want to copy
up to the first NUL byte for string keys. A dshash_memcpy()
helper function is provided and used for all existing in-tree
dshash tables without string keys.
* A set of helper functions for string keys are provided. These
helper functions forward to strcmp(), strcpy(), and
string_hash(), all of which ignore data beyond the first NUL.
This commit also adjusts the DSM registry's dshash table to use the
new helper functions for string keys.
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
A couple of dshash_create() callers provide 0 for the 'void *arg'
argument, which might give readers the incorrect impression that
this is some sort of "flags" parameter.
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
Add a note about the additional privileges required after the fix in
4989ce72644b (wording per Tom Lane); also change marked-up mentions of
"target_table_name" to be simply "the target table" or the like. Also,
note that "join_condition" is scouted for requisite privileges.
Backpatch to 15.
Discussion: https://postgr.es/m/202402211653.zuh6objy3z72@alvherre.pgsql
Information of sequences is cached for each backend for currval() and
nextval(), and the update of some cached information was mixed in the
middle of computations based on the other properties of a sequence, for
the increment value in nextval() and the cached state when altering a
sequence.
Grouping them makes the code easier to follow and to refactor in the
future, when splitting the computation and the SeqTable change parts.
Note that the cached data is untouched between the areas where these
cache updates are moved.
Issue noticed while doing some refactoring of the sequence code.
Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
Similarly to tables and indexes, these functions are able to open
relations with a sequence relkind, which is useful to make a distinction
with the other relation kinds. Previously, commands/sequence.c used a
mix of table_{close,open}() and relation_{close,open}() routines when
manipulating sequence relations, so this clarifies the code.
A direct effect of this change is to align the error messages produced
when attempting DDLs for sequences on relations with an unexpected
relkind, like a table or an index with ALTER SEQUENCE, providing an
extra error detail about the relkind of the relation used in the DDL
query.
Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
When this assertion was installed (in commit d2f60a3ab), I thought
it was only for catching server logic errors that caused accesses to
catalogs that were undergoing index rebuilds. However, it will also
fire in case of a user-defined index expression that attempts to
access its own table. We occasionally see reports of people trying
to do that, and typically getting unintelligible low-level errors
as a result. We can provide a more on-point message by making this
a regular runtime check.
While at it, adjust the similar error check in
systable_beginscan_ordered to use the same message text. That one
is (probably) not reachable without a coding bug, but we might as
well use a translatable message if we have one.
Per bug #18363 from Alexander Lakhin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18363-e3598a5a572d0699@postgresql.org
51efe38cb92f introduced bunch of tests for idle_in_transaction_session_timeout,
transaction_timeout and statement_timeout. These tests were too flaky on some
slow buildfarm machines, so we plan to replace them with TAP tests using
injection points. This commit removes flaky tests.
Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin
This commit introduces a new field 'sublevels_up' in ReplaceVarnoContext,
and enhances replace_varno_walker() to:
1) recurse into subselects with sublevels_up increased, and
2) perform the replacement only when varlevelsup is equal to sublevels_up.
This commit also fixes some outdated comments. And besides adding relevant
test cases, it makes some unification over existing SJE test cases.
Discussion: https://postgr.es/m/CAMbWs4-%3DPO6Mm9gNnySbx0VHyXjgnnYYwbN9dth%3DTLQweZ-M%2Bg%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Andrei Lepikhov, Alexander Korotkov
build_child_join_sjinfo creates a derived SpecialJoinInfo in
the short-lived GEQO context, but afterwards the semi_rhs_exprs
from that may be used in a UniquePath for a child base relation.
This breaks the expectation that all base-relation-level structures
are in the planning-lifespan context, leading to use of a dangling
pointer with probable ensuing crash later on in create_unique_plan.
To fix, copy the expression trees when making a UniquePath.
Per bug #18360 from Alexander Lakhin. This has been broken since
partitionwise joins were added, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/18360-a23caf3157f34e62@postgresql.org
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.
The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.
It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.
The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
943f7ae1c869 has changed GetSlotInvalidationCause() so as it would
return the last element of SlotInvalidationCauses[] when an incorrect
conflict reason name is given by a caller, but this should return
RS_INVAL_NONE in such cases, even if such a state should never be
reached in practice.
Per gripe from Peter Smith.
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/CAHut+PtsrSWxczpGkSaSVtJo+BXrvJ3Hwp5gES14bbL-G+HL7A@mail.gmail.com