Both BufFileSize() and BufFileAppend() contained Asserts to ensure the
given BufFile(s) had a valid fileset. A valid fileset isn't required in
either of these functions, so remove the Asserts and adjust the
comments accordingly.
This was noticed while work was being done on a new patch to call
BufFileSize() on a BufFile without a valid fileset. It seems there's
currently no code in the tree which could trigger these Asserts, so no
need to backpatch this, for now.
Reviewed-by: Peter Geoghegan, Matthias van de Meent, Tom Lane
Discussion: https://postgr.es/m/CAApHDvofgZT0VzydhyGH5MMb-XZzNDqqAbzf1eBZV5HDm3%2BosQ%40mail.gmail.com
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
In the past, FileRead() and FileWrite() used types based on the Unix
read() and write() functions from before C and POSIX standardization,
though not exactly (we had int for amount instead of unsigned). In
commit 2d4f1ba6 we changed to the appropriate standard C types, just
like the modern POSIX functions they wrap, but again not exactly: the
return type stayed as int. In theory, a ssize_t value could be returned
by the underlying call that is too large for an int.
That wasn't really a live bug, because we don't expect PostgreSQL code
to perform reads or writes of gigabytes, and OSes probably apply
internal caps smaller than that anyway. This change is done on the
principle that the return might as well follow the standard interfaces
consistently.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us
jit.c and dfgr.c had a copy of the same code to check if a file exists
or not, with a twist: jit.c did not check for EACCES when failing the
stat() call for the path whose existence is tested. This refactored
routine will be used by an upcoming patch.
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for
fd.c's virtual file descriptors. The simple FileRead() and FileWrite()
functions are now implemented in terms of the vectored functions, to
avoid code duplication, and they are converted back to the corresponding
simple system calls further down (commit 15c9ac36). Later work will
make more interesting multi-iovec calls.
The traditional behavior of reporting a "fake" ENOSPC error is
simplified. It's now always set for non-failing writes, for the benefit
of callers that expect to log a meaningful "%m" if they determine that
the write was short. (Perhaps we should consider getting rid of that
expectation one day.)
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
Quotes are applied to GUCs in a very inconsistent way across the code
base, with a mix of double quotes or no quotes used. This commit
removes double quotes around all the GUC names that are obviously
referred to as parameters with non-English words (use of underscore,
mixed case, etc).
This is the result of a discussion with Álvaro Herrera, Nathan Bossart,
Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
As coded, the start block calculated by BufFileAppend() would overflow
once more than 16k files are used with a default block size. This issue
existed before b1e5c9fa9a, but there's no reason not to be clean about
it.
Per report from Coverity, with a fix suggested by Tom Lane.
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds. This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER. This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.
This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.
Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
This routine has been marked as NOT_USED since 20ad43b576 from 2000,
and a patch is planned to switch the logtape/tuplestore APIs to rely on
int64 rather than long for the block nunbers, which is more portable.
Keeping it is more confusing than anything at this stage, so let's get
rid of it entirely.
Thanks for Heikki Linnakangas for the poke on this one.
Discussion: https://postgr.es/m/5047be8c-7ee6-4dd5-af76-6c916c3103b4@iki.fi
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.
The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.
All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.
The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.
Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget. There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.
Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.
There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.
Add tests in src/test/modules/test_resowner.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
Instead of returning the number of characters in the RelFileNumber,
return the RelFileNumber itself. Continue to return the fork number,
as before, and additionally return the segment number.
parse_filename_for_nontemp_relation now rejects a RelFileNumber or
segment number that begins with a leading zero. Before, we accepted
such cases as relation filenames, but if we continued to do so after
this change, the function might return the same values for two
different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be
annoying for callers. Since we don't actually ever generate filenames
with leading zeroes in the names, any such files that we find must
have been created by something other than PostgreSQL, and it is
therefore reasonable to treat them as non-relation files.
Along the way, change unlogged_relation_entry to store a RelFileNumber
rather than an OID. This update should have been made in
851f4cc75c, but it was overlooked.
It's trivial to make the update as part of this commit, perhaps more
trivial than it would have been without it, so do that.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
* sync_method is renamed to wal_sync_method.
* sync_method_options[] is renamed to wal_sync_method_options[].
* assign_xlog_sync_method() is renamed to assign_wal_sync_method().
* The names of the available synchronization methods are now
prefixed with "WAL_SYNC_METHOD_" and have been moved into a
WalSyncMethod enum.
* PLATFORM_DEFAULT_SYNC_METHOD is renamed to
PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is
renamed to DEFAULT_WAL_SYNC_METHOD.
These more descriptive names help distinguish the code for
wal_sync_method from the code for DataDirSyncMethod (e.g., the
recovery_init_sync_method configuration parameter and the
--sync-method option provided by several frontend utilities). This
change also prevents name collisions between the aforementioned
sets of code. Since this only improves the naming of internal
identifiers, there should be no behavior change.
Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and
moves it to common/file_utils.h. This is preparatory work for a
follow-up commit that will allow specifying the synchronization
method in frontend utilities such as pg_upgrade and pg_basebackup.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz
Presently, frontend code that needs to use these macros must either
include storage/fd.h, which declares several frontend-unsafe
functions, or duplicate the macros. This commit moves these macros
to common/file_utils.h, which is safe for both frontend and backend
code. Consequently, we can also remove the duplicated macros in
pg_checksums and stop including storage/fd.h in pg_rewind.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz
Run pgindent and pgperltidy. It seems we're still some ways
away from all committers doing this automatically. Now that
we have a buildfarm animal that will whine about poorly-indented
code, we'll try to keep the tree more tidy.
Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
Starting with 4d330a61bb we can use posix_fallocate() to extend
files. Unfortunately in some situation, e.g. on tmpfs filesystems, EINTR may
be returned. See also 4518c798b2.
To fix, add a retry path to FileFallocate(). In contrast to 4518c798b2 the
amount we extend by is limited and the extending may happen at a high
frequency, so disabling signals does not appear to be the correct path here.
Also add retry paths to other file operations currently lacking them (around
fdatasync(), fsync(), ftruncate(), posix_fadvise(), sync_file_range(),
truncate()) - they are all documented or have been observed to return EINTR.
Even though most of these functions used in the back branches, it does not
seem worth the risk to backpatch - outside of the new-to-16 case of
posix_fallocate() I am not aware of problem reports due to the lack of
retries.
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/ZEZDj1H61ryrmY9o@msg.df7cb.de
Backpatch: -
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
Give the new GUC introduced by d4e71df6 a name that is clearly not
intended for mainstream use quite yet.
Future proposals would drop the prefix only after adding infrastructure
to make it efficient. Having the switch in the tree sooner is good
because it might lead to new discoveries about the hazards awaiting us
on a wide range of systems, but that name was too enticing and could
lead to cross-version confusion in future, per complaints from Noah and
Justin.
Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto)
Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com
Provide a way to ask the kernel to use O_DIRECT (or local equivalent)
where available for data and WAL files, to avoid or minimize kernel
caching. This hurts performance currently and is not intended for end
users yet. Later proposed work would introduce our own I/O clustering,
read-ahead, etc to replace the facilities the kernel disables with this
option.
The only user-visible change, if the developer-only GUC is not used, is
that this commit also removes the obscure logic that would activate
O_DIRECT for the WAL when wal_sync_method=open_[data]sync and
wal_level=minimal (which also requires max_wal_senders=0). Those are
non-default and unlikely settings, and this behavior wasn't (correctly)
documented. The same effect can be achieved with io_direct=wal.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned. The exact requirements vary by OS and file system (typically
sectors and/or memory pages). The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size. There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.
Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.
Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer
pool is also aligned in shared memory.
WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.
BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).
If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0. This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead. That's an existing and tested class of system.
Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple
blocks. When extending by a small number of blocks, use FileZero() instead, as
using posix_fallocate() for small numbers of blocks is inefficient for some
file systems / operating systems. FileZero() is also used as the fallback for
FileFallocate() on platforms / filesystems that don't support fallocate.
A big advantage of using posix_fallocate() is that it typically won't cause
dirty buffers in the kernel pagecache. So far the most common pattern in our
code is that we smgrextend() a page full of zeroes and put the corresponding
page into shared buffers, from where we later write out the actual contents of
the page. If the kernel, e.g. due to memory pressure or elapsed time, already
wrote back the all-zeroes page, this can lead to doubling the amount of writes
reaching storage.
There are no users of smgrzeroextend() as of this commit. That will follow in
future commits.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
In instr_time.h it is stated that:
* When summing multiple measurements, it's recommended to leave the
* running sum in instr_time form (ie, use INSTR_TIME_ADD or
* INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end.
The reason for that is that converting to microseconds is not cheap, and can
loose precision. Therefore this commit changes 'PendingWalStats' to use
'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time'
and 'wal_sync_time'.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/1feedb83-7aa9-cb4b-5086-598349d3f555@gmail.com
In ancient times, these belonged to arguments or fields that were
actually of type long, but now they are not anymore, so this "L"
decoration is just confusing. (Some other 0L and other "L" constants
remain, where they are actually associated with a long type.)
Open long-lived data and WAL file descriptors with O_CLOEXEC. This flag
was introduced by SUSv4 (POSIX.1-2008), and by now all of our target
Unix systems have it. Our open() implementation for Windows already had
that behavior, so provide a dummy O_CLOEXEC flag on that platform.
For now, callers of open() and the "thin" wrappers in fd.c that deal in
raw descriptors need to pass in O_CLOEXEC explicitly if desired. This
commit does that for WAL files, and automatically for everything
accessed via VFDs including SMgrRelation and BufFile. (With more
discussion we might decide to turn it on automatically for the thin
open()-wrappers too to avoid risk of missing places that need it, but
these are typically used for short-lived descriptors where we don't
expect to fork/exec, and it's remotely possible that extensions could be
using these APIs and passing descriptors to subprograms deliberately, so
that hasn't been done here.)
Do the same for sockets and the postmaster pipe with FD_CLOEXEC. (Later
commits might use modern interfaces to remove these extra fcntl() calls
and more where possible, but we'll need them as a fallback for a couple
of systems, so do it that way in this initial commit.)
With this change, subprograms executed for archiving, copying etc will
no longer have access to the server's descriptors, other than the ones
that we decide to pass down.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
Commits 04cad8f7 and 0c088568 supported old macOS systems that didn't
define O_CLOEXEC or O_DSYNC yet, but those arrived in macOS releases
10.7 and 10.6 (respectively), which themselves reached EOL around a
decade ago. We've already made use of other POSIX features that early
macOS vintages can't compile (for example commits 623cc673, d2e15083).
A later commit will use O_CLOEXEC on POSIX systems so it would be
strange to pretend here that it's optional, and we might as well give
O_DSYNC the same treatment since the reference is also guarded by a test
for a macOS-specific macro, and we know that current Macs have it.
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
This makes sure that the internal logic of these functions does not
attempt to change the value of the arguments constified, and it removes
one unconstify() in basic_archive.c.
Author: Nathan Bossart
Reviewed-by: Andrew Dunstan, Peter Eisentraut
Discussion: https://postgr.es/m/20230114231126.GA2580330@nathanxps13
Most callers of BufFileRead() want to check whether they read the full
specified length. Checking this at every call site is very tedious.
This patch provides additional variants BufFileReadExact() and
BufFileReadMaybeEOF() that include the length checks.
I considered changing BufFileRead() itself, but this function is also
used in extensions, and so changing the behavior like this would
create a lot of problems there. The new names are analogous to the
existing LogicalTapeReadExact().
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0eaa9c@enterprisedb.com
Commit 57faaf376 added pg_truncate(const char *path, off_t length), but
"length" was ignored under WIN32 and the file was unconditionally
truncated to 0.
There was no live bug, since the only caller passes 0.
Fix, and back-patch to 14 where the function arrived.
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20230106031652.GR3109%40telsasoft.com
Make the argument types of the File API match stdio better:
- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.
In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().
Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
This commit moves pg_pwritev_with_retry(), a convenience wrapper of
pg_writev() able to handle partial writes, to common/file_utils.c so
that the frontend code is able to use it. A first use-case targetted
for this routine is pg_basebackup and pg_receivewal, for the
zero-padding of a newly-initialized WAL segment. This is used currently
in the backend when the GUC wal_init_zero is enabled (default).
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Thomas Munro
Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of
the pg_ prefixes where we use pread(), pwrite() and vectored variants.
We dropped support for ancient Unixes where we needed to use lseek() to
implement replacements for those, but it turns out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile() if the file handle is synchronous, despite
its documentation saying otherwise.
Switching to asynchronous file handles would fix that, but have other
complications. For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect, which we can now
describe as Windows-only.
Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.