If a data block to be skipped over is less than 4kB, just read the
data instead of using fseeko(). Experimentation shows that this
avoids useless kernel calls --- possibly quite a lot of them, at
least with current glibc --- while not incurring any extra I/O,
since libc will read 4kB at a time anyway. (There may be platforms
where the default buffer size is different from 4kB, but this
change seems unlikely to hurt in any case.)
We don't expect short data blocks to be common in the wake of
66ec01dc4 and related commits. But older pg_dump files may well
contain very short data blocks, and that will likely be a case
to be concerned with for a long time.
While here, do a little bit of other cleanup in _skipData.
Make "buflen" be size_t not int; it can't really exceed the
range of int, but comparing size_t and int variables is just
asking for trouble. Also, when we initially allocate a buffer
for reading skipped data into, make sure it's at least 4kB to
reduce the odds that we'll shortly have to realloc it bigger.
Author: Dimitrios Apostolou <jimis@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2edb7a57-b225-3b23-a680-62ba90658fec@gmx.net
Presently, "pg_dump --format=custom" calls WriteToc() twice. The
second call updates the data offset information, which allegedly
makes parallel pg_restore significantly faster. However, if we're
not dumping any data, there are no data offsets to update, so we
can skip this step.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
Commit c0d5be5d6 caused pg_dump to create a separate BLOB metadata TOC
entry for each large object (blob), but it did not touch the ancient
decision to put all the blobs' data into a single "BLOBS" TOC entry.
This is bad for a few reasons: for databases with millions of blobs,
the TOC becomes unreasonably large, causing performance issues;
selective restore of just some blobs is quite impossible; and we
cannot parallelize either dump or restore of the blob data, since our
architecture for that relies on farming out whole TOC entries to
worker processes.
To improve matters, let's group multiple blobs into each blob metadata
TOC entry, and then make corresponding per-group blob data TOC entries.
Selective restore using pg_restore's -l/-L switches is then possible,
though only at the group level. (Perhaps we should provide a switch
to allow forcing one-blob-per-group for users who need precise
selective restore and don't have huge numbers of blobs. This patch
doesn't do that, instead just hard-wiring the maximum number of blobs
per entry at 1000.)
The blobs in a group must all have the same owner, since the TOC entry
format only allows one owner to be named. In this implementation
we also require them to all share the same ACL (grants); the archive
format wouldn't require that, but pg_dump's representation of
DumpableObjects does. It seems unlikely that either restriction
will be problematic for databases with huge numbers of blobs.
The metadata TOC entries now have a "desc" string of "BLOB METADATA",
and their "defn" string is just a newline-separated list of blob OIDs.
The restore code has to generate creation commands, ALTER OWNER
commands, and drop commands (for --clean mode) from that. We would
need special-case code for ALTER OWNER and drop in any case, so the
alternative of keeping the "defn" as directly executable SQL code
for creation wouldn't buy much, and it seems like it'd bloat the
archive to little purpose.
Since we require the blobs of a metadata group to share the same ACL,
we can furthermore store only one copy of that ACL, and then make
pg_restore regenerate the appropriate commands for each blob. This
saves space in the dump file not only by removing duplicative SQL
command strings, but by not needing a separate TOC entry for each
blob's ACL. In turn, that reduces client-side memory requirements for
handling many blobs.
ACL TOC entries that need this special processing are labeled as
"ACL"/"LARGE OBJECTS nnn..nnn". If we have a blob with a unique ACL,
continue to label it as "ACL"/"LARGE OBJECT nnn". We don't actually
have to make such a distinction, but it saves a few cycles during
restore for the easy case, and it seems like a good idea to not change
the TOC contents unnecessarily.
The data TOC entries ("BLOBS") are exactly the same as before,
except that now there can be more than one, so we'd better give them
identifying tag strings.
Also, commit c0d5be5d6 put the new BLOB metadata TOC entries into
SECTION_PRE_DATA, which perhaps is defensible in some ways, but
it's a rather odd choice considering that we go out of our way to
treat blobs as data. Moreover, because parallel restore handles
the PRE_DATA section serially, this means we'd only get part of the
parallelism speedup we could hope for. Move these entries into
SECTION_DATA, letting us parallelize the lo_create calls not just the
data loading when there are many blobs. Add dependencies to ensure
that we won't try to load data for a blob we've not yet created.
As this stands, we still generate a separate TOC entry for any comment
or security label attached to a blob. I feel comfortable in believing
that comments and security labels on blobs are rare, so this patch
should be enough to get most of the useful TOC compression for blobs.
We have to bump the archive file format version number, since existing
versions of pg_restore wouldn't know they need to do something special
for BLOB METADATA, plus they aren't going to work correctly with
multiple BLOBS entries or multiple-large-object ACL entries.
The directory and tar-file format handlers need some work
for multiple BLOBS entries: they used to hard-wire the file name
as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc".
The 002_pg_dump.pl test script also knows about that and requires
minor updates. (I had to drop the test for manually-compressed
blobs.toc files with LZ4, because lz4's obtuse command line
design requires explicit specification of the output file name
which seems impractical here. I don't think we're losing any
useful test coverage thereby; that test stanza seems completely
duplicative with the gzip and zstd cases anyway.)
In passing, centralize management of the lo_buf used to hold data
while restoring blobs. The code previously had each format handler
create lo_buf, which seems rather pointless given that the format
handlers all make it the same way. Moreover, the format handlers
never use lo_buf directly, making this setup a failure from a
separation-of-concerns standpoint. Let's move the responsibility into
pg_backup_archiver.c, which is the only module concerned with lo_buf.
The reason to do this in this patch is that it allows a centralized
fix for the now-false assumption that we never restore blobs in
parallel. Also, get rid of dead code in DropLOIfExists: it's been a
long time since we had any need to be able to restore to a pre-9.0
server.
Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
Switch pg_dump to use the Compression API, implemented by bf9aa490db.
The CompressFileHandle replaces the cfp* family of functions with a
struct of callbacks for accessing (compressed) files. This allows adding
new compression methods simply by introducing a new struct instance with
appropriate implementation of the callbacks.
Archives compressed using custom compression methods store an identifier
of the compression algorithm in their header instead of the compression
level. The header version is bumped.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Tomas Vondra
Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com
For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.
This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead. There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry. Additional long command-line options
are added with the new naming.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
Compression specifications are currently used by pg_basebackup and
pg_receivewal, and are able to let the user control in an extended way
the method and level of compression used. As an effect of this commit,
pg_dump's -Z/--compress is now able to use more than just an integer, as
of the grammar "method[:detail]".
The method can be either "none" or "gzip", and can optionally take a
detail string. If the detail string is only an integer, it defines the
compression level. A comma-separated list of keywords can also be used
method allows for more options, the only keyword supported now is
"level".
The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.
Most of the code changes are straight-forward, as pg_dump was relying on
an integer tracking the compression level to check for gzip or no
compression. These are changed to use a compression specification and
the algorithm stored in it.
As of this change, note that the dump format is not bumped because there
is no need yet to track the compression algorithm in the TOC entries.
Hence, we still rely on the compression level to make the difference
when reading them. This will be mandatory once a new compression method
is added, though.
In order to keep the code simpler when parsing the compression
specification, the code is changed so as pg_dump now fails hard when
using gzip on -Z/--compress without its support compiled, rather than
enforcing no compression without the user knowing about it except
through a warning. Like before this commit, archive and custom formats
are compressed by default when the code is compiled with gzip, and left
uncompressed without gzip.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in pg_dump/pg_dumpall
related code.
Affected code happens to be inconsistent in how it applies conventions
around how Archive and Archive Handle variables are named. Significant
code churn is required to fully fix those inconsistencies, so take the
least invasive approach possible: treat function definition names as
authoritative, and mechanically adjust corresponding names from function
definitions to match.
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAH2-Wzmma+vzcO6gr5NYDZ+sx0G14aU-UrzFutT2FoRaisVCUQ@mail.gmail.com
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
pg_dump produces custom-format archive files that lack data offsets
when it is unable to seek its output. Up to now that's been a hazard
for pg_restore. But if pg_restore is able to seek in the archive
file, there is no reason to throw up our hands when asked to restore
data blocks out of order. Instead, whenever we are searching for a
data block, record the locations of the blocks we passed over (that
is, fill in the missing data-offset fields in our in-memory copy of
the TOC data). Then, when we hit a case that requires going
backwards, we can just seek back.
Also track the furthest point that we've searched to, and seek back
to there when beginning a search for a new data block. This avoids
possible O(N^2) time consumption, by ensuring that each data block
is examined at most twice. (On Unix systems, that's at most twice
per parallel-restore job; but since Windows uses threads here, the
threads can share block location knowledge, reducing the amount of
duplicated work.)
We can also improve the code a bit by using fseeko() to skip over
data blocks during the search.
This is all of some use even in simple restores, but it's really
significant for parallel pg_restore. In that case, we require
seekability of the input already, and we will very probably need
to do out-of-order restores.
Back-patch to v12, as this fixes a regression introduced by commit
548e50976. Before that, parallel restore avoided requesting
out-of-order restores, so it would work on a data-offset-less
archive. Now it will again.
Ideally this patch would include some test coverage, but there are
other open bugs that need to be fixed before we can extend our
coverage of parallel restore very much. Plan to revisit that later.
David Gilman and Tom Lane; reviewed by Justin Pryzby
Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
We do not really need to track the file position by hand. We were
already relying on ftello() whenever the archive file is seekable,
while if it's not seekable we don't need the file position info
anyway because we're not going to be able to re-write the TOC.
Moreover, that tracking was buggy since it failed to account for
the effects of fseeko(). Somewhat remarkably, that seems not to
have made for any live bugs up to now. We could fix the oversights,
but it seems better to just get rid of the whole error-prone mess.
In itself this is merely code cleanup. However, it's necessary
infrastructure for an upcoming bug-fix patch (because that code
*does* need valid file position after fseeko). The bug fix
needs to go back as far as v12; hence, back-patch that far.
Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies. That makes it
pointless to have distinct libraries at all. The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.
We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)
Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511. There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.
Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
Previously, the way this worked was that a parallel pg_dump would
re-order the TABLE_DATA items in the dump's TOC into decreasing size
order, and separately re-order (some of) the INDEX items into decreasing
size order. Then pg_dump would dump the items in that order. Later,
parallel pg_restore just followed the TOC order. This method had lots
of deficiencies:
* TOC ordering randomly differed between parallel and non-parallel
dumps, and was hard to predict in the former case, causing problems
for building stable pg_dump test cases.
* Parallel restore only followed a well-chosen order if the dump had
been done in parallel; in particular, this never happened for restore
from custom-format dumps.
* The best order for restore isn't necessarily the same as for dump,
and it's not really static either because of locking considerations.
* TABLE_DATA and INDEX items aren't the only things that might take a lot
of work during restore. Scheduling was particularly stupid for the BLOBS
item, which might require lots of work during dump as well as restore,
but was left to the end in either case.
This patch removes the logic that changed the TOC order, fixing the
test instability problem. Instead, we sort the parallelizable items
just before processing them during a parallel dump. Independently
of that, parallel restore prioritizes the ready-to-execute tasks
based on the size of the underlying table. In the case of dependent
tasks such as index, constraint, or foreign key creation, the largest
relevant table is used as the metric for estimating the task length.
(This is pretty crude, but it should be enough to avoid the case we
want to avoid, which is ending the run with just a few large tasks
such that we can't make use of all N workers.)
Patch by me, responding to a complaint from Peter Eisentraut,
who also reviewed the patch.
Discussion: https://postgr.es/m/5137fe12-d0a2-4971-61b6-eb4e7e8875f8@2ndquadrant.com
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The existing APIs for creating and parsing command and status messages are
rather messy; for example, archive-format modules have to provide code
for constructing command messages, which is entirely pointless since
the code to read them is hard-wired in WaitForCommands() and hence
no format-specific variation is actually possible. But there's little
foreseeable reason to need format-specific variation anyway.
The situation for status messages is no better; at least those are both
constructed and parsed by format-specific code, but said code is quite
redundant since there's no actual need for format-specific variation.
To add insult to injury, the first API involves returning pointers to
static buffers, which is bad, while the second involves returning pointers
to malloc'd strings, which is safer but randomly inconsistent.
Hence, get rid of the MasterStartParallelItem and MasterEndParallelItem
APIs, and instead write centralized functions that construct and parse
command and status messages. If we ever do need more flexibility, these
functions can be the standard implementations of format-specific
callback methods, but that's a long way off if it ever happens.
Tom Lane, reviewed by Kevin Grittner
Discussion: <17340.1464465717@sss.pgh.pa.us>
The ListenToWorkers/ReapWorkerStatus APIs were messy and hard to use.
Instead, make DispatchJobForTocEntry register a callback function that
will take care of state cleanup, doing whatever had been done by the caller
of ReapWorkerStatus in the old design. (This callback is essentially just
the old mark_work_done function in the restore case, and a trivial test for
worker failure in the dump case.) Then we can have ListenToWorkers call
the callback immediately on receipt of a status message, and return the
worker to WRKR_IDLE state; so the WRKR_FINISHED state goes away.
This allows us to design a unified wait-for-worker-messages loop:
WaitForWorkers replaces EnsureIdleWorker and EnsureWorkersFinished as well
as the mess in restore_toc_entries_parallel. Also, we no longer need the
fragile API spec that the caller of DispatchJobForTocEntry is responsible
for ensuring there's an idle worker, since DispatchJobForTocEntry can just
wait until there is one.
In passing, I got rid of the ParallelArgs struct, which was a net negative
in terms of notational verboseness, and didn't seem to be providing any
noticeable amount of abstraction either.
Tom Lane, reviewed by Kevin Grittner
Discussion: <1188.1464544443@sss.pgh.pa.us>
It was using %u to read a string that was earlier produced by snprintf with %d
into a signed integer variable. This seems to work in practice but is
incorrect.
found by cppcheck
Rather than passing around DumpOptions and RestoreOptions as separate
arguments, add fields to struct Archive to carry pointers to these objects,
and access them through those fields when needed. There already was a
RestoreOptions pointer in Archive, though for no obvious reason it was part
of the "private" struct rather than out where pg_dump.c could see it.
Doing this allows reversion of quite a lot of parameter-addition changes
made in commit 0eea8047bf, which is a good thing IMO because this will
reduce the code delta between 9.4 and 9.5, probably easing a few future
back-patch efforts. Moreover, the previous commit only added a DumpOptions
argument to functions that had to have it at the time, which means we could
anticipate still more code churn (and more back-patch hazard) as the
requirement spread further. I'd hit exactly that problem in my upcoming
patch to fix extension membership marking, which is what motivated me to
do this.
Most pg_dump.c global variables, which were passed down individually to
dumping routines, are now grouped as members of the new DumpOptions
struct, which is used as a local variable and passed down into routines
that need it. This helps future development efforts; in particular it
is said to enable a mode in which a parallel pg_dump run can output
multiple streams, and have them restored in parallel.
Also take the opportunity to clean up the pg_dump header files somewhat,
to avoid circularity.
Author: Joachim Wieland, revised by Álvaro Herrera
Reviewed by Peter Eisentraut
This was accidentally broken in commits cfa1b4a711/5e8e794e3b.
It saves a line or so to call ftello unconditionally in _CloseArchive,
but we have to expect that it might fail if we're not in hasSeek mode.
Per report from Bernd Helmle.
In passing, improve _getFilePos to print an appropriate message if
ftello fails unexpectedly, rather than just a vague complaint about
"ftell mismatch".
Rather than reset errno (or just hope that its cleared already),
check just the result of the ftello for < 0 to determine if there
was an issue.
Oversight by me, pointed out by Tom.
Improve pg_dump by checking results on various fgetc() calls which
previously were unchecked, ditto for ftello. Also clean up a couple
of very minor memory leaks by waiting to allocate structures until
after the initial check(s).
Issues spotted by Coverity.
Move functions used only by pg_dump and pg_restore from dumputils.c to a new
file, pg_backup_utils.c. dumputils.c is linked into psql and some programs
in bin/scripts, so it seems good to keep it slim. The parallel functionality
is moved to parallel.c, as is exit_horribly, because the interesting code in
exit_horribly is parallel-related.
This refactoring gets rid of the on_exit_msg_func function pointer. It was
problematic, because a modern gcc version with -Wmissing-format-attribute
complained if it wasn't marked with PF_PRINTF_ATTRIBUTE, but the ancient gcc
version that Tom Lane's old HP-UX box has didn't accept that attribute on a
function pointer, and gave an error. We still use a similar function pointer
trick for getLocalPQBuffer() function, to use a thread-local version of that
in parallel mode on Windows, but that dodges the problem because it doesn't
take printf-like arguments.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines. We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.
The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.
At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly. To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc(). This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.
This lets us clean up some places that were already with
localized hacks.
Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that. libpgcommon infrastructure was authored by Álvaro.
We had a number of variants on the theme of "malloc or die", with the
majority named like "pg_malloc", but by no means all. Standardize on the
names pg_malloc, pg_malloc0, pg_realloc, pg_strdup. Get rid of pg_calloc
entirely in favor of using pg_malloc0.
This is an essentially cosmetic change, so no back-patch. (I did find
a couple of places where psql and pg_dump were using plain malloc or
strdup instead of the pg_ versions, but they don't look significant
enough to bother back-patching.)
The old code was using exit_horribly or die_horribly other depending on
whether it had an ArchiveHandle on which to close the connection or not;
but there were places that were passing a NULL ArchiveHandle to
die_horribly, and other places that used exit_horribly while having an
AH available. So there wasn't all that much consistency.
Improve the situation by keeping only one of the routines, and instead
of having to pass the AH down from the caller, arrange for it to be
present for an on_exit_nicely callback to operate on.
Author: Joachim Wieland
Some tweaks by me
Per a suggestion from Robert Haas, in the ongoing "parallel pg_dump"
saga.