In a similar effort to f736e188c and 110d81728, fixup various usages of
string functions where a more appropriate function is available and more
fit for purpose.
These changes include:
1. Use cstring_to_text_with_len() instead of cstring_to_text() when
working with a StringInfoData and the length can easily be obtained.
2. Use appendStringInfoString() instead of appendStringInfo() when no
formatting is required.
3. Use pstrdup(...) instead of psprintf("%s", ...)
4. Use pstrdup(...) instead of psprintf(...) (with no formatting)
5. Use appendPQExpBufferChar() instead of appendPQExpBufferStr() when the
length of the string being appended is 1.
6. appendStringInfoChar() instead of appendStringInfo() when no formatting
is required and string is 1 char long.
7. Use appendPQExpBufferStr(b, .) instead of appendPQExpBuffer(b, "%s", .)
8. Don't use pstrdup when it's fine to just point to the string constant.
I (David) did find other cases of #8 but opted to use #4 instead as I
wasn't certain enough that applying #8 was ok (e.g in hba.c)
Author: Ranier Vilela, David Rowley
Discussion: https://postgr.es/m/CAApHDvo2j2+RJBGhNtUz6BxabWWh2Jx16wMUMWKUjv70Ver1vg@mail.gmail.com
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).
Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls. While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order. Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.
Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
In a similar effort to f01592f91, here we're targetting fixing the
warnings where we've deemed the shadowing variable to serve a close enough
purpose to the shadowed variable just to reuse the shadowed version and
not declare the shadowing variable at all.
By my count, this takes the warning count from 106 down to 71.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
<sys/resource.h> is in SUSv2 and is on all targeted Unix systems. We
have a replacement for getrusage() on Windows, so let's just move its
declarations into src/include/port/win32/sys/resource.h so that we can
use a standard-looking #include. Also remove an obsolete reference to
CLK_TCK. Also rename src/port/getrusage.c to win32getrusage.c,
following the convention for Windows-only fallback code.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
getrlimit() is in SUSv2 and all targeted systems have it.
Windows doesn't have it. We could just use #ifndef WIN32, but for a
little more explanation about why we're making things conditional, let's
retain the HAVE_GETRLIMIT macro. It's defined in port.h for Unix systems.
On systems that have it, it's not necessary to test for RLIMIT_CORE,
RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all
targeted systems have them. Also remove references to a pre-historic
alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe
that Cygwin didn't have it.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
Remove meaningless "failures" column from the aggregate logging. It
was just a sum of "serialization failures" and "deadlock failures".
Pointed out by Tom Lane. Patch reviewed by Fabien COELHO.
Discussion: https://postgr.es/m/4183048.1649536705%40sss.pgh.pa.us
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
Commit 4a39f87acd changed the aggregated log format. Problem is, now
the explanatory paragraph for the log line in the document is too
long. Also the log format included more optional columns, and it's
harder to parse the log lines. This commit tries to solve the
problems.
- There's no optional log columns anymore. If a column is not
meaningful with provided pgbench option, it will be presented as 0.
- Reorder the log columns so that it's easier to parse them.
- Adjust explanatory paragraph for the log line in the doc.
Discussion: https://postgr.es/m/flat/202203280757.3tu4ovs3petm%40alvherre.pgsql
In the wake of commit 4a39f87ac, which noticeably increased the
size of struct StatsData and thereby ParsedScript, Coverity started
to complain that ParsedScript was unreasonably large to be passing
by value. The two places that do this are only used during setup,
so they're not really dragging down benchmark measurements --- but
gratuitous inefficiency is not a good look in a benchmarking program.
Convert to use pointers instead.
When serialization or deadlock errors are reported by backend, allow
to retry and continue the benchmarking. For this purpose new options
"--max-tries", "--failures-detailed" and "--verbose-errors" are added.
Transactions with serialization errors or deadlock errors will be
repeated after rollbacks until they complete successfully or reach the
maximum number of tries (specified by the --max-tries option), or the
maximum time of tries (specified by the --latency-limit option).
These options can be specified at the same time. It is not possible to
use an unlimited number of tries (--max-tries=0) without the
--latency-limit option or the --time option. By default the option
--max-tries is set to 1, which means transactions with
serialization/deadlock errors are not retried. If the last try fails,
this transaction will be reported as failed, and the client variables
will be set as they were before the first run of this transaction.
Statistics on retries and failures are printed in the progress,
transaction / aggregation logs and in the end with other results (all
and for each script). Also retries and failures are printed
per-command with average latency by using option
(--report-per-command, -r).
Option --failures-detailed prints group failures by basic types
(serialization failures / deadlock failures).
Option --verbose-errors prints distinct reports on errors and failures
(errors without retrying) by type with detailed information like which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock failures.
Patch originally written by Marina Polyakova then Yugo Nagata
inherited the discussion and heavily modified the patch to make it
commitable.
Authors: Yugo Nagata, Marina Polyakova
Reviewed-by: Fabien Coelho, Tatsuo Ishii, Alvaro Herrera, Kevin Grittner, Andres Freund, Arthur Zakirov, Alexander Korotkov, Teodor Sigaev, Ildus Kurbangaliev
Discussion: https://postgr.es/m/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
Previously failures of initial connection and logfile open caused pgbench
to proceed the benchmarking, report the incomplete results and exit with
status 2. It didn't make sense to proceed the benchmarking even when
pgbench could not start as prescribed.
This commit improves pgbench so that early errors that occur when
starting benchmark such as those failures should make pgbench exit
immediately with status 1.
Author: Yugo Nagata
Reviewed-by: Fabien COELHO, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/TYCPR01MB5870057375ACA8A73099C649F5349@TYCPR01MB5870.jpnprd01.prod.outlook.com
Previously socket errors such as invalid socket or socket wait method failures
during benchmark caused pgbench to exit with status 0. Instead, errors during
the run should result in exit status 2.
Back-patch to v12 where pgbench started reporting exit status.
Original complaint and patch by Hayato Kuroda.
Author: Yugo Nagata, Fabien COELHO
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/TYCPR01MB5870057375ACA8A73099C649F5349@TYCPR01MB5870.jpnprd01.prod.outlook.com
The failure of socket wait method like "select()" doesn't terminate pgbench.
So the log level of error message when that failure happens should be ERROR.
But previously FATAL was used in that case.
Back-patch to v13 where pgbench started using common logging API.
Author: Yugo Nagata, Fabien COELHO
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/20210617005934.8bd37bf72efd5f1b38e6f482@sraoss.co.jp
When throttling is used, transactions that lag behind schedule by
more than the latency limit are counted and reported as skipped.
Previously, there was the case where pgbench counted all skipped
transactions even if the timer specified in -T option was exceeded.
This could take a very long time to do that especially when unrealistically
high rate setting in -R option caused quite a lot of transactions that
lagged behind schedule. This could prevent pgbench from ending
immediately, and so pgbench could look like it got stuck to users.
To fix the issue, this commit changes pgbench so that it stops counting
skipped transactions as soon as the timer is exceeded. The timer can
make pgbench end soon even when there are lots of skipped transactions
that have not been counted yet.
Note that there is no guarantee that all skipped transactions are
counted under -T though there is under -t. This is OK in practice
because it's very unlikely to happen with realistic setting. Also this is
not the issue that this commit newly introdues. There used to be
the case where pgbench ended without counting all skipped
transactions since before.
Back-patch to v14. Per discussion, we decided not to bother
back-patch to the stable branches because it's hard to imagine
the issue happens in practice (with realistic setting).
Author: Yugo Nagata, Fabien COELHO
Reviewed-by: Greg Sabino Mullane, Fujii Masao
Discussion: https://postgr.es/m/20210613040151.265ff59d832f835bbcf8b3ba@sraoss.co.jp
While populating the pgbench_accounts table, plain COPY was
unconditionally used. By changing it to COPY FREEZE, the time for
VACUUM is significantly reduced, thus the total time of "pgbench -i"
is also reduced. This only happens if pgbench runs against PostgreSQL
14 or later because COPY FREEZE in previous versions of PostgreSQL does
not bring the benefit. Also if partitioning is used, COPY FREEZE
cannot be used. In this case plain COPY will be used too.
Author: Tatsuo Ishii
Discussion: https://postgr.es/m/20210308.143907.2014279678657453983.t-ishii@gmail.com
Reviewed-by: Fabien COELHO, Laurenz Albe, Peter Geoghegan, Dean Rasheed
When -C/--connect option is specified, pgbench establishes and closes
the connection for each transaction. In this case pgbench needs to
measure the times taken for all those connections and disconnections,
to include the average connection time in the benchmark result.
But previously pgbench could not measure those disconnection delays.
To fix the bug, this commit makes pgbench measure the disconnection
delays whenever the connection is closed at the end of transaction,
if -C/--connect option is specified.
Back-patch to v14. Per discussion, we concluded not to back-patch to v13
or before because changing that behavior in stable branches would
surprise users rather than providing benefits.
Author: Yugo Nagata
Reviewed-by: Fabien COELHO, Tatsuo Ishii, Asif Rehman, Fujii Masao
Discussion: https://postgr.es/m/20210614151155.a393bc7d8fed183e38c9f52a@sraoss.co.jp
Commit 547f04e734 changed pgbench so that it used the measurement result
of connection delays in its benchmark report only when -C/--connect option
is specified. But previously those delays were unnecessarily measured
even when that option is not specified. Which was a waste of cycles.
This commit improves pgbench so that it avoids such unnecessary measurement.
Back-patch to v14 where commit 547f04e734 first appeared.
Author: Yugo Nagata
Reviewed-by: Fabien COELHO, Asif Rehman, Fujii Masao
Discussion: https://postgr.es/m/20210614151155.a393bc7d8fed183e38c9f52a@sraoss.co.jp
Up to now we did a PQconsumeInput() for each pipelined query, asking the OS
for more input - which it often won't have, as all results might already have
been sent. That turns out to have a noticeable performance impact.
Alvaro Herrera reviewed the idea to add the PQisBusy() check, but not this
concrete patch.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210720180039.23rivhdft3l4mayn@alap3.anarazel.de
Backpatch: 14, where libpq/pgbench pipelining was introduced.
The following changes are done:
- In pg_archivecleanup, the cleanup of older WAL segments would never
fail immediately.
- In pgbench, the initialization of a thread barrier would not fail
hard.
- In pg_recvlogical, a stat() failure never got the call.
- In pg_basebackup, two chmod() reported a failure without exit()'ing
when unpacking some tar data freshly received. It may be possible to
continue writing some data even after this failure, but that could be
confusing to the user at the end.
These are arguably bugs, but they would happen for code paths where a
failure is unlikely going to happen, so no backpatch is done.
Reviewed-by: Robert Haas, Fabien Coelho
Discussion: https://postgr.es/m/YQDMdB+B68yePFeT@paquier.xyz
Most of the integer options for command-line binaries now make use of a
single routine able to do the job, fixing issues with the detection of
sloppy values caused for example by the use of atoi(), that fails on
strings beginning with numerical characters with junk trailing
characters.
This commit cuts down the number of strings requiring translation by 26
per my count, switching the code to have two error types for invalid and
out-of-range values instead.
Much more could be done here, with float or even int64 options, but
int32 was the most appealing case as it is possible to rely on strtol()
to do the job reliably. Note that there are some exceptions for now,
like pg_ctl or pg_upgrade that use their own logging logic. A couple of
negative TAP tests required some adjustments for the new errors
generated.
pg_dump and pg_restore tracked the maximum number of parallel jobs
within the option parsing. The code is refactored a bit to track that
in the code dedicated to parallelism instead.
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CALj2ACXqdG9WhqVoJ9zYf-iZt7sgK7Szv5USs=he6NnWQ2ofTA@mail.gmail.com
Instead use the common/int.h functions to check for integer overflow
in a more C-standard-compliant fashion. This is motivated by recent
failures on buildfarm member moonjelly, where it appears that
development-tip gcc is optimizing without regard to the -fwrapv
switch. Presumably that's a gcc bug that will be fixed soon, but
we might as well install cleaner coding here rather than wait.
(This does not address the question of whether we'll ever be able
to get rid of using -fwrapv. Testing shows that this spot is the
only place where doing so creates visible regression test failures,
but unfortunately that proves very little.)
Back-patch to v12. The common/int.h functions exist in v11, but
that branch doesn't use them in any client-side code. I judge
that this case isn't interesting enough in the real world to take
even a small risk of issues from being the first such use.
Tom Lane and Fabien Coelho
Discussion: https://postgr.es/m/73927.1624815543@sss.pgh.pa.us
Commit 547f04e73 caused pgbench to start printing its version number,
which seems like a fine idea, but it needs a bit more work:
* Print the server version number too, when different.
* Print the PG_VERSION string, not some reconstructed approximation.
This patch copies psql's well-tested code for the same purpose.
Discussion: https://postgr.es/m/1226654.1624036821@sss.pgh.pa.us
We've accumulated quite a mix of instances of "an SQL" and "a SQL" in the
documents. It would be good to be a bit more consistent with these.
The most recent version of the SQL standard I looked at seems to prefer
"an SQL". That seems like a good lead to follow, so here we change all
instances of "a SQL" to become "an SQL". Most instances correctly use
"an SQL" already, so it also makes sense to use the dominant variation in
order to minimise churn.
Additionally, there were some other abbreviations that needed to be
adjusted. FSM, SSPI, SRF and a few others. Also fix some pronounceable,
abbreviations to use "a" instead of "an". For example, "a SASL" instead
of "an SASL".
Here I've only adjusted the documents and error messages. Many others
still exist in source code comments. Translator hint comments seem to be
the biggest culprit. It currently does not seem worth the churn to change
these.
Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
Also "make reformat-dat-files".
The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
This adds a new function, permute(), that generates pseudorandom
permutations of arbitrary sizes. This can be used to randomly shuffle
a set of values to remove unwanted correlations. For example,
permuting the output from a non-uniform random distribution so that
all the most common values aren't collocated, allowing more realistic
tests to be performed.
Formerly, hash() was recommended for this purpose, but that suffers
from collisions that might alter the distribution, so recommend
permute() for this purpose instead.
Fabien Coelho and Hironobu Suzuki, with additional hacking be me.
Reviewed by Thomas Munro, Alvaro Herrera and Muhammad Usama.
Discussion: https://postgr.es/m/alpine.DEB.2.21.1807280944370.5142@lancre
This commit improves pgbench \sleep command so that it handles
the following three cases more properly.
(1) When only one argument was specified in \sleep command and
it's not a number, previously pgbench reported a confusing error
message like "unrecognized time unit, must be us, ms or s".
This commit fixes this so that more proper error message like
"invalid sleep time, must be an integer" is reported.
(2) When two arguments were specified in \sleep command and
the first argument was not a number, previously pgbench treated
that argument as the sleep time 0. No error was reported in this
case. This commit fixes this so that an error is thrown in this
case.
(3) When a variable was specified as the first argument in \sleep
command and the variable stored non-digit value, previously
pgbench treated that argument as the sleep time 0. No error
was reported in this case. This commit fixes this so that
an error is thrown in this case.
Author: Kota Miyake
Reviewed-by: Hayato Kuroda, Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/23b254daf20cec4332a2d9168505dbc9@oss.nttdata.com
Commit 547f04e7 produced errors on AIX/xlc while building plpython. The
new code appears to be incompatible with the hack installed by commit
a11cf433. Without access to an AIX system to check, my guess is that
_POSIX_C_SOURCE may be required for <time.h> to declare the things the
header needs to see, but plpython.h undefines it.
For now, to unbreak build farm animal hoverfly, just move the new
pg_time_usec_t support into pgbench.c. Perhaps later we could figure
out what to rearrange to put it back into a header for wider use.
Discussion: https://postgr.es/m/CA%2BhUKG%2BP%2BjcD%3Dx9%2BagyTdWtjpOT64MYiGic%2Bcbu_TD8CV%3D6A3w%40mail.gmail.com
1. pg_time_usec_t needs to be printed with INT64_FORMAT, not %ld, or 32
bit systems complain, per lapwing.
2. Some Windows compilers didn't like a thread function not marked with
__stdcall, per whelk; let's see if this fixes the problem.