Per buildfarm member culicidae, the query checking for stats reported by
the WAL summarizer related to WAL reads is proving to be unstable.
Instead of a one-time query, this commit replaces the logic with a
polling query checking for the WAL read stats, making the test more
reliable on machines that could be slow with the stats reports.
This test has been introduced in f4694e0f35, so backpatch down to v18.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/f35ba3db-fca7-4693-bc35-6db64488e4b1@gmail.com
Backpatch-through: 18
The WAL receiver and WAL summarizer processes gain each one a call to
pgstat_report_wal(), to make sure that they report their WAL statistics
to pgstats, gathering data for pg_stat_io.
In the WAL receiver, the stats reports are timed with status updates sent
to the primary, that depend on wal_receiver_status_interval and
wal_receiver_timeout. This is a conservative choice, but perhaps we
could be more aggressive with the frequency of the stats reports. An
interesting historical fact is that the WAL receiver does writes and
syncs of WAL, but it has never reported its statistics to pgstats in
pg_stat_wal.
In the WAL summarizer, the stats reports are done each time the process
waits for WAL.
While on it, pg_stat_io is adjusted so as these two processes do not
report any rows when IOObject is not WAL, making the view easier to use
with less rows.
Two tests are added in TAP, checking statistics for the WAL summarizer
and the WAL receiver. Status updates in the WAL receiver are currently
possible in the recovery test 001_stream_rep.pl.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz
For some reason this listed "-f" and "-w" as valid switches, though
the code doesn't implement any such thing nor do the docs mention
them. The effect of this was that if you tried to use one of these
switches, you'd get an unhelpful error message.
Yusuke Sugie
Discussion: https://postgr.es/m/68e72a2a70f4d84c1c7847b13bcdaef8@oss.nttdata.com
Files in common/ and fe_utils/ that contain translatable strings need
to be listed in the nls.mk files of the programs that use them. (Not
great, but that's the way it works for now.) This usually requires
some manual analysis which is done about once during each major
release beta period. This time, I wrote a hackish script that figures
some of this out more automatically, so this update is a bit larger as
it also includes some files that were missed in the past.
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Commit 6b80394781 introduced integer comparison functions designed
to be as efficient as possible while avoiding overflow. This
commit makes use of these functions in many of the in-tree qsort()
comparators to help ensure transitivity. Many of these comparator
functions should also see a small performance boost.
Author: Mats Kindahl
Reviewed-by: Andres Freund, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
The latest buildfarm failures show that after the insert, we don't
actually wait long enough for WAL summarization to catch up, apparently
because the on disk state gets updated before the in-memory state, and
so by checking the on disk state to see whether we're caught up and then
the in-memory state to see where exactly how far we've progressed, we
can, if unlucky, derive an older value of summarized_lsn, messing up
the rest of the test.
Attempt to fix this by using pg_available_wal_summaries() everywhere in
the test and pg_get_wal_summarizer_state() nowhere.
Per buildfarm.
Make the new tuple larger than the old one so that it, hopefully, won't
manage to squeeze into leftover freespace on the same page. The test
is trying to verify that the UPDATE touches 2 pages, but if a HOT
update happens, then it doesn't.
Per buildfarm.
Analysis of buildfarm results showed that the code that was intended
to wait for the inserts performed by this test to complete did not
actually do so. Try to make that logic more robust.
Improve error checking elsewhere in the script, too, so that we
don't miss things like poll_query_until failing.
Along the way, fix a bit of pgindent damage introduced by commit
5ddf997347, which aimed to help us
debug the failures that this commit is trying to fix. It's making
the buildfarm sad.
Discussion: http://postgr.es/m/CA+TgmobWFb8NqyfC31YnKAbZiXf9tLuwmyuvx=iYMXMniPQ4nw@mail.gmail.com
The tests in 002_blocks.pl are failing in the buildfarm from time to
time, but we don't know how to reproduce the failure elsewhere. The
most obvious explanation seems to be the unexpected disappearance of a
WAL summary file, so bump up the logging level in
RemoveWalSummaryIfOlderThan to try to help us spot such problems, and
print the cutoff time in addition to the removed filename. Also
adjust 002_blocks.pl to dump out a directory listing of the relevant
directory at various points.
This patch should be reverted once we sort out what's happening here.
Patch by me, reviewed by Nathan Bossart, who also reported the issue.
Discussion: http://postgr.es/m/20240124170846.GA2643050@nathanxps13
It missed a entry for tmp_check/ generated by the tests. While on it,
append a slash at the beginning of "pg_walsummary" to restrict its check
to the current directory, like anywhere else.
Oversights in ee1bfd1683.
This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be useful to external tools.
Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com