87776bc9 updated the RHEL documentation to PostgreSQL 13/14 but did not update recovery highlighting to be compatible with RHEL. This was not caught because the RHEL documentation was being build as PDF, which does not do highlighting.
Instead build the RHEL documentation as HTML in the first stage (and PDF in the second) so the error is caught.
Finally, fix the RHEL documentation to generate the highlight by concatenating the log.
The documentation was a bit misleading regarding how incremental backups are expired. Update the misleading part ("Differential and Incremental backups are count-based...") and move the explanation of how incremental expiration works out of differential expiration into the introductory paragraph.
Also add a note about how full backups are considered as differential for the purpose of expiration.
37544da5 missed another place where 57ffd2df had accidentally hard-coded the integration test architecture to x86_64.
Fix the test code to use the correct image based on architecture.
Warn about calls to formatted input/output functions such as sprintf and vsprintf that might overflow the destination buffer.
This warning found a few short float buffers. In practice these are unlikely to ever be a problem based on our float usage but best to be safe.
Note that this warning only requires the buffer size to be 317 but it must be 318 to prevent warnings from -Wformat-truncation=2. We are not ready to enable that warning yet but seems better to get the buffer correct now.
The Vagrantfile has not been maintained in years and the Dockerfile is only used by a single developer. There are instructions for building a development environment in CONTRIBUTING.md so these build files are no longer required.
Some options can contain multiple key/value pairs. However, if if the key is specified again the value will be silently overwritten. In most cases one value per key is appropriate, but it makes sense to warn the user about the overwrite.
Currently the pg-socket-path option is required to be a valid absolute path but this precludes the use of abstract domain sockets.
Set the option type to string so abstract domain sockets are allowed. This removes some validation but libpq will report if the path is invalid and we don't use it for any other purpose.
The content-md5 header was generated for all requests with content but it is only required for batch delete requests. It is not clear why this header is required when x-amz-content-sha256 is also provided or why it
is required only for this request but the documentation is clear on the matter. However, the content for these requests is relatively small compared to uploading files so omitting content-md5 where possible will
save some CPU cycles.
Current AWS S3 and recent Minio don't complain if this header is missing but since it is still required by older versions of Minio and it is specified in the documentation for batch delete it is makes sense to
keep it.
This macro allows static List objects to be created which simplifies usage when passing lists to functions. Also, since List objects are commonly used this makes the code base a bit more consistent.
For now skip static lists that are only used locally within a function since the benefit is not as clear.
The prior documentation said that multiple stanzas should be specified by repeating the tls-server-auth option. This is incorrect -- in fact a comma-separated list of stanza should be used.
Fix the documentation and add a test to confirm this behavior.
In passing add some const qualifiers that were missing in the relevant code.
57ffd2df accidentally hard-coded the integration test architecture to x86_64 and it was not noticed because in CI all the integration tests run on that architecture.
Fix the test code to retrieve the current architecture and use it for integration tests.
This method was introduced in cad595f but on further reflection it does not seem worth the added complexity just to make restore consistency faster without improving the speed of the overall backup. The most common recovery case is PITR and this method produces diminishing returns as the recovery time gets further from the backup end time.
A better solution (not implemented here) is to copy unmodified files from prior backups. This is much faster than recopying and compressing files from the cluster (especially on object stores with a copy command) and can even be done after the backup window to further reduce WAL replay required for consistency. It also reduces load on the host where the backup is made.
Add support for the verify command --set option. This (internal) option was already accepted without errors but was not implemented.
The default behavior for verify is to check all the backups present. With the --set option only the specified backup will be verified. If the specified backup label is not valid an error is added to the result and verification is skipped. In addition, only WAL required to make the specified backup consistent will be verified.
The prior text implied that verify could only operate on a single repository. Make the summary more general to indicate that the command can work on any repository.
RHEL 7 is EOL so remove it from the label. Rather than update the version range just remove it from the label since the user guide is generally applicable to RHEL.
PostgreSQL 12 is EOL and no longer available in the yum.postgresql.org repository.
Update the base and update versions of the RHEL and Debian documentation to better cover supported versions.
If the selected backup to restore was not in the default (lowest number) repository and block incremental was used, then restore would erroneously try to load the file super block list from the default repository. Specifying --repo would fix this since it changed the default repository.
Fix by updating the super block read to the specified repository.
This macro is only ever called last in functions so this is not an active issue, but it makes sense to fix since it would pose a risk for future development.
Add a new output option to the version command that defaults to text (current behavior) but can be set to num to output the version in a numeric format, e.g. 2055000.
This makes it easier to compare and identify versions of pgBackRest.
The Github action we were using for multi-architecture testing stopped working. The project does not seem to be getting regular maintenance so it seems better to roll multi-architecture testing into our existing container builds.
Introduce multi-architecture builds and testing into our test framework. For now this only works for unit tests -- integration tests will still only run on x86_64. That could be updated in the future but since emulation is so slow it is not clear if it would be useful.
Also fix an invalid 32-bit checksum. The d11 test had not been running as 32-bit since d8ff89a so the checksum was not updated when it should have been in 48f511d.
GCS user projects allow the bucket requester to be billed rather than the bucket owner. This is useful for hosted services that create backups and want to provide direct (probably read-only) access to the backups.
S3 requester pays allows the bucket requester to be billed rather than the bucket owner. This is useful for hosted services that create backups and want to provide direct (probably read-only) access to the backups.
When archive expiration has a large number of files to remove it may look like the process has hung, at least at higher log levels.
Add a log message at detail level to show that progress is being made.
The prior logging only output the last path to be removed since start was overwritten as each path was deleted. This had no affect on expire functionality but was confusing since many more files might be expired than the logs indicated.
Fix logging so the correct start path is logged.
While process-max is as useful for object stores as any other storage type, for file creation time in particular file bundling is far more effective since fewer files are created.
Update the recommendation to reflect this.
If the user picks an invalid timeline (or the default is invalid) they will not discover it until after the restore is complete and recovery starts. In that case they'll receive a message like this:
FATAL: requested timeline 2 is not a child of this server's history
DETAIL: Latest checkpoint is at 0/7000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/600AA20.
This message generally causes confusion unless one is familiar with it. In this case 1) a standby was promoted creating a new timeline 2) a new backup was made from the primary 3) the new backup was restored but could not follow the new timeline because the backup was made after the new timeline forked off. Since PostgreSQL 12 following the latest timeline has been the default so this error has become common in split brain situations.
Improve pgBackRest to read the history files and provide better error messages. Now this error is thrown before the restore starts:
ERROR: [058]: target timeline 2 forked from backup timeline 1 at 0/600aa20 which is before backup lsn of 0/7000028
HINT: was the target timeline created by accidentally promoting a standby?
HINT: was the target timeline created by testing a restore without --archive-mode=off?
HINT: was the backup made after the target timeline was created?
This saves time since it happens before the restore and gives more information about what has gone wrong.
If the backup timeline is not an ancestor of the target timeline the error message is:
ERROR: [058]: backup timeline 6, lsn 0/4ffffff is not in the history of target timeline B
HINT: was the target timeline created by promoting from a timeline < latest?
This situation should be rare but can happen during complex recovery scenarios where the user is explicitly setting the target time.
Coverity complained about a possible overflow of result in the prior implementation.
It appears that Coverity was not able to follow the logic through the try block, but refactor and add an assertion to silence the complaint.
Coverity complained that decrementing targetIdx would result in it equaling UINT_MAX. While this is true it had no impact overall (at it least in the current code) since targetIdx was immediately incremented in the loop.
However, Coverity's suggestion is better and safer for future code updates so it makes sense to change it.
Coverity had this complaint:
assert_side_effect: Argument openData of ASSERT() has a side effect because the variable is volatile. The containing function might work differently in a non-debug build.
It appears this can also be fixed by assigning the volatile variable to an automatic but the cast seems to work just as well.
If a query that expected no results returned an error then it would incorrectly report that no results were expected because the error was interpreted as a result.
Switch the order of the checks so that an error is reported instead and add a test to prevent regression.