From a494f10232645c3456bee7f6fbea5546f8a8166d Mon Sep 17 00:00:00 2001 From: Noah Misch Date: Fri, 27 Aug 2021 23:33:23 -0700 Subject: [PATCH] Fix data loss in wal_level=minimal crash recovery of CREATE TABLESPACE. If the system crashed between CREATE TABLESPACE and the next checkpoint, the result could be some files in the tablespace unexpectedly containing no rows. Affected files would be those for which the system did not write WAL; see the wal_skip_threshold documentation. Before v13, a different set of conditions governed the writing of WAL; see v12's . (The v12 conditions were broader in some ways and narrower in others.) Users may want to audit non-default tablespaces for unexpected short files. The bug could have truncated an index without affecting the associated table, and reindexing the index would fix that particular problem. This fixes the bug by making create_tablespace_directories() more like TablespaceCreateDbspace(). create_tablespace_directories() was recursively removing tablespace contents, reasoning that WAL redo would recreate everything removed that way. That assumption holds for other wal_level values. Under wal_level=minimal, the old approach could delete files for which no other copy existed. Back-patch to 9.6 (all supported versions). Reviewed by Robert Haas and Prabhat Sahu. Reported by Robert Haas. Discussion: https://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zKiHxQzBbFRxxGxm0w@mail.gmail.com --- src/backend/commands/tablespace.c | 42 ++++++++++++++----------------- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 93d948f2ab4..f060c245995 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -617,40 +617,36 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) location))); } - if (InRecovery) - { - /* - * Our theory for replaying a CREATE is to forcibly drop the target - * subdirectory if present, and then recreate it. This may be more - * work than needed, but it is simple to implement. - */ - if (stat(location_with_version_dir, &st) == 0 && S_ISDIR(st.st_mode)) - { - if (!rmtree(location_with_version_dir, true)) - /* If this failed, MakePGDirectory() below is going to error. */ - ereport(WARNING, - (errmsg("some useless files may be left behind in old database directory \"%s\"", - location_with_version_dir))); - } - } - /* * The creation of the version directory prevents more than one tablespace - * in a single location. + * in a single location. This imitates TablespaceCreateDbspace(), but it + * ignores concurrency and missing parent directories. The chmod() would + * have failed in the absence of a parent. pg_tablespace_spcname_index + * prevents concurrency. */ - if (MakePGDirectory(location_with_version_dir) < 0) + if (stat(location_with_version_dir, &st) < 0) { - if (errno == EEXIST) + if (errno != ENOENT) ereport(ERROR, - (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("directory \"%s\" already in use as a tablespace", + (errcode_for_file_access(), + errmsg("could not stat directory \"%s\": %m", location_with_version_dir))); - else + else if (MakePGDirectory(location_with_version_dir) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", location_with_version_dir))); } + else if (!S_ISDIR(st.st_mode)) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" exists but is not a directory", + location_with_version_dir))); + else if (!InRecovery) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), + errmsg("directory \"%s\" already in use as a tablespace", + location_with_version_dir))); /* * In recovery, remove old symlink, in case it points to the wrong place.