From f11c1bb17099834a3640a750b263e08f5e6f8135 Mon Sep 17 00:00:00 2001 From: Noah Misch Date: Fri, 27 Aug 2021 23:33:23 -0700 Subject: [PATCH] Fix data loss in wal_level=minimal crash recovery of CREATE TABLESPACE. If the system crashed between CREATE TABLESPACE and the next checkpoint, the result could be some files in the tablespace unexpectedly containing no rows. Affected files would be those for which the system did not write WAL; see the wal_skip_threshold documentation. Before v13, a different set of conditions governed the writing of WAL; see v12's . (The v12 conditions were broader in some ways and narrower in others.) Users may want to audit non-default tablespaces for unexpected short files. The bug could have truncated an index without affecting the associated table, and reindexing the index would fix that particular problem. This fixes the bug by making create_tablespace_directories() more like TablespaceCreateDbspace(). create_tablespace_directories() was recursively removing tablespace contents, reasoning that WAL redo would recreate everything removed that way. That assumption holds for other wal_level values. Under wal_level=minimal, the old approach could delete files for which no other copy existed. Back-patch to 9.6 (all supported versions). Reviewed by Robert Haas and Prabhat Sahu. Reported by Robert Haas. Discussion: https://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zKiHxQzBbFRxxGxm0w@mail.gmail.com --- src/backend/commands/tablespace.c | 42 ++++++++++++++----------------- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 4a5d22da759..350c21912a0 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -589,40 +589,36 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) location))); } - if (InRecovery) - { - /* - * Our theory for replaying a CREATE is to forcibly drop the target - * subdirectory if present, and then recreate it. This may be more - * work than needed, but it is simple to implement. - */ - if (stat(location_with_version_dir, &st) == 0 && S_ISDIR(st.st_mode)) - { - if (!rmtree(location_with_version_dir, true)) - /* If this failed, mkdir() below is going to error. */ - ereport(WARNING, - (errmsg("some useless files may be left behind in old database directory \"%s\"", - location_with_version_dir))); - } - } - /* * The creation of the version directory prevents more than one tablespace - * in a single location. + * in a single location. This imitates TablespaceCreateDbspace(), but it + * ignores concurrency and missing parent directories. The chmod() would + * have failed in the absence of a parent. pg_tablespace_spcname_index + * prevents concurrency. */ - if (mkdir(location_with_version_dir, S_IRWXU) < 0) + if (stat(location_with_version_dir, &st) < 0) { - if (errno == EEXIST) + if (errno != ENOENT) ereport(ERROR, - (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("directory \"%s\" already in use as a tablespace", + (errcode_for_file_access(), + errmsg("could not stat directory \"%s\": %m", location_with_version_dir))); - else + else if (mkdir(location_with_version_dir, S_IRWXU) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", location_with_version_dir))); } + else if (!S_ISDIR(st.st_mode)) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" exists but is not a directory", + location_with_version_dir))); + else if (!InRecovery) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), + errmsg("directory \"%s\" already in use as a tablespace", + location_with_version_dir))); /* * In recovery, remove old symlink, in case it points to the wrong place.