1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-13 16:22:44 +03:00
Commit Graph

3191 Commits

Author SHA1 Message Date
Heikki Linnakangas
5c805d0a81 Fix timestamp in end-of-recovery WAL records.
We used time(null) to set a TimestampTz field, which gave bogus results.
Noticed while looking at pg_xlogdump output.

Backpatch to 9.3 and above, where the fast promotion was introduced.
2014-12-19 17:04:20 +02:00
Tom Lane
4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Heikki Linnakangas
ba94518aad Change how first WAL segment on new timeline after promotion is created.
Two changes:

1. When copying a WAL segment from old timeline to create the first segment
on the new timeline, only copy up to the point where the timeline switch
happens, and zero-fill the rest. This avoids corner cases where we might
think that the copied WAL from the previous timeline belong to the new
timeline.

2. If the timeline switch happens at a segment boundary, don't copy the
whole old segment to the new timeline. It's pointless, because it's 100%
identical to the old segment.
2014-12-18 20:23:03 +02:00
Andres Freund
c303e9e7e5 Fix (re-)starting from a basebackup taken off a standby after a failure.
When starting up from a basebackup taken off a standby extra logic has
to be applied to compute the point where the data directory is
consistent. Normal base backups use a WAL record for that purpose, but
that isn't possible on a standby.

That logic had a error check ensuring that the cluster's control file
indicates being in recovery. Unfortunately that check was too strict,
disregarding the fact that the control file could also indicate that
the cluster was shut down while in recovery.

That's possible when the a cluster starting from a basebackup is shut
down before the backup label has been removed. When everything goes
well that's a short window, but when either restore_command or
primary_conninfo isn't configured correctly the window can get much
wider. That's because inbetween reading and unlinking the label we
restore the last checkpoint from WAL which can need additional WAL.

To fix simply also allow starting when the control file indicates
"shutdown in recovery". There's nicer fixes imaginable, but they'd be
more invasive.

Backpatch to 9.2 where support for taking basebackups from standbys
was added.
2014-12-18 08:47:27 +01:00
Tom Lane
06d5803ffa Fix assorted confusion between Oid and int32.
In passing, also make some debugging elog's in pgstat.c a bit more
consistently worded.

Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are
really old).

Mark Dilger identified and patched the type violations; the message
rewordings are mine.
2014-12-11 15:41:15 -05:00
Simon Riggs
c270754719 Remove duplicate code in heap_prune_chain()
No need to set tuple tableOid twice

Jim Nasby
2014-12-08 08:44:37 +09:00
Simon Riggs
b8e33a85d4 Tweaks for recovery_target_action
Rename parameter action_at_recovery_target to
recovery_target_action suggested by Christoph Berg.

Place into recovery.conf suggested by Fujii Masao,
replacing (deprecating) earlier parameters, per
Michael Paquier.
2014-12-07 21:55:29 +09:00
Heikki Linnakangas
326b6f009f Print new track_commit_timestamp in rm_desc of a parameter-change record.
Michael Paquier
2014-12-05 12:11:43 +02:00
Heikki Linnakangas
c846e67c46 Print wal_log_hints in the rm_desc routing of a parameter-change record.
It was an oversight in the original commit.

Also note in the sample config file that changing wal_log_hints requires a
restart.

Michael Paquier. Backpatch to 9.4, where wal_log_hints was added.
2014-12-05 12:00:48 +02:00
Alvaro Herrera
73c986adde Keep track of transaction commit timestamps
Transactions can now set their commit timestamp directly as they commit,
or an external transaction commit timestamp can be fed from an outside
system using the new function TransactionTreeSetCommitTsData().  This
data is crash-safe, and truncated at Xid freeze point, same as pg_clog.

This module is disabled by default because it causes a performance hit,
but can be enabled in postgresql.conf requiring only a server restart.

A new test in src/test/modules is included.

Catalog version bumped due to the new subdirectory within PGDATA and a
couple of new SQL functions.

Authors: Álvaro Herrera and Petr Jelínek

Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
Singer, Peter Eisentraut
2014-12-03 11:53:02 -03:00
Alvaro Herrera
6597ec9be6 Fix typos 2014-12-03 11:52:15 -03:00
Tom Lane
1511521a36 Minor cleanup of function declarations for BRIN.
Get rid of PG_FUNCTION_INFO_V1() macros, which are quite inappropriate
for built-in functions (possibly leftovers from testing as a loadable
module?).  Also, fix gratuitous inconsistency between SQL-level and
C-level names of the minmax support functions.
2014-12-02 14:07:54 -05:00
Alvaro Herrera
ae04bf5027 Update transaction README for persistent multixacts
Multixacts are now maintained during recovery, but the README didn't get
the memo.  Backpatch to 9.3, where the divergence was introduced.
2014-11-28 18:06:18 -03:00
Heikki Linnakangas
afeacd2748 Fix assertion failure at end of PITR.
InitXLogInsert() cannot be called in a critical section, because it
allocates memory. But CreateCheckPoint() did that, when called for the
end-of-recovery checkpoint by the startup process.

In the passing, fix the scratch space allocation in InitXLogInsert to go to
the right memory context. Also update the comment at InitXLOGAccess, which
hasn't been totally accurate since hot standby was introduced (in a hot
standby backend, InitXLOGAccess isn't called at backend startup).

Reported by Michael Paquier
2014-11-28 09:31:53 +02:00
Simon Riggs
aedccb1f6f action_at_recovery_target recovery config option
action_at_recovery_target = pause | promote | shutdown

Petr Jelinek

Reviewed by Muhammad Asif Naeem, Fujji Masao and
Simon Riggs
2014-11-25 20:13:30 +00:00
Heikki Linnakangas
49b86fb1c9 Add a few paragraphs to B-tree README explaining L&Y algorithm.
This gives an overview of what Lehman & Yao's paper is all about, so that
you can understand the rest of the README without having to read the paper.

Per discussion with Peter Geoghegan and others.
2014-11-24 13:43:33 +02:00
Heikki Linnakangas
0bd624d63b Distinguish XLOG_FPI records generated for hint-bit updates.
Add a new XLOG_FPI_FOR_HINT record type, and use that for full-page images
generated for hint bit updates, when checksums are enabled. The new record
type is replayed exactly the same as XLOG_FPI, but allows them to be tallied
separately e.g. in pg_xlogdump.
2014-11-24 11:09:08 +02:00
Heikki Linnakangas
622983ea69 No need to call XLogEnsureRecordSpace when the relation is unlogged.
Amit Kapila
2014-11-21 15:13:15 +02:00
Heikki Linnakangas
8f5dcb56cb Fix bogus comments in XLogRecordAssemble
Pointed out by Michael Paquier
2014-11-21 12:15:27 +02:00
Tom Lane
adbfab119b Remove dead code supporting mark/restore in SeqScan, TidScan, ValuesScan.
There seems no prospect that any of this will ever be useful, and indeed
it's questionable whether some of it would work if it ever got called;
it's certainly not been exercised in a very long time, if ever. So let's
get rid of it, and make the comments about mark/restore in execAmi.c less
wishy-washy.

The mark/restore support for Result nodes is also currently dead code,
but that's due to planner limitations not because it's impossible that
it could be useful.  So I left it in.
2014-11-20 20:20:54 -05:00
Heikki Linnakangas
f464042161 Silence compiler warning about variable being used uninitialized.
It's a false positive - the variable is only used when 'onleft' is true,
and it is initialized in that case. But the compiler doesn't necessarily
see that.
2014-11-20 19:17:19 +02:00
Heikki Linnakangas
2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Simon Riggs
606c0123d6 Reduce btree scan overhead for < and > strategies
For <, <=, > and >= strategies, mark the first scan key
as already matched if scanning in an appropriate direction.
If index tuple contains no nulls we can skip the first
re-check for each tuple.

Author: Rajeev Rastogi
Reviewer: Haribabu Kommi
Rework of the code and comments by Simon Riggs
2014-11-18 10:24:55 +00:00
Heikki Linnakangas
c73669c0e0 Fix WAL-logging of B-tree "unlink halfdead page" operation.
There was some confusion on how to record the case that the operation
unlinks the last non-leaf page in the branch being deleted.
_bt_unlink_halfdead_page set the "topdead" field in the WAL record to
the leaf page, but the redo routine assumed that it would be an invalid
block number in that case. This commit fixes _bt_unlink_halfdead_page to
do what the redo routine expected.

This code is new in 9.4, so backpatch there.
2014-11-17 18:45:46 +02:00
Andres Freund
d3586fc8aa Ensure unlogged tables are reset even if crash recovery errors out.
Unlogged relations are reset at the end of crash recovery as they're
only synced to disk during a proper shutdown. Unfortunately that and
later steps can fail, e.g. due to running out of space. This reset
was, up to now performed after marking the database as having finished
crash recovery successfully. As out of space errors trigger a crash
restart that could lead to the situation that not all unlogged
relations are reset.

Once that happend usage of unlogged relations could yield errors like
"could not open file "...": No such file or directory". Luckily
clusters that show the problem can be fixed by performing a immediate
shutdown, and starting the database again.

To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT)
earlier, before marking the database as having successfully recovered.

Discussion: 20140912112246.GA4984@alap3.anarazel.de

Backpatch to 9.1 where unlogged tables were introduced.

Abhijit Menon-Sen and Andres Freund
2014-11-15 01:19:26 +01:00
Stephen Frost
80eacaa3cd Clean up includes from RLS patch
The initial patch for RLS mistakenly included headers associated with
the executor and planner bits in rewrite/rowsecurity.h.  Per policy and
general good sense, executor headers should not be included in planner
headers or vice versa.

The include of execnodes.h was a mistaken holdover from previous
versions, while the include of relation.h was used for Relation's
definition, which should have been coming from utils/relcache.h.  This
patch cleans these issues up, adds comments to the RowSecurityPolicy
struct and the RowSecurityConfigType enum, and changes Relation->rsdesc
to Relation->rd_rsdesc to follow Relation field naming convention.

Additionally, utils/rel.h was including rewrite/rowsecurity.h, which
wasn't a great idea since that was pulling in things not really needed
in utils/rel.h (which gets included in quite a few places).  Instead,
use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments
explaining why.

Lastly, add an include into access/nbtree/nbtsort.c for
utils/sortsupport.h, which was evidently missed due to the above mess.

Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the
concerns regarding a similar situation in the custom-path commit still
need to be addressed.
2014-11-14 17:05:17 -05:00
Alvaro Herrera
51f9ea25dc Allow interrupting GetMultiXactIdMembers
This function has a loop which can lead to uninterruptible process
"stalls" (actually infinite loops) when some bugs are triggered.  Avoid
that unpleasant situation by adding a check for interrupts in a place
that shouldn't degrade performance in the normal case.

Backpatch to 9.3.  Older branches have an identical loop here, but the
aforementioned bugs are only a problem starting in 9.3 so there doesn't
seem to be any point in backpatching any further.
2014-11-14 15:14:01 -03:00
Heikki Linnakangas
81c4508196 Fix race condition between hot standby and restoring a full-page image.
There was a window in RestoreBackupBlock where a page would be zeroed out,
but not yet locked. If a backend pinned and locked the page in that window,
it saw the zeroed page instead of the old page or new page contents, which
could lead to missing rows in a result set, or errors.

To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins,
zeroes, and locks the page, if it's not in the buffer cache already.

In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE,
to avoid breaking any 3rd party extensions that might use RBM_ZERO. More
importantly, this avoids renumbering the other enum values, which would
cause even bigger confusion in extensions that use ReadBufferExtended, but
haven't been recompiled.

Backpatch to all supported versions; this has been racy since hot standby
was introduced.
2014-11-13 20:02:37 +02:00
Heikki Linnakangas
34402ae351 Fix XLogReadBufferForRedoExtended to get cleanup lock when asked to do so. 2014-11-13 17:54:20 +02:00
Fujii Masao
c291503b1c Rename pending_list_cleanup_size to gin_pending_list_limit.
Since this parameter is only for GIN index, it's better to
add "gin" to the parameter name for easier understanding.
2014-11-13 12:14:48 +09:00
Fujii Masao
a1b395b6a2 Add GUC and storage parameter to set the maximum size of GIN pending list.
Previously the maximum size of GIN pending list was controlled only by
work_mem. But the reasonable value of work_mem and the reasonable size
of the list are basically not the same, so it was not appropriate to
control both of them by only one GUC, i.e., work_mem. This commit
separates new GUC, pending_list_cleanup_size, from work_mem to allow
users to control only the size of the list.

Also this commit adds pending_list_cleanup_size as new storage parameter
to allow users to specify the size of the list per index. This is useful,
for example, when users want to increase the size of the list only for
the GIN index which can be updated heavily, and decrease it otherwise.

Reviewed by Etsuro Fujita.
2014-11-11 21:08:21 +09:00
Alvaro Herrera
a590f266e4 BRIN: fix bug in xlog backup block counting
The code that generates the BRIN_XLOG_UPDATE removes the buffer
reference when the page that's target for the updated tuple is freshly
initialized.  This is a pretty usual optimization, but was breaking the
case where the revmap buffer, which is referenced in the same WAL
record, is getting a backup block: the replay code was using backup
block index 1, which is not valid when the update target buffer gets
pruned; the revmap buffer gets assigned 0 instead.  Make sure to use the
correct backup block index for revmap when replaying.

Bug reported by Fujii Masao.
2014-11-10 18:13:49 -03:00
Alvaro Herrera
1e0b4365c2 Further code and wording tweaks in BRIN
Besides a couple of typo fixes, per David Rowley, Thom Brown, and Amit
Langote, and mentions of BRIN in the general CREATE INDEX page again per
David, this includes silencing MSVC compiler warnings (thanks Microsoft)
and an additional variable initialization per Coverity scanner.
2014-11-10 15:56:08 -03:00
Kevin Grittner
96a73fcdac Fix compiler warning for non-assert builds.
Reported by Peter Geoghegan
David Rowley
2014-11-10 09:42:46 -06:00
Alvaro Herrera
b89ee54e20 Fix some coding issues in BRIN
Reported by David Rowley: variadic macros are a problem.  Get rid of
them using a trick suggested by Tom Lane: add extra parentheses where
needed.  In the future we might decide we don't need the calls at all
and remove them, but it seems appropriate to keep them while this code
is still new.

Also from David Rowley: brininsert() was trying to use a variable before
initializing it.  Fix by moving the brin_form_tuple call (which
initializes the variable) to within the locked section.

Reported by Peter Eisentraut: can't use "new" as a struct member name,
because C++ compilers will choke on it, as reported by cpluspluscheck.
2014-11-08 00:31:03 -03:00
Heikki Linnakangas
7250d8535b Fix building with WAL_DEBUG.
Now that the backup blocks are appended to the WAL record in xloginsert.c,
XLogInsert doesn't see them anymore and cannot remove them from the version
reconstructed for xlog_outdesc. This makes running with wal_debug=on more
expensive, as we now make (unnecessary) temporary copies of the backup
blocks, but it doesn't seem worth convoluting the code to keep that
optimization.

Reported by Alvaro Herrera.
2014-11-07 23:09:31 +02:00
Robert Haas
5ea86e6e65 Use the sortsupport infrastructure in more cases.
This removes some fmgr overhead from cases such as btree index builds.

Peter Geoghegan, reviewed by Andreas Karlsson and me.
2014-11-07 15:50:55 -05:00
Alvaro Herrera
7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Heikki Linnakangas
1961b1c131 Fix generation of SP-GiST vacuum WAL records.
I broke these in 8776faa81c. Backpatch to
9.4, where that was done.
2014-11-07 21:17:46 +02:00
Heikki Linnakangas
2effb72e68 Remove obsolete cases from GiST update redo code.
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.

Likewise, the code to turn the root page from non-leaf to leaf page was
removed when we got rid of old-style VACUUM FULL. Remove the code to replay
that action, too.
2014-11-07 15:13:02 +02:00
Fujii Masao
5332b8cec5 Prevent the unnecessary creation of .ready file for the timeline history file.
Previously .ready file was created for the timeline history file at the end
of an archive recovery even when WAL archiving was not enabled.
This creation is unnecessary and causes .ready file to remain infinitely.

This commit changes an archive recovery so that it creates .ready file for
the timeline history file only when WAL archiving is enabled.

Backpatch to all supported versions.
2014-11-06 21:24:40 +09:00
Heikki Linnakangas
2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Heikki Linnakangas
5028f22f6e Switch to CRC-32C in WAL and other places.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.

Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.

The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
2014-11-04 11:39:48 +02:00
Robert Haas
6cb4afff33 Avoid setup work for invalidation messages at start-of-(sub)xact.
Instead of initializing a new TransInvalidationInfo for every
transaction or subtransaction, we can just do it for those
transactions or subtransactions that actually need to queue
invalidation messages.  That also avoids needing to free those
entries at the end of a transaction or subtransaction that does
not generate any invalidation messages, which is by far the
common case.

Patch by me.  Review by Simon Riggs and Andres Freund.
2014-10-29 12:35:19 -04:00
Alvaro Herrera
b01a4f6838 Update README.tuplock
This file was documenting an older version of patch 0ac5ad5134; update
it to match what was really committed

Author: Florian Pflug
2014-10-23 20:51:58 -03:00
Fujii Masao
c7371c4a60 Prevent the already-archived WAL file from being archived again.
Previously the archive recovery always created .ready file for
the last WAL file of the old timeline at the end of recovery even when
it's restored from the archive and has .done file. That is, there was
the case where the WAL file had both .ready and .done files.
This caused the already-archived WAL file to be archived again.

This commit prevents the archive recovery from creating .ready file
for the last WAL file if it has .done file, in order to prevent it from
being archived again.

This bug was added when cascading replication feature was introduced,
i.e., the commit 5286105800.
So, back-patch to 9.2, where cascading replication was added.

Reviewed by Michael Paquier
2014-10-23 16:21:27 +09:00
Andres Freund
5e5b65f359 Don't duplicate log_checkpoint messages for both of restart and checkpoints.
The duplication originated in cdd46c765, where restartpoints were
introduced.

In LogCheckpointStart's case the duplication actually lead to the
compiler's format string checking not to be effective because the
format string wasn't constant.

Arguably these messages shouldn't be elog(), but ereport() style
messages. That'd even allow to translate the messages... But as
there's more mistakes of that kind in surrounding code, it seems
better to change that separately.
2014-10-21 01:01:56 +02:00
Andres Freund
11abd6c90f Renumber CHECKPOINT_* flags.
Commit 7dbb606938 added a new CHECKPOINT_FLUSH_ALL flag. As that
commit needed to be backpatched I didn't change the numeric values of
the existing flags as that could lead to nastly problems if any
external code issued checkpoints. That's not a concern on master, so
renumber them there.

Also add a comment about CHECKPOINT_FLUSH_ALL above
CreateCheckPoint().
2014-10-21 00:20:08 +02:00
Andres Freund
7dbb606938 Flush unlogged table's buffers when copying or moving databases.
CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE copy the source
database directory on the filesystem level. To ensure the on disk
state is consistent they block out users of the affected database and
force a checkpoint to flush out all data to disk. Unfortunately, up to
now, that checkpoint didn't flush out dirty buffers from unlogged
relations.

That bug means there could be leftover dirty buffers in either the
template database, or the database in its old location. Leading to
problems when accessing relations in an inconsistent state; and to
possible problems during shutdown in the SET TABLESPACE case because
buffers belonging files that don't exist anymore are flushed.

This was reported in bug #10675 by Maxim Boguk.

Fix by Pavan Deolasee, modified somewhat by me. Reviewed by MauMau and
Fujii Masao.

Backpatch to 9.1 where unlogged tables were introduced.
2014-10-20 23:43:46 +02:00
Peter Eisentraut
b7a08c8028 Message improvements 2014-10-12 01:06:35 -04:00