postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-18 02:02:55 +03:00

Author	SHA1	Message	Date
Robert Haas	30c22eb8fc	Correct sundry errors in Hot Standby-related comments. Fujii Masao	2010-08-12 23:24:54 +00:00
Tom Lane	d4fe61b083	Fix an additional set of problems in GIN's handling of lossy page pointers. Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count() from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee: \| dd:')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:')); Back-patch to 8.4 where the lossy pointer code was introduced.	2010-08-01 19:16:39 +00:00
Tom Lane	0454f13161	Rewrite the rbtree routines so that an RBNode is the first field of the struct representing a tree entry, rather than being a separately allocated piece of storage. This API is at least as clean as the old one (if not more so --- there were some bizarre choices in there) and it permits a very substantial memory savings, on the order of 2X in ginbulk.c's usage. Also, fix minor memory leaks in code called by ginEntryInsert, in particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert itself. These leaks resulted in the GIN index build context continuing to bloat even after we'd filled it to maintenance_work_mem and started to dump data out to the index. In combination these fixes restore the GIN index build code to honoring the maintenance_work_mem limit about as well as it did in 8.4. Speed seems on par with 8.4 too, maybe even a bit faster, for a non-pathological case in which HEAD was formerly slower. Back-patch to 9.0 so we don't have a performance regression from 8.4.	2010-08-01 02:12:42 +00:00
Tom Lane	2ab57e089b	Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem() routines to make them behave better in the presence of "lossy" index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple "entry" streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn't really need to be visited. In Artur's example case, a query like WHERE tsvector @@ to_tsquery('a & b') was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b') The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there's more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur's example, such cases don't occur often enough to be worth trying to optimize. Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not subject to these problems.	2010-07-31 00:30:54 +00:00
Simon Riggs	5b8bd0529e	Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0. Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.	2010-07-29 22:27:27 +00:00
Robert Haas	1a078629ac	Fix possible page corruption by ALTER TABLE .. SET TABLESPACE. If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage() will do the same thing during replay, so the corruption propagates to any standby. Note, however, that the bug can't be demonstrated unless archiving is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI are not set. Back-patch to 8.0; prior releases do not have tablespaces. Analysis and patch by Jeff Davis. Adjustments for back-branches and minor wordsmithing by me.	2010-07-29 16:14:36 +00:00
Robert Haas	7be8946c78	Avoid deep recursion when assigning XIDs to multiple levels of subxacts. Backpatch to 8.0. Andres Freund, with cleanup and adjustment for older branches by me.	2010-07-23 00:43:00 +00:00
Tom Lane	672efc0865	Update obsolete comment. Noted by Josh Tolley.	2010-07-08 16:08:30 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	8771634666	Don't set recoveryLastXTime when replaying a checkpoint --- that was a bogus idea from the start since the variable is only meant to track commit/abort events. This patch reverts the logic around the variable to what it was in 8.4, except that the value is now kept in shared memory rather than a static variable, so that it can be reported correctly by CreateRestartPoint (which is executed in the bgwriter).	2010-07-03 22:15:45 +00:00
Tom Lane	e76c1a0f4d	Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.	2010-07-03 20:43:58 +00:00
Bruce Momjian	b57ddccf05	Add C comment about why synchronous_commit=off behavior can lose committed transactions in a postmaster crash.	2010-06-29 18:44:58 +00:00
Robert Haas	400916b6d7	emode_for_corrupt_record shouldn't reduce LOG messages to WARNING. In non-interactive sessions, WARNING sorts below LOG.	2010-06-28 19:46:19 +00:00
Tom Lane	09698bb5fb	Make RemoveOldXlogFiles's debug printout match style used elsewhere: log and seg aren't an XLogRecPtr and shouldn't be printed like one. Fujii Masao	2010-06-17 17:37:23 +00:00
Tom Lane	07e8b6aabc	Don't allow walsender to send WAL data until it's been safely fsync'd on the master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao	2010-06-17 16:41:25 +00:00
Heikki Linnakangas	6da07cd80d	If a corrupt WAL record is received by streaming replication, disconnect and retry. If the record is genuinely corrupt in the master database, there's little hope of recovering, but it's better than simply retrying to apply the corrupt WAL record in a tight loop without even trying to retransmit it, which is what we used to do.	2010-06-14 06:04:21 +00:00
Peter Eisentraut	c86efdde5f	Fix typo/bug, found by Clang compiler	2010-06-12 09:14:52 +00:00
Itagaki Takahiro	56834fc759	Rename restartpoint_command to archive_cleanup_command.	2010-06-10 08:13:50 +00:00
Heikki Linnakangas	0a7cb85531	Make TriggerFile variable static. It's not used outside xlog.c. Fujii Masao	2010-06-10 07:49:23 +00:00
Heikki Linnakangas	346d7cd7fa	Return NULL instead of 0/0 in pg_last_xlog_receive_location() and pg_last_xlog_replay_location(). Per Robert Haas's suggestion, after Itagaki Takahiro pointed out an issue in the docs. Also, some wording changes in the docs by me.	2010-06-10 07:00:27 +00:00
Heikki Linnakangas	71815306e9	In standby mode, respect checkpoint_segments in addition to checkpoint_timeout to trigger restartpoints. We used to deliberately only do time-based restartpoints, because if checkpoint_segments is small we would spend time doing restartpoints more often than really necessary. But now that restartpoints are done in bgwriter, they're not as disruptive as they used to be. Secondly, because streaming replication stores the streamed WAL files in pg_xlog, we want to clean it up more often to avoid running out of disk space when checkpoint_timeout is large and checkpoint_segments small. Patch by Fujii Masao, with some minor changes by me.	2010-06-09 15:04:07 +00:00
Magnus Hagander	8c873bbfa7	Make the walwriter close it's handle to an old xlog segment if it's no longer the current one. Not doing this would leave the walwriter with a handle to a deleted file if there was nothing for it to do for a long period of time, preventing the file from being completely removed. Reported by Tollef Fog Heen, and thanks to Heikki for some hand-holding with the patch.	2010-06-09 10:54:45 +00:00
Itagaki Takahiro	b5faba1284	Ensure default-only storage parameters for TOAST relations to be initialized with proper values. Affected parameters are fillfactor, analyze_threshold, and analyze_scale_factor. Especially uninitialized fillfactor caused inefficient page usage because we built a StdRdOptions struct in which fillfactor is zero if any reloption is set for the toast table. In addition, we disallow toast.autovacuum_analyze_threshold and toast.autovacuum_analyze_scale_factor because we didn't actually support them; they are always ignored. Report by Rumko on pgsql-bugs on 12 May 2010. Analysis by Tom Lane and Alvaro Herrera. Patch by me. Backpatch to 8.4.	2010-06-07 02:59:02 +00:00
Peter Eisentraut	cb6038c168	Fix some inconsistent quoting of wal_level values in messages When referring to postgresql.conf syntax, then it's without quotes (wal_level=archive); in narrative it's with double quotes. But never single quotes.	2010-06-03 21:02:12 +00:00
Robert Haas	d561430b66	On clean shutdown during recovery, don't warn about possible corruption. Fujii Masao. Review by Heikki Linnakangas and myself.	2010-06-03 03:20:00 +00:00
Heikki Linnakangas	6b24036365	Fix obsolete comments that I neglected to update in a previous patch. Fujii Masao	2010-06-02 09:28:44 +00:00
Heikki Linnakangas	c5bd8feac6	Adjust comment to reflect that we now have Hot Standby. Pointed out by Robert Haas.	2010-05-27 00:38:39 +00:00
Robert Haas	ea9968c331	Rename PM_RECOVERY_CONSISTENT and PMSIGNAL_RECOVERY_CONSISTENT. The new names PM_HOT_STANDBY and PMSIGNAL_BEGIN_HOT_STANDBY more accurately reflect their actual function.	2010-05-15 20:01:32 +00:00
Simon Riggs	4a24c9a063	Fix bug in processing of checkpoint time for max_standby_delay. Latest log time was incorrectly set, typically leading to dates in the past, which would cause more cancellations in Hot Standby on a quiet server.	2010-05-15 07:14:43 +00:00
Simon Riggs	fd34374b17	Add many new Asserts in code and fix simple bug that slipped through without them, related to previous commit. Report by Bruce Momjian.	2010-05-14 07:11:49 +00:00
Simon Riggs	463f151a23	Ensure that top level aborts call XLogSetAsyncCommit(). Not doing so simply leads to data waiting in wal_buffers which then causes later commits to potentially do emergency writes and for all forms of replication to be potentially delayed without need or benefit. Issue pointed out exactly by Fujii Masao, following bug report by Robert Haas on a separate though related topic.	2010-05-13 11:39:30 +00:00
Simon Riggs	8431e296ea	Cleanup initialization of Hot Standby. Clarify working with reanalysis of requirements and documentation on LogStandbySnapshot(). Fixes two minor bugs reported by Tom Lane that would lead to an incorrect snapshot after transaction wraparound. Also fix two other problems discovered that would give incorrect snapshots in certain cases. ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().	2010-05-13 11:15:38 +00:00
Heikki Linnakangas	ffe8c7c677	Need to hold ControlFileLock while updating control file. Update minRecoveryPoint in control file when replaying a parameter change record, to ensure that we don't allow hot standby on WAL generated without wal_level='hot_standby' after a standby restart.	2010-05-03 11:17:52 +00:00
Tom Lane	609a63fd85	Improve printing of XLOG_HEAP_NEWPAGE records to include the forknum.	2010-05-02 22:37:43 +00:00
Tom Lane	e55e6ecfe4	Fix replay of XLOG_HEAP_NEWPAGE WAL records to pay attention to the forknum field of the WAL record. The previous coding always wrote to the main fork, resulting in data corruption if the page was meant to go into a non-default fork. At present, the only operation that can produce such WAL records is ALTER TABLE/INDEX SET TABLESPACE when executed with archive_mode = on. Data corruption would be observed on standby slaves, and could occur on the master as well if a database crash and recovery occurred after committing the ALTER and before the next checkpoint. Per report from Gordon Shannon. Back-patch to 8.4; the problem doesn't exist in earlier branches because we didn't have a concept of multiple relation forks then.	2010-05-02 22:28:05 +00:00
Tom Lane	f9ed327f76	Clean up some awkward, inaccurate, and inefficient processing around MaxStandbyDelay. Use the GUC units mechanism for the value, and choose more appropriate timestamp functions for performing tests with it. Make the ps_activity manipulation in ResolveRecoveryConflictWithVirtualXIDs have behavior similar to ps_activity code elsewhere, notably not updating the display when update_process_title is off and not truncating the display contents at an arbitrarily-chosen length. Improve the docs to be explicit about what MaxStandbyDelay actually measures, viz the difference between primary and standby servers' clocks, and the possible hazards if their clocks aren't in sync.	2010-05-02 02:10:33 +00:00
Heikki Linnakangas	21992dd4f5	Fix handling of b-tree reuse WAL records when hot standby is disabled, and add missing code in btree_desc for them. This fixes the bug with "tree_redo: unknown op code 208" error reported by Jaime Casanova.	2010-04-30 06:34:29 +00:00
Tom Lane	69f7a4d8e3	Adjust error checks in pg_start_backup and pg_stop_backup to make it possible to perform a backup without archive_mode being enabled. This gives up some user-error protection in order to improve usefulness for streaming-replication scenarios. Per discussion.	2010-04-29 21:49:03 +00:00
Tom Lane	f0488bd57c	Rename the parameter recovery_connections to hot_standby, to reduce possible confusion with streaming-replication settings. Also, change its default value to "off", because of concern about executing new and poorly-tested code during ordinary non-replicating operation. Per discussion. In passing do some minor editing of related documentation.	2010-04-29 21:36:19 +00:00
Tom Lane	77acab75df	Modify ShmemInitStruct and ShmemInitHash to throw errors internally, rather than returning NULL for some-but-not-all failures as they used to. Remove now-redundant tests for NULL from call sites. We had to do something about this because many call sites were failing to check for NULL; and changing it like this seems a lot more useful and mistake-proof than adding checks to the call sites without them.	2010-04-28 16:54:16 +00:00
Heikki Linnakangas	9b8a73326e	Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it's still used in the standby to enable/disable hot standby. Remove the WAL-logging of "unlogged operations", like creating an index without WAL-logging and fsyncing it at the end. Instead, we keep a copy of the wal_mode setting and the settings that affect how much shared memory a hot standby server needs to track master transactions (max_connections, max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings change, at server restart, write a WAL record noting the new settings and update pg_control. This allows us to notice the change in those settings in the standby at the right moment, they used to be included in checkpoint records, but that meant that a changed value was not reflected in the standby until the first checkpoint after the change. Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to the sequence it used to follow, before hot standby and subsequent patches changed it to 0x9003.	2010-04-28 16:10:43 +00:00
Tom Lane	2871b4618a	Replace the KnownAssignedXids hash table with a sorted-array data structure, and be more tense about the locking requirements for it, to improve performance in Hot Standby mode. In passing fix a few bugs and improve a number of comments in the existing HS code. Simon Riggs, with some editorialization by Tom	2010-04-28 00:09:05 +00:00
Heikki Linnakangas	3efba16d56	If a base backup is cancelled by server shutdown or crash, throw an error in WAL recovery when it sees the shutdown checkpoint record. It's more user-friendly to find out about it at that point than at the end of recovery, and you're not left wondering why your hot standby server never opens up for read-only connections.	2010-04-27 09:25:18 +00:00
Robert Haas	33980a0640	Fix various instances of "the the". Two of these were pointed out by Erik Rijkers; the rest I found.	2010-04-23 23:21:44 +00:00
Simon Riggs	491d1ea5b3	Previous patch revoked following objections.	2010-04-23 20:21:31 +00:00
Simon Riggs	6ca23b1a29	Make CheckRequiredParameterValues() depend upon correct combination of parameters. Fix bug report by Robert Haas that error message and hint was incorrect if wrong mode parameters specified on master. Internal changes only. Proposals for parameter simplification on master/primary still under way.	2010-04-23 19:57:19 +00:00
Simon Riggs	a2555571fb	Optimise btree delete processing when no active backends. Clarify comments, downgrade a message to DEBUG and remove some debug counters. Direct from ideas by Heikki Linnakangas.	2010-04-22 08:04:25 +00:00
Simon Riggs	781ec6b75d	Further reductions in Hot Standby conflict processing. These come from the realistion that HEAP2_CLEAN records don't always remove user visible data, so conflict processing for them can be skipped. Confirm validity using Assert checks, clarify circumstances under which we log heap_cleanup_info records. Tuning arises from bug fixing of earlier safety check failures.	2010-04-22 02:15:45 +00:00
Simon Riggs	bc2b85d904	Fix oversight in collecting values for cleanup_info records. vacuum_log_cleanup_info() now generates log records with a valid latestRemovedXid set in all cases. Also be careful not to zero the value when we do a round of vacuuming part-way through lazy_scan_heap(). Incidentally, this reduces frequency of conflicts in Hot Standby.	2010-04-21 17:20:56 +00:00
Robert Haas	481cb5d9b5	Rename standby_keep_segments to wal_keep_segments. Also, make the name of the GUC and the name of the backing variable match. Alnong the way, clean up a couple of slight typographical errors in the related docs.	2010-04-20 11:15:06 +00:00

... 29 30 31 32 33 ...

3191 Commits