postgres

mirror of https://github.com/postgres/postgres.git synced 2025-06-04 12:42:24 +03:00

Author	SHA1	Message	Date
Greg Stark	c62736cc37	Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF) Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost	2013-07-29 16:38:01 +01:00
Peter Eisentraut	626092a2e1	Message style improvements	2013-07-28 07:01:13 -04:00
Robert Haas	765ad89be3	Use InvalidSnapshot, now SnapshotNow, as the default snapshot. As far as I can determine, there's no code in the core distribution that fails to explicitly set the snapshot of a scan or executor state. If there is any such code, this will probably cause it to seg fault; friendlier suggestions were discussed on pgsql-hackers, but there was no consensus that anything more than this was needed. This is another step towards the hoped-for complete removal of SnapshotNow.	2013-07-23 10:58:32 -04:00
Robert Haas	0518eceec3	Adjust HeapTupleSatisfies* routines to take a HeapTuple. Previously, these functions took a HeapTupleHeader, but upcoming patches for logical replication will introduce new a new snapshot type under which the tuple's TID will be used to lookup (CMIN, CMAX) for visibility determination purposes. This makes that information available. Code churn is minimal since HeapTupleSatisfiesVisibility took the HeapTuple anyway, and deferenced it before calling the satisfies function. Independently of logical replication, this allows t_tableOid and t_self to be cross-checked via assertions in tqual.c. This seems like a useful way to make sure that all callers are setting these values properly, which has been previously put forward as desirable. Andres Freund, reviewed by Álvaro Herrera	2013-07-22 13:38:44 -04:00
Stephen Frost	4cbe3ac3e8	WITH CHECK OPTION support for auto-updatable VIEWs For simple views which are automatically updatable, this patch allows the user to specify what level of checking should be done on records being inserted or updated. For 'LOCAL CHECK', new tuples are validated against the conditionals of the view they are being inserted into, while for 'CASCADED CHECK' the new tuples are validated against the conditionals for all views involved (from the top down). This option is part of the SQL specification. Dean Rasheed, reviewed by Pavel Stehule	2013-07-18 17:10:16 -04:00
Heikki Linnakangas	107cbc90a7	Fix variable names mentioned in comment to match the code. Also, in another comment, explain why holding an insertion slot is a critical section. Per review by Amit Kapila.	2013-07-17 23:32:32 +03:00
Heikki Linnakangas	59c02a36f0	Fix assert failure at end of recovery, broken by XLogInsert scaling patch. Initialization of the first XLOG buffer at end-of-recovery was broken for the case that the last read WAL record ended at a page boundary. Instead of trying to copy the last full xlog page to the buffer cache in that case, just set shared state so that the next page is initialized when the first WAL record after startup is inserted. (that's what we did in earlier version, too) To make the shared state required for that case less surprising, replace the XLogCtl->curridx variable, which was the index of the latest initialized buffer, with an XLogRecPtr of how far the buffers have been initialized. That also allows us to get rid of the XLogRecEndPtrToBufIdx macro. While we're at it, make a similar change for XLogCtl->Write.curridx, getting rid of that variable and calculating the next buffer to write from XLogCtl->LogwrtResult instead.	2013-07-17 23:12:22 +03:00
Noah Misch	ffcf654547	Fix systable_recheck_tuple() for MVCC scan snapshots. Since this function assumed non-MVCC snapshots, it broke when commit 568d4138c646cd7cd8a837ac244ef2caf27c6bb8 switched its one caller from SnapshotNow scans to MVCC-snapshot scans. Reviewed by Robert Haas, Tom Lane and Andres Freund.	2013-07-16 20:16:32 -04:00
Heikki Linnakangas	f489470f8a	Fix Windows build. Was broken by my xloginsert scaling patch. XLogCtl global variable needs to be initialized in each process, as it's not inherited by fork() on Windows.	2013-07-08 17:28:48 +03:00
Heikki Linnakangas	9a20a9b21b	Improve scalability of WAL insertions. This patch replaces WALInsertLock with a number of WAL insertion slots, allowing multiple backends to insert WAL records to the WAL buffers concurrently. This is particularly useful for parallel loading large amounts of data on a system with many CPUs. This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. This potentially adds some overhead, but it has been a very common practice by DBA's to clear the "tail" of the segment with an external pg_clearxlogtail utility anyway, to make the WAL files compress better. With this patch, it's no longer necessary to do that. This patch adds a new GUC, xloginsert_slots, to tune the number of WAL insertion slots. Performance testing suggests that the default, 8, works pretty well for all kinds of worklods, but I left the GUC in place to allow others with different hardware to test that easily. We might want to remove that before release. Reviewed by Andres Freund.	2013-07-08 11:23:56 +03:00
Jeff Davis	5b571bb8c8	Handle posix_fallocate() errors. On some platforms, posix_fallocate() is available but may still return EINVAL if the underlying filesystem does not support it. So, in case of an error, fall through to the alternate implementation that just writes zeros. Per buildfarm failure and analysis by Tom Lane.	2013-07-06 13:46:04 -07:00
Noah Misch	02d2b694ee	Update messages, comments and documentation for materialized views. All instances of the verbiage lagging the code. Back-patch to 9.3, where materialized views were introduced.	2013-07-05 15:37:51 -04:00
Jeff Davis	269e780822	Use posix_fallocate() for new WAL files, where available. This function is more efficient than actually writing out zeroes to the new file, per microbenchmarks by Jon Nelson. Also, it may reduce the likelihood of WAL file fragmentation. Jon Nelson, with review by Andres Freund, Greg Smith and me.	2013-07-05 12:30:29 -07:00
Fujii Masao	7842d41df5	Fix typo in comment. Michael Paquier	2013-07-05 02:47:49 +09:00
Robert Haas	6bc8ef0b7f	Add new GUC, max_worker_processes, limiting number of bgworkers. In 9.3, there's no particular limit on the number of bgworkers; instead, we just count up the number that are actually registered, and use that to set MaxBackends. However, that approach causes problems for Hot Standby, which needs both MaxBackends and the size of the lock table to be the same on the standby as on the master, yet it may not be desirable to run the same bgworkers in both places. 9.3 handles that by failing to notice the problem, which will probably work fine in nearly all cases anyway, but is not theoretically sound. A further problem with simply counting the number of registered workers is that new workers can't be registered without a postmaster restart. This is inconvenient for administrators, since bouncing the postmaster causes an interruption of service. Moreover, there are a number of applications for background processes where, by necessity, the background process must be started on the fly (e.g. parallel query). While this patch doesn't actually make it possible to register new background workers after startup time, it's a necessary prerequisite. Patch by me. Review by Michael Paquier.	2013-07-04 11:24:24 -04:00
Fujii Masao	2ef085d0e6	Get rid of pg_class.reltoastidxid. Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.	2013-07-04 03:24:09 +09:00
Robert Haas	3682025015	Add support for multiple kinds of external toast datums. To that end, support tags rather than lengths for external datums. As an example of how this can be used, add support or "indirect" tuples which point to some externally allocated memory containing a toast tuple. Similar infrastructure could be used for other purposes, including, perhaps, support for alternative compression algorithms. Andres Freund, reviewed by Hitoshi Harada and myself	2013-07-02 13:38:55 -04:00
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Heikki Linnakangas	79ce29c734	Retry short writes when flushing WAL. We don't normally bother retrying when the number of bytes written by write() is short of what was requested. It is generally assumed that a write() to disk doesn't return short, unless you run out of disk space. While writing the WAL, however, it seems prudent to try a bit harder, because a failure leads to PANIC. The write() is also much larger than most write()s in the backend (up to wal_buffers), so there's more room for surprises. Also retry on EINTR. All signals used in the backend are flagged SA_RESTART nowadays, so it shouldn't happen, but better to be defensive.	2013-07-01 09:36:00 +03:00
Heikki Linnakangas	ee6556555b	Inline ginCompareItemPointers function for speed. ginCompareItemPointers function is called heavily in gin index scans - inlining it speeds up some kind of queries a lot.	2013-06-29 12:55:34 +03:00
Noah Misch	19085116ee	Cooperate with the Valgrind instrumentation framework. Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its Memcheck tool about the PostgreSQL allocator. This makes Valgrind roughly as sensitive to memory errors involving palloc chunks as it is to memory errors involving malloc chunks. Further client requests in PageAddItem() and printtup() verify that all bits being added to a buffer page or furnished to an output function are predictably-defined. Those tests catch failures of C-language functions to fully initialize the bits of a Datum, which in turn stymie optimizations that rely on _equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to enable these additions. An included "suppression file" silences nominal errors we don't plan to fix. Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.	2013-06-26 20:22:25 -04:00
Noah Misch	1d96bb9602	Initialize pad bytes in GinFormTuple(). Every other core buffer page consumer initializes the bytes it furnishes to PageAddItem(). For consistency, do the same here. No back-patch; regardless, we couldn't count on the fix so long as binary upgrade can carry forward affected index builds.	2013-06-26 19:55:15 -04:00
Alvaro Herrera	4ca50e0710	Avoid inconsistent type declaration Clang 3.3 correctly complains that a variable of type enum MultiXactStatus cannot hold a value of -1, which makes sense. Change the declared type of the variable to int instead, and apply casting as necessary to avoid the warning. Per notice from Andres Freund	2013-06-25 16:41:47 -04:00
Simon Riggs	1f09121b4e	Ensure no xid gaps during Hot Standby startup In some cases with higher numbers of subtransactions it was possible for us to incorrectly initialize subtrans leading to complaints of missing pages. Bug report by Sergey Konoplev Analysis and fix by Andres Freund	2013-06-23 11:05:02 +01:00
Peter Eisentraut	7dfd5cd21c	Clarify terminology standalone backend vs. single-user mode Most of the documentation uses "single-user mode", so use that in the code as well. Adjust the documentation to match the new error message wording. Also add a documentation index entry for "single-user mode". Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>	2013-06-20 23:03:18 -04:00
Jeff Davis	b8fd1a09f3	Add buffer_std flag to MarkBufferDirtyHint(). MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.	2013-06-17 08:02:12 -07:00
Tom Lane	e472b92140	Avoid deadlocks during insertion into SP-GiST indexes. SP-GiST's original scheme for avoiding deadlocks during concurrent index insertions doesn't work, as per report from Hailong Li, and there isn't any evident way to make it work completely. We could possibly lock individual inner tuples instead of their whole pages, but preliminary experimentation suggests that the performance penalty would be huge. Instead, if we fail to get a buffer lock while descending the tree, just restart the tree descent altogether. We keep the old tuple positioning rules, though, in hopes of reducing the number of cases where this can happen. Teodor Sigaev, somewhat edited by Tom Lane	2013-06-14 14:26:43 -04:00
Tom Lane	c62866eeaf	Remove special-case treatment of LOG severity level in standalone mode. elog.c has historically treated LOG messages as low-priority during bootstrap and standalone operation. This has led to confusion and even masked a bug, because the normal expectation of code authors is that elog(LOG) will put something into the postmaster log, and that wasn't happening during initdb. So get rid of the special-case rule and make the priority order the same as it is in normal operation. To keep from cluttering initdb's output and the behavior of a standalone backend, tweak the severity level of three messages routinely issued by xlog.c during startup and shutdown so that they won't appear in these cases. Per my proposal back in December.	2013-06-13 23:15:15 -04:00
Noah Misch	fb435f40d5	Observe array length in HaveVirtualXIDsDelayingChkpt(). Since commit f21bb9cfb5646e1793dcc9c0ea697bab99afa523, this function ignores the caller-provided length and loops until it finds a terminator, which GetVirtualXIDsDelayingChkpt() never adds. Restore the previous loop control logic. In passing, revert the addition of an unused variable by the same commit, presumably a debugging relic.	2013-06-12 19:50:14 -04:00
Heikki Linnakangas	f73cb5567c	Fix typo in comment.	2013-06-06 18:27:01 +03:00
Stephen Frost	f129615fe7	Additional spelling corrections A few more minor spelling corrections, no functional changes. Thom Brown	2013-06-03 08:40:27 -04:00
Heikki Linnakangas	e1e2bb34f1	Code review of recycling WAL segments in a restartpoint. Seems cleaner to get the currently-replayed TLI in the same call to GetXLogReplayRecPtr that we get the WAL position. Make it more clear in the comment what the code does when recovery has already ended (RecoveryInProgress() will set ThisTimeLineID in that case). Finally, make resetting ThisTimeLineID afterwards more explicit.	2013-06-03 09:25:12 +03:00
Stephen Frost	c9fc28a7f1	Minor spelling fixes Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.	2013-06-01 10:18:59 -04:00
Stephen Frost	551938ae22	Post-pgindent cleanup Make slightly better decisions about indentation than what pgindent is capable of. Mostly breaking out long function calls into one line per argument, with a few other minor adjustments. No functional changes- all whitespace. pgindent ran cleanly (didn't change anything) after. Passes all regressions.	2013-06-01 09:38:15 -04:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Simon Riggs	22a27ef113	After fast promotion use CHECKPOINT_FORCE Not necessary for correctness, just to make log_checkpoints output look less singular. Requested by Fujii Masao	2013-05-21 21:27:12 +01:00
Simon Riggs	75a192638f	Maintain ThisTimeLineID correctly in checkpointer checkpointer needs to reset ThisTimeLineID after a restartpoint to allow installing/recycling new WAL files. If recovery has already ended this would leave ThisTimeLineID set incorrectly and so we must reset it otherwise later checkpoints do not have the correct timeline. Bug report by Heikki Linnakangas. Further investigation by Heikki and myself.	2013-05-21 21:17:04 +01:00
Simon Riggs	d4337a0dcb	Init crash recovery using the latest available TLI This simplifies the handling of crashes after fast promotion and various minor cases that can exist in short timing windows around that case. Broad fix to bug reported by Michael Paquier on -hackers, approach prompted by Heikki Linnakangas	2013-05-19 17:31:07 +01:00
Simon Riggs	1781744cfc	Emit msg correctly for timeline-crossing crash	2013-05-19 17:00:18 +01:00
Simon Riggs	c94dff4c3c	Remove single space on end of a line in xlog.c Michael Paquier	2013-05-19 15:38:47 +01:00
Tom Lane	e9c336c786	Fix handling of OID wraparound while in standalone mode. If OID wraparound should occur while in standalone mode (unlikely but possible), we want to advance the counter to FirstNormalObjectId not FirstBootstrapObjectId. Otherwise, user objects might be created with OIDs in the system-reserved range. That isn't immediately harmful but it poses a risk of conflicts during future pg_upgrade operations. Noted by Andres Freund. Back-patch to all supported branches, since all of them are supported sources for pg_upgrade operations.	2013-05-13 15:40:16 -04:00
Tom Lane	91715e8293	Fix management of fn_extra caching during repeated GiST index scans. Commit d22a09dc70f9830fa78c1cd1a3a453e4e473d354 introduced official support for GiST consistentFns that want to cache data using the FmgrInfo fn_extra pointer: the idea was to preserve the cached values across gistrescan(), whereas formerly they'd been leaked. However, there was an oversight in that, namely that multiple scan keys might reference the same column's consistentFn; the code would result in propagating the same cache value into multiple scan keys, resulting in crashes or wrong answers. Use a separate array instead to ensure that each scan key keeps its own state. Per bug #8143 from Joel Roller. Back-patch to 9.2 where the bug was introduced.	2013-05-09 23:09:04 -04:00
Heikki Linnakangas	2ffa66f497	Fix walsender failure at promotion. If a standby server has a cascading standby server connected to it, it's possible that WAL has already been sent up to the next WAL page boundary, splitting a WAL record in the middle, when the first standby server is promoted. Don't throw an assertion failure or error in walsender if that happens. Also, fix a variant of the same bug in pg_receivexlog: if it had already received WAL on previous timeline up to a segment boundary, when the upstream standby server is promoted so that the timeline switch record falls on the previous segment, pg_receivexlog would miss the segment containing the timeline switch. To fix that, have walsender send the position of the timeline switch at end-of-streaming, in addition to the next timeline's ID. It was previously assumed that the switch happened exactly where the streaming stopped. Note: this is an incompatible change in the streaming protocol. You might get an error if you try to stream over timeline switches, if the client is running 9.3beta1 and the server is more recent. It should be fine after a reconnect, however. Reported by Fujii Masao.	2013-05-08 20:30:17 +03:00
Heikki Linnakangas	cb953d8b1b	Use the term "radix tree" instead of "suffix tree" for SP-GiST text opclass. What we have implemented is a radix tree (or a radix trie or a patricia trie), but the docs and code comments incorrectly called it a "suffix tree". Alexander Korotkov	2013-05-08 14:34:26 +03:00
Simon Riggs	443951748c	Record data_checksum_version in control file. The value is not used anywhere in code, but will allow future changes to the checksum version should that become necessary in the future.	2013-04-30 12:27:12 +01:00
Simon Riggs	2317a63328	Make fast promotion the default promotion mode. Continue to allow a request for synchronous checkpoints as a mechanism in case of problems.	2013-04-24 12:21:18 +01:00
Heikki Linnakangas	87ae9e7265	Remove some unused and seldom used fields from RelationAmInfo. This saves some memory from each index relcache entry. At least on a 64-bit machine, it saves just enough to shrink a typical relcache entry's memory usage from 2k to 1k. That's nice if you have a lot of backends and a lot of indexes.	2013-04-16 15:07:58 +03:00
Robert Haas	4cff7b9dd6	Remove duplicate initialization in XLogReadRecord. Per a note from Dickson S. Guedes.	2013-04-09 23:59:31 -04:00
Heikki Linnakangas	594041311c	Fix calculation of how many segments to retain for wal_keep_segments. KeepLogSeg function was broken when we switched to use a 64-bit int for the segment number. Per report from Jeff Janes.	2013-04-08 16:29:56 +03:00
Simon Riggs	5787c6730e	Skip extraneous locking in XLogCheckBuffer(). Heikki reported comment was wrong, so fixed code to match the comment: we only need to take additional locking precautions when we have a shared lock on the buffer.	2013-04-08 09:11:49 +01:00

1 2 3 4 5 ...

2204 Commits