1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-06 07:49:08 +03:00
Commit Graph

1963 Commits

Author SHA1 Message Date
Magnus Hagander
fbec7459aa Fix spelling errors and typos in comments
Author: Daniel Gustafsson <daniel@yesql.se>
2018-11-02 13:56:52 +01:00
Andres Freund
62649bad83 Correct constness of a few variables.
This allows the compiler / linker to mark affected pages as read-only.

There's other cases, but they're a bit more invasive, and should go
through some review. These are easy.

They were found with
objdump -j .data -t src/backend/postgres|awk '{print $4, $5, $6}'|sort -r|less

Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de
2018-10-15 21:01:14 -07:00
Michael Paquier
1df21ddb19 Avoid duplicate XIDs at recovery when building initial snapshot
On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby.  The set of transaction IDs is created
by scanning all the entries in ProcArray.  However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes.  This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.

So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design.  If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice.  It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.

XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.

Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
Backpatch-through: 9.3
2018-10-14 22:23:21 +09:00
Michael Paquier
09921f397b Refactor user-facing SQL functions signalling backends
This moves the system administration functions for signalling backends
from backend/utils/adt/misc.c into a separate file dedicated to backend
signalling.  No new functionality is introduced in this commit.

Author: Daniel Gustafsson
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/C2C7C3EC-CC5F-44B6-9C78-637C88BD7D14@yesql.se
2018-10-04 18:27:25 +09:00
Tom Lane
b04aeb0a05 Add assertions that we hold some relevant lock during relation open.
Opening a relation with no lock at all is unsafe; there's no guarantee
that we'll see a consistent state of the relevant catalog entries.
While use of MVCC scans to read the catalogs partially addresses that
complaint, it's still possible to switch to a new catalog snapshot
partway through loading the relcache entry.  Moreover, whether or not
you trust the reasoning behind sometimes using less than
AccessExclusiveLock for ALTER TABLE, that reasoning is certainly not
valid if concurrent users of the table don't hold a lock corresponding
to the operation they want to perform.

Hence, add some assertion-build-only checks that require any caller
of relation_open(x, NoLock) to hold at least AccessShareLock.  This
isn't a full solution, since we can't verify that the lock level is
semantically appropriate for the action --- but it's definitely of
some use, because it's already caught two bugs.

We can also assert that callers of addRangeTableEntryForRelation()
hold at least the lock level specified for the new RTE.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/16565.1538327894@sss.pgh.pa.us
2018-10-01 12:43:21 -04:00
Alexander Korotkov
2f39106a20 Replace CAS loop with single TAS in ProcArrayGroupClearXid()
Single pg_atomic_exchange_u32() is expected to be faster than loop of
pg_atomic_compare_exchange_u32().  Also, it would be consistent with
clog group update code.

Discussion: https://postgr.es/m/CAPpHfdtxLsC-bqfxFcHswZ91OxXcZVNDBBVfg9tAWU0jvn1tQA%40mail.gmail.com
Reviewed-by: Amit Kapila
2018-09-22 16:22:30 +03:00
Tom Lane
8f0de712c3 Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock.
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock
that checked the return value of LockAcquireExtended.  That created a
bug, because it's still passing reportMemoryError = false to
LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned
if we're out of shared memory for the lock table.

In such a situation, the startup process would believe it had acquired an
exclusive lock even though it hadn't, with potentially dire consequences.

To fix, just drop the use of reportMemoryError = false, which allows us
to simplify the call into a plain LockAcquire().  It's unclear that the
locktable-full situation arises often enough that it's worth having a
better recovery method than crash-and-restart.  (I strongly suspect that
the only reason the code path existed at all was that it was relatively
simple to do in the pre-37c54863c implementation.  But now it's not.)

LockAcquireExtended's reportMemoryError parameter is now dead code and
could be removed.  I refrained from doing so, however, because there
was some interest in resurrecting the behavior if we do get reports of
locktable-full failures in the field.  Also, it seems unwise to remove
the parameter concurrently with shipping commit f868a8143, which added a
parameter; if there are any third-party callers of LockAcquireExtended,
we want them to get a wrong-number-of-parameters compile error rather
than a possibly-silent misinterpretation of its last parameter.

Back-patch to 9.6 where the bug was introduced.

Discussion: https://postgr.es/m/6202.1536359835@sss.pgh.pa.us
2018-09-19 12:43:51 -04:00
Thomas Munro
422952ee78 Allow DSM allocation to be interrupted.
Chris Travers reported that the startup process can repeatedly try to
cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
to loop forever.  Teach the retry loop to give up if an interrupt is
pending.  Don't actually check for interrupts in that loop though,
because a non-local exit would skip some clean-up code in the caller.

Back-patch to 9.4 where DSM was added (and posix_fallocate() was later
back-patched).

Author: Chris Travers
Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin
Tested-by: Oleksii Kliukin
Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com
2018-09-18 22:56:36 +12:00
Michael Paquier
9226a3b89b Remove duplicated words split across lines in comments
This has been detected using some interesting tricks with sed, and the
method used is mentioned in details in the discussion below.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com
2018-09-08 12:24:19 -07:00
Tom Lane
f868a8143a Fix longstanding recursion hazard in sinval message processing.
LockRelationOid and sibling routines supposed that, if our session already
holds the lock they were asked to acquire, they could skip calling
AcceptInvalidationMessages on the grounds that we must have already read
any remote sinval messages issued against the relation being locked.
This is normally true, but there's a critical special case where it's not:
processing inside AcceptInvalidationMessages might attempt to access system
relations, resulting in a recursive call to acquire a relation lock.

Hence, if the outer call had acquired that same system catalog lock, we'd
fall through, despite the possibility that there's an as-yet-unread sinval
message for that system catalog.  This could, for example, result in
failure to access a system catalog or index that had just been processed
by VACUUM FULL.  This is the explanation for buildfarm failures we've been
seeing intermittently for the past three months.  The bug is far older
than that, but commits a54e1f158 et al added a new recursion case within
AcceptInvalidationMessages that is apparently easier to hit than any
previous case.

To fix this, we must not skip calling AcceptInvalidationMessages until
we have *finished* a call to it since acquiring a relation lock, not
merely acquired the lock.  (There's already adequate logic inside
AcceptInvalidationMessages to deal with being called recursively.)
Fortunately, we can implement that at trivial cost, by adding a flag
to LOCALLOCK hashtable entries that tracks whether we know we have
completed such a call.

There is an API hazard added by this patch for external callers of
LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
into thinking the lock wasn't already held.  This should be a fail-soft
condition, though, unless something very bizarre is being done in
response to the test.

Also, I added an additional output argument to LockAcquireExtended,
assuming that that probably isn't called by any outside code given
the very limited usefulness of its additional functionality.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07 18:04:54 -04:00
Tom Lane
44cac93464 Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:17 -04:00
Andres Freund
143290efd0 Introduce minimal C99 usage to verify compiler support.
This just converts a few for loops in postgres.c to declare variables
in the loop initializer, and uses designated initializers in smgr.c's
definition of smgr callbacks.

Author: Andres Freund
Discussion: https://postgr.es/m/97d4b165-192d-3605-749c-f614a0c4e783@2ndquadrant.com
2018-08-23 18:36:07 -07:00
Michael Paquier
246a6c8f7b Make autovacuum more aggressive to remove orphaned temp tables
Commit dafa084, added in 10, made the removal of temporary orphaned
tables more aggressive.  This commit makes an extra step into the
aggressiveness by adding a flag in each backend's MyProc which tracks
down any temporary namespace currently in use.  The flag is set when the
namespace gets created and can be reset if the temporary namespace has
been created in a transaction or sub-transaction which is aborted.  The
flag value assignment is assumed to be atomic, so this can be done in a
lock-less fashion like other flags already present in PGPROC like
databaseId or backendId, still the fact that the temporary namespace and
table created are still locked until the transaction creating those
commits acts as a barrier for other backends.

This new flag gets used by autovacuum to discard more aggressively
orphaned tables by additionally checking for the database a backend is
connected to as well as its temporary namespace in-use, removing
orphaned temporary relations even if a backend reuses the same slot as
one which created temporary relations in a past session.

The base idea of this patch comes from Robert Haas, has been written in
its first version by Tsunakawa Takayuki, then heavily reviewed by me.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
breakages on already released versions.
2018-08-13 11:49:04 +02:00
Tom Lane
130beba36d Fix inadequate buffer locking in FSM and VM page re-initialization.
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking.  There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.

This does not add any locking overhead in the normal code path where the
page is OK.  It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.

Problem noted by R P Asim.  It's been like this for a long time, so
back-patch to all supported branches.

Discussion: https://postgr.es/m/CANXE4Te4G0TGq6cr0-TvwP0H4BNiK_-hB5gHe8mF+nz0mcYfMQ@mail.gmail.com
2018-07-13 11:53:10 -04:00
Peter Eisentraut
5e6e2c8773 Reset shmem_exit_inprogress after shmem_exit()
In ad9a274778, shmem_exit_inprogress was
introduced.  But we need to reset it after shmem_exit(), because unlike
the similar proc_exit(), shmem_exit() can also be called for cleanup
when the process will not exit.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
2018-07-12 20:22:17 +02:00
Alexander Korotkov
a01d0fa1d8 Fix wrong file path in header comment
Header comment of shm_mq.c was mistakenly specifying path to shm_mq.h.
It was introduced in ec9037df.  So, theoretically it could be
backpatched to 9.4, but it doesn't seem to worth it.
2018-07-11 13:16:46 +03:00
Thomas Munro
f98b8476cd Use signals for postmaster death on FreeBSD.
Use FreeBSD 11.2's new support for detecting parent process death to
make PostmasterIsAlive() very cheap, as was done for Linux in an
earlier commit.

Author: Thomas Munro
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
2018-07-11 13:14:07 +12:00
Thomas Munro
9f09529952 Use signals for postmaster death on Linux.
Linux provides a way to ask for a signal when your parent process dies.
Use that to make PostmasterIsAlive() very cheap.

Based on a suggestion from Andres Freund.

Author: Thomas Munro, Heikki Linnakangas
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi
Discussion: https://postgr.es/m/20180411002643.6buofht4ranhei7k%40alap3.anarazel.de
2018-07-11 12:47:06 +12:00
Peter Eisentraut
bcbd940806 Remove dynamic_shared_memory_type=none
PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms.  Having the choice of "none" prevents us from
relying on DSM in core features.  So this patch removes the choice of
"none".

Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
2018-07-10 18:35:24 +02:00
Fujii Masao
b41669118c Improve the performance of relation deletes during recovery.
When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.

To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.

This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
2018-07-05 02:23:46 +09:00
Andrew Dunstan
1e9c858090 pgindent run prior to branching 2018-06-30 12:25:49 -04:00
Thomas Munro
a40cff8956 Move RecoveryLockList into a hash table.
Standbys frequently need to release all locks held by a given xid.
Instead of searching one big list linearly, let's create one list
per xid and put them in a hash table, so we can find what we need
in O(1) time.

Earlier analysis and a prototype were done by David Rowley, though
this isn't his patch.

Back-patch all the way.

Author: Thomas Munro
Diagnosed-by: David Rowley, Andres Freund
Reviewed-by: Andres Freund, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAEepm%3D1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA%40mail.gmail.com
Discussion: https://postgr.es/m/CAKJS1f9vJ841HY%3DwonnLVbfkTWGYWdPN72VMxnArcGCjF3SywA%40mail.gmail.com
2018-06-26 18:45:45 +12:00
Simon Riggs
15378c1a15 Remove AELs from subxids correctly on standby
Issues relate only to subtransactions that hold AccessExclusiveLocks
when replayed on standby.

Prior to PG10, aborting subtransactions that held an
AccessExclusiveLock failed to release the lock until top level commit or
abort. 49bff5300d fixed that.

However, 49bff5300d also introduced a similar bug where subtransaction
commit would fail to release an AccessExclusiveLock, leaving the lock to
be removed sometimes early and sometimes late. This commit fixes
that bug also. Backpatch to PG10 needed.

Tested by observation. Note need for multi-node isolationtester to improve
test coverage for this and other HS cases.

Reported-by: Simon Riggs
Author: Simon Riggs
2018-06-16 14:03:29 +01:00
Tatsuo Ishii
1cfdb1cb0e Fix memory leak in BufFileCreateShared().
Also this commit unifies some duplicated code in makeBufFile() and
BufFileOpenShared() into new function makeBufFileCommon().

Author: Antonin Houska
Reviewed-By: Thomas Munro, Tatsuo Ishii
Discussion: https://postgr.es/m/16139.1529049566%40localhost
2018-06-16 14:21:08 +09:00
Tatsuo Ishii
969274d813 Fix memory leak.
Memory is allocated twice for "file" and "files" variables in
BufFileOpenShared().

Author: Antonin Houska
Discussion: https://postgr.es/m/11329.1529045692%40localhost
2018-06-15 16:32:59 +09:00
Simon Riggs
dc878ffedf Remove spurious code comments in standby related code
GetRunningTransactionData() suggested that subxids were not worth
optimizing away if overflowed, yet they have already been removed
for that case.

Changes to LogAccessExclusiveLock() API forgot to remove the
prior comment when it was copied to LockAcquire().
2018-06-14 12:17:51 +01:00
Simon Riggs
802bde87ba Remove cut-off bug from RunningTransactionData
32ac7a118f tried to fix a Hot Standby issue
reported by Greg Stark, but in doing so caused
a different bug to appear, noted by Andres Freund.

Revoke the core changes from 32ac7a118f,
leaving in its place a minor change in code
ordering and comments to explain for the future.
2018-06-14 12:02:41 +01:00
Simon Riggs
32ac7a118f Exclude VACUUMs from RunningXactData
GetRunningTransactionData() should ignore VACUUM procs because in some
cases they are assigned xids. This could lead to holding back xmin via
the route of passing the xid to standby and then having that hold back
xmin on master via feedback.

Backpatch to 9.1 needed, but will only do so on supported versions.
Backpatch once proven on the buildfarm.

Reported-by: Greg Stark
Author: Simon Riggs
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CANP8+jJBYt=4PpTfiPb0UrH1_iPhzsxKH5Op_Wec634F0ohnAw@mail.gmail.com
2018-06-07 20:38:12 +01:00
Tom Lane
1d96c1b91a Fix incorrect ordering of operations in pg_resetwal and pg_rewind.
Commit c37b3d08c dropped its added GetDataDirectoryCreatePerm call into
the wrong place in pg_resetwal.c, namely after the chdir to DataDir.
That broke invocations using a relative path, as reported by Tushar Ahuja.
We could have left it where it was and changed the argument to be ".",
but that'd result in a rather confusing error message in event of a
failure, so re-ordering seems like a better solution.

Similarly reorder operations in pg_rewind.c.  The issue there is that
it doesn't seem like a good idea to do any actual operations before the
not-root check (on Unix) or the restricted token acquisition (on Windows).
I don't know that this is an actual bug, but I'm definitely not convinced
that it isn't, either.

Assorted other code review for c37b3d08c and da9b580d8: fix some
misspelled or otherwise badly worded comments, put the #include for
<sys/stat.h> where it actually belongs, etc.

Discussion: https://postgr.es/m/aeb9c3a7-3c3f-a57f-1a18-c8d4fcdc2a1f@enterprisedb.com
2018-05-23 10:59:55 -04:00
Teodor Sigaev
0bef1c0678 Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.

This fixes two bugs:

1. If fast update was turned on concurrently, subsequent inserts to the
   pending list would not conflict with predicate locks that were acquired
   earlier, on entry pages. The included 'predicate-gin-fastupdate' test
   demonstrates that. To fix, make all scans acquire a predicate lock on
   the metapage. That lock represents a scan of the pending list, whether
   or not there is a pending list at the moment. Forget about the
   optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
   point of predicate locks is to lock the gabs between values, whether
   or not there is a match. The included 'predicate-gin-nomatch' test
   tests that case.

In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)

Also, some spelling  fixes.

Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
2018-05-04 11:27:50 +03:00
Tom Lane
fbb2e9a030 Fix assorted compiler warnings seen in the buildfarm.
Failure to use DatumGetFoo/FooGetDatum macros correctly, or at all,
causes some warnings about sign conversion.  This is just cosmetic
at the moment but in principle it's a type violation, so clean up
the instances I could find.

autoprewarm.c and sharedfileset.c contained code that unportably
assumed that pid_t is the same size as int.  We've variously dealt
with this by casting pid_t to int or to unsigned long for printing
purposes; I went with the latter.

Fix uninitialized-variable warning in RestoreGUCState.  This is
a live bug in some sense, but of no great significance given that
nobody is very likely to care what "line number" is associated with
a GUC that hasn't got a source file recorded.
2018-05-02 15:52:54 -04:00
Heikki Linnakangas
445e31bdc7 Fix some sloppiness in the new BufFileSize() and BufFileAppend() functions.
There were three related issues:

* BufFileAppend() incorrectly reset the seek position on the 'source' file.
  As a result, if you had called BufFileRead() on the file before calling
  BufFileAppend(), it got confused, and subsequent calls would read/write
  at wrong position.

* BufFileSize() did not work with files opened with BufFileOpenShared().

* FileGetSize() only worked on temporary files.

To fix, change the way BufFileSize() works so that it works on shared
files. Remove FileGetSize() altogether, as it's no longer needed. Remove
buffilesize from TapeShare struct, as the leader process can simply call
BufFileSize() to get the tape's size, there's no need to pass it through
shared memory anymore.

Discussion: https://www.postgresql.org/message-id/CAH2-WznEDYe_NZXxmnOfsoV54oFkTdMy7YLE2NPBLuttO96vTQ@mail.gmail.com
2018-05-02 17:23:13 +03:00
Tom Lane
9cb7db3f0c In AtEOXact_Files, complain if any files remain unclosed at commit.
This change makes this module act more like most of our other low-level
resource management modules.  It's a caller error if something is not
explicitly closed by the end of a successful transaction, so issue
a WARNING about it.  This would not actually have caught the file leak
bug fixed in commit 231bcd080, because that was in a transaction-abort
path; but it still seems like a good, and pretty cheap, cross-check.

Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
2018-04-28 17:45:02 -04:00
Peter Eisentraut
d4f16d5071 perltidy: Add option --nooutdent-long-quotes 2018-04-27 11:37:43 -04:00
Tom Lane
bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Tom Lane
231bcd0803 Fix incorrect close() call in dsm_impl_mmap().
One improbable error-exit path in this function used close() where
it should have used CloseTransientFile().  This is unlikely to be
hit in the field, and I think the consequences wouldn't be awful
(just an elog(LOG) bleat later).  But a bug is a bug, so back-patch
to 9.4 where this code came in.

Pan Bian

Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
2018-04-10 18:34:54 -04:00
Tom Lane
af1a949109 Further cleanup of client dependencies on src/include/catalog headers.
In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h
should also be considered a frontend-unsafe header, because it includes
(and needs) the full form of pg_class.h, not to mention relcache.h.
However, various frontend code was depending on it to get
TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for.

The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY,
as well as the OIDCHARS symbol, to common/relpath.h.  Do that, and mop up
inclusions as necessary.  (I found that quite a few current users of
catalog/catalog.h don't seem to need it at all anymore, apparently as a
result of the refactorings that created common/relpath.[hc].  And
initdb.c needed it only as a route to pg_class_d.h.)

Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us
2018-04-09 14:39:58 -04:00
Magnus Hagander
a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00
Stephen Frost
da9b580d89 Refactor dir/file permissions
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele <david@pgmasters.net>,
         Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Teodor Sigaev
b508a56f2f Predicate locking in hash indexes.
Hash index searches acquire predicate locks on the primary
page of a bucket. It acquires a lock on both the old and new buckets
for scans that happen concurrently with page splits. During a bucket
split, a predicate lock is copied from the primary page of an old
bucket to the primary page of a new bucket.

Author: Shubham Barai, Amit Kapila
Reviewed by: Amit Kapila, Alexander Korotkov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPvNsM2GTiXdRgaaZ1Pjd1bs+sxfFsf7Ytr+iq+5JJoYXA@mail.gmail.com
2018-04-07 16:59:14 +03:00
Magnus Hagander
1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Tom Lane
0b11a674fb Fix a boatload of typos in C comments.
Justin Pryzby

Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
2018-04-01 15:01:28 -04:00
Tom Lane
e5eb4fa873 Remove obsolete SLRU wrapping and warnings from predicate.c.
When SSI was developed, slru.c was limited to segment files with names in
the range 0000-FFFF.  This didn't allow enough space for predicate.c to
store every possible XID when spilling old transactions to disk, so it
would wrap around sooner and print warnings.  Since commits 638cf09e and
73c986ad increased the number of segment files slru.c could manage, that
behavior is unnecessary.  Therefore remove that code.

Also remove the macro OldSerXidSegment, which has been unused since
4cd3fb6e.

Thomas Munro, reviewed by Anastasia Lubennikova

Discussion: https://postgr.es/m/CAEepm=3XfsTSxgEbEOmxu0QDiXy0o18NUg2nC89JZcCGE+XFPA@mail.gmail.com
2018-03-30 15:11:39 -04:00
Teodor Sigaev
43d1ed60fd Predicate locking in GIN index
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.

Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
2018-03-30 14:23:17 +03:00
Tom Lane
2b1759e267 Remove unnecessary BufferGetPage() calls in fsm_vacuum_page().
Just noticed that these were quite redundant, since we're holding the
page address in a local variable anyway, and we have pin on the buffer
throughout.

Also improve a comment.
2018-03-29 12:44:19 -04:00
Tom Lane
a063baaced Remove UpdateFreeSpaceMap(), use FreeSpaceMapVacuumRange() instead.
FreeSpaceMapVacuumRange has the same effect, is more efficient if many
pages are involved, and makes fewer assumptions about how it's used.
Notably, Claudio Freire pointed out that UpdateFreeSpaceMap could fail
if the specified freespace value isn't the maximum possible.  This isn't
a problem for the single existing user, but the function represents an
attractive nuisance IMO, because it's named as though it were a
general-purpose update function and its limitations are undocumented.
In any case we don't need multiple ways to get the same result.

In passing, do some code review and cleanup in RelationAddExtraBlocks.
In particular, I see no excuse for it to omit the PageIsNew safety check
that's done in the mainline extension path in RelationGetBufferForTuple.

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 12:22:44 -04:00
Bruce Momjian
bc0021ef09 C comment: fix wording about shared memory message queue
Reported-by: Tels

Discussion: https://postgr.es/m/e66e05bc55f5ce904e361ad17a3395ae.squirrel@sm.webmail.pair.com
2018-03-29 12:18:42 -04:00
Tom Lane
851a26e266 While vacuuming a large table, update upper-level FSM data every so often.
VACUUM updates leaf-level FSM entries immediately after cleaning the
corresponding heap blocks.  fsmpage.c updates the intra-page search trees
on the leaf-level FSM pages when this happens, but it does not touch the
upper-level FSM pages, so that the released space might not actually be
findable by searchers.  Previously, updating the upper-level pages happened
only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum()
call.  This is bad because the VACUUM might get canceled before ever
reaching that point, so that from the point of view of searchers no space
has been freed at all, leading to table bloat.

We can improve matters by updating the upper pages immediately after each
cycle of index-cleaning and heap-cleaning, processing just the FSM pages
corresponding to the range of heap blocks we have now fully cleaned.
This adds a small amount of extra work, since the FSM pages leading down
to each range boundary will be touched twice, but it's pretty negligible
compared to everything else going on in a large VACUUM.

If there are no indexes, VACUUM doesn't work in cycles but just cleans
each heap page on first visit.  In that case we just arbitrarily update
upper FSM pages after each 8GB of heap.  That maintains the goal of not
letting all this work slide until the very end, and it doesn't seem worth
expending extra complexity on a case that so seldom occurs in practice.

In either case, the FSM is fully up to date before any attempt is made
to truncate the relation, so that the most likely scenario for VACUUM
cancellation no longer results in out-of-date upper FSM pages.  When
we do successfully truncate, adjusting the FSM to reflect that is now
fully handled within FreeSpaceMapTruncateRel.

Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 11:29:54 -04:00
Teodor Sigaev
920a5e500a Skip temp tables from basebackup.
Do not store temp tables in basebackup, they will not be visible anyway, so,
there are not reasons to store them.

Author: David Steel
Reviewed by: me
Discussion: https://www.postgresql.org/message-id/flat/5ea4d26a-a453-c1b7-eff9-5a3ef8f8aceb@pgmasters.net
2018-03-27 16:14:40 +03:00
Teodor Sigaev
3ad55863e9 Add predicate locking for GiST
Add page-level predicate locking, due to gist's code organization, patch seems
close to trivial: add check before page changing, add predicate lock before page
scanning.  Although choosing right place to check is not simple: it should not
be called during index build, it should support insertion of new downlink and so
on.

Author: Shubham Barai with editorization by me and Alexander Korotkov
Reviewed by: Alexander Korotkov, Andrey Borodin, me
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com
2018-03-27 15:43:19 +03:00