1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-21 00:42:43 +03:00
Commit Graph

1661 Commits

Author SHA1 Message Date
Robert Haas
7191ce8bea Make all built-in lwlock tranche IDs fixed.
This makes the values more stable, which seems like a good thing for
anybody who needs to look at at them.

Alexander Korotkov and Amit Kapila
2016-02-02 06:45:55 -05:00
Robert Haas
2251179e6a Migrate replication slot I/O locks into a separate tranche.
This is following in a long train of similar changes and for the same
reasons - see b319356f0e and
fe702a7b3f inter alia.

Author: Amit Kapila
Reviewed-by: Alexander Korotkov, Robert Haas
2016-01-29 09:45:38 -05:00
Robert Haas
b319356f0e Migrate PGPROC's backendLock into PGPROC itself, using a new tranche.
Previously, each PGPROC's backendLock was part of the main tranche,
and the PGPROC just contained a pointer.  Now, the actual LWLock is
part of the PGPROC.

As with previous, similar patches, this makes it significantly easier
to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output
and improves modularity.

Author: Ildus Kurbangaliev
Reviewed-by: Amit Kapila, Robert Haas
2016-01-29 08:14:28 -05:00
Simon Riggs
1129c2b0ad Correct comment in GetConflictingVirtualXIDs()
We use Share lock because it is safe to do so.
2016-01-24 10:22:11 -08:00
Bruce Momjian
ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Noah Misch
dfcd9cb302 Cover heap_page_prune_opt()'s cleanup lock tactic in README.
Jeff Janes, reviewed by Jim Nasby.
2016-01-01 21:52:22 -05:00
Robert Haas
049469e7e7 Teach mdnblocks() not to create zero-length files.
It's entirely surprising that mdnblocks() has the side effect of
creating new files on disk, so let's make it not do that.  One
consequence of the old behavior is that, if running on a damaged
cluster that is missing a file, mdnblocks() can recreate the file
and allow a subsequent _mdfd_getseg() for a higher segment to succeed.
This happens because, while mdnblocks() stops when it finds a segment
that is shorter than 1GB, _mdfd_getseg() has no such check, and thus
the empty file created by mdnblocks() can allow it to continue its
traversal and find higher-numbered segments which remain.

It might be a good idea for _mdfd_getseg() to actually verify that
each segment it finds is exactly 1GB before proceeding to the next
one, but that would involve some additional system calls, so for
now I'm just doing this much.

Patch by me, per off-list analysis by Kevin Grittner and Rahila Syed.
Review by Andres Freund.
2015-12-15 13:57:45 -05:00
Robert Haas
6150a1b08a Move buffer I/O and content LWLocks out of the main tranche.
Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two.  Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).

We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche.  This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.

Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.

Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.
2015-12-15 13:32:54 -05:00
Robert Haas
3fed417452 Provide a way to predefine LWLock tranche IDs.
It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID.  So, for built-in tranches,
use a hard-coded value instead.

This is motivated by an upcoming patch adding further built-in tranches.

Andres Freund and Robert Haas
2015-12-15 11:48:19 -05:00
Andres Freund
2a3544960e Correct statement to actually be the intended assert statement.
e3f4cfc7 introduced a LWLockHeldByMe() call, without the corresponding
Assert() surrounding it.

Spotted by Coverity.

Backpatch: 9.1+, like the previous commit
2015-12-14 11:23:24 +01:00
Andres Freund
e3f4cfc7aa Fix bug leading to restoring unlogged relations from empty files.
At the end of crash recovery, unlogged relations are reset to the empty
state, using their init fork as the template. The init fork is copied to
the main fork without going through shared buffers. Unfortunately WAL
replay so far has not necessarily flushed writes from shared buffers to
disk at that point. In normal crash recovery, and before the
introduction of 'fast promotions' in fd4ced523 / 9.3, the
END_OF_RECOVERY checkpoint flushes the buffers out in time. But with
fast promotions that's not the case anymore.

To fix, force WAL writes targeting the init fork to be flushed
immediately (using the new FlushOneBuffer() function). In 9.5+ that
flush can centrally be triggered from the code dealing with restoring
full page writes (XLogReadBufferForRedoExtended), in earlier releases
that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay
function.

Backpatch to 9.1, even if this currently is only known to trigger in
9.3+. Flushing earlier is more robust, and it is advantageous to keep
the branches similar.

Typical symptoms of this bug are errors like
'ERROR:  index "..." contains unexpected zero page at block 0'
shortly after promoting a node.

Reported-By: Thom Brown
Author: Andres Freund and Michael Paquier
Discussion: 20150326175024.GJ451@alap3.anarazel.de
Backpatch: 9.1-
2015-12-10 16:29:26 +01:00
Peter Eisentraut
5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Robert Haas
e93b62985f Remove volatile qualifiers from bufmgr.c and freelist.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Review by Andres Freund
2015-11-16 18:50:06 -05:00
Robert Haas
fe702a7b3f Move each SLRU's lwlocks to a separate tranche.
This makes it significantly easier to identify these lwlocks in
LWLOCK_STATS or Trace_lwlocks output.  It's also arguably better
from a modularity standpoint, since lwlock.c no longer needs to
know anything about the LWLock needs of the higher-level SLRU
facility.

Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
2015-11-12 14:59:09 -05:00
Robert Haas
4efe26cbd3 shm_mq: Third attempt at fixing nowait behavior in shm_mq_receive.
Commit a1480ec1d3 purported to fix the
problems with commit b2ccb5f4e6, but it
didn't completely fix them.  The problem is that the checks were
performed in the wrong order, leading to a race condition.  If the
sender attached, sent a message, and detached after the receiver
called shm_mq_get_sender and before the receiver called
shm_mq_counterparty_gone, we'd incorrectly return SHM_MQ_DETACHED
before all messages were read.  Repair by reversing the order of
operations, and add a long comment explaining why this new logic is
(hopefully) correct.
2015-11-03 09:12:52 -05:00
Tom Lane
620ac88d6f Remove some more dead Alpha-specific code. 2015-11-02 19:37:51 -05:00
Kevin Grittner
585e2a3b1a Fix serialization anomalies due to race conditions on INSERT.
On insert the CheckForSerializableConflictIn() test was performed
before the page(s) which were going to be modified had been locked
(with an exclusive buffer content lock).  If another process
acquired a relation SIReadLock on the heap and scanned to a page on
which an insert was going to occur before the page was so locked,
a rw-conflict would be missed, which could allow a serialization
anomaly to be missed.  The window between the check and the page
lock was small, so the bug was generally not noticed unless there
was high concurrency with multiple processes inserting into the
same table.

This was reported by Peter Bailis as bug #11732, by Sean Chittenden
as bug #13667, and by others.

The race condition was eliminated in heap_insert() by moving the
check down below the acquisition of the buffer lock, which had been
the very next statement.  Because of the loop locking and unlocking
multiple buffers in heap_multi_insert() a check was added after all
inserts were completed.  The check before the start of the inserts
was left because it might avoid a large amount of work to detect a
serialization anomaly before performing the all of the inserts and
the related WAL logging.

While investigating this bug, other SSI bugs which were even harder
to hit in practice were noticed and fixed, an unnecessary check
(covered by another check, so redundant) was removed from
heap_update(), and comments were improved.

Back-patch to all supported branches.

Kevin Grittner and Thomas Munro
2015-10-31 14:43:34 -05:00
Robert Haas
a1480ec1d3 shm_mq: Repair breakage from previous commit.
If the counterparty writes some data into the queue and then detaches,
it's wrong to return SHM_MQ_DETACHED right away.  If we do that, we
fail to read whatever was written.
2015-10-22 22:01:11 -04:00
Robert Haas
b2ccb5f4e6 shm_mq: Fix failure to notice a dead counterparty when nowait is used.
The shm_mq mechanism was intended to optionally notice when the process
on the other end of the queue fails to attach to the queue.  It does
this by allowing the user to pass a BackgroundWorkerHandle; if the
background worker in question is launched and dies without attaching
to the queue, then we know it never will.  This logic works OK in
blocking mode, but when called with nowait = true we fail to notice
that this has happened due to an asymmetry in the logic.  Repair.

Reported off-list by Rushabh Lathia.  Patch by me.
2015-10-22 16:33:30 -04:00
Robert Haas
d53e3d5fe0 Remove volatile qualifiers from proc.c and procarray.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Michael Paquier
2015-10-16 14:20:36 -04:00
Robert Haas
430008b5a7 Remove volatile qualifiers from dynahash.c, shmem.c, and sinvaladt.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Thomas Munro
2015-10-16 14:14:15 -04:00
Robert Haas
db0f6cad48 Remove set_latch_on_sigusr1 flag.
This flag has proven to be a recipe for bugs, and it doesn't seem like
it can really buy anything in terms of performance.  So let's just
*always* set the process latch when we receive SIGUSR1 instead of
trying to do it only when needed.

Per my recent proposal on pgsql-hackers.
2015-10-09 14:31:04 -04:00
Robert Haas
f40792a93c Use LOCKBIT_ON() instead of a bit shift in a few places.
We do this mostly everywhere, so it seems just as well to do it here,
too.

Thomas Munro
2015-09-28 10:57:15 -04:00
Andres Freund
4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Andres Freund
98d5b084d2 Correct value of LW_SHARED_MASK.
The previous wrong value lead to wrong LOCK_DEBUG output, never showing
any shared lock holders.

Reported-By: Alexander Korotkov
Discussion: CAPpHfdsPmWqz9FB0AnxJrwp1=KLF0n=-iB+QvR0Q8GSmpFVdUQ@mail.gmail.com
Backpatch: 9.5, where the bug was introduced.
2015-09-22 11:14:14 +02:00
Tom Lane
ba51774d87 Be more wary about partially-valid LOCALLOCK data in RemoveLocalLock().
RemoveLocalLock() must consider the possibility that LockAcquireExtended()
failed to palloc the initial space for a locallock's lockOwners array.
I had evidently meant to cope with this hazard when the code was originally
written (commit 1785acebf2), but missed that
the pfree needed to be protected with an if-test.  Just to make sure things
are left in a clean state, reset numLockOwners as well.

Per low-memory testing by Andreas Seltenreich.  Back-patch to all supported
branches.
2015-09-20 16:48:44 -04:00
Robert Haas
4a4e6893aa Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends.  However, shm_mq itself only deals in raw
bytes.  Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them.  This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.

The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion.  Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.

This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.

Extracted from Amit Kapila's parallel sequential scan patch.  This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:56:58 -04:00
Robert Haas
2ccc4e972e Fix build problems in commit aa65de042f.
The previous way didn't work for vpath builds, and make distprep was
busted too.

Reported off-list by Andres Freund.
2015-09-11 14:56:17 -04:00
Robert Haas
aa65de042f When trace_lwlocks is used, identify individual lwlocks by name.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.

Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
2015-09-11 14:01:39 -04:00
Alvaro Herrera
1aba62ec63 Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.

Author: Julien Rouhaud
Reviewed by: Andres Freund
2015-09-08 12:51:42 -03:00
Robert Haas
4aec49899e Assorted code review for recent ProcArrayLock patch.
Post-commit review by Andres Freund discovered a couple of concurrency
bugs in the original patch: specifically, if the leader cleared a
follower's XID before it reached PGSemaphoreLock, the semaphore would be
left in the wrong state; and if another process did PGSemaphoreUnlock
for some unrelated reason, we might resume execution before the fact
that our XID was cleared was globally visible.

Also, improve the wording of some comments, rename nextClearXidElem
to firstClearXidElem in PROC_HDR for clarity, and drop some volatile
qualifiers that aren't necessary.

Amit Kapila, reviewed and slightly revised by me.
2015-09-03 13:19:15 -04:00
Andres Freund
e95126cf04 Don't use function definitions looking like old-style ones.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
2015-08-15 17:25:00 +02:00
Andres Freund
d25fbf9f3e Fix two off-by-one errors in bufmgr.c.
In 4b4b680c I passed a buffer index number (starting from 0) instead of
a proper Buffer id (which start from 1 for shared buffers) in two
places.

This wasn't noticed so far as one of those locations isn't compiled at
all (PrintPinnedBufs) and the other one (InvalidBuffer) requires a
unlikely, but possible, set of circumstances to trigger a symptom.

To reduce the likelihood of such incidents a bit also convert existing
open coded mappings from buffer descriptors to buffer ids with
BufferDescriptorGetBuffer().

Author: Qingqing Zhou
Reported-By: Qingqing Zhou
Discussion: CAJjS0u2ai9ooUisKtkV8cuVUtEkMTsbK8c7juNAjv8K11zeCQg@mail.gmail.com
Backpatch: 9.5 where the private ref count infrastructure was introduced
2015-08-12 17:35:50 +02:00
Robert Haas
846f8c9483 Fix attach-related race condition in shm_mq_send_bytes.
Spotted by Antonin Houska.
2015-08-07 10:04:07 -04:00
Robert Haas
df0a67f754 Fix incorrect calculation in shm_mq_receive.
If some, but not all, of the length word has already been read, and the
next attempt to read sees exactly the number of bytes needed to complete
the length word, or fewer, then we'll incorrectly read less than all of
the available data.

Antonin Houska
2015-08-06 13:25:45 -04:00
Robert Haas
0e141c0fbb Reduce ProcArrayLock contention by removing backends in batches.
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results.  When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn.  To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group.  Benchmarking reveals that this is much
more efficient at very high client counts.

Amit Kapila, heavily revised by me, with some review also from Pavan
Deolasee.
2015-08-06 12:02:12 -04:00
Andres Freund
a4b09af3e9 Micro optimize LWLockAttemptLock() a bit.
LWLockAttemptLock pointlessly read the lock's state in every loop
iteration, even though pg_atomic_compare_exchange_u32() returns the old
value. Instead do that only once before the loop iteration.

Additionally there's no need to have the expected_state variable,
old_state mostly had the same value anyway.

Noticed-By: Heikki Linnakangas
Backpatch: 9.5, no reason to let the branches diverge at this point
2015-08-02 18:41:23 +02:00
Andres Freund
7039760114 Fix issues around the "variable" support in the lwlock infrastructure.
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.

To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.

Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
2015-08-02 18:41:23 +02:00
Andres Freund
3bc9356ddd Remove outdated comment in LWLockDequeueSelf's header.
Noticed-By: Robert Haas
Backpatch: 9.5, where the function was added
2015-07-29 10:13:10 +02:00
Tom Lane
d8f15c95be Reduce chatter from signaling of autovacuum workers.
Don't print a WARNING if we get ESRCH from a kill() that's attempting
to cancel an autovacuum worker.  It's possible (and has been seen in the
buildfarm) that the worker is already gone by the time we are able to
execute the kill, in which case the failure is harmless.  About the only
plausible reason for reporting such cases would be to help debug corrupted
lock table contents, but this is hardly likely to be the most important
symptom if that happens.  Moreover issuing a WARNING might scare users
more than is warranted.

Also, since sending a signal to an autovacuum worker is now entirely a
routine thing, and the worker will log the query cancel on its end anyway,
reduce the message saying we're doing that from LOG to DEBUG1 level.

Very minor cosmetic cleanup as well.

Since the main practical reason for doing this is to avoid unnecessary
buildfarm failures, back-patch to all active branches.
2015-07-28 17:34:23 -04:00
Robert Haas
6f2871f12e Centralize decision-making about where to get a backend's PGPROC.
This code was originally written as part of parallel query effort, but
it seems to have independent value, because if we make one decision
about where to get a PGPROC when we allocate and then put it back on a
different list at backend-exit time, bad things happen.  This isn't
just a theoretical risk; we fixed an actual problem of this type in
commit e280c630a8.
2015-07-28 14:51:57 -04:00
Heikki Linnakangas
4b8e24b9ad Fix a couple of bugs with wal_log_hints.
1. Replay of the WAL record for setting a bit in the visibility map
contained an assertion that a full-page image of that record type can only
occur with checksums enabled. But it can also happen with wal_log_hints, so
remove the assertion. Unlike checksums, wal_log_hints can be changed on the
fly, so it would be complicated to figure out if it was enabled at the time
that the WAL record was generated.

2. wal_log_hints has the same effect on the locking needed to read the LSN
of a page as data checksums. BufferGetLSNAtomic() didn't get the memo.

Backpatch to 9.4, where wal_log_hints was added.
2015-06-26 12:38:24 +03:00
Tom Lane
57e1138bcc Remove special cases for ETXTBSY from new fsync'ing logic.
The argument that this is a sufficiently-expected case to be silently
ignored seems pretty thin.  Andres had brought it up back when we were
still considering that most fsync failures should be hard errors, and it
probably would be legit not to fail hard for ETXTBSY --- but the same is
true for EROFS and other cases, which is why we gave up on hard failures.
ETXTBSY is surely not a normal case, so logging the failure seems fine
from here.
2015-05-29 15:11:36 -04:00
Tom Lane
d8179b001a Fix fsync-at-startup code to not treat errors as fatal.
Commit 2ce439f337 introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.

After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/.  The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.

This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.

I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().

The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.

Back-patch to all active branches, like the prior commit.

Abhijit Menon-Sen and Tom Lane
2015-05-28 17:33:03 -04:00
Tom Lane
32f628be74 Fix assorted inconsistencies in our calls of readlink().
Ensure that we null-terminate the result string (one place in pg_rewind).
Be paranoid about out-of-range results from readlink() (should not happen,
but there is no good reason for some call sites to be careful about it and
others not).  Consistently use the whole buffer, not sometimes one byte
less.  Ensure we emit an appropriate errcode() in all cases.  Spell the
error messages the same way.

The only serious bug here is the missing null-termination in pg_rewind,
which is new code, so no need for a back-patch.

Abhijit Menon-Sen and Tom Lane
2015-05-28 12:17:22 -04:00
Bruce Momjian
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Tom Lane
d4b538ea36 Improve packing/alignment annotation for ItemPointerData.
We want this struct to be exactly a series of 3 int16 words, no more
and no less.  Historically, at least, some ARM compilers preferred to
pad it to 8 bytes unless coerced.  Our old way of doing that was just
to use __attribute__((packed)), but as pointed out by Piotr Stefaniak,
that does too much: it also licenses the compiler to give the struct
only byte-alignment.  We don't want that because it adds access overhead,
possibly quite significant overhead.  According to the GCC manual, what
we want requires also specifying __attribute__((align(2))).  It's not
entirely clear if all the relevant compilers accept this pragma as well,
but we can hope the buildfarm will tell us if not.  We can also add a
static assertion that should fire if the compiler padded the struct.

Since the combination of these pragmas should define exactly what we
want on any compiler that accepts them, let's try using them wherever
we think they exist, not only for __arm__.  (This is likely to expose
that the conditional definitions in c.h are inadequate, but finding
that out would be a good thing.)

The immediate motivation for this is that the current definition of
ExecRowMark allows its curCtid field to be misaligned.  It is not clear
whether there are any other uses of ItemPointerData with a similar hazard.
We could change the definition of ExecRowMark if this doesn't work, but
it would be far better to have a future-proof fix.

Piotr Stefaniak, some further hacking by me
2015-05-21 17:21:46 -04:00
Heikki Linnakangas
fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Heikki Linnakangas
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Heikki Linnakangas
b48437d11b Fix off-by-one error in Assertion.
The point of the assertion is to ensure that the arrays allocated in stack
are large enough, but the check was one item short.

This won't matter in practice because MaxIndexTuplesPerPage is an
overestimate, so you can't have that many items on a page in reality.
But let's be tidy.

Spotted by Anastasia Lubennikova. Backpatch to all supported versions, like
the patch that added the assertion.
2015-05-19 19:25:01 +03:00