1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-25 01:02:05 +03:00
Commit Graph

15930 Commits

Author SHA1 Message Date
17d5db352c Remove warning about num_sync being too large in synchronous_standby_names.
If we're not going to reject such setups entirely, throwing a WARNING in
check_synchronous_standby_names() is unhelpful, because it will cause the
warning to be logged again every time the postmaster receives SIGHUP.
Per discussion, just remove the warning.

In passing, improve the documentation for synchronous_commit, which had not
gotten the word that now there can be more than one synchronous standby.
2016-04-30 10:54:45 -04:00
207d5a656e Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like

	Nested Loop
	  ->  Scan X
	  ->  Nested Loop
	    ->  Scan Y
	    ->  Scan Z
	          Filter: Z.Z = X.X

The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node.  On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members.  So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there.  If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.

This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape.  (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.)  Many
thanks to Alexander Kirkouski for submitting a reproducible test case.

The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-29 20:19:38 -04:00
7c3e8039f4 Add a few entries to the tail of time mapping, to see old values.
Without a few entries beyond old_snapshot_threshold, the lookup
would often fail, resulting in the more aggressive pruning or
vacuum being skipped often enough to matter.  This was very clearly
shown by a python test script posted by Ants Aasma, and was likely
a factor in an earlier but somewhat less clear-cut test case posted
by Jeff Janes.

This patch makes no change to the logic, per se -- it just makes
the array of mapping entries big enough to make lookup misses based
on timing much less likely.  An occasional miss is still possible
if a thread stalls for more than 10 minutes, but that does not
create any problem with correctness of behavior.  Besides, if
things are so busy that a thread is stalling for more than 10
minutes, it is probably OK to skip the more aggressive cleanup at
that particular point in time.
2016-04-29 16:46:08 -05:00
0fb54de9aa Support building with Visual Studio 2015
Adjust the way we detect the locale. As a result the minumum Windows
version supported by VS2015 and later is Windows Vista. Add some tweaks
to remove new compiler warnings. Remove documentation references to the
now obsolete msysGit.

Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich.

Backpatch to 9.5
2016-04-29 08:09:07 -04:00
59455018a8 Remember asking for feedback during walsender shutdown.
Since 5a991ef8 we're explicitly asking for feedback from the receiving
side when shutting down walsender, if there's not yet replicated
data.

Unfortunately we didn't remember (i.e. set waiting_for_ping_response to
true) having asked for feedback, leading to scenarios in which replies
were requested at a high frequency.

I can't reproduce this problem on my laptop, I think that's because the
problem requires a significant TCP window to manifest due to the
!pq_is_send_pending() condition. But since this clearly is a bug, let's
fix it.  There's quite possibly more wrong than just this though.

While fiddling with WalSndDone(), I rewrote a hard to understand comment
about looking at the flush vs. the write position.

Reported-By: Nick Cleaton, Magnus Hagander
Author: Nick Cleaton
Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
    CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com
Backpatch: 9.4, were 5a991ef8 introduced the use of feedback messages
    during shutdown.
2016-04-28 22:11:18 -07:00
f8467f7da8 Prevent to use magic constants
Use macroses for definition amstrategies/amsupport fields instead of
hardcoded values.

Author: Nikolay Shaplov with addition for contrib/bloom
2016-04-28 16:39:25 +03:00
e2c79e14d9 Prevent multiple cleanup process for pending list in GIN.
Previously, ginInsertCleanup could exit early if it detects that someone else
is cleaning up the pending list, without waiting for that someone else to
finish the job. But in this case vacuum could miss tuples to be deleted.

Cleanup process now locks metapage with a help of heavyweight
LockPage(ExclusiveLock), and it guarantees that there is no another cleanup
process at the same time. Lock is taken differently depending on caller of
cleanup process: any vacuums and gin_clean_pending_list() will be blocked
until lock becomes available, ordinary insert uses conditional lock to
prevent indefinite waiting on lock.

Insert into pending list doesn't use this lock, so insertion isn't blocked.

Also, patch adds stopping of cleanup process when at-start-cleanup-tail is
reached in order to prevent infinite cleanup in case of massive insertion. But
it will stop only for automatic maintenance tasks like autovacuum.

Patch introduces choice of limit of memory to use: autovacuum_work_mem,
maintenance_work_mem or work_mem depending on call path.

Patch for previous releases should be reworked due to changes between 9.6 and
previous ones in this area.

Discover and diagnostics by Jeff Janes and Tomas Vondra

Patch by me with some ideas of Jeff Janes
2016-04-28 16:21:42 +03:00
4c804fbdfb Clean up parsing of synchronous_standby_names GUC variable.
Commit 989be0810d added a flex/bison lexer/parser to interpret
synchronous_standby_names.  It was done in a pretty crufty way, though,
making assorted end-use sites responsible for calling the parser at the
right times.  That was not only vulnerable to errors of omission, but made
it possible that lexer/parser errors occur at very undesirable times,
and created memory leakages even if there was no error.

Instead, perform the parsing once during check_synchronous_standby_names
and let guc.c manage the resulting data.  To do that, we have to flatten
the parsed representation into a single hunk of malloc'd memory, but that
is not very hard.

While at it, work a little harder on making useful error reports for
parsing problems; the previous code felt that "synchronous_standby_names
parser returned 1" was an appropriate user-facing error message.  (To
be fair, it did also log a syntax error message, but separately from the
GUC problem report, which is at best confusing.)  It had some outright
bugs in the face of invalid input, too.

I (tgl) also concluded that we need to restrict unquoted names in
synchronous_standby_names to be just SQL identifiers.  The previous coding
would accept darn near anything, which (1) makes the quoting convention
both nearly-unnecessary and formally ambiguous, (2) makes it very hard to
understand what is a syntax error and what is a creative interpretation of
the input as a standby name, and (3) makes it impossible to further extend
the syntax in future without a compatibility break.  I presume that we're
intending future extensions of the syntax, else this parsing infrastructure
is massive overkill, so (3) is an important objection.  Since we've taken
a compatibility hit for non-identifier names with this change anyway, we
might as well lock things down now and insist that users use double quotes
for standby names that aren't identifiers.

Kyotaro Horiguchi and Tom Lane
2016-04-27 17:55:25 -04:00
372ff7cae2 Fix wrong word.
Commit a31212b429 was a little too hasty.

Per report from Tom Lane.
2016-04-27 14:23:56 -04:00
a31212b429 Change postgresql.conf.sample to say that fsync=off will corrupt data.
Discussion: 24748.1461764666@sss.pgh.pa.us

Per a suggestion from Craig Ringer.  This wording from Tom Lane,
following discussion.
2016-04-27 13:47:07 -04:00
cf402ba734 Tighten up sanity checks for parallel aggregate in execQual.c.
David Rowley
2016-04-27 12:05:35 -04:00
b33dc77665 Remove inadvertently commited vim swapfile.
If you were wondering what editor I use, now you know.
2016-04-27 11:53:01 -04:00
8126eaee2f Clean up a few parallelism-related things that pgindent wants to mangle.
In nodeFuncs.c, pgindent wants to introduce spurious indentation into
the definitions of planstate_tree_walker and planstate_walk_subplans.
Fix that by spreading the definition out across several lines, similar
to what is already done for other walker functions in that file.

In execParallel.c, in the definition of SharedExecutorInstrumentation,
pgindent wants to insert more whitespace between the type name and the
member name.  That causes it to mangle comments later on the line.  Fix
by moving the comments out of line.  Now that we have a bit more room,
add some more details that may be useful to the next person reading
this code.
2016-04-27 11:29:45 -04:00
360ca27a9b Remove mergeHyperLogLog.
It's buggy.  If somebody needs this later, they'll need to put back
a non-buggy vesion of it.

Discussion: CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com
Discussion: CAM3SWZRUOLsYoTT83QgdUy9D8ehYWm_nvbrrfcOOzikiRfFY7g@mail.gmail.com

Peter Geoghegan
2016-04-27 10:55:32 -04:00
59eb551279 Fix EXPLAIN VERBOSE output for parallel aggregate.
The way that PartialAggregate and FinalizeAggregate plan nodes were
displaying output columns before was bogus.  Now, FinalizeAggregate
produces the same outputs as an Aggregate would have produced, while
PartialAggregate produces each of those outputs prefixed by the word
PARTIAL.

Discussion: 12585.1460737650@sss.pgh.pa.us

Patch by me, reviewed by David Rowley.
2016-04-27 07:37:40 -04:00
72a98a6395 Don't open formally non-existent segments in _mdfd_getseg().
Before this commit _mdfd_getseg(), in contrast to mdnblocks(), did not
verify whether all segments leading up to the to-be-opened one, were
RELSEG_SIZE sized. That is e.g. not the case after truncating a
relation, because later segments just get truncated to zero length, not
removed.

Once a "non-existent" segment has been opened in a session, mdnblocks()
will return wrong results, causing errors like "could not read block %u
in file" when accessing blocks. Closing the session, or the later
arrival of relevant invalidation messages, would "fix" the problem.

That, so far, was mostly harmless, because most segment accesses are
only done after an mdnblocks() call. But since 428b1d6b29 we try to
open segments that might have been deleted, to trigger kernel writeback
from a backend's queue of recent writes.

To fix check segment sizes in _mdfd_getseg() when opening previously
unopened segments. In practice this shouldn't imply a lot of additional
lseek() calls, because mdnblocks() will most of the time already have
opened all relevant segments.

This commit also fixes a second problem, namely that _mdfd_getseg(
EXTENSION_RETURN_NULL) extends files during recovery, which is not
desirable for the mdwriteback() case.  Add EXTENSION_REALLY_RETURN_NULL,
which does not behave that way, and use it.

Reported-By: Thom Brown
Author: Andres Freund, Abhijit Menon-Sen
Reviewd-By: Robert Haas, Fabien Coehlo
Discussion: CAA-aLv6Dp_ZsV-44QA-2zgkqWKQq=GedBX2dRSrWpxqovXK=Pg@mail.gmail.com
Fixes: 428b1d6b29
2016-04-26 20:32:51 -07:00
c6ff84b06a Emit invalidations to standby for transactions without xid.
So far, when a transaction with pending invalidations, but without an
assigned xid, committed, we simply ignored those invalidation
messages. That's problematic, because those are actually sent for a
reason.

Known symptoms of this include that existing sessions on a hot-standby
replica sometimes fail to notice new concurrently built indexes and
visibility map updates.

The solution is to WAL log such invalidations in transactions without an
xid. We considered to alternatively force-assign an xid, but that'd be
problematic for vacuum, which might be run in systems with few xids.

Important: This adds a new WAL record, but as the patch has to be
back-patched, we can't bump the WAL page magic. This means that standbys
have to be updated before primaries; otherwise
"PANIC: standby_redo: unknown op code 32" errors can be encountered.

XXX:

Reported-By: Васильев Дмитрий, Masahiko Sawada
Discussion:
    CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com
    CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
2016-04-26 20:21:54 -07:00
2ac3be2e76 Fix pg_get_functiondef to dump parallel-safety markings.
Ashutosh Sharma
2016-04-26 22:56:27 -04:00
82311bcdd7 Yet more portability hacking for degree-based trig functions.
The true explanation for Peter Eisentraut's report of inexact asind results
seems to be that (a) he's compiling into x87 instruction set, which uses
wider-than-double float registers, plus (b) the library function asin() on
his platform returns a result that is wider than double and is not rounded
to double width.  To fix, we have to force the function's result to be
rounded comparably to what happened to the scaling constant asin_0_5.
Experimentation suggests that storing it into a volatile local variable is
the least ugly way of making that happen.  Although only asin() is known to
exhibit an observable inexact result, we'd better do this in all the places
where we're hoping to get an exact result by scaling.
2016-04-26 11:24:15 -04:00
77cd477c4b Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it.  Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de
2016-04-26 08:35:58 -04:00
b7351ced42 Fix typo in comment
Author: Daniel Gustafsson
2016-04-26 10:38:32 +02:00
e65953be4f Fix C comment typo and redundant test 2016-04-25 15:42:29 -05:00
6b1a213bbd New method for preventing compile-time calculation of degree constants.
Commit 65abaab547 tried to prevent the scaling constants used in
the degree-based trig functions from being precomputed at compile time,
because some compilers do that with functions that don't yield results
identical-to-the-last-bit to what you get at runtime.  A report from
Peter Eisentraut suggests that some recent compilers are smart enough
to see through that trick, though.  Instead, let's put the inputs to
these calculations into non-const global variables, which should be a
more reliable way of convincing the compiler that it can't assume that
they are compile-time constants.  (If we really get desperate, we could
mark these variables "volatile", but I do not believe we should have to.)
2016-04-25 15:21:04 -04:00
8f91d87d43 Fix documentation & config inconsistencies around 428b1d6b2.
Several issues:
1) checkpoint_flush_after doc and code disagreed about the default
2) new GUCs were missing from postgresql.conf.sample
3) Outdated source-code comment about bgwriter_flush_after's default
4) Sub-optimal categories assigned to new GUCs
5) Docs suggested backend_flush_after is PGC_SIGHUP, but it's PGC_USERSET.
6) Spell out int as integer in the docs, as done elsewhere

Reported-By: Magnus Hagander, Fujii Masao
Discussion: CAHGQGwETyTG5VYQQ5C_srwxWX7RXvFcD3dKROhvAWWhoSBdmZw@mail.gmail.com
2016-04-24 12:26:55 -07:00
0ab3595e5b Rename strtoi() to strtoint().
NetBSD has seen fit to invent a libc function named strtoi(), which
conflicts with the long-established static functions of the same name in
datetime.c and ecpg's interval.c.  While muttering darkly about intrusions
on application namespace, we'll rename our functions to avoid the conflict.

Back-patch to all supported branches, since this would affect attempts
to build any of them on recent NetBSD.

Thomas Munro
2016-04-23 16:53:15 -04:00
915cee4595 Properly mark initRectBox() as taking 'void' args
Was part of box type in SP-GiST index patch.

Reported-by: Emre Hasegeli
2016-04-23 10:41:11 -04:00
abb164655c Fix unexpected side-effects of operator_precedence_warning.
The implementation of that feature involves injecting nodes into the
raw parsetree where explicit parentheses appear.  Various places in
parse_expr.c that test to see "is this child node of type Foo" need to
look through such nodes, else we'll get different behavior when
operator_precedence_warning is on than when it is off.  Note that we only
need to handle this when testing untransformed child nodes, since the
AEXPR_PAREN nodes will be gone anyway after transformExprRecurse.

Per report from Scott Ribe and additional code-reading.  Back-patch
to 9.5 where this feature was added.

Report: <ED37E303-1B0A-4CD8-8E1E-B9C4C2DD9A17@elevated-dev.com>
2016-04-21 23:17:36 -04:00
80f66a9ad0 Fix planner failure with full join in RHS of left join.
Given a left join containing a full join in its righthand side, with
the left join's joinclause referencing only one side of the full join
(in a non-strict fashion, so that the full join doesn't get simplified),
the planner could fail with "failed to build any N-way joins" or related
errors.  This happened because the full join was seen as overlapping the
left join's RHS, and then recent changes within join_is_legal() caused
that function to conclude that the full join couldn't validly be formed.
Rather than try to rejigger join_is_legal() yet more to allow this,
I think it's better to fix initsplan.c so that the required join order
is explicit in the SpecialJoinInfo data structure.  The previous coding
there essentially ignored full joins, relying on the fact that we don't
flatten them in the joinlist data structure to preserve their ordering.
That's sufficient to prevent a wrong plan from being formed, but as this
example shows, it's not sufficient to ensure that the right plan will
be formed.  We need to work a bit harder to ensure that the right plan
looks sane according to the SpecialJoinInfos.

Per bug #14105 from Vojtech Rylko.  This was apparently induced by
commit 8703059c6 (though now that I've seen it, I wonder whether there
are related cases that could have failed before that); so back-patch
to all active branches.  Unfortunately, that patch also went into 9.0,
so this bug is a regression that won't be fixed in that branch.
2016-04-21 20:05:58 -04:00
125ad539a2 Improve TranslateSocketError() to handle more Windows error codes.
The coverage was rather lean for cases that bind() or listen() might
return.  Add entries for everything that there's a direct equivalent
for in the set of Unix errnos that elog.c has heard of.
2016-04-21 16:58:47 -04:00
1f7c85b820 Fix ruleutils.c's dumping of ScalarArrayOpExpr containing an EXPR_SUBLINK.
When we shoehorned "x op ANY (array)" into the SQL syntax, we created a
fundamental ambiguity as to the proper treatment of a sub-SELECT on the
righthand side: perhaps what's meant is to compare x against each row of
the sub-SELECT's result, or perhaps the sub-SELECT is meant as a scalar
sub-SELECT that delivers a single array value whose members should be
compared against x.  The grammar resolves it as the former case whenever
the RHS is a select_with_parens, making the latter case hard to reach ---
but you can get at it, with tricks such as attaching a no-op cast to the
sub-SELECT.  Parse analysis would throw away the no-op cast, leaving a
parsetree with an EXPR_SUBLINK SubLink directly under a ScalarArrayOpExpr.
ruleutils.c was not clued in on this fine point, and would naively emit
"x op ANY ((SELECT ...))", which would be parsed as the first alternative,
typically leading to errors like "operator does not exist: text = text[]"
during dump/reload of a view or rule containing such a construct.  To fix,
emit a no-op cast when dumping such a parsetree.  This might well be
exactly what the user wrote to get the construct accepted in the first
place; and even if she got there with some other dodge, it is a valid
representation of the parsetree.

Per report from Karl Czajkowski.  He mentioned only a case involving
RLS policies, but actually the problem is very old, so back-patch to
all supported branches.

Report: <20160421001832.GB7976@moraine.isi.edu>
2016-04-21 14:20:30 -04:00
c4a586c486 Prevent possible crash reading pg_stat_activity.
Also, avoid reading PGPROC's wait_event field twice, once for the wait
event and again for the wait_event_type, because the value might change
in the middle.

Petr Jelinek and Robert Haas
2016-04-21 14:02:15 -04:00
36f69faeff Comment improvements for ForeignPath.
It's not necessarily just scanning a base relation any more.

Amit Langote and Etsuro Fujita
2016-04-21 13:30:48 -04:00
9f84280ae9 Fix assorted defects in 09adc9a8c0.
That commit increased all shared memory allocations to the next higher
multiple of PG_CACHE_LINE_SIZE, but it didn't ensure that allocation
started on a cache line boundary.  It also failed to remove a couple
other pieces of now-useless code.

BUFFERALIGN() is perhaps obsolete at this point, and likely should be
removed at some point, too, but that seems like it can be left to a
future cleanup.

Mistakes all pointed out by Andres Freund.  The patch is mine, with
a few extra assertions which I adopted from his version of this fix.
2016-04-21 13:27:41 -04:00
11e178d0dc Inline initial comparisons in TestForOldSnapshot()
Even with old_snapshot_threshold = -1 (which disables the "snapshot
too old" feature), performance regressions were seen at moderate to
high concurrency.  For example, a one-socket, four-core system
running 200 connections at saturation could see up to a 2.3%
regression, with larger regressions possible on NUMA machines.
By inlining the early (smaller, faster) tests in the
TestForOldSnapshot() function, the i7 case dropped to a 0.2%
regression, which could easily just be noise, and is clearly an
improvement.  Further testing will show whether more is needed.
2016-04-21 08:40:08 -05:00
9c75e1a36b Forbid parallel Hash Right Join or Hash Full Join.
That won't work.  You'll get bogus null-extended rows.

Mithun Cy
2016-04-20 17:48:55 -04:00
bde361fef5 Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550a turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not.  Subsequent patches
band-aided over some of the problems with this design by making things
even messier.

One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud).  This would not typically
be noticeable during retail index updates.  It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.

Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state.  There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.

To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts.  The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page.  The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path.  The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage().  Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)

In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.

Report: <571276DD.5050303@dalibo.com>
2016-04-20 14:25:15 -04:00
a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
a0382e2d7e Make partition-lock-release coding more transparent in BufferAlloc().
Coverity complained that oldPartitionLock was possibly dereferenced after
having been set to NULL.  That actually can't happen, because we'd only use
it if (oldFlags & BM_TAG_VALID) is true.  But nonetheless Coverity is
justified in complaining, because at line 1275 we actually overwrite
oldFlags, and then still expect its BM_TAG_VALID bit to be a safe guide to
whether to release the oldPartitionLock.  Thus, the code would be incorrect
if someone else had changed the buffer's BM_TAG_VALID flag meanwhile.
That should not happen, since we hold pin on the buffer throughout this
sequence, but it's starting to look like a rather shaky chain of logic.
And there's no need for such assumptions, because we can simply replace
the (oldFlags & BM_TAG_VALID) tests with (oldPartitionLock != NULL),
which has identical results and makes it plain to all comers that we don't
dereference a null pointer.  A small side benefit is that the range of
liveness of oldFlags is greatly reduced, possibly allowing the compiler
to save a register.

This is just cleanup, not an actual bug fix, so there seems no need
for a back-patch.
2016-04-18 18:05:56 -04:00
4039c736eb Adjust spin.c's spinlock emulation so that 0 is not a valid spinlock value.
We've had repeated troubles over the years with failures to initialize
spinlocks correctly; see 6b93fcd14 for a recent example.  Most of the time,
on most platforms, such oversights can escape notice because all-zeroes is
the expected initial content of an slock_t variable.  The only platform
we have where the initialized state of an slock_t isn't zeroes is HPPA,
and that's practically gone in the wild.  To make it easier to catch such
errors without needing one of those, adjust the --disable-spinlocks code
so that zero is not a valid value for an slock_t for it.

In passing, remove a bunch of unnecessary #include's from spin.c;
commit daa7527afc removed all the intermodule coupling that
made them necessary.
2016-04-16 19:53:38 -04:00
c34df8a003 Disallow creation of indexes on system columns (except for OID).
Although OID acts pretty much like user data, the other system columns do
not, so an index on one would likely misbehave.  And it's pretty hard to
see a use-case for one, anyway.  Let's just forbid the case rather than
worry about whether it should be supported.

David Rowley
2016-04-16 12:11:41 -04:00
99f2f3c19a In recordExtensionInitPriv(), keep the scan til we're done with it
For reasons of sheer brain fade, we (I) was calling systable_endscan()
immediately after systable_getnext() and expecting the tuple returned
by systable_getnext() to still be valid.

That's clearly wrong.  Move the systable_endscan() down below the tuple
usage.

Discovered initially by Pavel Stehule and then also by Alvaro.

Add a regression test based on Alvaro's testing.
2016-04-15 21:57:15 -04:00
8f1911d5e6 Fix possible crash in ALTER TABLE ... REPLICA IDENTITY USING INDEX.
Careless coding added by commit 07cacba983 could result in a crash
or a bizarre error message if someone tried to select an index on the
OID column as the replica identity index for a table.  Back-patch to 9.4
where the feature was introduced.

Discussion: CAKJS1f8TQYgTRDyF1_u9PVCKWRWz+DkieH=U7954HeHVPJKaKg@mail.gmail.com

David Rowley
2016-04-15 12:11:40 -04:00
5702277ca9 Tweak EXPLAIN for parallel query to show workers launched.
The previous display was sort of confusing, because it didn't
distinguish between the number of workers that we planned to launch
and the number that actually got launched.  This has already confused
several people, so display both numbers and label them clearly.

Julien Rouhaud, reviewed by me.
2016-04-15 11:52:18 -04:00
6b85d4ba9b Fix portability problem induced by commit a6f6b7819.
pg_xlogdump includes bufmgr.h.  With a compiler that emits code for
static inline functions even when they're unreferenced, that leads
to unresolved external references in the new static-inline version
of BufferGetPage().  So hide it with #ifndef FRONTEND, as we've done
for similar issues elsewhere.  Per buildfarm member pademelon.
2016-04-15 10:44:28 -04:00
ba8fe38f58 Fix typo in comment 2016-04-15 13:32:54 +02:00
f0e766bd7f Fix memory leak in GIN index scans.
The code had a query-lifespan memory leak when encountering GIN entries
that have posting lists (rather than posting trees, ie, there are a
relatively small number of heap tuples containing this index key value).
With a suitable data distribution this could add up to a lot of leakage.
Problem seems to have been introduced by commit 36a35c550, so back-patch
to 9.4.

Julien Rouhaud
2016-04-15 00:02:26 -04:00
4b74c6a40e Make init_spin_delay() C89 compliant #2.
My previous attempt at doing so, in 80abbeba23, was not sufficient. While that
fixed the problem for bufmgr.c and lwlock.c , s_lock.c still has non-constant
expressions in the struct initializer, because the file/line/function
information comes from the caller of s_lock().

Give up on using a macro, and use a static inline instead.

Discussion: 4369.1460435533@sss.pgh.pa.us
2016-04-14 19:26:13 -07:00
533cd2303a Remove trailing commas in enums.
These aren't valid C89. Found thanks to gcc's -Wc90-c99-compat. These
exist in differing places in most supported branches.
2016-04-14 19:25:16 -07:00
7b16781228 Fix trivial typo. 2016-04-14 19:25:16 -07:00
6a3d3965d6 Fix core dump in ReorderBufferRestoreChange on alignment-picky platforms.
When re-reading an update involving both an old tuple and a new tuple from
disk, reorderbuffer.c was careless about whether the new tuple is suitably
aligned for direct access --- in general, it isn't.  We'd missed seeing
this in the buildfarm because the contrib/test_decoding tests exercise this
code path only a few times, and by chance all of those cases have old
tuples with length a multiple of 4, which is usually enough to make the
access to the new tuple's t_len safe.  For some still-not-entirely-clear
reason, however, Debian's sparc build gets a bus error, as reported by
Christoph Berg; perhaps it's assuming 8-byte alignment of the pointer?

The lack of previous field reports is probably because you need all of
these conditions to trigger a crash: an alignment-picky platform (not
Intel), a transaction large enough to spill to disk, an update within
that xact that changes a primary-key field and has an odd-length old tuple,
and of course logical decoding tracing the transaction.

Avoid the alignment assumption by using memcpy instead of fetching t_len
directly, and add a test case that exposes the crash on picky platforms.
Back-patch to 9.4 where the bug was introduced.

Discussion: <20160413094117.GC21485@msg.credativ.de>
2016-04-14 19:42:21 -04:00