This is epecially useful in the case or "VARIADIC ANY" functions. The
caller can get the artguments and types regardless of whether or not and
explicit VARIADIC array argument has been used. The function also
provides an option to convert arguments on type "unknown" to to "text".
Michael Paquier and me, reviewed by Tom Lane.
Backpatch to 9.4 in order to support the following json bug fix.
Although joinaliasvars lists coming out of the parser are quite simple,
those lists can contain arbitrarily complex expressions after subquery
pullup. We do not perform expression preprocessing on them, meaning that
expressions in those lists will not meet the expectations of later phases
of the planner (for example, that they do not contain SubLinks). This had
been thought pretty harmless, since we don't intentionally touch those
lists in later phases --- but Andreas Seltenreich found a case in which
adjust_appendrel_attrs() could recurse into a joinaliasvars list and then
die on its assertion that it never sees a SubLink. We considered a couple
of localized fixes to prevent that specific case from looking at the
joinaliasvars lists, but really this seems like a generic hazard for all
expression processing in the planner. Therefore, probably the best answer
is to delete the joinaliasvars lists from the parsetree at the end of
expression preprocessing, so that there are no reachable expressions that
haven't been through preprocessing.
The case Andreas found seems to be harmless in non-Assert builds, and so
far there are no field reports suggesting that there are user-visible
effects in other cases. I considered back-patching this anyway, but
it turns out that Andreas' test doesn't fail at all in 9.4-9.6, because
in those versions adjust_appendrel_attrs contains code (added in commit
842faa714 and removed again in commit 215b43cdc) to process SubLinks
rather than complain about them. Barring discovery of another path by
which unprocessed joinaliasvars lists can cause trouble, the most
prudent compromise seems to be to patch this into v10 but not further.
Patch by me, with thanks to Amit Langote for initial investigation
and review.
Discussion: https://postgr.es/m/87r2tvt9f1.fsf@ansel.ydns.eu
DST law changes in Fiji, Namibia, Northern Cyprus, Sudan, Tonga,
and Turks & Caicos Islands. Historical corrections for Alaska, Apia,
Burma, Calcutta, Detroit, Ireland, Namibia, and Pago Pago.
This is a trivial update containing only cosmetic changes. The point
is just to get back to being synced with an official release of tzcode,
rather than some ad-hoc point in their commit history, which is where
commit 47f849a3c left it.
find_expr_references() neglected to record a dependency on the result type
of a FieldSelect node, allowing a DROP TYPE to break a view or rule that
contains such an expression. I think we'd omitted this case intentionally,
reasoning that there would always be a related dependency ensuring that the
DROP would cascade to the view. But at least with nested field selection
expressions, that's not true, as shown in bug #14867 from Mansur Galiev.
Add the dependency, and for good measure a dependency on the node's exposed
collation.
Likewise add a dependency on the result type of a FieldStore. I think here
the reasoning was that it'd only appear within an assignment to a field,
and the dependency on the field's column would be enough ... but having
seen this example, I think that's wrong for nested-composites cases.
Looking at nearby code, I notice we're not recording a dependency on the
exposed collation of CoerceViaIO, which seems inconsistent with our choices
for related node types. Maybe that's OK but I'm feeling suspicious of this
code today, so let's add that; it certainly can't hurt.
This patch does not do anything to protect already-existing views, only
views created after it's installed. But seeing that the issue has been
there a very long time and nobody noticed till now, that's probably good
enough.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/20171023150118.1477.19174@wrigleys.postgresql.org
It seems that the parray_gin extension has seen fit to introduce a
"text[] @> text[]" operator, which conflicts with the core
"anyarray @> anyarray" operator, causing ambiguous-operator failures
if the input arguments are coercible to text[] without being exactly
that type. This strikes me as a bad idea, but it's out there and
people use it. As of v10, that breaks psql's query that tries to
test "pg_statistic_ext.stxkind @> '{d}'", since stxkind is char[].
The best workaround seems to be to avoid use of that operator.
We can use a scalar-vs-array test "'d' = any(stxkind)" instead;
that's arguably more readable anyway.
Per report from Justin Pryzby. Backpatch to v10 where this
query was added.
Discussion: https://postgr.es/m/20171022181525.GA21884@telsasoft.com
Like the similar logic for arrays and records, it's necessary to examine
the range's subtype to decide whether the range type can support hashing.
We can omit checking the subtype for btree-defined operations, though,
since range subtypes are required to have those operations. (Possibly
that simplification for btree cases led us to overlook that it does
not apply for hash cases.)
This is only an issue if the subtype lacks hash support, which is not
true of any built-in range type, but it's easy to demonstrate a problem
with a range type over, eg, money: you can get a "could not identify
a hash function" failure when the planner is misled into thinking that
hash join or aggregation would work.
This was born broken, so back-patch to all supported branches.
setTargetTable threw an error if the proposed target RangeVar's relname
matched any visible CTE or ENR. This breaks backwards compatibility in
the CTE case, since pre-v10 we never looked for a CTE here at all, so that
CTE names did not mask regular tables. It does seem like a good idea to
throw an error for the ENR case, though, thus causing ENRs to mask tables
for this purpose; ENRs are new in v10 so we're not breaking existing code,
and we may someday want to allow them to be the targets of DML.
To fix that, replace use of getRTEForSpecialRelationTypes, which was
overkill anyway, with use of scanNameSpaceForENR.
A second problem was that the check neglected to verify null schemaname,
so that a CTE or ENR could incorrectly be thought to match a qualified
RangeVar. That happened because getRTEForSpecialRelationTypes relied
on its caller to have checked for null schemaname. Even though the one
remaining caller got it right, this is obviously bug-prone, so move
the check inside getRTEForSpecialRelationTypes.
Also, revert commit 18ce3a4ab's extremely poorly thought out decision to
add a NULL return case to parserOpenTable --- without either documenting
that or adjusting any of the callers to check for it. The current bug
seems to have arisen in part due to working around that bad idea.
In passing, remove the one-line shim functions transformCTEReference and
transformENRReference --- they don't seem to be adding any clarity or
functionality.
Per report from Hugo Mercier (via Julien Rouhaud). Back-patch to v10
where the bug was introduced.
Thomas Munro, with minor editing by me
Discussion: https://postgr.es/m/CAOBaU_YdPVH+PTtiKSSLOiiW3mVDYsnNUekK+XPbHXiP=wrFLA@mail.gmail.com
An aggregate's input expression(s) are not supposed to be evaluated
at all for a row where its FILTER test fails ... but commit 8ed3f11bb
overlooked that requirement. Reshuffle so that aggregates having a
filter clause evaluate their arguments separately from those without.
This still gets the benefit of doing only one ExecProject in the
common case of multiple Aggrefs, none of which have filters.
While at it, arrange for filter clauses to be included in the common
ExecProject evaluation, thus perhaps buying a little bit even when
there are filters.
Back-patch to v10 where the bug was introduced.
Discussion: https://postgr.es/m/30065.1508161354@sss.pgh.pa.us
While poking around in the aggregate logic, I noticed that commit
8ed3f11bb broke the logic in nodeAgg.c that purports to detect nested
aggregates, by moving initialization of regular aggregate argument
expressions out of the code segment that checks for that.
You could argue that this check is unnecessary, but it's not much code
so I'm inclined to keep it as a backstop against parser and planner
bugs. However, there's certainly zero value in checking only some of
the subexpressions.
We can make the check complete again, and as a bonus make it a good
deal more bulletproof against future mistakes of the same ilk, by
moving it out to the outermost level of ExecInitAgg. This means we
need to check only once per Agg node not once per aggregate, which
also seems like a good thing --- if the check does find something
wrong, it's not urgent that we report it before the plan node
initialization finishes.
Since this requires remembering the original length of the aggs list,
I deleted a long-obsolete stanza that changed numaggs from 0 to 1.
That's so old it predates our decision that palloc(0) is a valid
operation, in (digs...) 2004, see commit 24a1e20f1.
In passing improve a few comments.
Back-patch to v10, just in case.
If a Parallel Bitmap Heap scan's chain of leftmost descendents
includes a BitmapOr whose first child is a BitmapAnd, the prior coding
would mistakenly create a non-shared TIDBitmap and then try to perform
shared iteration.
Report by Tomas Vondra. Patch by Dilip Kumar.
Discussion: http://postgr.es/m/50e89684-8ad9-dead-8767-c9545bafd3b6@2ndquadrant.com
If we merge the transition calculations for two different aggregates,
it's reasonable to assume that the transition function should not care
which of those Aggref structs it gets from AggGetAggref(). It is not
reasonable to make the same assumption about an aggregate final function,
however. Commit 804163bc2 broke this, as it will pass whichever Aggref
was first associated with the transition state in both cases.
This doesn't create an observable bug so far as the core system is
concerned, because the only existing uses of AggGetAggref() are in
ordered-set aggregates that happen to not pay attention to anything
but the input properties of the Aggref; and besides that, we disabled
sharing of transition calculations for OSAs yesterday. Nonetheless,
if some third-party code were using AggGetAggref() in a normal aggregate,
they would be entitled to call this a bug. Hence, back-patch the fix
to 9.6 where the problem was introduced.
In passing, improve some of the comments about transition state sharing.
Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
This ought to work, but the built-in OSAs are not capable of coping,
because their final-functions destructively modify their transition
state (specifically, the contained tuplesort object). That was fine
when those functions were written, but commit 804163bc2 moved the
goalposts without telling orderedsetaggs.c.
We should fix the built-in OSAs to support this, but it will take
a little work, especially if we don't want to sacrifice performance
in the normal non-shared-state case. Given that it took a year after
9.6 release for anyone to notice this bug, we should not prioritize
sharable-state over nonsharable-state performance. And a proper fix
is likely to be more complicated than we'd want to back-patch, too.
Therefore, let's just put in this stop-gap patch to prevent nodeAgg.c
from choosing to use shared state for OSAs. We can revert it in HEAD
when we get a better fix.
Report from Lukas Eder, diagnosis by me, patch by David Rowley.
Back-patch to 9.6 where the problem was introduced.
Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
The previous coding in ProcessInterrupts() could lead to
idle_in_transaction_session_timeout being ignored, when
statement_timeout occurred earlier.
The problem was that ProcessInterrupts() would return before
processing the transaction timeout if QueryCancelPending was set while
QueryCancelHoldoffCount != 0 - which is the case when reading new
commands from the client. Ergo when the idle transaction timeout would
hit.
Fix that by removing the early return. Alternatively the transaction
timeout code could have been moved up, but that early return seems
like an issue that could hit other cases too.
Author: Lukas Fittl
Bug: #14821
Discussion:
https://www.postgresql.org/message-id/20170921010956.17345.61461%40wrigleys.postgresql.orghttps://www.postgresql.org/message-id/CAP53PkxQnv3OWJpyNPGJYT62uY=n1=2CF_Lpc6gVOFnc0-gazw@mail.gmail.com
Backpatch: 9.6-, where idle_in_transaction_session_timeout was introduced.
The GRANT reference page, which lists the default privileges for new
objects, failed to mention that USAGE is granted by default for data
types and domains. As a lesser sin, it also did not specify anything
about the initial privileges for sequences, FDWs, foreign servers,
or large objects. Fix that, and add a comment to acldefault() in the
probably vain hope of getting people to maintain this list in future.
Noted by Laurenz Albe, though I editorialized on the wording a bit.
Back-patch to all supported branches, since they all have this behavior.
Discussion: https://postgr.es/m/1507620895.4152.1.camel@cybertec.at
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
I noticed the tmp_check subdirectory wasn't getting cleaned up
after a check-world run. Apparently pgxs.mk will only do this
for you if you've defined REGRESS. The only other src/test/modules
Makefile that does not set that is snapshot_too_old, and it
does it like this.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
The logical decoding functions do BeginInternalSubTransaction and
RollbackAndReleaseCurrentSubTransaction to clean up after themselves.
It turns out that AtEOSubXact_SPI has an unrecognized assumption that
we always need to cancel the active SPI operation in the SPI context
that surrounds the subtransaction (if there is one). That's true
when the RollbackAndReleaseCurrentSubTransaction call is coming from
the SPI-using function itself, but not when it's happening inside
some unrelated function invoked by a SPI query. In practice the
affected callers are the various PLs.
To fix, record the current subtransaction ID when we begin a SPI
operation, and clean up only if that ID is the subtransaction being
canceled.
Also, remove AtEOSubXact_SPI's assertion that it must have cleaned
up the surrounding SPI context's active tuptable. That's proven
wrong by the same test case.
Also clarify (or, if you prefer, reinterpret) the calling conventions
for _SPI_begin_call and _SPI_end_call. The memory context cleanup
in the latter means that these have always had the flavor of a matched
resource-management pair, but they weren't documented that way before.
Per report from Ben Chobot.
Back-patch to 9.4 where logical decoding came in. In principle,
the SPI changes should go all the way back, since the problem dates
back to commit 7ec1c5a86. But given the lack of field complaints
it seems few people are using internal subtransactions in this way.
So I don't feel a need to take any risks in 9.2/9.3.
Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
Both ExecMakeFunctionResultSet() and evaluation of simple expressions
need to be done in the per-tuple memory context, not per-query, else
we leak data until end of query. This is a consideration that was
missed while refactoring code in the ProjectSet patch (note that in
pre-v10, ExecMakeFunctionResult is called in the per-tuple context).
Per bug #14843 from Ben M. Diagnosed independently by Andres and myself.
Discussion: https://postgr.es/m/20171005230321.28561.15927@wrigleys.postgresql.org
Sloppy loop coding in set_status_by_pages() resulted in fetching one array
element more than it should from the subxids[] array. The odds of this
resulting in SIGSEGV are pretty small, but we've certainly seen that happen
with similar mistakes elsewhere. While at it, we can get rid of an extra
TransactionIdToPage() calculation per loop.
Per report from David Binderman. Back-patch to all supported branches,
since this code is quite old.
Discussion: https://postgr.es/m/HE1PR0802MB2331CBA919CBFFF0C465EB429C710@HE1PR0802MB2331.eurprd08.prod.outlook.com
When some tuple versions in an update chain are frozen due to them being
older than freeze_min_age, the xmax/xmin trail can become broken. This
breaks HOT (and probably other things). A subsequent VACUUM can break
things in more serious ways, such as leaving orphan heap-only tuples
whose root HOT redirect items were removed. This can be seen because
index creation (or REINDEX) complain like
ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t"
Because of relfrozenxid contraints, we cannot avoid the freezing of the
early tuples, so we must cope with the results: whenever we see an Xmin
of FrozenTransactionId, consider it a match for whatever the previous
Xmax value was.
This problem seems to have appeared in 9.3 with multixact changes,
though strictly speaking it seems unrelated.
Since 9.4 we have commit 37484ad2a "Change the way we mark tuples as
frozen", so the fix is simple: just compare the raw Xmin (still stored
in the tuple header, since freezing merely set an infomask bit) to the
Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so
the original value is lost and we have nothing to compare the Xmax with.
To cope with that case we need to compare the Xmin with FrozenXid,
assume it's a match, and hope for the best. Sadly, since you can
pg_upgrade a 9.3 instance containing half-frozen pages to newer
releases, we need to keep the old check in newer versions too, which
seems a bit brittle; I hope we can somehow get rid of that.
I didn't optimize the new function for performance. The new coding is
probably a bit slower than before, since there is a function call rather
than a straight comparison, but I'd rather have it work correctly than
be fast but wrong.
This is a followup after 20b655224249 fixed a few related problems.
Apparently, in 9.6 and up there are more ways to get into trouble, but
in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so
there must be a separate bug.
Reported-by: Peter Geoghegan
Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood,
Yi Wen Wong, Álvaro
Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
Commit 597a87ccc introduced a latch pointer variable to replace use
of a long-lived shared latch in the shared WalRcvData structure.
This was not well thought out, because there are now hazards of the
pointer variable changing while it's being inspected by another
process. This could obviously lead to a core dump in code like
if (WalRcv->latch)
SetLatch(WalRcv->latch);
and there's a more remote risk of a torn read, if we have any
platforms where reading/writing a pointer is not atomic.
An actual problem would occur only if the walreceiver process
exits (gracefully) while the startup process is trying to
signal it, but that seems well within the realm of possibility.
To fix, treat the pointer variable (not the referenced latch)
as being protected by the WalRcv->mutex spinlock. There
remains a race condition that we could apply SetLatch to a
process latch that no longer belongs to the walreceiver, but
I believe that's harmless: at worst it'd cause an extra wakeup
of the next process to use that PGPROC structure.
Back-patch to v10 where the faulty code was added.
Discussion: https://postgr.es/m/22735.1507048202@sss.pgh.pa.us
1. Since commit b1a9bad9e744 we had pstrdup() inside a
spinlock-protected critical section; reported by Andreas Seltenreich.
Turn those into strlcpy() to stack-allocated variables instead.
Backpatch to 9.6.
2. Since commit 9ed551e0a4fd we had a pfree() uselessly inside a
spinlock-protected critical section. Tom Lane noticed in code review.
Move down. Backpatch to 9.6.
3. Since commit 64233902d22b we had GetCurrentTimestamp() (a kernel
call) inside a spinlock-protected critical section. Tom Lane noticed in
code review. Move it up. Backpatch to 9.2.
4. Since commit 1bb2558046cc we did elog(PANIC) while holding spinlock.
Tom Lane noticed in code review. Release spinlock before dying.
Backpatch to 9.2.
Discussion: https://postgr.es/m/87h8vhtgj2.fsf@ansel.ydns.eu
Document better how to create custom collations and what locale strings
ICU accepts. Explain the ICU examples in more detail. Also update the
text on the CREATE COLLATION reference page a bit to take ICU more into
account.
Buildfarm members skink and sungazer have both recently failed this
test, with symptoms indicating that the default 3-second timeout
isn't quite enough for those very slow systems. There's no reason
to be miserly with this timeout, so boost it to 60 seconds.
Back-patch to all versions containing this test. That may be overkill,
because the failure has only been observed in the v10 branch, but
I don't feel like having to revisit this later.
If --rate was used to throttle pgbench, it failed to sleep when it had
nothing to do, leading to a busy-wait with 100% CPU usage. This bug was
introduced in the refactoring in v10. Before that, sleep() was called with
a timeout, even when there were no file descriptors to wait for.
Reported by Jeff Janes, patch by Fabien COELHO. Backpatch to v10.
Discussion: https://www.postgresql.org/message-id/CAMkU%3D1x5hoX0pLLKPRnXCy0T8uHoDvXdq%2B7kAM9eoC9_z72ucw%40mail.gmail.com
get_rel_oids used to not take any relation locks at all, but that stopped
being a good idea with commit 3c3bb9933, which inserted a syscache lookup
into the function. A concurrent DROP TABLE could now produce "cache lookup
failed", which we don't want to have happen in normal operation. The best
solution seems to be to transiently take a lock on the relation named by
the RangeVar (which also makes the result of RangeVarGetRelid a lot less
spongy). But we shouldn't hold the lock beyond this function, because we
don't want VACUUM to lock more than one table at a time. (That would not
be a big problem right now, but it will become one after the pending
feature patch to allow multiple tables to be named in VACUUM.)
In passing, adjust vacuum_rel and analyze_rel to document that we don't
trust the passed RangeVar to be accurate, and allow the RangeVar to
possibly be NULL --- which it is anyway for a whole-database VACUUM,
though we accidentally didn't crash for that case.
The passed RangeVar is in fact inaccurate when dealing with a child
partition, as of v10, and it has been wrong for a whole long time in the
case of vacuum_rel() recursing to a TOAST table. None of these things
present visible bugs up to now, because the passed RangeVar is in fact
only consulted for autovacuum logging, and in that particular context it's
always accurate because autovacuum doesn't let vacuum.c expand partitions
nor recurse to toast tables. Still, this seems like trouble waiting to
happen, so let's nail the door at least partly shut. (Further cleanup
is planned, in HEAD only, as part of the pending feature patch.)
Fix some sadly inaccurate/obsolete comments too. Back-patch to v10.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us
For \d sequencename, the psql code just did SELECT * FROM sequencename
to get the information to display, but this does not contain much
interesting information anymore in PostgreSQL 10, because the metadata
has been moved to a separate system catalog.
This patch creates a newly designed sequence display that is not merely
an extension of the general relation/table display as it was previously.
Example:
PostgreSQL 9.6:
=> \d foobar
Sequence "public.foobar"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | foobar
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | f
PostgreSQL 10 before this change:
=> \d foobar
Sequence "public.foobar"
Column | Type | Value
------------+---------+-------
last_value | bigint | 1
log_cnt | bigint | 0
is_called | boolean | f
New:
=> \d foobar
Sequence "public.foobar"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+---------+-------
bigint | 1 | 1 | 9223372036854775807 | 1 | no | 1
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.
(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)
One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it. But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.
Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead). In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.
An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer. It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.
Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling. In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.
Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
float8_numeric() and float4_numeric() failed to consider the possibility
that the input is an IEEE infinity. The results depended on the
platform-specific behavior of sprintf(): on most platforms you'd get
something like
ERROR: invalid input syntax for type numeric: "inf"
but at least on Windows it's possible for the conversion to succeed and
deliver a finite value (typically 1), due to a nonstandard output format
from sprintf and lack of syntax error checking in these functions.
Since our numeric type lacks the concept of infinity, a suitable conversion
is impossible; the best thing to do is throw an explicit error before
letting sprintf do its thing.
While at it, let's use snprintf not sprintf. Overrunning the buffer
should be impossible if sprintf does what it's supposed to, but this
is cheap insurance against a stack smash if it doesn't.
Problem reported by Taiki Kondo. Patch by me based on fix suggestion
from KaiGai Kohei. Back-patch to all supported branches.
Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp
This reverts commit 15bc038f9, along with the followon commits 1635e80d3
and 984c92074 that tried to clean up the problems exposed by bug #14825.
The result was incomplete because it failed to address parallel-query
requirements. With 10.0 release so close upon us, now does not seem like
the time to be adding more code to fix that. I hope we can un-revert this
code and add the missing parallel query support during the v11 cycle.
Back-patch to v10.
Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
It drops objects outside information_schema that depend on objects
inside information_schema. For example, it will drop a user-defined
view if the view query refers to information_schema.
Discussion: https://postgr.es/m/20170831025345.GE3963697@rfd.leadboat.com
posix_fallocate() is not quite a drop-in replacement for fallocate(),
because it is defined to return the error code as its function result,
not in "errno". I (tgl) missed this because RHEL6's version seems
to set errno as well. That is not the case on more modern Linuxen,
though, as per buildfarm results.
Aside from fixing the return-convention confusion, remove the test
for ENOSYS; we expect that glibc will mask that for posix_fallocate,
though it does not for fallocate. Keep the test for EINTR, because
POSIX specifies that as a possible result, and buildfarm results
suggest that it can happen in practice.
Back-patch to 9.4, like the previous commit.
Thomas Munro
Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com