from the source table. This could never happen anyway before 8.4 because
the executor invariably applied a "junk filter" to rows due to be inserted;
but now that we skip doing that when it's not necessary, the case can occur.
Problem noted 2008-11-27 by KaiGai Kohei, though I misunderstood what he
was on about at the time (the opacity of the patch he proposed didn't help).
qualifier, and add support for this in pg_dump.
This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
case that the command is rewritten into another type of command. The old
behavior to return the command tag of the last executed command was
pretty surprising. In PL/pgSQL, for example, it meant that if a command
was rewritten to a utility statement, FOUND wasn't set at all.
the same page we are nanoseconds away from reading for real. There should be
something left to do on the current page before we consider issuing a prefetch.
GUC variable effective_io_concurrency controls how many concurrent block
prefetch requests will be issued.
(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)
Greg Stark
bitmap. This is extracted from Greg Stark's posix_fadvise patch; it seems
worth committing separately, since it's potentially useful independently of
posix_fadvise.
that are set up for execution with ExecPrepareExpr rather than going through
the full planner process. By introducing an explicit notion of "expression
planning", this patch also lays a bit of groundwork for maybe someday
allowing sub-selects in standalone expressions.
OutputFunctionCall, and friends. This allows SPI-using functions to invoke
datatype I/O without concern for the possibility that a SPI-using function
will be called (which could be either the I/O function itself, or a function
used in a domain check constraint). It's a tad ugly, but not nearly as ugly
as what'd be needed to make this work via retail insertion of push/pop
operations in all the PLs.
This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop
calls into plpgsql; that approach only fixed plpgsql, and not any other PLs.
But the other PLs have the issue too, as illustrated by a recent gripe from
Christian Schröder.
Back-patch to 8.2, which is as far back as this solution will work. It's
also as far back as we need to worry about the domain-constraint case, since
earlier versions did not attempt to check domain constraints within datatype
input. I'm not aware of any old I/O functions that use SPI themselves, so
this should be sufficient for a back-patch.
not include postgres.h nor anything else it doesn't directly need. Add
#includes to calling files as needed to compensate. Per my proposal of
yesterday.
This should be noted as a source code change in the 8.4 release notes,
since it's likely to require changes in add-on modules.
practically free given prior 8.4 changes in plancache and portal management,
and it makes it a lot easier for ExecutorStart/Run/End hooks to get at the
query text. Extracted from Itagaki Takahiro's pg_stat_statements patch,
with minor editorialization.
patch. This includes the ability to force the frame to cover the whole
partition, and the ability to make the frame end exactly on the current row
rather than its last ORDER BY peer. Supporting any more of the full SQL
frame-clause syntax will require nontrivial hacking on the window aggregate
code, so it'll have to wait for 8.5 or beyond.
upcoming window-functions patch. First, tuplestore_trim is now an
exported function that must be explicitly invoked by callers at
appropriate times, rather than something that tuplestore tries to do
behind the scenes. Second, a read pointer that is marked as allowing
backward scan no longer prevents truncation. This means that a read pointer
marked as having BACKWARD but not REWIND capability can only safely read
backwards as far as the oldest other read pointer. (The expected use pattern
for this involves having another read pointer that serves as the truncation
fencepost.)
materialize-mode set results. Since it now uses the ReturnSetInfo node
to hold internal state, we need to be sure to set up the node even when
the immediately called function doesn't return set (but does have a set-valued
argument). Per report from Anupama Aherrao.
toasted values, since those could get dropped once the cursor's transaction
is over. Per bug #4553 from Andrew Gierth.
Back-patch as far as 8.1. The bug actually exists back to 7.4 when holdable
cursors were introduced, but this patch won't work before 8.1 without
significant adjustments. Given the lack of field complaints, it doesn't seem
worth the work (and risk of introducing new bugs) to try to make a patch for
the older branches.
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases. Instead formalize
the approach of passing any needed parameters to the receiver separately.
One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.
This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
DestReceiver created during postquel_start needs to be destroyed during
postquel_end. In a moment of brain fade I had assumed this would be taken
care of by FreeQueryDesc, but it's not (and shouldn't be).
* Refactor explain.c slightly to export a convenient-to-use subroutine
for printing EXPLAIN results.
* Provide hooks for plugins to get control at ExecutorStart and ExecutorEnd
as well as ExecutorRun.
* Add some minimal support for tracking the total runtime of ExecutorRun.
This code won't actually do anything unless a plugin prods it to.
* Change the API of the DefineCustomXXXVariable functions to allow nonzero
"flags" to be specified for a custom GUC variable. While at it, also make
the "bootstrap" default value for custom GUCs be explicitly specified as a
parameter to these functions. This is to eliminate confusion over where the
default comes from, as has been expressed in the past by some users of the
custom-variable facility.
* Refactor GUC code a bit to ensure that a custom variable gets initialized to
something valid (like its default value) even if the placeholder value was
invalid.
locate the target row, if the cursor was declared with FOR UPDATE or FOR
SHARE. This approach is more flexible and reliable than digging through the
plan tree; for instance it can cope with join cursors. But we still provide
the old code for use with non-FOR-UPDATE cursors. Per gripe from Robert Haas.
return the tableoid as well as the ctid for any FOR UPDATE targets that
have child tables. All child tables are listed in the ExecRowMark list,
but the executor just skips the ones that didn't produce the current row.
Curiously, this longstanding restriction doesn't seem to have been documented
anywhere; so no doc changes.
(but not locked, as that would risk deadlocks). Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.
Robert Haas, after an idea of Simon Riggs'.
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values). Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.
Kris Jurka
it just return void instead of sometimes returning a TupleTableSlot. SQL
functions don't need that anymore, and noplace else does either. Eliminating
the return value also means one less hassle for the ExecutorRun hook functions
that will be supported beginning in 8.4.
RETURNING clause, not just a SELECT as formerly.
A side effect of this patch is that when a set-returning SQL function is used
in a FROM clause, performance is improved because the output is collected into
a tuplestore within the function, rather than using the less efficient
value-per-call mechanism.
backwards scan could actually happen. In particular, pass a flag to
materialize-mode SRFs that tells them whether they need to require random
access. In passing, also suppress unneeded backward-scan overhead for a
Portal's holdStore tuplestore. Per my proposal about reducing I/O costs for
tuplestores.
via a tuplestore instead of value-per-call. Refactor a few things to reduce
ensuing code duplication with nodeFunctionscan.c. This represents the
reasonably noncontroversial part of my proposed patch to switch SQL functions
over to returning tuplestores. For the moment, SQL functions still do things
the old way. However, this change enables PL SRFs to be called in targetlists
(observe changes in plperl regression results).
didn't actually work, because nodeRecursiveunion.c creates the underlying
tuplestore with backward scan disabled; which is a decision that we shouldn't
reverse because of performance cost. We could imagine adding signaling from
WorkTableScan to RecursiveUnion about whether backward scan is needed ...
but in practice it'd be a waste of effort, because there simply isn't any
current or plausible future scenario where WorkTableScan would be called on
to scan backward. So just dike out the code that claims to support it.
in their targetlists had better reset ps_TupFromTlist during ReScan calls.
There's no need to back-patch here since nodeAgg and nodeGroup didn't
even pretend to support SRFs in prior releases.
scanning; GiST and GIN do not, and it seems like too much trouble to make
them do so. By teaching ExecSupportsBackwardScan() about this restriction,
we ensure that the planner will protect a scroll cursor from the problem
by adding a Materialize node.
In passing, fix another longstanding bug in the same area: backwards scan of
a plan with set-returning functions in the targetlist did not work either,
since the TupFromTlist expansion code pays no attention to direction (and
has no way to run a SRF backwards anyway). Again the fix is to make
ExecSupportsBackwardScan check this restriction.
Also adjust the index AM API specification to note that mark/restore support
is unnecessary if the AM can't produce ordered output.
In the previous coding, the list of columns that needed to be hashed on
was allocated in the per-query context, but we reallocated every time
the Agg node was rescanned. Since this information doesn't change over
a rescan, just construct the list of columns once during ExecInitAgg().
according to the TupleDesc's natts, not the number of physical columns in the
tuple. The previous coding would do the wrong thing in cases where natts is
different from the tuple's column count: either incorrectly report error when
it should just treat the column as null, or actually crash due to indexing off
the end of the TupleDesc's attribute array. (The second case is probably not
possible in modern PG versions, due to more careful handling of inheritance
cases than we once had. But it's still a clear lack of robustness here.)
The incorrect error indication is ignored by all callers within the core PG
distribution, so this bug has no symptoms visible within the core code, but
it might well be an issue for add-on packages. So patch all the way back.
RecursiveUnion to which it refers. It turns out that we can just postpone the
relevant initialization steps until the first exec call for the node, by which
time the ancestor node must surely be initialized. Per report from Greg Stark.
implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
This facility replaces the former mark/restore support but is otherwise
upward-compatible with previous uses. It's expected to be needed for
single evaluation of CTEs and also for window functions, so I'm committing
it separately instead of waiting for either one of those patches to be
finished. Per discussion with Greg Stark and Hitoshi Harada.
Note: I removed nodeFunctionscan's mark/restore support, instead of bothering
to update it for this change, because it was dead code anyway.
we regenerate the SQL query text not merely the plan derived from it. This
is needed to handle contingencies such as renaming of a table or column
used in an FK. Pre-8.3, such cases worked despite the lack of replanning
(because the cached plan needn't actually change), so this is a regression.
Per bug #4417 from Benjamin Bihler.
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical. This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database. This bug could explain some recent reports of
failure to truncate pg_clog.
At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used. Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss. InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.
Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.
checks in ExecIndexBuildScanKeys() that were inadequate anyway: it's better
to verify the correct varno on an expected index key, not just reject OUTER
and INNER.
This makes the entire current contents of nodeFuncs.c dead code. I'll be
replacing it with some other stuff later, as per recent proposal.
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.
The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed. The least invasive
solution seems to be to generate both plans and postpone the choice until
execution. Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.
There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected. I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
match in antijoin mode, we should advance to next outer tuple not next inner.
We know we don't want to return this outer tuple, and there is no point in
advancing over matching inner tuples now, because we'd just have to do it
again if the next outer tuple has the same merge key. This makes a noticeable
difference if there are lots of duplicate keys in both inputs.
Similarly, after finding a match in semijoin mode, arrange to advance to
the next outer tuple after returning the current match; or immediately,
if it fails the extra quals. The rationale is the same. (This is a
performance bug in existing releases; perhaps worth back-patching? The
planner tries to avoid using mergejoin with lots of duplicates, so it may
not be a big issue in practice.)
Nestloop and hash got this right to start with, but I made some cosmetic
adjustments there to make the corresponding bits of logic look more similar.
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
INSERT or UPDATE will match the target table's current rowtype. In pre-8.3
releases inconsistency can arise with stale cached plans, as reported by
Merlin Moncure. (We patched the equivalent hazard on the SELECT side in Feb
2007; I'm not sure why we thought there was no risk on the insertion side.)
In 8.3 and HEAD this problem should be impossible due to plan cache
invalidation management, but it seems prudent to make the check anyway.
Back-patch as far as 8.0. 7.x versions lack ALTER COLUMN TYPE, so there
seems no way to abuse a stale plan comparably.