The general design of memory management in Postgres is that intermediate
results computed by an expression are not freed until the end of the tuple
cycle. For expression indexes, ANALYZE has to re-evaluate each expression
for each of its sample rows, and it wasn't bothering to free intermediate
results until the end of processing of that index. This could lead to very
substantial leakage if the intermediate results were large, as in a recent
example from Jakub Ouhrabka. Fix by doing ResetExprContext for each sample
row. This necessitates adding a datumCopy step to ensure that the final
expression value isn't recycled too. Some quick testing suggests that this
change adds at worst about 10% to the time needed to analyze a table with
an expression index; which is annoying, but seems a tolerable price to pay
to avoid unexpected out-of-memory problems.
Back-patch to all supported branches.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
Split the old typenameTypeId() into two functions: A new typenameTypeId() that
returns only a type OID, and typenameTypeIdAndMod() that returns type OID and
typmod. This isolates call sites better that actually care about the typmod.
After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.
Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied. If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.
For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints. For the
moment, we just forbid such matching. We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.
To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead. In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.
Per bug #5717 from Richard Huxton and subsequent discussion.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted. This should be
particularly useful for fast-start cases such as queries with LIMIT.
Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete. The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update. So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.
In passing, this patch corrects the names of some columns in the
information_schema.triggers view. It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.
Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
This patch resurrects some of the information that could be logged by the
old, now-dead implementation of VACUUM FULL, in particular counts of live
and dead tuples and the time taken for the table rebuild proper. There's
still no logging about the ensuing index rebuilds, though.
Itagaki Takahiro
Actually making this case work, if the column is used in the trigger's
WHEN condition, will take some new code that probably isn't appropriate
to back-patch. For now, just throw a FEATURE_NOT_SUPPORTED error rather
than allowing control to reach the "unexpected object" case. Per bug #5688
from Daniel Grace. Back-patch to 9.0 where the possibility of such a
dependency was introduced.
This is intended as infrastructure to support integration with label-based
mandatory access control systems such as SE-Linux. Further changes (mostly
hooks) will be needed, but this is a big chunk of it.
KaiGai Kohei and Robert Haas
hasn't been set.
The only known case where this can happen is when show_session_authorization
is invoked in an autovacuum process, which is possible if an index function
calls it, as for example in bug #5669 from Andrew Geery. We could perhaps
try to return a sensible value, such as the name of the cluster-owning
superuser; but that seems like much more trouble than the case is worth,
and in any case it could create new possible failure modes. Simply
returning an empty string seems like the most appropriate fix.
Back-patch to all supported versions, even those before autovacuum, just
in case there's another way to provoke this crash.
The previous coding just terminated the COPY immediately after seeing
the EOF marker (-1 where a row field count is expected). The expected
CopyDone or CopyFail message just got thrown away later, since we weren't
in COPY mode anymore. This behavior complicated matters for the JDBC
driver, and arguably was the wrong thing in any case since a CopyFail
message after the marker wouldn't be honored.
Note that there is a behavioral change here: extra data after the EOF
marker was silently ignored before, but now it will cause an error.
Hence not back-patching, although this is arguably a bug.
Per report and patch by Kris Jurka.
transaction snapshots, i.e. a snapshot registered at the beginning of
a transaction. Change variable naming and comments to reflect this reality
in preparation for a future, truly serializable mode, e.g.
Serializable Snapshot Isolation (SSI).
For the moment transaction snapshots are still used to implement
SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
Grittner and Dan Ports with review and some minor wording changes by me.
afterTriggerInvokeEvents failed to adjust events->tailfree when truncating
the last chunk of an event list. This could result in the data being
"de-truncated" by afterTriggerRestoreEventList during a subsequent
subtransaction abort. Even that wouldn't kill us, because the re-added data
would just be events marked DONE --- unless the data had been partially
overwritten by new events. Then we might crash, or in any case misbehave
(perhaps fire triggers twice, or fire triggers with the wrong event data).
Per bug #5622 from Thue Janus Kristensen.
Back-patch to 8.4 where the current trigger list representation was introduced.
The implicitly created sequence was created as owned by the current user,
who could be different from the table owner, eg if current user is a
superuser or some member of the table's owning role. This caused sanity
checks in the SEQUENCE OWNED BY code to spit up. Although possibly we
don't need those sanity checks, the safest fix seems to be to make sure
the implicit sequence is assigned the same owner role as the table has.
(We still do all permissions checks as the current user, however.)
Per report from Josh Berkus.
Back-patch to 9.0. The bug goes back to the invention of SEQUENCE OWNED BY
in 8.2, but the fix requires an API change for DefineRelation(), which seems
to have potential for breaking third-party code if done in a minor release.
Given the lack of prior complaints, it's probably not worth fixing in the
stable branches.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.
Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1. It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.
Review by Jaime Casanova and Tom Lane.
- Rename TSParserGetPrsid to get_ts_parser_oid.
- Rename TSDictionaryGetDictid to get_ts_dict_oid.
- Rename TSTemplateGetTmplid to get_ts_template_oid.
- Rename TSConfigGetCfgid to get_ts_config_oid.
- Rename FindConversionByName to get_conversion_oid.
- Rename GetConstraintName to get_constraint_oid.
- Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid,
get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid.
The name of each function matches the corresponding catalog.
Thanks to KaiGai Kohei for the review.
unqualified names.
- Add a missing_ok parameter to get_tablespace_oid.
- Avoid duplicating get_tablespace_od guts in objectNamesToOids.
- Add a missing_ok parameter to get_database_oid.
- Replace get_roleid and get_role_checked with get_role_oid.
- Add get_namespace_oid, get_language_oid, get_am_oid.
- Refactor existing code to use new interfaces.
Thanks to KaiGai Kohei for the review.
Without this patch, constraints inherited by children of a parent
table which itself has multiple inheritance parents can end up with
the wrong coninhcount. After dropping the constraint, the children
end up with a leftover copy of the constraint that is not dumped
and cannot be dropped. There is a similar problem with ALTER TABLE
.. ADD COLUMN, but that looks significantly more difficult to
resolve, so I'm committing this fix separately.
Back-patch to 8.4, which is the first release that has coninhcount.
Report by Hank Enting.
indexes when the index column type (the opclass opckeytype) is different from
the expression's datatype. When coded, this limitation wasn't worth worrying
about because we had no intelligence to speak of in stats collection for the
datatypes used by such opclasses. However, now that there's non-toy
estimation capability for tsvector queries, it amounts to a bug that ANALYZE
fails to do this.
The fix changes struct VacAttrStats, and therefore constitutes an API break
for custom typanalyze functions. Therefore we can't back-patch it into
released branches, but it was agreed that 9.0 isn't yet frozen hard enough
to make such a change unacceptable. Ergo, back-patch to 9.0 but no further.
The API break had better be mentioned in 9.0 release notes.
assuming that a local char[] array would be aligned on at least a word
boundary. There are architectures on which that is pretty much guaranteed to
NOT be the case ... and those arches also don't like non-aligned memory
accesses, meaning that log_newpage() would crash if it ever got invoked.
Even on Intel-ish machines there's a potential for a large performance penalty
from doing I/O to an inadequately aligned buffer. So palloc it instead.
Backpatch to 8.0 --- 7.4 doesn't have this code.
Avoid hard-coding lockmode used for many altering DDL commands, allowing easier
future changes of lock levels. Implementation of initial analysis on DDL
sub-commands, so that many lock levels are now at ShareUpdateExclusiveLock or
ShareRowExclusiveLock, allowing certain DDL not to block reads/writes.
First of number of planned changes in this area; additional docs required
when full project complete.
Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
This is intended to make it feasible for an enhanced security provider
to actually make use of ExecutorCheckPerms_hook, but also has the
advantage that RI_Initial_Check can allow use of the fast-path when
column-level but not table-level permissions are present.
KaiGai Kohei. Reviewed (in an earlier version) by Stephen Frost, and by me.
Some further changes to the comments by me.
rather than just $N. This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well. In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees. This is noticeably cleaner for
subplans, and about a wash elsewhere.
One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables. Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before. Adjust a
couple of examples in the documentation to match this behavior.
Backpatch to 8.3, which is as far back as we have opfamilies.
The opclass portion could probably be backpatched to 8.2, when
REASSIGN OWNED was added, but for now I have not done that.
Asko Tiidumaa, with minor adjustments by me.
In HEAD, emit a warning when an operator named => is defined.
In both HEAD and the backbranches (except in 8.2, where contrib
modules do not have documentation), document that hstore's text =>
text operator may be removed in a future release, and encourage the
use of the hstore(text, text) function instead. This function only
exists in HEAD (previously, it was called tconvert), so backpatch
it back to 8.2, when hstore was added. Per discussion.
While my previous attempt seems to always produce valid YAML, it
doesn't always produce YAML that means what it appears to mean,
because of tokens like "0xa" and "true", which without quotes will
be interpreted as integer or Boolean literals. So, instead, just
quote everything that's not known to be a number, as we do for
JSON.
Dean Rasheed, with some changes to the comments by me.
The previous code failed to quote in many cases where quoting was necessary -
YAML has loads of special characters, including -:[]{},"'|*& - so quote much
more aggressively, and only refrain from quoting things where it seems fairly
clear that it isn't necessary.
Per report from Dean Rasheed.
before it checks whether the expression is immutable. This covers two cases
that were previously handled poorly:
1. SQL function inlining could reduce the apparent volatility of the
expression, allowing an expression to be accepted where it previously would
not have been. As an example, polymorphic functions must be marked with the
worst-case volatility they have for any argument type, but for specific
argument types they might not be so volatile, so indexing could be allowed.
(Since the planner will refuse to inline functions in cases where the
apparent volatility of the expression would increase, this won't break
any cases that were accepted before.)
2. A nominally immutable function could have default arguments that are
volatile expressions. In such a case insertion of the defaults will increase
both the apparent and actual volatility of the expression, so it is
*necessary* to check this before allowing the expression to be indexed.
Back-patch to 8.4, where default arguments were introduced.
rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.
We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.