1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-06 00:02:13 +03:00
Commit Graph

3384 Commits

Author SHA1 Message Date
Alvaro Herrera
21c9e4973c Revise attribute handling code on partition creation
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.

This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition.  One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent.  That would amount to a behavioral
change that's better not back-patched.

This rewrite makes the code simpler:

1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;

2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly.  This works
because we know the list in the parent is already correct and there can
only be one parent.

This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.

This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched.  No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.

Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code.  Ashutosh Bapat reviewed Amit's fix.  Amit
reviewed mine.

Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/152746742177.1291.9847032632907407358@wrigleys.postgresql.org
2018-11-08 16:22:09 -03:00
Michael Paquier
8aad248f7c Block creation of partitions with open references to its parent
When a partition is created as part of a trigger processing, it is
possible that the partition which just gets created changes the
properties of the table the executor of the ongoing command relies on,
causing a subsequent crash.  This has been found possible when for
example using a BEFORE INSERT which creates a new partition for a
partitioned table being inserted to.

Any attempt to do so is blocked when working on a partition, with
regression tests added for both CREATE TABLE PARTITION OF and ALTER
TABLE ATTACH PARTITION.

Reported-by: Dmitry Shalashov
Author: Amit Langote
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/15437-3fe01ee66bd1bae1@postgresql.org
Backpatch-through: 10
2018-11-05 11:04:20 +09:00
Alvaro Herrera
6b6b59b38e Silence compiler warning in Assert()
gcc 6.3 does not whine about this mistake I made in 39808e8868 but
evidently lots of other compilers do, according to Michael Paquier,
Peter Eisentraut, Arthur Zakirov, Tomas Vondra.

Discussion: too many to list
2018-10-08 10:37:21 -03:00
Alvaro Herrera
101b21ead3 Fix event triggers for partitioned tables
Index DDL cascading on partitioned tables introduced a way for ALTER
TABLE to be called reentrantly.  This caused an an important deficiency
in event trigger support to be exposed: on exiting the reentrant call,
the alter table state object was clobbered, causing a crash when the
outer alter table tries to finalize its processing.  Fix the crash by
creating a stack of event trigger state objects.  There are still ways
to cause things to misbehave (and probably other crashers) with more
elaborate tricks, but at least it now doesn't crash in the obvious
scenario.

Backpatch to 9.5, where DDL deparsing of event triggers was introduced.

Reported-by: Marco Slot
Authors: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CANNhMLCpi+HQ7M36uPfGbJZEQLyTy7XvX=5EFkpR-b1bo0uJew@mail.gmail.com
2018-10-06 19:17:46 -03:00
Tom Lane
db01fc97ad Fix ALTER COLUMN TYPE to not open a relation without any lock.
If the column being modified is referenced by a foreign key constraint
of another table, ALTER TABLE would open the other table (to re-parse
the constraint's definition) without having first obtained a lock on it.
This was evidently intentional, but that doesn't mean it's really safe.
It's especially not safe in 9.3, which pre-dates use of MVCC scans for
catalog reads, but even in current releases it doesn't seem like a good
idea.

We know we'll need AccessExclusiveLock shortly to drop the obsoleted
constraint, so just get that a little sooner to close the hole.

Per testing with a patch that complains if we open a relation without
holding any lock on it.  I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.

Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us
2018-10-01 11:39:14 -04:00
Peter Eisentraut
5f6b0e6d69 Recurse to sequences on ownership change for all relkinds
When a table ownership is changed, we must apply that also to any owned
sequences.  (Otherwise, it would result in a situation that cannot be
restored, because linked sequences must have the same owner as the
table.)  But this was previously only applied to regular tables and
materialized views.  But it should also apply to at least foreign
tables.  This patch removes the relkind check altogether, because it
doesn't save very much and just introduces the possibility of similar
omissions.

Bug: #15238
Reported-by: Christoph Berg <christoph.berg@credativ.de>
2018-09-26 20:19:44 +02:00
Tom Lane
10b9af3ebb Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:13 -04:00
Michael Paquier
ecf56dc5e5 Fix set of NLS translation issues
While monitoring the code, a couple of issues related to string
translation has showed up:
- Some routines for auto-updatable views return an error string, which
sometimes missed the shot.  A comment regarding string translation is
added for each routine to help with future features.
- GSSAPI authentication missed two translations.
- vacuumdb handles non-translated strings.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
2018-08-21 15:17:38 +09:00
Peter Eisentraut
6c206de559 Remove obsolete comment
The sequence name is no longer stored in the sequence relation, since
1753b1b027.
2018-08-13 21:08:15 +02:00
Tom Lane
9446d71577 Don't record FDW user mappings as members of extensions.
CreateUserMapping has a recordDependencyOnCurrentExtension call that's
been there since extensions were introduced (very possibly my fault).
However, there's no support anywhere else for user mappings as members
of extensions, nor are they listed as a possible member object type in
the documentation.  Nor does it really seem like a good idea for user
mappings to belong to extensions when roles don't.  Hence, remove the
bogus call.

(As we saw in bug #15310, the lack of any pg_dump support for this case
ensures that any such membership record would silently disappear during
pg_upgrade.  So there's probably no need for us to do anything else
about cleaning up after this mistake.)

Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us
2018-08-07 16:33:00 -04:00
Tom Lane
2131d4501f Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior.  Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.

There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.

Back-patch to v10 where this code was added.

Report and patch by Yugo Nagata.

Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp
2018-07-19 15:41:46 -04:00
Heikki Linnakangas
ed529faf7e Fix misc typos, mostly in comments.
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.

Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
2018-07-18 16:18:27 +03:00
Michael Paquier
5862174ec7 Clarify use of temporary tables within partition trees
Since their introduction, partition trees have been a bit lossy
regarding temporary relations.  Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.

Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence.  This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message.  There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature.  Partitions respect 2), 3) and 4) already.

It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.

Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there.  Documentation also
includes limitations related to the use of temporary tables with
partition trees.

Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
2018-06-20 10:48:28 +09:00
Tom Lane
b10edaf4bb Fix access to just-closed relcache entry.
It might be impossible for this to cause a problem in non-debug builds,
since there'd be no opportunity for the relcache entry to get recycled
before the fetch.  It blows up nicely with -DRELCACHE_FORCE_RELEASE plus
valgrind, though.

Evidently introduced by careless refactoring in commit f0e44751d.
Back-patch accordingly.

Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us
2018-06-11 19:17:50 -04:00
Tom Lane
c92d1461e1 Widen COPY FROM's current-line-number counter from 32 to 64 bits.
Because the code for the HEADER option skips a line when this counter
is zero, a very long COPY FROM WITH HEADER operation would drop a line
every 2^32 lines.  A lesser but still unfortunate problem is that errors
would show a wrong input line number for errors occurring beyond the
2^31'st input line.  While such large input streams seemed impractical
when this code was first written, they're not any more.  Widening the
counter (and some associated variables) to uint64 should be enough to
prevent problems for the foreseeable future.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
2018-05-22 13:32:52 -04:00
Tom Lane
fab4ecacc4 Fix race conditions when an event trigger is added concurrently with DDL.
EventTriggerTableRewrite crashed if there were table_rewrite triggers
present, but there had not been when the calling command started.

EventTriggerDDLCommandEnd called ddl_command_end triggers if present,
even if there had been no such triggers when the calling command started,
which would lead to a failure in pg_event_trigger_ddl_commands.

In both cases, fix by doing nothing; it's better to wait till the next
command when things will be properly initialized.

In passing, remove an elog(DEBUG1) call that might have seemed interesting
four years ago but surely isn't today.

We found this because of intermittent failures in the buildfarm.  Thanks
to Alvaro Herrera and Andrew Gierth for analysis.

Back-patch to 9.5; some of this code exists before that, but the specific
hazards we need to guard against don't.

Discussion: https://postgr.es/m/5767.1523995174@sss.pgh.pa.us
2018-04-20 17:15:31 -04:00
Tom Lane
94a898f69c Better fix for deadlock hazard in CREATE INDEX CONCURRENTLY.
Commit 54eff5311 did not account for the possibility that we'd have
a transaction snapshot due to default_transaction_isolation being
set high enough to require one.  The transaction snapshot is enough
to hold back our advertised xmin and thus risk deadlock anyway.
The only way to get rid of that snap is to start a new transaction,
so let's do that instead.  Also throw in an assert checking that we
really have gotten to a state where no xmin is being advertised.

Back-patch to 9.4, like the previous commit.

Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com
2018-04-18 12:07:37 -04:00
Robert Haas
29ab1e24a6 Enforce child constraints during COPY TO a partitioned table.
The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition.  This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.

Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat

Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp
2018-04-06 11:52:38 -04:00
Tom Lane
e17e9055f5 Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
2018-03-19 18:49:53 -04:00
Tom Lane
1568156d8f Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY.
Jeff Janes discovered that commit 7ca25b7de made one of the queries run by
REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly.  The root cause is
bad cardinality estimation for correlated quals, but a principled solution
to that problem is some way off, especially since the planner lacks any
statistics about whole-row variables.  Moreover, in non-error cases this
query produces no rows, meaning it must be run to completion; but use of
LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan,
exactly not what we want.  Remove the LIMIT clause, and instead rely on
the count parameter we pass to SPI_execute() to prevent excess work if the
query does return some rows.

While we've heard no field reports of planner misbehavior with this query,
it could be that people are having performance issues that haven't reached
the level of pain needed to cause a bug report.  In any case, that LIMIT
clause can't possibly do anything helpful with any existing version of the
planner, and it demonstrably can cause bad choices in some cases, so
back-patch to 9.4 where the code was introduced.

Thomas Munro

Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com
2018-03-19 17:23:23 -04:00
Alvaro Herrera
e3faddf537 Fix state reversal after partition tuple routing
We make some changes to ModifyTableState and the EState it uses whenever
we route tuples to partitions; but we weren't restoring properly in all
cases, possibly causing crashes when partitions with different tuple
descriptors are targeted by tuples inserted in the same command.
Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the
needed state changing logic, and have it invoked one level above its
current place (ie. put it in ExecModifyTable instead of ExecInsert);
this makes it all more readable.

Add a test case to exercise this.

We don't support having views as partitions; and since only views can
have INSTEAD OF triggers, there is no point in testing for INSTEAD OF
when processing insertions into a partitioned table.  Remove code that
appears to support this (but which is actually never relevant.)

In passing, fix location of some very confusing comments in
ModifyTableState.

Reported-by: Amit Langote
Author: Etsuro Fujita, Amit Langote
Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp
2018-03-19 17:43:55 -03:00
Tom Lane
1bfb567230 When updating reltuples after ANALYZE, just extrapolate from our sample.
The existing logic for updating pg_class.reltuples trusted the sampling
results only for the pages ANALYZE actually visited, preferring to
believe the previous tuple density estimate for all the unvisited pages.
While there's some rationale for doing that for VACUUM (first that
VACUUM is likely to visit a very nonrandom subset of pages, and second
that we know for sure that the unvisited pages did not change), there's
no such rationale for ANALYZE: by assumption, it's looked at an unbiased
random sample of the table's pages.  Furthermore, in a very large table
ANALYZE will have examined only a tiny fraction of the table's pages,
meaning it cannot slew the overall density estimate very far at all.
In a table that is physically growing, this causes reltuples to increase
nearly proportionally to the change in relpages, regardless of what is
actually happening in the table.  This has been observed to cause reltuples
to become so much larger than reality that it effectively shuts off
autovacuum, whose threshold for doing anything is a fraction of reltuples.
(Getting to the point where that would happen seems to require some
additional, not well understood, conditions.  But it's undeniable that if
reltuples is seriously off in a large table, ANALYZE alone will not fix it
in any reasonable number of iterations, especially not if the table is
continuing to grow.)

Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone,
and in ANALYZE, just extrapolate from the sample pages on the assumption
that they provide an accurate model of the whole table.  If, by very bad
luck, they don't, at least another ANALYZE will fix it; in the old logic
a single bad estimate could cause problems indefinitely.

In HEAD, let's remove vac_estimate_reltuples' is_analyze argument
altogether; it was never used for anything and now it's totally pointless.
But keep it in the back branches, in case any third-party code is calling
this function.

Per bug #15005.  Back-patch to all supported branches.

David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me

Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels
2018-03-13 13:24:27 -04:00
Peter Eisentraut
c32f44c4a5 Fix CREATE TABLE / LIKE with bigint identity column
CREATE TABLE / LIKE with a bigint identity column would fail on
platforms where long is 32 bits.  Copying the sequence values used
makeInteger(), which would truncate the 64-bit sequence data to 32 bits.
To fix, use makeFloat() instead, like the parser.  (This does not
actually make use of floats, but stores the values as strings.)

Bug: #15096
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-13 09:41:36 -04:00
Tom Lane
e2ed3c4a30 Fix improper uses of canonicalize_qual().
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses.  It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE.  Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.

Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints.  That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not.  (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail.  There may be related cases
that do fail before that.)

More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean.  If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10.  I haven't attempted to prove that though.

To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly.  Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics.  I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects.  In HEAD, add an Assert to catch
that type of mistake in future.

This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches.  Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.

Patch by me with suggestions from Dean Rasheed.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
2018-03-11 18:10:42 -04:00
Alvaro Herrera
e20dd6a13d Fix bogus Name assignment in CreateStatistics
Apparently, it doesn't work to use a plain cstring as a Name datum: you
may end up having random bytes because of failing to zero the bytes
after the terminating \0, as indicated by valgrind.  I introduced this
bug in 5564c11815, so backpatch this fix to REL_10_STABLE, like that
commit.

While at it, fix a slightly misleading comment, pointed out by David
Rowley.
2018-03-06 13:21:04 -03:00
Alvaro Herrera
911e6236ba Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so.  Patch it up so that it does.  Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.

While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.

Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.

In pg11, comments on statistics objects are cloned too.  In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it.  Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason.  Any corresponding user-visible changes (docs) are backpatched,
though.

Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-05 19:37:19 -03:00
Tom Lane
b45f821e22 Prevent dangling-pointer access when update trigger returns old tuple.
A before-update row trigger may choose to return the "new" or "old" tuple
unmodified.  ExecBRUpdateTriggers failed to consider the second
possibility, and would proceed to free the "old" tuple even if it was the
one returned, leading to subsequent access to already-deallocated memory.
In debug builds this reliably leads to an "invalid memory alloc request
size" failure; in production builds it might accidentally work, but data
corruption is also possible.

This is a very old bug.  There are probably a couple of reasons it hasn't
been noticed up to now.  It would be more usual to return NULL if one
wanted to suppress the update action; returning "old" is significantly less
efficient since the update will occur anyway.  Also, none of the standard
PLs would ever cause this because they all returned freshly-manufactured
tuples even if they were just copying "old".  But commit 4b93f5799 changed
that for plpgsql, making it possible to see the bug with a plpgsql trigger.
Still, this is certainly legal behavior for a trigger function, so it's
ExecBRUpdateTriggers's fault not plpgsql's.

It seems worth creating a test case that exercises returning "old" directly
with a C-language trigger; testing this through plpgsql seems unreliable
because its behavior might change again.

Report and fix by Rushabh Lathia; regression test case by me.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com
2018-02-27 13:27:38 -05:00
Peter Eisentraut
1597948c96 Fix application of identity values in some cases
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN.  To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.

For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence.  The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet.  So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-02 15:06:52 -05:00
Alvaro Herrera
61f08c0163 Fix StoreCatalogInheritance1 to use 32bit inhseqno
For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog.  This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:

ERROR:  duplicate key value violates unique constraint «pg_inherits_relid_seqno_index»
DETAIL:  Key (inhrelid, inhseqno)=(329371, 0) already exists.

Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9d1:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349

Backpatch all the way back.

David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
2018-01-19 10:15:08 -03:00
Alvaro Herrera
6d2a9ae0ed Fix deadlock hazard in CREATE INDEX CONCURRENTLY
Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3bd2 specifically to support this case.  In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.

Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e47510 keeping track of
(registering) the catalog snapshot.  To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.

Backpatch to 9.4, which introduced MVCC catalog scans.

Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).

Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
2018-01-02 19:16:16 -03:00
Teodor Sigaev
bdbf29aaef Update relation's stats in pg_class during vacuum full.
Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.

Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.

Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net
2017-12-27 18:26:58 +03:00
Andres Freund
d3044f8b07 Perform a lot more sanity checks when freezing tuples.
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.

The errors are emitted with ereport rather than elog, despite being
"should never happen" messages, so a proper error code is emitted. To
avoid superflous translations, mark messages as internal.

Author: Andres Freund and Alvaro Herrera
Reviewed-By: Alvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:48 -08:00
Peter Eisentraut
ee5b595493 Apply identity sequence values on COPY
A COPY into a table should apply identity sequence values just like it
does for ordinary defaults.  This was previously forgotten, leading to
null values being inserted, which in turn would fail because identity
columns have not-null constraints.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Steven Winfield <steven.winfield@cantabcapital.com>
Bug: #14952
2017-12-08 09:39:55 -05:00
Tom Lane
6448704496 Fix assorted syscache lookup sloppiness in partition-related code.
heap_drop_with_catalog and ATExecDetachPartition neglected to check for
SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian.
Such failures are pretty unlikely, since we should already have some sort
of lock on the rel at these points, but it's neither a good idea nor
per project style to omit a check for failure.

Also, StorePartitionKey contained a syscache lookup that it never did
anything with, including never releasing the result.  Presumably the
reason why we don't see refcount-leak complaints is that the lookup
always fails; but in any case it's pretty useless, so remove it.

All of these errors were evidently introduced by the relation
partitioning feature.  Back-patch to v10 where that came in.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org
2017-11-27 19:22:08 -05:00
Noah Misch
2168f37c4d Ignore CatalogSnapshot when checking COPY FREEZE prerequisites.
This restores the ability, essentially lost in commit
ffaa44cb55, to use COPY FREEZE under
REPEATABLE READ isolation.  Back-patch to 9.4, like that commit.

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/CA+TgmoahWDm-7fperBxzU9uZ99LPMUmEpSXLTw9TmrOgzwnORw@mail.gmail.com
2017-11-05 09:25:59 -08:00
Alvaro Herrera
7a95966bc0 Revert bogus fixes of HOT-freezing bug
It turns out we misdiagnosed what the real problem was.  Revert the
previous changes, because they may have worse consequences going
forward.  A better fix is forthcoming.

The simplistic test case is kept, though disabled.

Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
2017-11-02 15:51:05 +01:00
Tom Lane
f4cdf781a1 Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running.  However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running.  If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin.  Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages.  The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.

The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages.  But that would add cycles,
as well as contention for the ProcArrayLock.  We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all.  In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load.  Light testing
says that this change is a wash performance-wise for normal loads.

I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.

Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.

Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 14:28:33 -04:00
Tom Lane
2aab70205b Fix inadequate locking during get_rel_oids().
get_rel_oids used to not take any relation locks at all, but that stopped
being a good idea with commit 3c3bb9933, which inserted a syscache lookup
into the function.  A concurrent DROP TABLE could now produce "cache lookup
failed", which we don't want to have happen in normal operation.  The best
solution seems to be to transiently take a lock on the relation named by
the RangeVar (which also makes the result of RangeVarGetRelid a lot less
spongy).  But we shouldn't hold the lock beyond this function, because we
don't want VACUUM to lock more than one table at a time.  (That would not
be a big problem right now, but it will become one after the pending
feature patch to allow multiple tables to be named in VACUUM.)

In passing, adjust vacuum_rel and analyze_rel to document that we don't
trust the passed RangeVar to be accurate, and allow the RangeVar to
possibly be NULL --- which it is anyway for a whole-database VACUUM,
though we accidentally didn't crash for that case.

The passed RangeVar is in fact inaccurate when dealing with a child
partition, as of v10, and it has been wrong for a whole long time in the
case of vacuum_rel() recursing to a TOAST table.  None of these things
present visible bugs up to now, because the passed RangeVar is in fact
only consulted for autovacuum logging, and in that particular context it's
always accurate because autovacuum doesn't let vacuum.c expand partitions
nor recurse to toast tables.  Still, this seems like trouble waiting to
happen, so let's nail the door at least partly shut.  (Further cleanup
is planned, in HEAD only, as part of the pending feature patch.)

Fix some sadly inaccurate/obsolete comments too.  Back-patch to v10.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us
2017-09-29 16:26:21 -04:00
Alvaro Herrera
46c35116ae Fix freezing of a dead HOT-updated tuple
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples.  But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.

(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)

One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it.  But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.

Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead).  In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.

An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer.  It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.

Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling.  In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.

Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
2017-09-28 16:44:01 +02:00
Tom Lane
93a1af0b3f Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE.
This reverts commit 15bc038f9, along with the followon commits 1635e80d3
and 984c92074 that tried to clean up the problems exposed by bug #14825.
The result was incomplete because it failed to address parallel-query
requirements.  With 10.0 release so close upon us, now does not seem like
the time to be adding more code to fix that.  I hope we can un-revert this
code and add the missing parallel query support during the v11 cycle.

Back-patch to v10.

Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
2017-09-27 16:14:37 -04:00
Tom Lane
9ebc778144 Improve wording of error message added in commit 714805010.
Per suggestions from Peter Eisentraut and David Johnston.
Back-patch, like the previous commit.

Discussion: https://postgr.es/m/E1dv9jI-0006oT-Fn@gemulon.postgresql.org
2017-09-26 15:25:56 -04:00
Peter Eisentraut
3d7f11a0fa Fix saving and restoring umask
In two cases, we set a different umask for some piece of code and
restore it afterwards.  But if the contained code errors out, the umask
is not restored.  So add TRY/CATCH blocks to fix that.
2017-09-23 10:03:05 -04:00
Robert Haas
1a44df007c For wal_consistency_checking, mask page checksum as well as page LSN.
If the LSN is different, the checksum will be different, too.

Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh

Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com
2017-09-22 14:33:18 -04:00
Tom Lane
a2b1eb2349 Give a better error for duplicate entries in VACUUM/ANALYZE column list.
Previously, the code didn't think about this case and would just try to
analyze such a column twice.  That would fail at the point of inserting
the second version of the pg_statistic row, with obscure error messsages
like "duplicate key value violates unique constraint" or "tuple already
updated by self", depending on context and PG version.  We could allow
the case by ignoring duplicate column specifications, but it seems better
to reject it explicitly.

The bogus error messages seem like arguably a bug, so back-patch to
all supported versions.

Nathan Bossart, per a report from Michael Paquier, and whacked
around a bit by me.

Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
2017-09-21 18:13:11 -04:00
Peter Eisentraut
522b028a00 Fix DROP SUBSCRIPTION hang
When ALTER SUBSCRIPTION DISABLE is run in the same transaction before
DROP SUBSCRIPTION, the latter will hang because workers will still be
running, not having seen the DISABLE committed, and DROP SUBSCRIPTION
will wait until the workers have vacated the replication origin slots.

Previously, DROP SUBSCRIPTION killed the logical replication workers
immediately only if it was going to drop the replication slot, otherwise
it scheduled the worker killing for the end of the transaction, as a
result of 7e174fa793.  This, however,
causes the present problem.  To fix, kill the workers immediately in all
cases.  This covers all cases: A subscription that doesn't have a
replication slot must be disabled.  It was either disabled in the same
transaction, or it was already disabled before the current transaction,
but then there shouldn't be any workers left and this won't make a
difference.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad
2017-09-17 22:00:34 -04:00
Tom Lane
66fe509be0 Fix possible dangling pointer dereference in trigger.c.
AfterTriggerEndQuery correctly notes that the query_stack could get
repalloc'd during a trigger firing, but it nonetheless passes the address
of a query_stack entry to afterTriggerInvokeEvents, so that if such a
repalloc occurs, afterTriggerInvokeEvents is already working with an
obsolete dangling pointer while it scans the rest of the events.  Oops.
The only code at risk is its "delete_ok" cleanup code, so we can
prevent unsafe behavior by passing delete_ok = false instead of true.

However, that could have a significant performance penalty, because the
point of passing delete_ok = true is to not have to re-scan possibly
a large number of dead trigger events on the next time through the loop.
There's more than one way to skin that cat, though.  What we can do is
delete all the "chunks" in the event list except the last one, since
we know all events in them must be dead.  Deleting the chunks is work
we'd have had to do later in AfterTriggerEndQuery anyway, and it ends
up saving rescanning of just about the same events we'd have gotten
rid of with delete_ok = true.

In v10 and HEAD, we also have to be careful to mop up any per-table
after_trig_events pointers that would become dangling.  This is slightly
annoying, but I don't think that normal use-cases will traverse this code
path often enough for it to be a performance problem.

It's pretty hard to hit this in practice because of the unlikelihood
of the query_stack getting resized at just the wrong time.  Nonetheless,
it's definitely a live bug of ancient standing, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us
2017-09-17 14:50:01 -04:00
Tom Lane
5cc2349319 Ensure that BEFORE STATEMENT triggers fire the right number of times.
Commit 0f79440fb introduced mechanism to keep AFTER STATEMENT triggers
from firing more than once per statement, which was formerly possible
if more than one FK enforcement action had to be applied to a given
table.  Add a similar mechanism for BEFORE STATEMENT triggers, so that
we don't have the unexpected situation of firing BEFORE STATEMENT
triggers more often than AFTER STATEMENT.

As with the previous patch, back-patch to v10.

Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us
2017-09-17 12:16:38 -04:00
Tom Lane
54d4d0ff6c Fix SQL-spec incompatibilities in new transition table feature.
The standard says that all changes of the same kind (insert, update, or
delete) caused in one table by a single SQL statement should be reported
in a single transition table; and by that, they mean to include foreign key
enforcement actions cascading from the statement's direct effects.  It's
also reasonable to conclude that if the standard had wCTEs, they would say
that effects of wCTEs applying to the same table as each other or the outer
statement should be merged into one transition table.  We weren't doing it
like that.

Hence, arrange to merge tuples from multiple update actions into a single
transition table as much as we can.  There is a problem, which is that if
the firing of FK enforcement triggers and after-row triggers with
transition tables is interspersed, we might need to report more tuples
after some triggers have already seen the transition table.  It seems like
a bad idea for the transition table to be mutable between trigger calls.
There's no good way around this without a major redesign of the FK logic,
so for now, resolve it by opening a new transition table each time this
happens.

Also, ensure that AFTER STATEMENT triggers fire just once per statement,
or once per transition table when we're forced to make more than one.
Previous versions of Postgres have allowed each FK enforcement query
to cause an additional firing of the AFTER STATEMENT triggers for the
referencing table, but that's certainly not per spec.  (We're still
doing multiple firings of BEFORE STATEMENT triggers, though; is that
something worth changing?)

Also, forbid using transition tables with column-specific UPDATE triggers.
The spec requires such transition tables to show only the tuples for which
the UPDATE trigger would have fired, which means maintaining multiple
transition tables or else somehow filtering the contents at readout.
Maybe someday we'll bother to support that option, but it looks like a
lot of trouble for a marginal feature.

The transition tables are now managed by the AfterTriggers data structures,
rather than being directly the responsibility of ModifyTable nodes.  This
removes a subtransaction-lifespan memory leak introduced by my previous
band-aid patch 3c4359521.

In passing, refactor the AfterTriggers data structures to reduce the
management overhead for them, by using arrays of structs rather than
several parallel arrays for per-query-level and per-subtransaction state.

I failed to resist the temptation to do some copy-editing on the SGML
docs about triggers, above and beyond merely documenting the effects
of this patch.

Back-patch to v10, because we don't want the semantics of transition
tables to change post-release.

Patch by me, with help and review from Thomas Munro.

Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2017-09-16 13:20:37 -04:00
Robert Haas
29f021160e Fix inconsistent capitalization.
Amit Langote

Discussion: http://postgr.es/m/a83a0899-19f5-594c-9aac-3ba0f16989a1@lab.ntt.co.jp
2017-09-14 11:11:56 -04:00
Peter Eisentraut
f552d18f3e Message style fixes 2017-09-11 11:20:47 -04:00