This file isn't entirely consistent about whether "on" and "off"
should be marked up with <literal>, but it doesn't make much sense
to be inconsistent within a single sentence.
Etsuro Fujita
When this code was written, catalog scans were normally performed using
SnapshotNow, making special handling necessary here. Now, however, all
catalog scans use MVCC snapshots, so we can change these cases to look
more like what we do for catalog scans elsewhere in the code.
Per discussion with Tom Lane and a reminder from Bruce Momjian.
Currently regression tests for python 3 are disabled on MSVC, and these
tests fail with python 3, too, so we have some work to do to enable
both. Meanwhile, all the buildfarm hosts seem to be building with python
2 anyway, so this at least gets us some coverage.
Original patch from Michael Paquier, significantly modified by me.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
I thought I'd gone through all of these before, but a fresh review found
this one too. (Perhaps it would be better to just delete this test and
let the failure occur later, but for the moment I'll preserve the logic.)
The case that this was rejecting is like
CREATE FOREIGN TABLE ft (f1 int ...) ...;
CREATE TABLE c1 (UNIQUE(f1)) INHERITS(ft);
This is necessary in view of the changes to allow foreign tables to be
full members of inheritance hierarchies, but I (tgl) unaccountably missed
it in commit cb1ca4d800621dcae67ca6c799006de99fa4f0a5.
Noted by Amit Langote, patch by Etsuro Fujita
With this patch the MSVC build and installation will work correctly with
the transforms. However the python transform tests for hstore and ltree
are still disabled pending some further adjustments.
Michael Paquier with some tweaks from me.
Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore. This leads to loss of files that
are critical to interpret existing tuples's Xmax values.
To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten. This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough. We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)
On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum. If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact. Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts. (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely. The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).
Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.
Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.
If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor
Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug. Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.
This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.
Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view. This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.
As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks. That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.
Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
Switching the Windows build scripts to use forward slashes instead of
backslashes has caused a couple of issues in VC builds:
- The file tree list was not correctly generated, build script
generating vcproj file missing tree dependencies when listing items in
Filter.
- VC builds do not accept file paths with forward slashes, perhaps it
could be possible to use a Condition but it seems safer to simply
enforce the file paths to use backslashes in the vcproj files.
- chkpass had an unneeded dependency with libpgport and libpgcommon to
make build succeed but actually it is not necessary as crypt.c is
already listed for this project and should be replaced with a fake name
as it is a unique file.
Michael Paquier
Since both forms are arguably legal I wasn't sure about changing
this. But then Tom argued for 'therefore'...
Author: Dmitriy Olshevskiy
Discussion: 34789.1430067832@sss.pgh.pa.us
When displaying stats it was possible that a floating point division by
zero occured when no FPIs were issued for a type of record.
Author: Abhijit Menon-Sen
Discussion: 20150417091811.GA14008@toroid.org
This provides a mechanism for specifying conversions between SQL data
types and procedural languages. As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.
reviews by Pavel Stěhule and Andres Freund
pg_dump has historically assumed that default_with_oids affects only plain
tables and not other relkinds. Conceivably we could make it apply to some
newly invented relkind if we did so from the get-go, but changing the
behavior for existing object types will break existing dump scripts.
Add code comments warning about this interaction.
Also, make sure that default_with_oids doesn't cause parse_utilcmd.c to
think that CREATE FOREIGN TABLE will create an OID column. I think this is
only a latent bug right now, since we don't allow UNIQUE/PKEY constraints
in CREATE FOREIGN TABLE, but it's better to be consistent and future-proof.
The temp-install target sets EXTRA_INSTALL to install the current
directory. But when doing so, it should append instead of overwrite,
otherwise settings of EXTRA_INSTALL from a makefile won't take effect.
This would cause the earthdistance test to fail when called directly,
because it would miss installing the cube module.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that. In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks. It's broken as far back as we had antijoins, so back-patch to all
supported branches.
This makes it possible to run some stages of these build scripts on
non-Windows systems. That way, we can more easily test whether file
moves or makefile changes might break the MSVC build.
Peter Eisentraut and Michael Paquier
The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.
This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks. This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.
The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.
Author: Dean Rasheed, with some kabitzing and comment changes by me.
The majority practice is to add -DFRONTEND in directories building files
that are, at other times, built for the backend. Some directories
lacking that property added a noise -DFRONTEND in one build system.
Remove the excess flags, for consistency.
Each of the libraries incorporates src/port files, which often check
FRONTEND. Build systems disagreed on whether to build libpgtypes this
way. Only libecpg incorporates files that rely on it today. Back-patch
to 9.0 (all supported versions) to forestall surprises.
examples/, locale/, and thread/ lacked .gitignore files and were also
not connected up to top-level "make clean" etc. This had escaped notice
because none of those directories are built in normal scenarios. Still,
they have working Makefiles, so if someone does a "make" in one of these
directories it would be good if (a) git doesn't bleat about the product
files and (b) cleaning up removes them.
This is a longstanding oversight, but since this behavior is probably
only of interest to developers, there seems no need for back-patching.
Michael Paquier and Tom Lane
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80ba571d4de8992de6386536867250474, which split what
had been set_rel_pathlist() and child routines into two sets of
functions. But I (tgl) evidently missed updating this comment.
Back-patch to 9.2 to avoid unnecessary divergence among branches.
Amit Langote
It was previously mixed in with the description of ALTER TABLE
subcommands. Move it to the Parameters section, which is where it is on
other reference pages.
pointed out by Amit Langote
In get_row_security_policies(), we need to make a copy of the relation
name when building the WithCheckOptions structure, since
RelationGetRelationName just returns a pointer into the local Relation
structure. The relation name in the WCO structure is only used for
error reporting.
Pointed out by Robert and Christian Ullrich, who noted that the
buildfarm members with -DCLOBBER_CACHE_ALWAYS were failing.
When the startup process recovers transactions by scanning pg_twophase
directory, it should clear MyLockedGxact after it's done processing each
transaction. Like we do during normal operation, at PREPARE TRANSACTION.
Otherwise, if the startup process exits due to an error, it will try to
clear the locking_backend field of the last recovered transaction. That's
usually harmless, but if the error happens in MarkAsPreparing, while
holding TwoPhaseStateLock, the shmem-exit hook will try to acquire
TwoPhaseStateLock again, and deadlock with itself.
This fixes bug #13128 reported by Grant McAlister. The bug was introduced
by commit bb38fb0d, so backpatch to all supported versions like that
commit.
Before, make check-world would create a new temporary installation for
each test suite, which is slow and wasteful. Instead, we now create one
test installation that is used by all test suites that are part of a
make run.
The management of the temporary installation is removed from pg_regress
and handled in the makefiles. This allows for better control, and
unifies the code with that of test suites not run through pg_regress.
review and msvc support by Michael Paquier <michael.paquier@gmail.com>
more review by Fabien Coelho <coelho@cri.ensmp.fr>
Commit a2e35b53c39b2a neglected to update the type OID to use further
down in DefineType when TypeShellMake was changed to return
ObjectAddress instead of OID (it got it right in DefineRange, however.)
This resulted in an internal error message being issued when looking up
I/O functions.
Author: Michael Paquier
Also add Asserts() to a couple of other places to ensure that the type
OID being used is as expected.
As pointed out by the buildfarm, test_rls_hooks wasn't functioning
properly with a clean installcheck. test_rls_hooks needs to explicitly
load the library with the hooks in it, to allow installcheck to work;
using the --temp-config doesn't help since that isn't used when running
installcheck and it isn't exactly fair to the buildfarm to modify the
installed config prior to calling installcheck.
Also, have test_rls_hooks clean up after itself.
In prepend_row_security_policies(), defaultDeny was always true, so if
there were any hook policies, the RLS policies on the table would just
get discarded. Fixed to start off with defaultDeny as false and then
properly set later if we detect that only the default deny policy exists
for the internal policies.
The infinite recursion detection in fireRIRrules() didn't properly
manage the activeRIRs list in the case of WCOs, so it would incorrectly
report infinite recusion if the same relation with RLS appeared more
than once in the rtable, for example "UPDATE t ... FROM t ...".
Further, the RLS expansion code in fireRIRrules() was handling RLS in
the main loop through the rtable, which lead to RTEs being visited twice
if they contained sublink subqueries, which
prepend_row_security_policies() attempted to handle by exiting early if
the RTE already had securityQuals. That doesn't work, however, since
if the query involved a security barrier view on top of a table with
RLS, the RTE would already have securityQuals (from the view) by the
time fireRIRrules() was invoked, and so the table's RLS policies would
be ignored. This is fixed in fireRIRrules() by handling RLS in a
separate loop at the end, after dealing with any other sublink
subqueries, thus ensuring that each RTE is only visited once for RLS
expansion.
The inheritance planner code didn't correctly handle non-target
relations with RLS, which would get turned into subqueries during
planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where
t1 has inheritance and t2 has RLS quals would fail. Fix by making sure
to copy in and update the securityQuals when they exist for non-target
relations.
process_policies() was adding WCOs to non-target relations, which is
unnecessary, and could lead to a lot of wasted time in the rewriter and
the planner. Fix by only adding WCO policies when working on the result
relation. Also in process_policies, we should be copying the USING
policies to the WITH CHECK policies on a per-policy basis, fix by moving
the copying up into the per-policy loop.
Lastly, as noted by Dean, we were simply adding policies returned by the
hook provided to the list of quals being AND'd, meaning that they would
actually restrict records returned and there was no option to have
internal policies and hook-based policies work together permissively (as
all internal policies currently work). Instead, explicitly add support
for both permissive and restrictive policies by having a hook for each
and combining the results appropriately. To ensure this is all done
correctly, add a new test module (test_rls_hooks) to test the various
combinations of internal, permissive, and restrictive hook policies.
Largely from Dean Rasheed (thanks!):
CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com
Author: Dean Rasheed, though I added the new hooks and test module.
As noted by Etsuro Fujita [1] and Dean Rasheed[2],
cb1ca4d800621dcae67ca6c799006de99fa4f0a5 changed ExecBuildAuxRowMark()
to always look for the tableoid in the target list, but didn't also
change preprocess_targetlist() to always include the tableoid. This
resulted in errors with soon-to-be-added RLS with inheritance tests,
and errors when using inheritance with foreign tables.
Authors: Etsuro Fujita and Dean Rasheed (independently)
Minor word-smithing on the comments by me.
[1] 552CF0B6.8010006@lab.ntt.co.jp
[2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com