To make sure that pg_dump/pg_restore function properly with RLS
policies, arrange to have a few of them left around at the end of the
regression tests.
Back-patch to 9.5 where RLS was added.
When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables
with oids were not supported. The tuple generated by the update target
list was projected without space for an oid - a simple oversight.
Reported-By: Peter Geoghegan
Author: Andres Freund
Backpatch: 9.5, where ON CONFLICT was introduced
Otherwise, we effectively act as if abs_top_builddir were the root
directory, which is quite dangerous if the user happens to have
permissions to do things there. This can crop up in PGXS builds,
for example.
Report by Sandro Santilli, patch by me, review by Noah Misch.
With the arrival of the CUBE key word/feature, the index entries for the
cube extension and the CUBE feature were collapsed into one. Tweak the
entry for the cube extension so they are separate entries.
In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.
I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.
We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.
This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
this has gotten much easier
During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
segments. This happened to be hard or impossible to hit due to the
interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
that doesn't work for offsets that haven't yet been flushed to
disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
of emergency vacuuming. The problem remains in
pg_get_multixact_members(), but that's quite harmless.
For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.
For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound. To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.
A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
This replaces ill-fated commit 5ddc72887a,
which was reverted because it broke active uses of FK cache entries. In
this patch, we still do nothing more to invalidatable cache entries than
mark them as needing revalidation, so we won't break active uses. To keep
down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
just the currently-valid cache entries. (The entries are large enough that
some added space for list links doesn't seem like a big problem.) This
would still be O(N^2) when there are many valid entries, though, so when
the list gets too long, just force the "sinval reset" behavior to remove
everything from the list. I set the threshold at 1000 entries, somewhat
arbitrarily. Possibly that could be fine-tuned later. Another item for
future study is whether it's worth adding reference counting so that we
could safely remove invalidated entries. As-is, problem cases are likely
to end up with large and mostly invalid FK caches.
Like the previous attempt, backpatch to 9.3.
Jan Wieck and Tom Lane
(Third time's the charm, I hope.)
Additional testing disclosed that this code could mangle already-localized
output from the "money" datatype. We can't very easily skip applying it
to "money" values, because the logic is tied to column right-justification
and people expect "money" output to be right-justified. Short of
decoupling that, we can fix it in what should be a safe enough way by
testing to make sure the string doesn't contain any characters that would
not be expected in plain numeric output.
On closer inspection, those seemingly redundant atoi() calls were not so
much inefficient as just plain wrong: the author of this code either had
not read, or had not understood, the POSIX specification for localeconv().
The grouping field is *not* a textual digit string but separate integers
encoded as chars.
We'll follow the existing code as well as the backend's cash.c in only
honoring the first group width, but let's at least honor it correctly.
This doesn't actually result in any behavioral change in any of the
locales I have installed on my Linux box, which may explain why nobody's
complained; grouping width 3 is close enough to universal that it's barely
worth considering other cases. Still, wrong is wrong, so back-patch.
This code did the wrong thing entirely for numbers with an exponent
but no decimal point (e.g., '1e6'), as reported by Jeff Janes in
bug #13636. More generally, it made lots of unverified assumptions
about what the input string could possibly look like. Rearrange so
that it only fools with leading digits that it's directly verified
are there, and an immediately adjacent decimal point. While at it,
get rid of some useless inefficiencies, like converting the grouping
count string to integer over and over (and over).
This has been broken for a long time, so back-patch to all supported
branches.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992. The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future. We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity. But the code in clause_selectivity()
failed to make use of such data even if it exists.
Refactor so that that will happen. I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions. To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.
I preserved the behavior that the default estimate for a function call
is 0.3333333. (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.
Per a complaint from Jehan-Guillaume de Rorthais. Back-patch into 9.5,
but not further, at least for the moment.
If we have a local Var of say varchar type with default collation, and
we apply a RelabelType to convert that to text with default collation, we
don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
It should be okay to compare that to a remote Var, so long as the remote
Var determines the comparison collation. (When we actually ship such an
expression to the remote side, the local Var would become a Param with
default collation, meaning the remote Var would in fact control the
comparison collation, because non-default implicit collation overrides
default implicit collation in parse_collate.c.) To fix, be more precise
about what FDW_COLLATE_NONE means: it applies either to a noncollatable
data type or to a collatable type with default collation, if that collation
can't be traced to a remote Var. (When it can, FDW_COLLATE_SAFE is
appropriate.) We were essentially using that interpretation already at
the Var/Const/Param level, but we weren't bubbling it up properly.
An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
value to describe the second situation, but that would add more code
without changing the actual behavior, so it didn't seem worthwhile.
Also, since we're clarifying the rule to be that we care about whether
operator/function input collations match, there seems no need to fail
immediately upon seeing a Const/Param/non-foreign-Var with nondefault
collation. We only have to reject if it appears in a collation-sensitive
context (for example, "var IS NOT NULL" is perfectly safe from a collation
standpoint, whatever collation the var has). So just set the state to
UNSAFE rather than failing immediately.
Per report from Jeevan Chalke. This essentially corrects some sloppy
thinking in commit ed3ddf918b, so back-patch
to 9.3 where that logic appeared.
Previously pg_controldata didn't report newestCommitTs and this was
an oversight in commit 73c986a.
Also this patch changes pg_resetxlog so that it uses the same sentences
as pg_controldata does, regarding oldestCommitTs and newestCommitTs,
for the sake of consistency.
Back-patch to 9.5 where track_commit_timestamp was added.
Euler Taveira
The old minimum values are rather large, making it time consuming to
test related behaviour. Additionally the current limits, especially for
multixacts, can be problematic in space-constrained systems. 10000000
multixacts can contain a lot of members.
Since there's no good reason for the current limits, lower them a good
bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
so still retain a limit.
While at it fix autovacuum_multixact_freeze_max_age to refer to
multixact.c instead of varsup.c.
Reviewed-By: Robert Haas
Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
Backpatch: back to 9.0 (in parts)
Previously, ANALYZE simply ignored columns of datatypes that have neither
a btree nor hash opclass (which means they have no recognized equality
operator). Without a notion of equality, we can't identify most-common
values nor estimate the number of distinct values. But we can still
count nulls and compute the average physical column width, and those
stats might be of value. Moreover there are some tools out there that
don't work so well if rows are missing from pg_statistic. So let's
add suitable logic for this case.
While this is arguably a bug fix, it also has the potential to change
query plans, and the gain seems not worth taking a risk of that in
stable branches. So back-patch into 9.5 but not further.
Oleksandr Shulgin, rewritten a bit by me.
A bunch of tests missed specifying that empty transactions shouldn't be
displayed. That causes problems when e.g. autovacuum runs in an
unfortunate moment. The tests in question only run for a very short
time, making this quite unlikely.
Reported-By: Buildfarm member axolotl
Backpatch: 9.4, where logical decoding was introduced
mul_var() postpones propagating carries until it risks overflow in its
internal digit array. However, the logic failed to account for the
possibility of overflow in the carry propagation step, allowing wrong
results to be generated in corner cases. We must slightly reduce the
when-to-propagate-carries threshold to avoid that.
Discovered and fixed by Dean Rasheed, with small adjustments by me.
This has been wrong since commit d72f6c7503,
so back-patch to all supported branches.
This commit's parent made superfluous the bit's sole usage. Referential
integrity checks have long run as the subject table's owner, and that
now implies RLS bypass. Safe use of the bit was tricky, requiring
strict control over the SQL expressions evaluating therein. Back-patch
to 9.5, where the bit was introduced.
Based on a patch by Stephen Frost.
Every query of a single ENABLE ROW SECURITY table has two meanings, with
the row_security GUC selecting between them. With row_security=force
available, every function author would have been advised to either set
the GUC locally or test both meanings. Non-compliance would have
threatened reliability and, for SECURITY DEFINER functions, security.
Authors already face an obligation to account for search_path, and we
should not mimic that example. With this change, only BYPASSRLS roles
need exercise the aforementioned care. Back-patch to 9.5, where the
row_security GUC was introduced.
Since this narrows the domain of pg_db_role_setting.setconfig and
pg_proc.proconfig, one might bump catversion. A row_security=force
setting in one of those columns will elicit a clear message, so don't.
RemoveLocalLock() must consider the possibility that LockAcquireExtended()
failed to palloc the initial space for a locallock's lockOwners array.
I had evidently meant to cope with this hazard when the code was originally
written (commit 1785acebf2), but missed that
the pfree needed to be protected with an if-test. Just to make sure things
are left in a clean state, reset numLockOwners as well.
Per low-memory testing by Andreas Seltenreich. Back-patch to all supported
branches.
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.
Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.
There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.
Backpatch to 9.5 where the jsonb variants were introduced.
After an internal failure in shortest() or longest() while pinning down the
exact location of a match, find() forgot to free the DFA structure before
returning. This is pretty unlikely to occur, since we just successfully
ran the "search" variant of the DFA; but it could happen, and it would
result in a session-lifespan memory leak since this code uses malloc()
directly. Problem seems to have been aboriginal in Spencer's library,
so back-patch all the way.
In passing, correct a thinko in a comment I added awhile back about the
meaning of the "ntree" field.
I happened across these issues while comparing our code to Tcl's version
of the library.
This setting contains extra configuration for the temp instance, as used
in pg_regress' --temp-config flag.
Backpatch to 9.2 where test.sh was introduced.
The docs claimed that \uhhhh would be interpreted as a Unicode value
regardless of the database encoding, but it's never been implemented
that way: \uhhhh and \xhhhh actually mean exactly the same thing, namely
the character that pg_mb2wchar translates to 0xhhhh. Moreover we were
falsely dismissive of the usefulness of Unicode code points above FFFF.
Fix that.
It's been like this for ages, so back-patch to all supported branches.
For the UPDATE/DELETE RETURNING case, filter the records which are not
visible to the user through ALL or SELECT policies from those considered
for the UPDATE or DELETE. This is similar to how the GRANT system
works, which prevents RETURNING unless the caller has SELECT rights on
the relation.
Per discussion with Robert, Dean, Tom, and Kevin.
Back-patch to 9.5 where RLS was introduced.
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.
This also allowed us to do away with the policy_id field. A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.
Patch by Dean Rasheed, with various mostly cosmetic changes by me.
Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
Commit 5ddc72887a does not actually work
because it will happily blow away ri_constraint_cache entries that are
in active use in outer call levels. In any case, it's a very ugly,
brute-force solution to the problem of limiting the cache size.
Revert until it can be redesigned.
COMMENT supports POLICY but the documentation hadn't caught up with
that fact.
Patch by Charles Clavadetscher
Back-patch to 9.5 where POLICY was added.
This patch changes the log message which is logged when the server
successfully renames backup_label file to *.old but fails to rename
tablespace_map file during the shutdown. Previously the WARNING
message "online backup mode was not canceled" was logged in that case.
However this message is confusing because the backup mode is treated
as canceled whenever backup_label is successfully renamed. So this
commit makes the server log the message "online backup mode canceled"
in that case.
Also this commit changes errdetail messages so that they follow the
error message style guide.
Back-patch to 9.5 where tablespace_map file is introduced.
Original patch by Amit Kapila, heavily modified by me.
To prevent perverse results, we now only return the other operand if
it's not scalar, and if both operands are of the same kind (array or
object).
Original bug complaint and patch from Oskari Saarenmaa, extended by me
to cover the cases of different kinds of jsonb.
Backpatch to 9.5 where jsonb_concat was introduced.
Modify pg_dump to restore postgres/template1 databases to non-default
tablespaces by switching out of the database to be moved, then switching
back.
Also, to fix potentially cases where the old/new tablespaces might not
match, fix pg_upgrade to process new/old tablespaces separately in all
cases.
Report by Marti Raudsepp
Patch by Marti Raudsepp, me
Backpatch through 9.0
I think this particular branch is actually dead, but the analysis to
prove that is not trivial, so instead take the weasel way.
Reported by Jinyu Zhang
Backpatch to 9.5, where BRIN was introduced.