1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-26 18:17:33 +03:00

2709 Commits

Author SHA1 Message Date
Tom Lane
87f830e0ce Adjust cutoff points in newly-added sanity tests.
Per recommendation from Andres.
2014-07-21 12:58:41 -04:00
Tom Lane
78db307bb2 Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values.
In commit a61daa14d56867e90dc011bbba52ef771cea6770, we fixed pg_upgrade so
that it would install sane relminmxid and datminmxid values, but that does
not cure the problem for installations that were already pg_upgraded to
9.3; they'll initially have "1" in those fields.  This is not a big problem
so long as 1 is "in the past" compared to the current nextMultiXact
counter.  But if an installation were more than halfway to the MXID wrap
point at the time of upgrade, 1 would appear to be "in the future" and
that would effectively disable tracking of oldest MXIDs in those
tables/databases, until such time as the counter wrapped around.

While in itself this isn't worse than the situation pre-9.3, where we did
not manage MXID wraparound risk at all, the consequences of premature
truncation of pg_multixact are worse now; so we ought to make some effort
to cope with this.  We discussed advising users to fix the tracking values
manually, but that seems both very tedious and very error-prone.

Instead, this patch adopts two amelioration rules.  First, a relminmxid
value that is "in the future" is allowed to be overwritten with a
full-table VACUUM's actual freeze cutoff, ignoring the normal rule that
relminmxid should never go backwards.  (This essentially assumes that we
have enough defenses in place that wraparound can never occur anymore,
and thus that a value "in the future" must be corrupt.)  Second, if we see
any "in the future" values then we refrain from truncating pg_clog and
pg_multixact.  This prevents loss of clog data until we have cleaned up
all the broken tracking data.  In the worst case that could result in
considerable clog bloat, but in practice we expect that relfrozenxid-driven
freezing will happen soon enough to fix the problem before clog bloat
becomes intolerable.  (Users could do manual VACUUM FREEZEs if not.)

Note that this mechanism cannot save us if there are already-wrapped or
already-truncated-away MXIDs in the table; it's only capable of dealing
with corrupt tracking values.  But that's the situation we have with the
pg_upgrade bug.

For consistency, apply the same rules to relfrozenxid/datfrozenxid.  There
are not known mechanisms for these to get messed up, but if they were, the
same tactics seem appropriate for fixing them.
2014-07-21 11:41:53 -04:00
Alvaro Herrera
346d7be184 Move view reloptions into their own varlena struct
Per discussion after a gripe from me in
http://www.postgresql.org/message-id/20140611194633.GH18688@eldon.alvh.no-ip.org

Jaime Casanova
2014-07-14 17:24:40 -04:00
Fujii Masao
d4635b16fe Prevent bitmap heap scans from showing unnecessary block info in EXPLAIN ANALYZE.
EXPLAIN ANALYZE shows the information of the numbers of exact/lossy blocks which
bitmap heap scan processes. But, previously, when those numbers were both zero,
it displayed only the prefix "Heap Blocks:" in TEXT output format. This is strange
and would confuse the users. So this commit suppresses such unnecessary information.

Backpatch to 9.4 where EXPLAIN ANALYZE was changed so that such information was
displayed.

Etsuro Fujita
2014-07-14 20:40:14 +09:00
Tom Lane
59efda3e50 Implement IMPORT FOREIGN SCHEMA.
This command provides an automated way to create foreign table definitions
that match remote tables, thereby reducing tedium and chances for error.
In this patch, we provide the necessary core-server infrastructure and
implement the feature fully in the postgres_fdw foreign-data wrapper.
Other wrappers will throw a "feature not supported" error until/unless
they are updated.

Ronan Dunklau and Michael Paquier, additional work by me
2014-07-10 15:01:43 -04:00
Tom Lane
fbb1d7d73f Allow CREATE/ALTER DATABASE to manipulate datistemplate and datallowconn.
Historically these database properties could be manipulated only by
manually updating pg_database, which is error-prone and only possible for
superusers.  But there seems no good reason not to allow database owners to
set them for their databases, so invent CREATE/ALTER DATABASE options to do
that.  Adjust a couple of places that were doing it the hard way to use the
commands instead.

Vik Fearing, reviewed by Pavel Stehule
2014-07-01 20:10:38 -04:00
Tom Lane
15c82efd69 Refactor CREATE/ALTER DATABASE syntax so options need not be keywords.
Most of the existing option names are keywords anyway, but we can get rid
of LC_COLLATE and LC_CTYPE as keywords known to the lexer/grammar.  This
immediately reduces the size of the grammar tables by about 8KB, and will
save more when we add additional CREATE/ALTER DATABASE options in future.

A side effect of the implementation is that the CONNECTION LIMIT option
can now also be spelled CONNECTION_LIMIT.  We choose not to document this,
however.

Vik Fearing, based on a suggestion by me; reviewed by Pavel Stehule
2014-07-01 19:02:21 -04:00
Alvaro Herrera
f741300c90 Have multixact be truncated by checkpoint, not vacuum
Instead of truncating pg_multixact at vacuum time, do it only at
checkpoint time.  The reason for doing it this way is twofold: first, we
want it to delete only segments that we're certain will not be required
if there's a crash immediately after the removal; and second, we want to
do it relatively often so that older files are not left behind if
there's an untimely crash.

Per my proposal in
http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org
we now execute the truncation in the checkpointer process rather than as
part of vacuum.  Vacuum is in only charge of maintaining in shared
memory the value to which it's possible to truncate the files; that
value is stored as part of checkpoints also, and so upon recovery we can
reuse the same value to re-execute truncate and reset the
oldest-value-still-safe-to-use to one known to remain after truncation.

Per bug reported by Jeff Janes in the course of his tests involving
bug #8673.

While at it, update some comments that hadn't been updated since
multixacts were changed.

Backpatch to 9.3, where persistency of pg_multixact files was
introduced by commit 0ac5ad5134f2.
2014-06-27 14:43:53 -04:00
Alvaro Herrera
b7e51d9c06 Don't allow relminmxid to go backwards during VACUUM FULL
We were allowing a table's pg_class.relminmxid value to move backwards
when heaps were swapped by VACUUM FULL or CLUSTER.  There is a
similar protection against relfrozenxid going backwards, which we
neglected to clone when the multixact stuff was rejiggered by commit
0ac5ad5134f276.

Backpatch to 9.3, where relminmxid was introduced.

As reported by Heikki in
http://www.postgresql.org/message-id/52401AEA.9000608@vmware.com
2014-06-27 14:43:46 -04:00
Heikki Linnakangas
a87a7dc8b6 Don't allow foreign tables with OIDs.
The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it
was still possible with default_with_oids=true. But the rest of the system,
including pg_dump, isn't prepared to handle foreign tables with OIDs
properly.

Backpatch down to 9.1, where foreign tables were introduced. It's possible
that there are databases out there that already have foreign tables with
OIDs. There isn't much we can do about that, but at least we can prevent
them from being created in the future.

Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
2014-06-24 13:27:18 +03:00
Andres Freund
ecac0e2b9e Do all-visible handling in lazy_vacuum_page() outside its critical section.
Since fdf9e21196a lazy_vacuum_page() rechecks the all-visible status
of pages in the second pass over the heap. It does so inside a
critical section, but both visibilitymap_test() and
heap_page_is_all_visible() perform operations that should not happen
inside one. The former potentially performs IO and both potentially do
memory allocations.

To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.

Backpatch to 9.3 where the problem was introduced. The bug only came
to light due to the assertion added in 4a170ee9 and isn't likely to
cause problems in production scenarios. The worst outcome is a
avoidable PANIC restart.

This also gets rid of the difference in the order of operations
between master and standby mentioned in 2a8e1ac5.

Per reports from David Leverton and Keith Fiske in bug #10533.
2014-06-20 11:09:17 +02:00
Andres Freund
3bdcf6a5a7 Don't allow to disable backend assertions via the debug_assertions GUC.
The existance of the assert_enabled variable (backing the
debug_assertions GUC) reduced the amount of knowledge some static code
checkers (like coverity and various compilers) could infer from the
existance of the assertion. That could have been solved by optionally
removing the assertion_enabled variable from the Assert() et al macros
at compile time when some special macro is defined, but the resulting
complication doesn't seem to be worth the gain from having
debug_assertions. Recompiling is fast enough.

The debug_assertions GUC is still available, but readonly, as it's
useful when diagnosing problems. The commandline/client startup option
-A, which previously also allowed to enable/disable assertions, has
been removed as it doesn't serve a purpose anymore.

While at it, reduce code duplication in bufmgr.c and localbuf.c
assertions checking for spurious buffer pins. That code had to be
reindented anyway to cope with the assert_enabled removal.
2014-06-20 11:09:17 +02:00
Alvaro Herrera
7937910781 Fix typos 2014-06-12 14:01:01 -04:00
Tom Lane
e416830a29 Prevent auto_explain from changing the output of a user's EXPLAIN.
Commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which introduced the
EXPLAIN (TIMING) option, for some reason coded explain.c to look at
planstate->instrument->need_timer rather than es->timing to decide
whether to print timing info.  However, the former flag might get set
as a result of contrib/auto_explain wanting timing information.  We
certainly don't want activation of auto_explain to change user-visible
statement behavior, so fix that.

Also fix an independent bug introduced in the same patch: in the code
path for a never-executed node with a machine-friendly output format,
if timing was selected, it would fail to print the Actual Rows and Actual
Loops items.

Per bug #10404 from Tomonari Katsumata.  Back-patch to 9.2 where the
faulty code was introduced.
2014-05-20 12:20:47 -04:00
Bruce Momjian
0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane
2d00190495 Rationalize common/relpath.[hc].
Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that.  To clean it up:

* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h.  We won't consider this symbol part of the FE/BE API.

* Push enum ForkNumber from relfilenode.h into relpath.h.  We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.

* So, relfilenode.h now includes relpath.h rather than vice-versa.  This
direction of dependency is fine.  (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)

* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.

* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().

* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers.  (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)

* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().

* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames.  This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.

The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.

Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 17:30:50 -04:00
Tom Lane
f0fedfe82c Allow polymorphic aggregates to have non-polymorphic state data types.
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject.  The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments.  However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature.  So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations.  We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.

The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column.  I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.

While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency.  I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
2014-04-23 19:17:41 -04:00
Heikki Linnakangas
8d34f68628 Avoid transient bogus page contents when creating a sequence.
Don't use simple_heap_insert to insert the tuple to a sequence relation.
simple_heap_insert creates a heap insertion WAL record, and replaying that
will create a regular heap page without the special area containing the
sequence magic constant, which is wrong for a sequence. That was not a bug
because we always created a sequence WAL record after that, and replaying
that overwrote the bogus heap page, and the transient state could never be
seen by another backend because it was only done when creating a new
sequence relation. But it's simpler and cleaner to avoid that in the first
place.
2014-04-22 10:40:23 +03:00
Heikki Linnakangas
2a8e1ac598 Set the all-visible flag on heap page before writing WAL record, not after.
If we set the all-visible flag after writing WAL record, and XLogInsert
takes a full-page image of the page, the image would not include the flag.
We will then proceed to set the VM bit, which would then be set without the
corresponding all-visible flag on the heap page.

Found by comparing page images on master and standby, after writing/replaying
each WAL record. (There is still a discrepancy: the all-visible flag won't
be set after replaying the HEAP_CLEAN record, even though it is set in the
master. However, it will be set when replaying the HEAP2_VISIBLE record and
setting the VM bit, so the all-visible flag and VM bit are always consistent
on the standby, even though they are momentarily out-of-sync with master)

Backpatch to 9.3 where this code was introduced.
2014-04-17 17:47:50 +03:00
Tom Lane
5f86cbd714 Rename EXPLAIN ANALYZE's "total runtime" output to "execution time".
Now that EXPLAIN also outputs a "planning time" measurement, the use of
"total" here seems rather confusing: it sounds like it might include the
planning time which of course it doesn't.  Majority opinion was that
"execution time" is a better label, so we'll call it that.

This should be noted as a backwards incompatibility for tools that examine
EXPLAIN ANALYZE output.

In passing, I failed to resist the temptation to do a little editing on the
materialized-view example affected by this change.
2014-04-16 20:48:59 -04:00
Tom Lane
e0c91a7ff0 Improve some O(N^2) behavior in window function evaluation.
Repositioning the tuplestore seek pointer in window_gettupleslot() turns
out to be a very significant expense when the window frame is sizable and
the frame end can move.  To fix, introduce a tuplestore function for
skipping an arbitrary number of tuples in one call, parallel to the one we
introduced for tuplesort objects in commit 8d65da1f.  This reduces the cost
of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk.
As in the previous commit, I didn't try to do any real optimization of
tuplestore_skiptuples for the case where the tuplestore has spilled to
disk.  There is probably no practical way to get the cost to less than O(N)
anyway, but perhaps someone can think of something later.

Also fix PersistHoldablePortal() to make use of this API now that we have
it.

Based on a suggestion by Dean Rasheed, though this turns out not to look
much like his patch.
2014-04-13 13:59:17 -04:00
Stephen Frost
842faa714c Make security barrier views automatically updatable
Views which are marked as security_barrier must have their quals
applied before any user-defined quals are called, to prevent
user-defined functions from being able to see rows which the
security barrier view is intended to prevent them from seeing.

Remove the restriction on security barrier views being automatically
updatable by adding a new securityQuals list to the RTE structure
which keeps track of the quals from security barrier views at each
level, independently of the user-supplied quals.  When RTEs are
later discovered which have securityQuals populated, they are turned
into subquery RTEs which are marked as security_barrier to prevent
any user-supplied quals being pushed down (modulo LEAKPROOF quals).

Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
2014-04-12 21:04:58 -04:00
Tom Lane
a9d9acbf21 Create infrastructure for moving-aggregate optimization.
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved.  This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state.  As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.

This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates.  Follow-on commits will update some
of the built-in aggregates to use this feature.

David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
2014-04-12 12:03:30 -04:00
Simon Riggs
e5550d5fec Reduce lock levels of some ALTER TABLE cmds
VALIDATE CONSTRAINT

CLUSTER ON
SET WITHOUT CLUSTER

ALTER COLUMN SET STATISTICS
ALTER COLUMN SET ()
ALTER COLUMN RESET ()

All other sub-commands use AccessExclusiveLock

Simon Riggs and Noah Misch

Reviews by Robert Haas and Andres Freund
2014-04-06 11:13:43 -04:00
Tom Lane
abe075dfff Fix tablespace creation WAL replay to work on Windows.
The code segment that removes the old symlink (if present) wasn't clued
into the fact that on Windows, symlinks are junction points which have
to be removed with rmdir().

Backpatch to 9.0, where the failing code was introduced.

MauMau, reviewed by Muhammad Asif Naeem and Amit Kapila
2014-04-04 23:09:35 -04:00
Noah Misch
7cbe57c34d Offer triggers on foreign tables.
This covers all the SQL-standard trigger types supported for regular
tables; it does not cover constraint triggers.  The approach for
acquiring the old row mirrors that for view INSTEAD OF triggers.  For
AFTER ROW triggers, we spool the foreign tuples to a tuplestore.

This changes the FDW API contract; when deciding which columns to
populate in the slot returned from data modification callbacks, writable
FDWs will need to check for AFTER ROW triggers in addition to checking
for a RETURNING clause.

In support of the feature addition, refactor the TriggerFlags bits and
the assembly of old tuples in ModifyTable.

Ronan Dunklau, reviewed by KaiGai Kohei; some additional hacking by me.
2014-03-23 02:16:34 -04:00
Noah Misch
6115480c54 Improve comments about AfterTriggerBeginQuery() query level usage. 2014-03-23 02:15:52 -04:00
Tom Lane
f7271c4427 Fix relcache reference leak in refresh_by_match_merge().
One path through the loop over indexes forgot to do index_close().  Rather
than adding a fourth call, restructure slightly so that there's only one.

In passing, get rid of an unnecessary syscache lookup: the pg_index struct
for the index is already available from its relcache entry.

Per report from YAMAMOTO Takashi, though this is a bit different from his
suggested patch.  This is new code in HEAD, so no need for back-patch.
2014-03-18 11:36:53 -04:00
Tom Lane
7bae0284ee Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted.  The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction.  Which
of course would usually be the point of receiving the message.  We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.

The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing.  But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.

Per bug #9557 from Marko Tiikkaja.  This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.

Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 12:02:54 -04:00
Bruce Momjian
5024044a20 C comments: improve description of relfilenode uniqueness
Report by Antonin Houska
2014-03-08 12:20:30 -05:00
Tom Lane
7c31874945 Avoid getting more than AccessShareLock when deparsing a query.
In make_ruledef and get_query_def, we have long used AcquireRewriteLocks
to ensure that the querytree we are about to deparse is up-to-date and
the schemas of the underlying relations aren't changing.  Howwever, that
function thinks the query is about to be executed, so it acquires locks
that are stronger than necessary for the purpose of deparsing.  Thus for
example, if pg_dump asks to deparse a rule that includes "INSERT INTO t",
we'd acquire RowExclusiveLock on t.  That results in interference with
concurrent transactions that might for example ask for ShareLock on t.
Since pg_dump is documented as being purely read-only, this is unexpected.
(Worse, it used to actually be read-only; this behavior dates back only
to 8.1, cf commit ba4200246.)

Fix this by adding a parameter to AcquireRewriteLocks to tell it whether
we want the "real" execution locks or only AccessShareLock.

Report, diagnosis, and patch by Dean Rasheed.  Back-patch to all supported
branches.
2014-03-06 19:31:05 -05:00
Andrew Dunstan
3b5e03dca2 Provide a FORCE NULL option to COPY in CSV mode.
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.

Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
2014-03-04 17:31:59 -05:00
Robert Haas
af2543e884 Allow VACUUM FULL/CLUSTER to bump freeze horizons even for pg_class.
pg_class is a special case for CLUSTER and VACUUM FULL, so although
commit 3cff1879f8d03cb729368722ca823a4bf74c0cac caused these
operations to advance relfrozenxid and relminmxid for all other
tables, it did not provide the same benefit for pg_class.  This
plugs that gap.

Andres Freund
2014-03-04 11:08:18 -05:00
Robert Haas
b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Robert Haas
cf6aa68bbd Update a few comments to mention materialized views.
Etsuro Fujita
2014-02-25 13:40:12 -05:00
Tom Lane
769065c1b2 Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.
A large majority of the callers of pg_do_encoding_conversion were
specifying the database encoding as either source or target of the
conversion, meaning that we can use the less general functions
pg_any_to_server/pg_server_to_any instead.

The main advantage of using the latter functions is that they can make use
of a cached conversion-function lookup in the common case that the other
encoding is the current client_encoding.  It's notationally cleaner too in
most cases, not least because of the historical artifact that the latter
functions use "char *" rather than "unsigned char *" in their APIs.

Note that pg_any_to_server will apply an encoding verification step in
some cases where pg_do_encoding_conversion would have just done nothing.
This seems to me to be a good idea at most of these call sites, though
it partially negates the performance benefit.

Per discussion of bug #9210.
2014-02-23 16:59:05 -05:00
Robert Haas
5f173040e3 Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts.  At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.

This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older).  In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches.  A field has also been added
to the Constraint node (FkConstraint in 8.4).  Third-party code calling
these functions or using the Constraint node will require updating.

Report by Andres Freund.  Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.

Security: CVE-2014-0062
2014-02-17 09:33:31 -05:00
Noah Misch
537cbd35c8 Prevent privilege escalation in explicit calls to PL validators.
The primary role of PL validators is to be called implicitly during
CREATE FUNCTION, but they are also normal functions that a user can call
explicitly.  Add a permissions check to each validator to ensure that a
user cannot use explicit validator calls to achieve things he could not
otherwise achieve.  Back-patch to 8.4 (all supported versions).
Non-core procedural language extensions ought to make the same two-line
change to their own validators.

Andres Freund, reviewed by Tom Lane and Noah Misch.

Security: CVE-2014-0061
2014-02-17 09:33:31 -05:00
Noah Misch
fea164a72a Shore up ADMIN OPTION restrictions.
Granting a role without ADMIN OPTION is supposed to prevent the grantee
from adding or removing members from the granted role.  Issuing SET ROLE
before the GRANT bypassed that, because the role itself had an implicit
right to add or remove members.  Plug that hole by recognizing that
implicit right only when the session user matches the current role.
Additionally, do not recognize it during a security-restricted operation
or during execution of a SECURITY DEFINER function.  The restriction on
SECURITY DEFINER is not security-critical.  However, it seems best for a
user testing his own SECURITY DEFINER function to see the same behavior
others will see.  Back-patch to 8.4 (all supported versions).

The SQL standards do not conflate roles and users as PostgreSQL does;
only SQL roles have members, and only SQL users initiate sessions.  An
application using PostgreSQL users and roles as SQL users and roles will
never attempt to grant membership in the role that is the session user,
so the implicit right to add or remove members will never arise.

The security impact was mostly that a role member could revoke access
from others, contrary to the wishes of his own grantor.  Unapproved role
member additions are less notable, because the member can still largely
achieve that by creating a view or a SECURITY DEFINER function.

Reviewed by Andres Freund and Tom Lane.  Reported, independently, by
Jonas Sundman and Noah Misch.

Security: CVE-2014-0060
2014-02-17 09:33:31 -05:00
Alvaro Herrera
801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Peter Eisentraut
66c04c981d Mark some more variables as static or include the appropriate header
Detected by clang's -Wmissing-variable-declarations.

From: Andres Freund <andres@anarazel.de>
2014-02-08 21:21:46 -05:00
Tom Lane
571addd729 Fix unsafe references to errno within error messaging logic.
Various places were supposing that errno could be expected to hold still
within an ereport() nest or similar contexts.  This isn't true necessarily,
though in some cases it accidentally failed to fail depending on how the
compiler chanced to order the subexpressions.  This class of thinko
explains recent reports of odd failures on clang-built versions, typically
missing or inappropriate HINT fields in messages.

Problem identified by Christian Kruse, who also submitted the patch this
commit is based on.  (I fixed a few issues in his patch and found a couple
of additional places with the same disease.)

Back-patch as appropriate to all supported branches.
2014-01-29 20:04:43 -05:00
Robert Haas
9347baa5bb Include planning time in EXPLAIN ANALYZE output.
This doesn't work for prepared queries, but it's not too easy to get
the information in that case and there's some debate as to exactly
what the right thing to measure is, so just do this for now.

Andreas Karlsson, with slight doc changes by me.
2014-01-29 16:09:15 -05:00
Stephen Frost
fbe19ee3b8 ALTER TABLESPACE ... MOVE ... OWNED BY
Add the ability to specify the objects to move by who those objects are
owned by (as relowner) and change ALL to mean ALL objects.  This
makes the command always operate against a well-defined set of objects
and not have the objects-to-be-moved based on the role of the user
running the command.

Per discussion with Simon and Tom.
2014-01-23 23:52:40 -05:00
Alvaro Herrera
b152c6cd0d Make DROP IF EXISTS more consistently not fail
Some cases were still reporting errors and aborting, instead of a NOTICE
that the object was being skipped.  This makes it more difficult to
cleanly handle pg_dump --clean, so change that to instead skip missing
objects properly.

Per bug #7873 reported by Dave Rolsky; apparently this affects a large
number of users.

Authors: Pavel Stehule and Dean Rasheed.  Some tweaks by Álvaro Herrera
2014-01-23 14:40:29 -03:00
Alvaro Herrera
d2458e3b20 Expose a routine to print triggers during EXPLAIN ANALYZE
This is so that auto_explain can use it.

Kyotaro HORIGUCHI
2014-01-20 17:13:47 -03:00
Fujii Masao
5363c7f2bc Fix typo in comment.
Sawada Masahiko
2014-01-21 02:24:17 +09:00
Simon Riggs
4d1e2aeb1a Speed up COPY into tables with DEFAULT nextval()
Previously the presence of a nextval() prevented the
use of batch-mode COPY.  This patch introduces a
special case just for nextval() functions. In future
we will introduce a general case solution for
labelling volatile functions as safe for use.
2014-01-20 17:22:38 +00:00
Stephen Frost
5254958e92 Add CREATE TABLESPACE ... WITH ... Options
Tablespaces have a few options which can be set on them to give PG hints
as to how the tablespace behaves (perhaps it's faster for sequential
scans, or better able to handle random access, etc).  These options were
only available through the ALTER TABLESPACE command.

This adds the ability to set these options at CREATE TABLESPACE time,
removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to
get the correct options set on the tablespace.

Vik Fearing, reviewed by Michael Paquier.
2014-01-18 20:59:31 -05:00
Tom Lane
115f414124 Fix VACUUM's reporting of dead-tuple counts to the stats collector.
Historically, VACUUM has just reported its new_rel_tuples estimate
(the same thing it puts into pg_class.reltuples) to the stats collector.
That number counts both live and dead-but-not-yet-reclaimable tuples.
This behavior may once have been right, but modern versions of the
pgstats code track live and dead tuple counts separately, so putting
the total into n_live_tuples and zero into n_dead_tuples is surely
pretty bogus.  Fix it to report live and dead tuple counts separately.

This doesn't really do much for situations where updating transactions
commit concurrently with a VACUUM scan (possibly causing double-counting or
omission of the tuples they add or delete); but it's clearly an improvement
over what we were doing before.

Hari Babu, reviewed by Amit Kapila
2014-01-18 19:24:33 -05:00