The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results. Per report from Marti Raudsepp,
though this is not his proposed patch.
Back-patch to 9.0, where both these features were introduced. In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
CREATE EXTENSION needs to transiently set search_path, as well as
client_min_messages and log_min_messages. We were doing this by the
expedient of saving the current string value of each variable, doing a
SET LOCAL, and then doing another SET LOCAL with the previous value at
the end of the command. This is a bit expensive though, and it also fails
badly if there is anything funny about the existing search_path value,
as seen in a recent report from Roger Niederland. Fortunately, there's a
much better way, which is to piggyback on the GUC infrastructure previously
developed for functions with SET options. We just open a new GUC nesting
level, do our assignments with GUC_ACTION_SAVE, and then close the nesting
level when done. This automatically restores the prior settings without a
re-parsing pass, so (in principle anyway) there can't be an error. And
guc.c still takes care of cleanup in event of an error abort.
The CREATE EXTENSION code for this was modeled on some much older code in
ri_triggers.c, which I also changed to use the better method, even though
there wasn't really much risk of failure there. Also improve the comments
in guc.c to reflect this additional usage.
The standardized errno code for "no such locale" failures is ENOENT, which
we were just reporting at face value, viz "No such file or directory".
Per gripe from Thom Brown, this might confuse users, so add an errdetail
message to clarify what it means. Also, report newlocale() failures as
ERRCODE_INVALID_PARAMETER_VALUE rather than using
errcode_for_file_access(), since newlocale()'s errno values aren't
necessarily tied directly to file access failures.
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").
Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
With 9.1's use of Params to pass down values from NestLoop join nodes
to their inner plans, it is possible for a Param to have type RECORD, in
which case the set of fields comprising the value isn't determinable by
inspection of the Param alone. However, just as with a Var of type RECORD,
we can find out what we need to know if we can locate the expression that
the Param represents. We already knew how to do this in get_parameter(),
but I'd overlooked the need to be able to cope in get_name_for_var_field(),
which led to EXPLAIN failing with "record type has not been registered".
To fix, refactor the search code in get_parameter() so it can be used by
both functions.
Per report from Marti Raudsepp.
dots. I previously worked around this in initdb, mapping the known
problematic locale names to aliases that work, but Hiroshi Inoue pointed
out that that's not enough because even if you use one of the aliases, like
"Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie.
"Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by
passing that value back to setlocale(), it fails. Note that you are affected
by this bug also if you use one of those short-form names manually, so just
reverting the hack in initdb won't fix it.
To work around that, move the locale name mapping from initdb to a wrapper
around setlocale(), so that the mapping is invoked on every setlocale() call.
Also, add a few checks for failed setlocale() calls in the backend. These
calls shouldn't fail, and if they do there isn't much we can do about it,
but at least you'll get a warning.
Backpatch to 9.1, where the initdb hack was introduced. The Windows bug
affects older versions too if you set locale manually to one of the aliases,
but given the lack of complaints from the field, I'm hesitent to backpatch.
Examination of examples provided by Mark Kirkwood and others has convinced
me that actually commit 7f3eba30c9 was quite
a few bricks shy of a load. The useful part of that patch was clamping
ndistinct for the inner side of a semi or anti join, and the reason why
that's needed is that it's the only way that restriction clauses
eliminating rows from the inner relation can affect the estimated size of
the join result. I had not clearly understood why the clamping was
appropriate, and so mis-extrapolated to conclude that we should clamp
ndistinct for the outer side too, as well as for both sides of regular
joins. These latter actions were all wrong, and are reverted with this
patch. In addition, the clamping logic is now made to affect the behavior
of both paths in eqjoinsel_semi, with or without MCV lists to compare.
When we have MCVs, we suppose that the most common values are the ones
that are most likely to survive the decimation resulting from a lower
restriction clause, so we think of the clamping as eliminating non-MCV
values, or potentially even the least-common MCVs for the inner relation.
Back-patch to 8.4, same as previous fixes in this area.
This patch fixes an oversight in my commit
7f3eba30c9 of 2008-10-23. That patch
accounted for baserel restriction clauses that reduced the number of rows
coming out of a table (and hence the number of possibly-distinct values of
a join variable), but not for join restriction clauses that might have been
applied at a lower level of join. To account for the latter, look up the
sizes of the min_lefthand and min_righthand inputs of the current join,
and clamp with those in the same way as for the base relations.
Noted while investigating a complaint from Ben Chobot, although this in
itself doesn't seem to explain his report.
Back-patch to 8.4; previous versions used different estimation methods
for which this heuristic isn't relevant.
tsvector_concat() allocated its result workspace using the "conservative"
estimate of the sum of the two input tsvectors' sizes. Unfortunately that
wasn't so conservative as all that, because it supposed that the number of
pad bytes required could not grow. Which it can, as per test case from
Jesper Krogh, if there's a mix of lexemes with positions and lexemes
without them in the input data. The fix is to assume that we might add
a not-previously-present pad byte for each and every lexeme in the two
inputs; which really is conservative, but it doesn't seem worthwhile to
try to be more precise.
This is an aboriginal bug in tsvector_concat, so back-patch to all
versions containing it.
There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case. This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.
Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.
Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.
The previous code went into an infinite loop after overflow. In fact,
an overflow is not really an error; it just means that the current
value is the last one we need to return. So, just arrange to stop
immediately when overflow is detected.
Back-patch all the way.
The initial commit of the ALTER TABLE ADD FOREIGN KEY NOT VALID feature
failed to support labeling such constraints as deferrable. The best fix
for this seems to be to fold NOT VALID into ConstraintAttributeSpec.
That's a bit more general than the documented syntax, but it allows
better-targeted syntax error messages.
In addition, do some mostly-but-not-entirely-cosmetic code review for
the whole NOT VALID patch.
This case was missed when NOT VALID constraints were first introduced in
commit 722bf7017b by Simon Riggs on
2011-02-08. Among other things, it causes pg_dump to omit the NOT VALID
flag when dumping such constraints, which may cause them to fail to
load afterwards, if they contained values failing the constraint.
Per report from Thom Brown.
parse_xml_decl's header comment says you can pass NULL for any unwanted
output parameter, but it failed to honor this contract for the "standalone"
flag. The only currently-affected caller is xml_recv, so the net effect is
that sending a binary XML value containing a standalone parameter in its
xml declaration would crash the backend. Per bug #6044 from Christopher
Dillard.
In passing, remove useless initializations of parse_xml_decl's output
parameters in xml_parse.
Back-patch to 8.3, where this code was introduced.
We had some hacks in ruleutils.c to cope with various odd transformations
that the optimizer could do on a CASE foo WHEN "CaseTestExpr = RHS" clause.
However, the fundamental impossibility of covering all cases was exposed
by Heikki, who pointed out that the "=" operator could get replaced by an
inlined SQL function, which could contain nearly anything at all. So give
up on the hacks and just print the expression as-is if we fail to recognize
it as "CaseTestExpr = RHS". (We must cover that case so that decompiled
rules print correctly; but we are not under any obligation to make EXPLAIN
output be 100% valid SQL in all cases, and already could not do so in some
other cases.) This approach requires that we have some printable
representation of the CaseTestExpr node type; I used "CASE_TEST_EXPR".
Back-patch to all supported branches, since the problem case fails in all.
The new logic is less vulnerable to transpositions.
This invalidates the contents of hash indexes built with the old
functions; hence, bump catversion.
Dean Rasheed
Normally nel == 0 works okay because the initial value of "last" will be
less than "base"; but if "base" is zero then the calculation wraps around
and we have a very large (unsigned) value for "last", so that the loop can
be entered and we get a SIGSEGV on a bogus pointer.
This is certainly the proximate cause of the recent reports of Windows
builds crashing on 'infinity'::timestamp --- evidently, they're either not
setting an active timezonetktbl, or setting an empty one. It's not yet
clear to me why it's only happening on Windows and not happening on any
buildfarm member. But even if that's due to some bug elsewhere, it seems
wise for this function to not choke on the powerup values of
timezonetktbl/sztimezonetktbl.
I also changed the copy of this code in ecpglib, although I am not sure
whether it's exposed to a similar hazard.
Per report and stack trace from Richard Broersma.
Convert it to use successive shifts right instead of increasing a divisor.
This is probably a tad more efficient than the original coding, and it's
nicer-looking than the previous patch because we don't need a special case
to avoid overflow in the last branch. But the real reason to do it is to
avoid a Solaris compiler bug, as per results from buildfarm member moa.
The style is set to "printf" for backwards compatibility everywhere except
on Windows, where it is set to "gnu_printf", which eliminates hundreds of
false error messages from modern versions of gcc arising from %m and %ll{d,u}
formats.
Per recent discussion, it's important for all computed datums (not only the
results of input functions) to not contain any ill-defined (uninitialized)
bits. Failing to ensure that can result in equal() reporting that
semantically indistinguishable Consts are not equal, which in turn leads to
bizarre and undesirable planner behavior, such as in a recent example from
David Johnston. We might eventually try to fix this in a general manner by
allowing datatypes to define identity-testing functions, but for now the
path of least resistance is to expect datatypes to force all unused bits
into consistent states.
Per some testing by Noah Misch, array and path functions seem to be the
only ones presenting risks at the moment, so I looked through all the
functions in adt/array*.c and geo_ops.c and fixed them as necessary. In
the array functions, the easiest/safest fix is to allocate result arrays
with palloc0 instead of palloc. Possibly in future someone will want to
look into whether we can just zero the padding bytes, but that looks too
complex for a back-patchable fix. In the path functions, we already had a
precedent in path_in for just zeroing the one known pad field, so duplicate
that code as needed.
Back-patch to all supported branches.
Instead, foreign tables are treated just like views: permissions can
be granted using GRANT privilege ON [TABLE] foreign_table_name TO role,
and revoked similarly. GRANT/REVOKE .. FOREIGN TABLE is no longer
supported, just as we don't support GRANT/REVOKE .. VIEW. The set of
accepted permissions for foreign tables is now identical to the set for
regular tables, and views.
Per report from Thom Brown, and subsequent discussion.
The expression that tried to round the value to the nearest TB could
overflow, leading to bogus output as reported in bug #5993 from Nicola
Cossu. This isn't likely to ever happen in the intended usage of the
function (if it could, we'd be needing to use a wider datatype instead);
but it's not hard to give the expected output, so let's do so.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available. Where those functions are not
available, temporarily select the correct locale with uselocale().
This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.
I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions. Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
This patch is almost entirely cosmetic --- mostly cleaning up a lot of
neglected comments, and fixing code layout problems in places where the
patch made lines too long and then pgindent did weird things with that.
I did find a bug-of-omission in equalTupleDescs().
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings. Fix by passing it in FunctionCallInfoData
instead. In particular this allows a clean fix for bug #5970 (record_cmp
not working). This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
In particular, if we don't have real ndistinct estimates for both sides,
fall back to assuming that half of the left-hand rows have join partners.
This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions).
It's pretty stupid but it won't lead us to think that an antijoin produces
no rows out, as seen in recent example from Uwe Schroeder.
If the referencing and referenced columns have different collations,
the parser will be unable to resolve which collation to use unless it's
helped out in this way. The effects are sometimes masked, if we end up
using a non-collation-sensitive plan; but if we do use a mergejoin
we'll see a failure, as recently noted by Robert Haas.
The SQL spec states that the referenced column's collation should be used
to resolve RI checks, so that's what we do. Note however that we currently
don't append a COLLATE clause when writing a query that examines only the
referencing column. If we ever support collations that have varying
notions of equality, that will have to be changed. For the moment, though,
it's preferable to leave it off so that we can use a normal index on the
referencing column.
This warning is new in gcc 4.6 and part of -Wall. This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
This involves getting the character classification and case-folding
functions in the regex library to use the collations infrastructure.
Most of this work had been done already in connection with the upper/lower
and LIKE logic, so it was a simple matter of transposition.
While at it, split out these functions into a separate source file
regc_pg_locale.c, so that they can be correctly labeled with the Postgres
project's license rather than the Scriptics license. These functions are
100% Postgres-written code whereas what remains in regc_locale.c is still
mostly not ours, so lumping them both under the same copyright notice was
getting more and more misleading.
The original collation patch only fixed the multi-byte code path.
This change also ensures that ILIKE's idea of the case-folding rules
is exactly the same as str_tolower's.
The previous coding worked only if ltproc->fn_collation was always either
DEFAULT_COLLATION_OID or a C-compatible locale. While that's true at the
moment, it wasn't documented (and in fact wasn't true when this code was
committed...). But it only takes a couple more lines to make its internal
caching behavior locale-aware, so let's do that.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment. And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server". There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead). This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".