This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).
Some of the initial values of the GUCs updated rely on a compile-time
default. These are refactored so as the GUC table and its C declaration
use the same values. This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.
Extracted from a larger patch by Peter Smith. The spots updated in the
modules are from me.
Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in optimizer, parser,
utility, libpq, and "commands" code, as well as in remaining library
code. Do the same for all code related to frontend programs (with the
exception of pg_dump/pg_dumpall related code).
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will handle
ecpg and pg_dump/pg_dumpall.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
There were many calls to construct_array() and deconstruct_array() for
built-in types, for example, when dealing with system catalog columns.
These all hardcoded the type attributes necessary to pass to these
functions.
To simplify this a bit, add construct_array_builtin(),
deconstruct_array_builtin() as wrappers that centralize this hardcoded
knowledge. This simplifies many call sites and reduces the amount of
hardcoded stuff that is spread around.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2914356f-9e5f-8c59-2995-5997fc48bcba%40enterprisedb.com
The output of table_to_xmlschema() and allied functions includes
a regex describing valid values for these types ... but the regex
was itself invalid, as it failed to escape a literal "+" sign.
Report and fix by Renan Soares Lopes. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/7f6fabaa-3f8f-49ab-89ca-59fbfe633105@me.com
SQL includes provisions for numeric Unicode escapes in string
literals and identifiers. Previously we only accepted those
if they represented ASCII characters or the server encoding
was UTF-8, making the conversion to internal form trivial.
This patch adjusts things so that we'll call the appropriate
encoding conversion function in less-trivial cases, allowing
the escape sequence to be accepted so long as it corresponds
to some character available in the server encoding.
This also applies to processing of Unicode escapes in JSONB.
However, the old restriction still applies to client-side
JSON processing, since that hasn't got access to the server's
encoding conversion infrastructure.
This patch includes some lexer infrastructure that simplifies
throwing errors with error cursors pointing into the middle of
a string (or other complex token). For the moment I only used
it for errors relating to Unicode escapes, but we might later
expand the usage to some other cases.
Patch by me, reviewed by John Naylor.
Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code. But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage. It's never
too late to make it better though, so let's do that.
The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.
I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references. But that's not fatal --- there's no expectation that
we'd actually change any of these values. We can clean up stragglers
over time.
Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
SPI gets used to build a list of relation OIDs for XML object
generation, and one code path building a list uses SPI_execute() without
looking at errors it produces. So fix that.
Author: Mark Dilger
Reviewed-by: Michael Paquier, Pavel Stehule
Discussion: https://postgr.es/m/17d30445-4862-7917-170f-84328dcd292d@gmail.com
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases. So instead of
PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_CATCH();
{
cleanup();
PG_RE_THROW();
}
PG_END_TRY();
cleanup();
one can write
PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_FINALLY();
{
cleanup();
}
PG_END_TRY();
Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
When we already know the length that we're going to append, then it
makes sense to use appendBinaryStringInfo instead of
appendStringInfoString so that the append can be performed with a simple
memcpy() using a known length rather than having to first perform a
strlen() call to obtain the length.
Discussion: https://postgr.es/m/CAKJS1f8+FRAM1s5+mAa3isajeEoAaicJ=4e0WzrH3tAusbbiMQ@mail.gmail.com
xml.c passed format = 1 to xmlNodeDump(), resulting in sometimes getting
extra whitespace (newlines + spaces) in the output. We don't really want
that, first because whitespace might be semantically significant in some
XML uses, and second because it happens only very inconsistently. Only
one case in our regression tests is affected.
This potentially affects the results of xpath() and the XMLTABLE construct,
when emitting nodeset values.
Note that the older code in contrib/xml2 doesn't do this; it seems
to have been an aboriginal bad decision in commit ea3b212fe.
While this definitely seems like a bug to me, the small number of
complaints to date argues against back-patching a behavioral change.
Hence, fix in HEAD only, at least for now.
Per report from Jean-Marc Voillequin.
Discussion: https://postgr.es/m/1EC8157EB499BF459A516ADCF135ADCE3A23A9CA@LON-WGMSX712.ad.moodys.net
Previously we were using the SQL:2003 definition, which doesn't allow
this, but that creates a serious dump/restore gotcha: there is no
setting of xmloption that will allow all valid XML data. Hence,
switch to the 2006 definition.
Since libxml doesn't accept <!DOCTYPE> directives in the mode we
use for CONTENT parsing, the implementation is to detect <!DOCTYPE>
in the input and switch to DOCUMENT parsing mode. This should not
cost much, because <!DOCTYPE> should be close to the front of the
input if it's there at all. It's possible that this causes the
error messages for malformed input to be slightly different than
they were before, if said input includes <!DOCTYPE>; but that does
not seem like a big problem.
In passing, buy back a few cycles in parsing of large XML documents
by not doing strlen() of the whole input in parse_xml_decl().
Back-patch because dump/restore failures are not nice. This change
shouldn't break any cases that worked before, so it seems safe to
back-patch.
Chapman Flack (revised a bit by me)
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
Certain libxml2 versions (such as the 2.7.6 commonly seen in older
distributions, but apparently only on x86_64) contain a bug that causes
xmlCopyNode, when called on a XML_DOCUMENT_NODE, to return a node that
xmlFreeNode crashes on. Arrange to call xmlFreeDoc instead of
xmlFreeNode for those nodes.
Per buildfarm members lapwing and grison.
Author: Pavel Stehule, light editing by Álvaro.
Discussion: https://postgr.es/m/20190308024436.GA2374@alvherre.pgsql
Correctly process nodes of more types than previously. In some cases,
nodes were being ignored (nothing was output); in other cases, trying to
return them resulted in errors about unrecognized nodes. In yet other
cases, necessary escaping (of XML special characters) was not being
done. Fix all those (as far as the authors could find) and add
regression tests cases verifying the new behavior.
I (Álvaro) was of two minds about backpatching these changes. They do
seem bugfixes that would benefit most users of the affected functions;
but on the other hand it would change established behavior in minor
releases, so it seems prudent not to.
Authors: Pavel Stehule, Markus Winand, Chapman Flack
Discussion:
https://postgr.es/m/CAFj8pRA6J25CtAZ2TuRvxK3gat7-bBUYh0rfE2yM7Hj9GD14Dg@mail.gmail.comhttps://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at
The elephant in the room as pointed out by Chapman Flack, not fixed in
this commit, is that we still have XMLTABLE operating on XPath 1.0
instead of the standard-mandated XQuery (or even its subset XPath 2.0).
Fixing that is a major undertaking, however.
While this isn't really supposed to happen, it can occur in OOM
situations and perhaps others. Instead of crashing, substitute
"(no message provided)".
I didn't worry about localizing this text, since we aren't
localizing anything else here; besides, if we're on the edge of
OOM, it's unlikely gettext() would work.
Report and fix by Sergio Conde Gómez in bug #15624.
Discussion: https://postgr.es/m/15624-4dea54091a2864e6@postgresql.org
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.
heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.
Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.
As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node. This becomes visible when a relative path rather than
absolute is used as row expression. Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.
Other functions such as xpath() and xpath_exists() also have this
problem. While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too. However, this is a backwards-incompatible change.
No backpatch, out of fear of breaking code depending on the original
broken behavior.
Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
Column expressions that match TEXT or CDATA nodes must return the
contents of the nodes themselves, not the content of non-existing
children (i.e. the empty string).
Author: Markus Winand
Reported-by: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
When a value contained an XML declaration naming some other encoding,
this function interpreted UTF8 bytes as the named encoding, yielding
mojibake. xml_parse() already has similar logic. This would be
necessary but not sufficient for non-UTF8 databases, so preserve
behavior there until the xpath facility can support such databases
comprehensively. Back-patch to 9.3 (all supported versions).
Pavel Stehule and Noah Misch
Discussion: https://postgr.es/m/CAFj8pRC-dM=tT=QkGi+Achkm+gwPmjyOayGuUfXVumCxkDgYWg@mail.gmail.com
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources. The upper case spellings are only used
in some files/modules. So standardize on the standard spellings.
The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.
In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc. Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.
Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
It only produced <row> elements but no wrapping <table> element.
By contrast, cursor_to_xmlschema produced a schema that is now correct
but did not previously match the XML data produced by cursor_to_xml.
In passing, also fix a minor misunderstanding about moving cursors in
the tests related to this.
Reported-by: filip@jirsak.org
Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
I think I've now gotten all the hard-coded references in C code.
Unfortunately there's no equally convenient way to parameterize
SQL files ...
Discussion: https://postgr.es/m/11145.1488931324@sss.pgh.pa.us
Compilers that don't realize that ereport(ERROR) doesn't return
complained that XmlTableGetValue() failed to return a value.
Also, make XmlTableFetchRow's non-USE_LIBXML case look more like
the other ones. As coded, it could lead to "unreachable code"
warnings with USE_LIBXML enabled.
Oversights in commit fcec6caaf. Per buildfarm.