One of the examples on the SELECT page was missing a semicolon from
a listing which has the look and feel of being a psql session. This
adds the missing semicolon and also removes the newline between the
query and results to match the other examples nearby.
Backpatch to all supported branches to avoid backpatching issues on
this page.
Reported-by: tim.needham2@gmail.com
Discussion: https://postgr.es/m/169965004097.225187.12941375915673151540@wrigleys.postgresql.org
Backpatch-through: v12
This allows aliases for sub-SELECTs and VALUES clauses in the FROM
clause to be omitted.
This is an extension of the SQL standard, supported by some other
database systems, and so eases the transition from such systems, as
well as removing the minor inconvenience caused by requiring these
aliases.
Patch by me, reviewed by Tom Lane.
Discussion: https://postgr.es/m/CAEZATCUCGCf82=hxd9N5n6xGHPyYpQnxW8HneeH+uP7yNALkWA@mail.gmail.com
Historically we've been lax about this, but seeing that we're not
lax in C files, there doesn't seem to be a good reason to be so
in the documentation. Remove the existing occurrences (mostly
though not entirely in copied-n-pasted psql output), and modify
.gitattributes so that "git diff --check" will warn about future
cases.
While at it, add *.pm to the set of extensions .gitattributes
knows about, and remove some obsolete entries for files that
we don't have in the tree anymore.
Per followup discussion of commit 5a892c9b1.
Discussion: https://postgr.es/m/E1nfcV1-000kOR-E5@gemulon.postgresql.org
Similarly to what was done in 04539e73f, we standardized on SQL being
pronounced "es-que-ell" rather than "sequel" in our documentation.
Two inconsistencies have crept in during the v15 cycle. The others
existed before but were missed in 04539e73f due to none of the searches
accounting for "SQL" being wrapped in tags.
As with 04539e73f, we don't touch code comments here in order to not
create unnecessary back-patching pain.
Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
With grouping sets, it's possible that some of the grouping sets are
duplicate. This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to
GROUP BY GROUPING SETS (
(a, b, c),
(a, b, c),
(a, b, c),
(a, b),
(a, b),
(a, b),
(a),
(a),
(a),
(c, a),
(c, a),
(c, a),
(c),
(b, c),
(b),
()
)
Some of the grouping sets are calculated multiple times, which is mostly
unnecessary. This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.
Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
Per a user question, spell out that UNNEST() returns array elements
in storage order; also provide an example to clarify the behavior for
multi-dimensional arrays.
While here, also clarify the SELECT reference page's description of
WITH ORDINALITY. These details were already given in 7.2.1.4, but
a reference page should not omit details.
Back-patch to v13; there's not room in the table in older versions.
Discussion: https://postgr.es/m/FF1FB31F-0507-4F18-9559-2DE6E07E3B43@gmail.com
- Misc grammar and punctuation fixes.
- Stylistic cleanup: use spaces between function arguments and JSON fields
in examples. For example "foo(a,b)" -> "foo(a, b)". Add semicolon after
last END in a few PL/pgSQL examples that were missing them.
- Make sentence that talked about "..." and ".." operators more clear,
by avoiding to end the sentence with "..". That makes it look the same
as "..."
- Fix syntax description for HAVING: HAVING conditions cannot be repeated
Patch by Justin Pryzby, per Yaroslav Schekin's report. Backpatch to all
supported versions, to the extent that the patch applies easily.
Discussion: https://www.postgresql.org/message-id/20201005191922.GE17626%40telsasoft.com
SQL commands are generally marked up as <command>, except when a link
to a reference page is used using <xref>. But the latter doesn't
create monospace markup, so this looks strange especially when a
paragraph contains a mix of links and non-links.
We considered putting <command> in the <refentrytitle> on the target
side, but that creates some formatting side effects elsewhere.
Generally, it seems safer to solve this on the link source side.
We can't put the <xref> inside the <command>; the DTD doesn't allow
this. DocBook 5 would allow the <command> to have the linkend
attribute itself, but we are not there yet.
So to solve this for now, convert the <xref>s to <link> plus
<command>. This gives the correct look and also gives some more
flexibility what we can put into the link text (e.g., subcommands or
other clauses). In the future, these could then be converted to
DocBook 5 style.
I haven't converted absolutely all xrefs to SQL command reference
pages, only those where we care about the appearance of the link text
or where it was otherwise appropriate to make the appearance match a
bit better. Also in some cases, the links where repetitive, so in
those cases the links where just removed and replaced by a plain
<command>. In cases where we just want the link and don't
specifically care about the generated link text (typically phrased
"for further information see <xref ...>") the xref is kept.
Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/87o8pco34z.fsf@wibble.ilmari.org
Whitespace between tags is significant, and in some cases it creates
extra vertical space in man pages. The fix is either to remove some
newlines or in some cases to reword slightly to avoid the awkward
markup layout.
Use xreflabel attributes instead of endterm attributes to control the
appearance of links to subsections of SQL command reference pages.
This is simpler, it matches what we do elsewhere (e.g. for GUC variables),
and it doesn't draw "Unresolved ID reference" warnings from the PDF
toolchain.
Fix some places where the text was absolutely dependent on an <xref>
rendering exactly so, by using a <link> around the required text
instead. At least one of those spots had already been turned into
bad grammar by subsequent changes, and the whole idea is just too
fragile for my taste. <xref> does NOT have fixed output, don't write
as if it does.
Consistently include a page-level link in cross-man-page references,
because otherwise they are useless/nonsensical in man-page output.
Likewise, be consistent about mentioning "below" or "above" in same-page
references; we were doing that in about 90% of the cases, but now it's
100%.
Also get rid of another nonfunctional-in-PDF idea, of making
cross-references to functions by sticking ID tags on <row> constructs.
We can put the IDs on <indexterm>s instead --- which is probably not any
more sensible in abstract terms, but it works where the other doesn't.
(There is talk of attaching cross-reference IDs to most or all of
the docs' function descriptions, but for now I just fixed the two
that exist.)
Discussion: https://postgr.es/m/14480.1589154358@sss.pgh.pa.us
Historically we've always materialized the full output of a CTE query,
treating WITH as an optimization fence (so that, for example, restrictions
from the outer query cannot be pushed into it). This is appropriate when
the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE
query is non-recursive and side-effect-free, there's no hazard of changing
the query results by pushing restrictions down.
Another argument for materialization is that it can avoid duplicate
computation of an expensive WITH query --- but that only applies if
the WITH query is called more than once in the outer query. Even then
it could still be a net loss, if each call has restrictions that
would allow just a small part of the WITH query to be computed.
Hence, let's change the behavior for WITH queries that are non-recursive
and side-effect-free. By default, we will inline them into the outer
query (removing the optimization fence) if they are called just once.
If they are called more than once, we will keep the old behavior by
default, but the user can override this and force inlining by specifying
NOT MATERIALIZED. Lastly, the user can force the old behavior by
specifying MATERIALIZED; this would mainly be useful when the query had
deliberately been employing WITH as an optimization fence to prevent a
poor choice of plan.
Andreas Karlsson, Andrew Gierth, David Fetter
Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk
Commit ff4f88916 adjusted the code to enforce the SQL spec's requirement
that a window using GROUPS mode must have an ORDER BY clause. But I missed
that the documentation explicitly said you didn't have to have one.
Also minor wordsmithing in the window-function section of select.sgml.
Per Masahiko Sawada, though I didn't use his patch.
OFFSET <x> ROWS FETCH FIRST <y> ROWS ONLY syntax is supposed to accept
<simple value specification>, which includes parameters as well as
literals. When this syntax was added all those years ago, it was done
inconsistently, with <x> and <y> being different subsets of the
standard syntax.
Rectify that by making <x> and <y> accept the same thing, and allowing
either a (signed) numeric literal or a c_expr there, which allows for
parameters, variables, and parenthesized arbitrary expressions.
Per bug #15200 from Lukas Eder.
Backpatch all the way, since this has been broken from the start.
Discussion: https://postgr.es/m/877enz476l.fsf@news-spur.riddles.org.uk
Discussion: http://postgr.es/m/152647780335.27204.16895288237122418685@wrigleys.postgresql.org
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions. We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion. That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value. Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that. (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)
The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types. Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.
In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011. As far as I can see, we are now fully
up to spec on window framing options.
Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.
Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.
Oliver Ford, reviewed and whacked around some by me
Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.
The source files are still named *.sgml, but they are actually XML files
now. Renaming could be considered later.
In the build system, the intermediate step to convert from SGML to XML
is removed. Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.
The documentation toolchain instructions are updated and are much
simpler now.
Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
IDs in SGML are case insensitive, and we have accumulated a mix of upper
and lower case IDs, including different variants of the same ID. In
XML, these will be case sensitive, so we need to fix up those
differences. Going to all lower case seems most straightforward, and
the current build process already makes all anchors and lower case
anyway during the SGML->XML conversion, so this doesn't create any
difference in the output right now. A future XML-only build process
would, however, maintain any mixed case ID spellings in the output, so
that is another reason to clean this up beforehand.
Author: Alexander Lakhin <exclusion@gmail.com>
For DocBook XML compatibility, don't use SGML empty tags (</>) anymore,
replace by the full tag name. Add a warning option to catch future
occurrences.
Alexander Lakhin, Jürgen Purtz
Claiming that NATURAL JOIN is equivalent to CROSS JOIN when there are
no common column names is only strictly correct if it's an inner join;
you can't say e.g. CROSS LEFT JOIN. Better to explain it as meaning
JOIN ON TRUE, instead. Per a suggestion from David Johnston.
Discussion: https://postgr.es/m/CAKFQuwb+mYszQhDS9f_dqRrk1=Pe-S6D=XMkAXcDf4ykKPmgKQ@mail.gmail.com
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done. Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order. For example, in
SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.
In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step. Getting the desired
behavior reliably required contortions like
SELECT x, nextval('seq')
FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;
This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary. Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.
We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse. Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement. We might tweak the decision rules in the
future, though.
Konstantin Knizhnik, heavily rewritten by me
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.
Petr Jelinek
Reviewed by Michael Paquier and Simon Riggs
You're required to write either RANGE or ROWS to start a frame clause,
but the documentation incorrectly implied this is optional. Noted by
David Johnston.
The SELECT reference page didn't really address the question of when
aggregate function evaluation occurs, nor did the "expression evaluation
rules" documentation mention that CASE can't be used to control whether
an aggregate gets evaluated or not. Improve that.
Per discussion of bug #11661. Original text by Marti Raudsepp and Michael
Paquier, rewritten significantly by me.
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows. While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.
Catalog version bumped because this patch changes the representation of
stored rules.
Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.
Author: Thomas Munro
Errors detected using Topy (https://github.com/intgr/topy), all
changes verified by hand and some manual tweaks added.
Marti Raudsepp
Individual changes backpatched, where applicable, as far as 9.0.
DocBook XML is superficially compatible with DocBook SGML but has a
slightly stricter DTD that we have been violating in a few cases.
Although XSLT doesn't care whether the document is valid, the style
sheets don't necessarily process invalid documents correctly, so we need
to work toward fixing this.
This first commit moves the indexterms in refentry elements to an
allowed position. It has no impact on the output.
This fixes a problem noted as a followup to bug #8648: if a query has a
semantically-empty target list, e.g. SELECT * FROM zero_column_table,
ruleutils.c will dump it as a syntactically-empty target list, which was
not allowed. There doesn't seem to be any reliable way to fix this by
hacking ruleutils (note in particular that the originally zero-column table
might since have had columns added to it); and even if we had such a fix,
it would do nothing for existing dump files that might contain bad syntax.
The best bet seems to be to relax the syntactic restriction.
Also, add parse-analysis errors for SELECT DISTINCT with no columns (after
*-expansion) and RETURNING with no columns. These cases previously
produced unexpected behavior because the parsed Query looked like it had
no DISTINCT or RETURNING clause, respectively. If anyone ever offers
a plausible use-case for this, we could work a bit harder on making the
situation distinguishable.
Arguably this is a bug fix that should be back-patched, but I'm worried
that there may be client apps or PLs that expect "SELECT ;" to throw a
syntax error. The issue doesn't seem important enough to risk changing
behavior in minor releases.
SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and
other collection types. This feature, however, deals with set-returning
functions. Use a different syntax for this feature to keep open the
possibility of implementing the standard TABLE().
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me