mirror of
https://github.com/postgres/postgres.git
synced 2025-05-02 11:44:50 +03:00
Still needs to be filled with more information, but it gives us a framework to have a User's Guide with complete coverage of the basic SQL operations. Move arrays into data type chapter, inheritance into DDL chapter (for now). Make <comment>s show up in the output while the version number ends in "devel". Allow cross-book references with entities &cite-user; etc.
965 lines
36 KiB
Plaintext
965 lines
36 KiB
Plaintext
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.16 2002/08/05 19:43:31 petere Exp $ -->
|
|
|
|
<chapter id="queries">
|
|
<title>Queries</title>
|
|
|
|
<para>
|
|
The previous chapters explained how to create tables, how to fill
|
|
them with data, and how to manipulate that data. Now we finally
|
|
discuss how to retrieve the data out of the database.
|
|
</para>
|
|
|
|
<sect1 id="queries-overview">
|
|
<title>Overview</title>
|
|
|
|
<para>
|
|
The process of retrieving or the command to retrieve data from a
|
|
database is called a <firstterm>query</firstterm>. In SQL the
|
|
<command>SELECT</command> command is used to specify queries. The
|
|
general syntax of the <command>SELECT</command> command is
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
|
|
</synopsis>
|
|
The following sections describe the details of the select list, the
|
|
table expression, and the sort specification. The simplest kind of
|
|
query has the form
|
|
<programlisting>
|
|
SELECT * FROM table1;
|
|
</programlisting>
|
|
Assuming that there is a table called table1, this command would
|
|
retrieve all rows and all columns from table1. (The method of
|
|
retrieval depends on the client application. For example, the
|
|
<application>psql</application> program will display an ASCII-art
|
|
table on the screen, client libraries will offer functions to
|
|
retrieve individual rows and columns.) The select list
|
|
specification <literal>*</literal> means all columns that the table
|
|
expression happens to provide. A select list can also select a
|
|
subset of the available columns or even make calculations on the
|
|
columns before retrieving them; see <xref
|
|
linkend="queries-select-lists">. For example, if table1 has columns
|
|
named a, b, and c (and perhaps others) you can make the following
|
|
query:
|
|
<programlisting>
|
|
SELECT a, b + c FROM table1;
|
|
</programlisting>
|
|
(assuming that b and c are of a numeric data type).
|
|
</para>
|
|
|
|
<para>
|
|
<literal>FROM table1</literal> is a particularly simple kind of
|
|
table expression. In general, table expressions can be complex
|
|
constructs of base tables, joins, and subqueries. But you can also
|
|
omit the table expression entirely and use the SELECT command as a
|
|
calculator:
|
|
<programlisting>
|
|
SELECT 3 * 4;
|
|
</programlisting>
|
|
This is more useful if the expressions in the select list return
|
|
varying results. For example, you could call a function this way.
|
|
<programlisting>
|
|
SELECT random();
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="queries-table-expressions">
|
|
<title>Table Expressions</title>
|
|
|
|
<para>
|
|
A <firstterm>table expression</firstterm> specifies a table. The
|
|
table expression contains a FROM clause that is optionally followed
|
|
by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions
|
|
simply refer to a table on disk, a so-called base table, but more
|
|
complex expressions can be used to modify or combine base tables in
|
|
various ways.
|
|
</para>
|
|
|
|
<para>
|
|
The optional WHERE, GROUP BY, and HAVING clauses in the table expression
|
|
specify a pipeline of successive transformations performed on the
|
|
table derived in the FROM clause. The derived table that is produced by
|
|
all these transformations provides the input rows used to compute output
|
|
rows as specified by the select list of column value expressions.
|
|
</para>
|
|
|
|
<sect2 id="queries-from">
|
|
<title>FROM clause</title>
|
|
|
|
<para>
|
|
The FROM clause derives a table from one or more other tables
|
|
given in a comma-separated table reference list.
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
|
|
</synopsis>
|
|
|
|
A table reference may be a table name (possibly schema-qualified),
|
|
or a derived table such as a
|
|
subquery, a table join, or complex combinations of these. If more
|
|
than one table reference is listed in the FROM clause they are
|
|
cross-joined (see below) to form the derived table that may then
|
|
be subject to transformations by the WHERE, GROUP BY, and HAVING
|
|
clauses and is finally the result of the overall table expression.
|
|
</para>
|
|
|
|
<para>
|
|
When a table reference names a table that is the
|
|
supertable of a table inheritance hierarchy, the table reference
|
|
produces rows of not only that table but all of its subtable successors,
|
|
unless the keyword ONLY precedes the table name. However, the reference
|
|
produces only the columns that appear in the named table --- any columns
|
|
added in subtables are ignored.
|
|
</para>
|
|
|
|
<sect3 id="queries-join">
|
|
<title>Joined Tables</title>
|
|
|
|
<indexterm zone="queries-join">
|
|
<primary>joins</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A joined table is a table derived from two other (real or
|
|
derived) tables according to the rules of the particular join
|
|
type. Inner, outer, and cross-joins are available.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<title>Join Types</title>
|
|
|
|
<varlistentry>
|
|
<term>Cross-join</term>
|
|
|
|
<indexterm>
|
|
<primary>joins</primary>
|
|
<secondary>cross</secondary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<synopsis>
|
|
<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
|
|
</synopsis>
|
|
|
|
<para>
|
|
For each combination of rows from
|
|
<replaceable>T1</replaceable> and
|
|
<replaceable>T2</replaceable>, the derived table will contain a
|
|
row consisting of all columns in <replaceable>T1</replaceable>
|
|
followed by all columns in <replaceable>T2</replaceable>. If
|
|
the tables have N and M rows respectively, the joined
|
|
table will have N * M rows. A cross join is equivalent to an
|
|
<literal>INNER JOIN ON TRUE</literal>.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
|
|
<replaceable>T2</replaceable></literal> is equivalent to
|
|
<literal>FROM <replaceable>T1</replaceable>,
|
|
<replaceable>T2</replaceable></literal>.
|
|
</para>
|
|
</tip>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Qualified joins</term>
|
|
|
|
<indexterm>
|
|
<primary>joins</primary>
|
|
<secondary>outer</secondary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<synopsis>
|
|
<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
|
|
<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
|
|
<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
|
|
</synopsis>
|
|
|
|
<para>
|
|
The words <token>INNER</token> and <token>OUTER</token> are
|
|
optional for all joins. <token>INNER</token> is the default;
|
|
<token>LEFT</token>, <token>RIGHT</token>, and
|
|
<token>FULL</token> imply an OUTER JOIN.
|
|
</para>
|
|
|
|
<para>
|
|
The <firstterm>join condition</firstterm> is specified in the
|
|
ON or USING clause, or implicitly by the word NATURAL. The join
|
|
condition determines which rows from the two source tables are
|
|
considered to <quote>match</quote>, as explained in detail below.
|
|
</para>
|
|
|
|
<para>
|
|
The ON clause is the most general kind of join condition: it takes a
|
|
Boolean value expression of the same kind as is used in a WHERE
|
|
clause. A pair of rows from T1 and T2 match if the ON expression
|
|
evaluates to TRUE for them.
|
|
</para>
|
|
|
|
<para>
|
|
USING is a shorthand notation: it takes a
|
|
comma-separated list of column names, which the joined tables
|
|
must have in common, and forms a join condition specifying equality
|
|
of each of these pairs of columns. Furthermore, the output of
|
|
a JOIN USING has one column for each of the equated pairs of
|
|
input columns, followed by all of the other columns from each table.
|
|
Thus, <literal>USING (a, b, c)</literal> is equivalent to
|
|
<literal>ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)</literal>
|
|
with the exception that
|
|
if ON is used there will be two columns a, b, and c in the
|
|
result, whereas with USING there will be only one of each.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>joins</primary>
|
|
<secondary>natural</secondary>
|
|
</indexterm>
|
|
Finally, NATURAL is a shorthand form of USING: it forms a USING
|
|
list consisting of exactly those column names that appear in both
|
|
input tables. As with USING, these columns appear only once in
|
|
the output table.
|
|
</para>
|
|
|
|
<para>
|
|
The possible types of qualified JOIN are:
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>INNER JOIN</term>
|
|
|
|
<listitem>
|
|
<para>
|
|
For each row R1 of T1, the joined table has a row for each
|
|
row in T2 that satisfies the join condition with R1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>LEFT OUTER JOIN</term>
|
|
|
|
<indexterm>
|
|
<primary>joins</primary>
|
|
<secondary>left</secondary>
|
|
</indexterm>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an INNER JOIN is performed. Then, for each row in T1
|
|
that does not satisfy the join condition with any row in
|
|
T2, a joined row is returned with null values in columns of
|
|
T2. Thus, the joined table unconditionally has at least one
|
|
row for each row in T1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>RIGHT OUTER JOIN</term>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an INNER JOIN is performed. Then, for each row in T2
|
|
that does not satisfy the join condition with any row in
|
|
T1, a joined row is returned with null values in columns of
|
|
T1. This is the converse of a left join: the result table will
|
|
unconditionally have a row for each row in T2.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>FULL OUTER JOIN</term>
|
|
|
|
<listitem>
|
|
<para>
|
|
First, an INNER JOIN is performed. Then, for each row in
|
|
T1 that does not satisfy the join condition with any row in
|
|
T2, a joined row is returned with null values in columns of
|
|
T2. Also, for each row of T2 that does not satisfy the
|
|
join condition with any row in T1, a joined row with null
|
|
values in the columns of T1 is returned.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>
|
|
Joins of all types can be chained together or nested: either
|
|
or both of <replaceable>T1</replaceable> and
|
|
<replaceable>T2</replaceable> may be joined tables. Parentheses
|
|
may be used around JOIN clauses to control the join order. In the
|
|
absence of parentheses, JOIN clauses nest left-to-right.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-subqueries">
|
|
<title>Subqueries</title>
|
|
|
|
<indexterm zone="queries-subqueries">
|
|
<primary>subqueries</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Subqueries specifying a derived table must be enclosed in
|
|
parentheses and <emphasis>must</emphasis> be named using an AS
|
|
clause. (See <xref linkend="queries-table-aliases">.)
|
|
</para>
|
|
|
|
<programlisting>
|
|
FROM (SELECT * FROM table1) AS alias_name
|
|
</programlisting>
|
|
|
|
<para>
|
|
This example is equivalent to <literal>FROM table1 AS
|
|
alias_name</literal>. More interesting cases, which can't be
|
|
reduced to a plain join, arise when the subquery involves grouping
|
|
or aggregation.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-table-aliases">
|
|
<title>Table and Column Aliases</title>
|
|
|
|
<indexterm zone="queries-table-aliases">
|
|
<primary>label</primary>
|
|
<secondary>table</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>alias</primary>
|
|
<see>label</see>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A temporary name can be given to tables and complex table
|
|
references to be used for references to the derived table in
|
|
further processing. This is called a <firstterm>table
|
|
alias</firstterm>.
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
|
|
</synopsis>
|
|
Here, <replaceable>alias</replaceable> can be any regular
|
|
identifier. The alias becomes the new name of the table
|
|
reference for the current query -- it is no longer possible to
|
|
refer to the table by the original name. Thus
|
|
<programlisting>
|
|
SELECT * FROM my_table AS m WHERE my_table.a > 5;
|
|
</programlisting>
|
|
is not valid SQL syntax. What will actually happen (this is a
|
|
<productname>PostgreSQL</productname> extension to the standard)
|
|
is that an implicit
|
|
table reference is added to the FROM clause, so the query is
|
|
processed as if it were written as
|
|
<programlisting>
|
|
SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a > 5;
|
|
</programlisting>
|
|
Table aliases are mainly for notational convenience, but it is
|
|
necessary to use them when joining a table to itself, e.g.,
|
|
<programlisting>
|
|
SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
|
|
</programlisting>
|
|
Additionally, an alias is required if the table reference is a
|
|
subquery.
|
|
</para>
|
|
|
|
<para>
|
|
Parentheses are used to resolve ambiguities. The following
|
|
statement will assign the alias <literal>b</literal> to the
|
|
result of the join, unlike the previous example:
|
|
<programlisting>
|
|
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
|
|
</synopsis>
|
|
This form is equivalent to the previously treated one; the
|
|
<token>AS</token> key word is noise.
|
|
</para>
|
|
|
|
<para>
|
|
<synopsis>
|
|
FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
|
|
</synopsis>
|
|
In this form,
|
|
in addition to renaming the table as described above, the columns
|
|
of the table are also given temporary names for use by the surrounding
|
|
query. If fewer column
|
|
aliases are specified than the actual table has columns, the remaining
|
|
columns are not renamed. This syntax is especially useful for
|
|
self-joins or subqueries.
|
|
</para>
|
|
|
|
<para>
|
|
When an alias is applied to the output of a JOIN clause, using any of
|
|
these forms, the alias hides the original names within the JOIN.
|
|
For example,
|
|
<programlisting>
|
|
SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
|
|
</programlisting>
|
|
is valid SQL, but
|
|
<programlisting>
|
|
SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
|
|
</programlisting>
|
|
is not valid: the table alias A is not visible outside the alias C.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="queries-table-expression-examples">
|
|
<title>Examples</title>
|
|
|
|
<para>
|
|
<programlisting>
|
|
FROM T1 INNER JOIN T2 USING (C)
|
|
FROM T1 LEFT OUTER JOIN T2 USING (C)
|
|
FROM (T1 RIGHT OUTER JOIN T2 ON (T1.C1=T2.C1)) AS DT1
|
|
FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2)
|
|
|
|
FROM T1 NATURAL INNER JOIN T2
|
|
FROM T1 NATURAL LEFT OUTER JOIN T2
|
|
FROM T1 NATURAL RIGHT OUTER JOIN T2
|
|
FROM T1 NATURAL FULL OUTER JOIN T2
|
|
|
|
FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3
|
|
FROM (SELECT * FROM T1) DT1, T2, T3
|
|
</programlisting>
|
|
|
|
Above are some examples of joined tables and complex derived
|
|
tables. Notice how the AS clause renames or names a derived
|
|
table and how the optional comma-separated list of column names
|
|
that follows renames the columns. The last two
|
|
FROM clauses produce the same derived table from T1, T2, and T3.
|
|
The AS keyword was omitted in naming the subquery as DT1. The
|
|
keywords OUTER and INNER are noise that can be omitted also.
|
|
</para>
|
|
</sect3>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="queries-where">
|
|
<title>WHERE clause</title>
|
|
|
|
<indexterm zone="queries-where">
|
|
<primary>where</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The syntax of the WHERE clause is
|
|
<synopsis>
|
|
WHERE <replaceable>search_condition</replaceable>
|
|
</synopsis>
|
|
where <replaceable>search_condition</replaceable> is any value
|
|
expression as defined in <xref linkend="sql-expressions"> that
|
|
returns a value of type <type>boolean</type>.
|
|
</para>
|
|
|
|
<para>
|
|
After the processing of the FROM clause is done, each row of the
|
|
derived table is checked against the search condition. If the
|
|
result of the condition is true, the row is kept in the output
|
|
table, otherwise (that is, if the result is false or null) it is
|
|
discarded. The search condition typically references at least some
|
|
column in the table generated in the FROM clause; this is not
|
|
required, but otherwise the WHERE clause will be fairly useless.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Before the implementation of the JOIN syntax, it was necessary to
|
|
put the join condition of an inner join in the WHERE clause. For
|
|
example, these table expressions are equivalent:
|
|
<programlisting>
|
|
FROM a, b WHERE a.id = b.id AND b.val > 5
|
|
</programlisting>
|
|
and
|
|
<programlisting>
|
|
FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5
|
|
</programlisting>
|
|
or perhaps even
|
|
<programlisting>
|
|
FROM a NATURAL JOIN b WHERE b.val > 5
|
|
</programlisting>
|
|
Which one of these you use is mainly a matter of style. The JOIN
|
|
syntax in the FROM clause is probably not as portable to other
|
|
products. For outer joins there is no choice in any case: they
|
|
must be done in the FROM clause. A ON/USING clause of an outer join
|
|
is <emphasis>not</> equivalent to a WHERE condition, because it
|
|
determines the addition of rows (for unmatched input rows) as well
|
|
as the removal of rows from the final result.
|
|
</para>
|
|
</note>
|
|
|
|
<programlisting>
|
|
FROM FDT WHERE
|
|
C1 > 5
|
|
|
|
FROM FDT WHERE
|
|
C1 IN (1, 2, 3)
|
|
FROM FDT WHERE
|
|
C1 IN (SELECT C1 FROM T2)
|
|
FROM FDT WHERE
|
|
C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10)
|
|
|
|
FROM FDT WHERE
|
|
C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100
|
|
|
|
FROM FDT WHERE
|
|
EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1)
|
|
</programlisting>
|
|
|
|
<para>
|
|
In the examples above, <literal>FDT</literal> is the table derived
|
|
in the FROM clause. Rows that do not meet the search condition of
|
|
the where clause are eliminated from
|
|
<literal>FDT</literal>. Notice the use of scalar subqueries as
|
|
value expressions. Just like any other query, the subqueries can
|
|
employ complex table expressions. Notice how
|
|
<literal>FDT</literal> is referenced in the subqueries.
|
|
Qualifying <literal>C1</> as <literal>FDT.C1</> is only necessary
|
|
if <literal>C1</> is also the name of a column in the derived
|
|
input table of the subquery. Qualifying the column name adds
|
|
clarity even when it is not needed. This shows how the column
|
|
naming scope of an outer query extends into its inner queries.
|
|
</para>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="queries-group">
|
|
<title>GROUP BY and HAVING clauses</title>
|
|
|
|
<indexterm zone="queries-group">
|
|
<primary>group</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After passing the WHERE filter, the derived input table may be
|
|
subject to grouping, using the GROUP BY clause, and elimination of
|
|
group rows using the HAVING clause.
|
|
</para>
|
|
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM ...
|
|
<optional>WHERE ...</optional>
|
|
GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
|
|
</synopsis>
|
|
|
|
<para>
|
|
The GROUP BY clause is used to group together rows in a table that
|
|
share the same values in all the columns listed. The order in
|
|
which the columns are listed does not matter (as opposed to an
|
|
ORDER BY clause). The purpose is to reduce each group of rows
|
|
sharing common values into one group row that is representative of
|
|
all rows in the group. This is done to eliminate redundancy in
|
|
the output and/or obtain aggregates that apply to these groups.
|
|
</para>
|
|
|
|
<para>
|
|
Once a table is grouped, columns that are not used in the
|
|
grouping cannot be referenced except in aggregate expressions,
|
|
since a specific value in those columns is ambiguous - which row
|
|
in the group should it come from? The grouped-by columns can be
|
|
referenced in select list column expressions since they have a
|
|
known constant value per group. Aggregate functions on the
|
|
ungrouped columns provide values that span the rows of a group,
|
|
not of the whole table. For instance, a
|
|
<function>sum(sales)</function> on a table grouped by product code
|
|
gives the total sales for each product, not the total sales on all
|
|
products. Aggregates computed on the ungrouped columns are
|
|
representative of the group, whereas individual values of an ungrouped
|
|
column are not.
|
|
</para>
|
|
|
|
<para>
|
|
Example:
|
|
<programlisting>
|
|
SELECT pid, p.name, (sum(s.units) * p.price) AS sales
|
|
FROM products p LEFT JOIN sales s USING ( pid )
|
|
GROUP BY pid, p.name, p.price;
|
|
</programlisting>
|
|
In this example, the columns <literal>pid</literal>, <literal>p.name</literal>, and <literal>p.price</literal> must be in
|
|
the GROUP BY clause since they are referenced in the query select
|
|
list. The column s.units does not have to be in the GROUP BY list
|
|
since it is only used in an aggregate expression
|
|
(<function>sum()</function>), which represents the group of sales
|
|
of a product. For each product, a summary row is returned about
|
|
all sales of the product.
|
|
</para>
|
|
|
|
<para>
|
|
In strict SQL, GROUP BY can only group by columns of the source
|
|
table but <productname>PostgreSQL</productname> extends this to also allow GROUP BY to group by
|
|
select columns in the query select list. Grouping by value
|
|
expressions instead of simple column names is also allowed.
|
|
</para>
|
|
|
|
<para>
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
|
|
</synopsis>
|
|
If a table has been grouped using a GROUP BY clause, but then only
|
|
certain groups are of interest, the HAVING clause can be used,
|
|
much like a WHERE clause, to eliminate groups from a grouped
|
|
table. <productname>PostgreSQL</productname> allows a HAVING clause to be
|
|
used without a GROUP BY, in which case it acts like another WHERE
|
|
clause, but the point in using HAVING that way is not clear. A good
|
|
rule of thumb is that a HAVING condition should refer to the results
|
|
of aggregate functions. A restriction that does not involve an
|
|
aggregate is more efficiently expressed in the WHERE clause.
|
|
</para>
|
|
|
|
<para>
|
|
Example:
|
|
<programlisting>
|
|
SELECT pid AS "Products",
|
|
p.name AS "Over 5000",
|
|
(sum(s.units) * (p.price - p.cost)) AS "Past Month Profit"
|
|
FROM products p LEFT JOIN sales s USING ( pid )
|
|
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
|
|
GROUP BY pid, p.name, p.price, p.cost
|
|
HAVING sum(p.price * s.units) > 5000;
|
|
</programlisting>
|
|
In the example above, the WHERE clause is selecting rows by a
|
|
column that is not grouped, while the HAVING clause
|
|
restricts the output to groups with total gross sales over 5000.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-select-lists">
|
|
<title>Select Lists</title>
|
|
|
|
<indexterm>
|
|
<primary>select</primary>
|
|
<secondary>select list</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As shown in the previous section,
|
|
the table expression in the <command>SELECT</command> command
|
|
constructs an intermediate virtual table by possibly combining
|
|
tables, views, eliminating rows, grouping, etc. This table is
|
|
finally passed on to processing by the <firstterm>select list</firstterm>. The select
|
|
list determines which <emphasis>columns</emphasis> of the
|
|
intermediate table are actually output. The simplest kind of select list
|
|
is <literal>*</literal> which emits all columns that the table
|
|
expression produces. Otherwise, a select list is a comma-separated
|
|
list of value expressions (as defined in <xref
|
|
linkend="sql-expressions">). For instance, it could be a list of
|
|
column names:
|
|
<programlisting>
|
|
SELECT a, b, c FROM ...
|
|
</programlisting>
|
|
The columns names a, b, and c are either the actual names of the
|
|
columns of tables referenced in the FROM clause, or the aliases
|
|
given to them as explained in <xref linkend="queries-table-aliases">.
|
|
The name space available in the select list is the same as in the
|
|
WHERE clause (unless grouping is used, in which case it is the same
|
|
as in the HAVING clause). If more than one table has a column of
|
|
the same name, the table name must also be given, as in
|
|
<programlisting>
|
|
SELECT tbl1.a, tbl2.b, tbl1.c FROM ...
|
|
</programlisting>
|
|
(see also <xref linkend="queries-where">).
|
|
</para>
|
|
|
|
<para>
|
|
If an arbitrary value expression is used in the select list, it
|
|
conceptually adds a new virtual column to the returned table. The
|
|
value expression is evaluated once for each retrieved
|
|
row, with the row's values substituted for any column references. But
|
|
the expressions in the select list do not have to reference any
|
|
columns in the table expression of the FROM clause; they could be
|
|
constant arithmetic expressions as well, for instance.
|
|
</para>
|
|
|
|
<sect2 id="queries-column-labels">
|
|
<title>Column Labels</title>
|
|
|
|
<indexterm zone="queries-column-labels">
|
|
<primary>label</primary>
|
|
<secondary>column</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The entries in the select list can be assigned names for further
|
|
processing. The <quote>further processing</quote> in this case is
|
|
an optional sort specification and the client application (e.g.,
|
|
column headers for display). For example:
|
|
<programlisting>
|
|
SELECT a AS value, b + c AS sum FROM ...
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
If no output column name is specified via AS, the system assigns a
|
|
default name. For simple column references, this is the name of the
|
|
referenced column. For function
|
|
calls, this is the name of the function. For complex expressions,
|
|
the system will generate a generic name.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The naming of output columns here is different from that done in
|
|
the FROM clause (see <xref linkend="queries-table-aliases">). This
|
|
pipeline will in fact allow you to rename the same column twice,
|
|
but the name chosen in the select list is the one that will be
|
|
passed on.
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
|
|
<sect2 id="queries-distinct">
|
|
<title>DISTINCT</title>
|
|
|
|
<indexterm zone="queries-distinct">
|
|
<primary>distinct</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After the select list has been processed, the result table may
|
|
optionally be subject to the elimination of duplicates. The
|
|
<token>DISTINCT</token> key word is written directly after the
|
|
<token>SELECT</token> to enable this:
|
|
<synopsis>
|
|
SELECT DISTINCT <replaceable>select_list</replaceable> ...
|
|
</synopsis>
|
|
(Instead of <token>DISTINCT</token> the word <token>ALL</token>
|
|
can be used to select the default behavior of retaining all rows.)
|
|
</para>
|
|
|
|
<para>
|
|
Obviously, two rows are considered distinct if they differ in at
|
|
least one column value. Null values are considered equal in this
|
|
comparison.
|
|
</para>
|
|
|
|
<para>
|
|
Alternatively, an arbitrary expression can determine what rows are
|
|
to be considered distinct:
|
|
<synopsis>
|
|
SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
|
|
</synopsis>
|
|
Here <replaceable>expression</replaceable> is an arbitrary value
|
|
expression that is evaluated for all rows. A set of rows for
|
|
which all the expressions are equal are considered duplicates, and
|
|
only the first row of the set is kept in the output. Note that the
|
|
<quote>first row</quote> of a set is unpredictable unless the
|
|
query is sorted on enough columns to guarantee a unique ordering
|
|
of the rows arriving at the DISTINCT filter. (DISTINCT ON processing
|
|
occurs after ORDER BY sorting.)
|
|
</para>
|
|
|
|
<para>
|
|
The DISTINCT ON clause is not part of the SQL standard and is
|
|
sometimes considered bad style because of the potentially indeterminate
|
|
nature
|
|
of its results. With judicious use of GROUP BY and subselects in
|
|
FROM the construct can be avoided, but it is very often the most
|
|
convenient alternative.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="queries-union">
|
|
<title>Combining Queries</title>
|
|
|
|
<indexterm zone="queries-union">
|
|
<primary>union</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>intersection</primary>
|
|
</indexterm>
|
|
<indexterm zone="queries-union">
|
|
<primary>except</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The results of two queries can be combined using the set operations
|
|
union, intersection, and difference. The syntax is
|
|
<synopsis>
|
|
<replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
<replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
<replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
|
|
</synopsis>
|
|
<replaceable>query1</replaceable> and
|
|
<replaceable>query2</replaceable> are queries that can use any of
|
|
the features discussed up to this point. Set operations can also
|
|
be nested and chained, for example
|
|
<synopsis>
|
|
<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
|
|
</synopsis>
|
|
which really says
|
|
<synopsis>
|
|
(<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
|
|
</synopsis>
|
|
</para>
|
|
|
|
<para>
|
|
<command>UNION</command> effectively appends the result of
|
|
<replaceable>query2</replaceable> to the result of
|
|
<replaceable>query1</replaceable> (although there is no guarantee
|
|
that this is the order in which the rows are actually returned).
|
|
Furthermore, it eliminates all duplicate rows, in the sense of DISTINCT,
|
|
unless ALL is specified.
|
|
</para>
|
|
|
|
<para>
|
|
<command>INTERSECT</command> returns all rows that are both in the
|
|
result of <replaceable>query1</replaceable> and in the result of
|
|
<replaceable>query2</replaceable>. Duplicate rows are eliminated
|
|
unless ALL is specified.
|
|
</para>
|
|
|
|
<para>
|
|
<command>EXCEPT</command> returns all rows that are in the result
|
|
of <replaceable>query1</replaceable> but not in the result of
|
|
<replaceable>query2</replaceable>. Again, duplicates are
|
|
eliminated unless ALL is specified.
|
|
</para>
|
|
|
|
<para>
|
|
In order to calculate the union, intersection, or difference of two
|
|
queries, the two queries must be <quote>union compatible</quote>,
|
|
which means that they both return the same number of columns, and
|
|
that the corresponding columns have compatible data types, as
|
|
described in <xref linkend="typeconv-union-case">.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="queries-order">
|
|
<title>Sorting Rows</title>
|
|
|
|
<indexterm zone="queries-order">
|
|
<primary>sorting</primary>
|
|
<secondary>query results</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After a query has produced an output table (after the select list
|
|
has been processed) it can optionally be sorted. If sorting is not
|
|
chosen, the rows will be returned in random order. The actual
|
|
order in that case will depend on the scan and join plan types and
|
|
the order on disk, but it must not be relied on. A particular
|
|
output ordering can only be guaranteed if the sort step is explicitly
|
|
chosen.
|
|
</para>
|
|
|
|
<para>
|
|
The ORDER BY clause specifies the sort order:
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM <replaceable>table_expression</replaceable>
|
|
ORDER BY <replaceable>column1</replaceable> <optional>ASC | DESC</optional> <optional>, <replaceable>column2</replaceable> <optional>ASC | DESC</optional> ...</optional>
|
|
</synopsis>
|
|
<replaceable>column1</replaceable>, etc., refer to select list
|
|
columns. These can be either the output name of a column (see
|
|
<xref linkend="queries-column-labels">) or the number of a column. Some
|
|
examples:
|
|
<programlisting>
|
|
SELECT a, b FROM table1 ORDER BY a;
|
|
SELECT a + b AS sum, c FROM table1 ORDER BY sum;
|
|
SELECT a, sum(b) FROM table1 GROUP BY a ORDER BY 1;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
As an extension to the SQL standard, <productname>PostgreSQL</productname> also allows ordering
|
|
by arbitrary expressions:
|
|
<programlisting>
|
|
SELECT a, b FROM table1 ORDER BY a + b;
|
|
</programlisting>
|
|
References to column names in the FROM clause that are renamed in
|
|
the select list are also allowed:
|
|
<programlisting>
|
|
SELECT a AS b FROM table1 ORDER BY a;
|
|
</programlisting>
|
|
But these extensions do not work in queries involving UNION, INTERSECT,
|
|
or EXCEPT, and are not portable to other <acronym>DBMS</acronym>.
|
|
</para>
|
|
|
|
<para>
|
|
Each column specification may be followed by an optional <token>ASC</token> or
|
|
<token>DESC</token> to set the sort direction. <token>ASC</token> is default. Ascending order
|
|
puts smaller values first, where <quote>smaller</quote> is defined
|
|
in terms of the <literal><</literal> operator. Similarly,
|
|
descending order is determined with the <literal>></literal>
|
|
operator.
|
|
</para>
|
|
|
|
<para>
|
|
If more than one sort column is specified, the later entries are
|
|
used to sort rows that are equal under the order imposed by the
|
|
earlier sort specifications.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="queries-limit">
|
|
<title>LIMIT and OFFSET</title>
|
|
|
|
<indexterm zone="queries-limit">
|
|
<primary>limit</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="queries-limit">
|
|
<primary>offset</primary>
|
|
<secondary>with query results</secondary>
|
|
</indexterm>
|
|
|
|
<synopsis>
|
|
SELECT <replaceable>select_list</replaceable>
|
|
FROM <replaceable>table_expression</replaceable>
|
|
<optional>LIMIT { <replaceable>number</replaceable> | ALL }</optional> <optional>OFFSET <replaceable>number</replaceable></optional>
|
|
</synopsis>
|
|
|
|
<para>
|
|
LIMIT allows you to retrieve just a portion of the rows that are
|
|
generated by the rest of the query. If a limit count is given, no
|
|
more than that many rows will be returned.
|
|
LIMIT ALL is the same as omitting a LIMIT clause.
|
|
</para>
|
|
|
|
<para>
|
|
OFFSET says to skip that many rows before beginning to return rows
|
|
to the client. OFFSET 0 is the same as omitting an OFFSET clause.
|
|
If both OFFSET and LIMIT appear, then OFFSET rows are skipped before
|
|
starting to count the LIMIT rows that are returned.
|
|
</para>
|
|
|
|
<para>
|
|
When using LIMIT, it is a good idea to use an ORDER BY clause that
|
|
constrains the result rows into a unique order. Otherwise you will
|
|
get an unpredictable subset of the query's rows---you may be asking
|
|
for the tenth through twentieth rows, but tenth through twentieth
|
|
in what ordering? The ordering is unknown, unless you specified
|
|
ORDER BY.
|
|
</para>
|
|
|
|
<para>
|
|
The query optimizer takes LIMIT into account when generating a
|
|
query plan, so you are very likely to get different plans (yielding
|
|
different row orders) depending on what you give for LIMIT and
|
|
OFFSET. Thus, using different LIMIT/OFFSET values to select
|
|
different subsets of a query result <emphasis>will give
|
|
inconsistent results</emphasis> unless you enforce a predictable
|
|
result ordering with ORDER BY. This is not a bug; it is an
|
|
inherent consequence of the fact that SQL does not promise to
|
|
deliver the results of a query in any particular order unless ORDER
|
|
BY is used to constrain the order.
|
|
</para>
|
|
</sect1>
|
|
|
|
</chapter>
|