mirror of
https://github.com/postgres/postgres.git
synced 2025-05-03 22:24:49 +03:00
569 lines
18 KiB
Plaintext
569 lines
18 KiB
Plaintext
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/array.sgml,v 1.31 2003/08/31 17:32:18 petere Exp $ -->
|
|
|
|
<sect1 id="arrays">
|
|
<title>Arrays</title>
|
|
|
|
<indexterm>
|
|
<primary>array</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> allows columns of a table to be
|
|
defined as variable-length multidimensional arrays. Arrays of any
|
|
built-in type or user-defined type can be created.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Declaration of Array Types</title>
|
|
|
|
<para>
|
|
To illustrate the use of array types, we create this table:
|
|
<programlisting>
|
|
CREATE TABLE sal_emp (
|
|
name text,
|
|
pay_by_quarter integer[],
|
|
schedule text[][]
|
|
);
|
|
</programlisting>
|
|
As shown, an array data type is named by appending square brackets
|
|
(<literal>[]</>) to the data type name of the array elements. The
|
|
above command will create a table named
|
|
<structname>sal_emp</structname> with a column of type
|
|
<type>text</type> (<structfield>name</structfield>), a
|
|
one-dimensional array of type <type>integer</type>
|
|
(<structfield>pay_by_quarter</structfield>), which represents the
|
|
employee's salary by quarter, and a two-dimensional array of
|
|
<type>text</type> (<structfield>schedule</structfield>), which
|
|
represents the employee's weekly schedule.
|
|
</para>
|
|
|
|
<para>
|
|
The syntax for <command>CREATE TABLE</command> allows the exact size of
|
|
arrays to be specified, for example:
|
|
|
|
<programlisting>
|
|
CREATE TABLE tictactoe (
|
|
squares integer[3][3]
|
|
);
|
|
</programlisting>
|
|
|
|
However, the current implementation does not enforce the array size
|
|
limits --- the behavior is the same as for arrays of unspecified
|
|
length.
|
|
</para>
|
|
|
|
<para>
|
|
Actually, the current implementation does not enforce the declared
|
|
number of dimensions either. Arrays of a particular element type are
|
|
all considered to be of the same type, regardless of size or number
|
|
of dimensions. So, declaring number of dimensions or sizes in
|
|
<command>CREATE TABLE</command> is simply documentation, it does not
|
|
affect runtime behavior.
|
|
</para>
|
|
|
|
<para>
|
|
An alternative, SQL99-standard syntax may be used for one-dimensional arrays.
|
|
<structfield>pay_by_quarter</structfield> could have been defined as:
|
|
<programlisting>
|
|
pay_by_quarter integer ARRAY[4],
|
|
</programlisting>
|
|
This syntax requires an integer constant to denote the array size.
|
|
As before, however, <productname>PostgreSQL</> does not enforce the
|
|
size restriction.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Array Value Input</title>
|
|
|
|
<para>
|
|
Now we can show some <command>INSERT</command> statements. To write an array
|
|
value as a literal constant, we enclose the element values within curly
|
|
braces and separate them by commas. (If you know C, this is not unlike the
|
|
C syntax for initializing structures.) We may put double quotes around any
|
|
element value, and must do so if it contains commas or curly braces.
|
|
(More details appear below.)
|
|
|
|
<programlisting>
|
|
INSERT INTO sal_emp
|
|
VALUES ('Bill',
|
|
'{10000, 10000, 10000, 10000}',
|
|
'{{"meeting", "lunch"}, {}}');
|
|
|
|
INSERT INTO sal_emp
|
|
VALUES ('Carol',
|
|
'{20000, 25000, 25000, 25000}',
|
|
'{{"talk", "consult"}, {"meeting"}}');
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
A limitation of the present array implementation is that individual
|
|
elements of an array cannot be SQL null values. The entire array can be set
|
|
to null, but you can't have an array with some elements null and some
|
|
not.
|
|
</para>
|
|
<para>
|
|
This can lead to surprising results. For example, the result of the
|
|
previous two inserts looks like this:
|
|
<programlisting>
|
|
SELECT * FROM sal_emp;
|
|
name | pay_by_quarter | schedule
|
|
-------+---------------------------+--------------------
|
|
Bill | {10000,10000,10000,10000} | {{meeting},{""}}
|
|
Carol | {20000,25000,25000,25000} | {{talk},{meeting}}
|
|
(2 rows)
|
|
</programlisting>
|
|
Because the <literal>[2][2]</literal> element of
|
|
<structfield>schedule</structfield> is missing in each of the
|
|
<command>INSERT</command> statements, the <literal>[1][2]</literal>
|
|
element is discarded.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Fixing this is on the to-do list.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
The <literal>ARRAY</literal> expression syntax may also be used:
|
|
<programlisting>
|
|
INSERT INTO sal_emp
|
|
VALUES ('Bill',
|
|
ARRAY[10000, 10000, 10000, 10000],
|
|
ARRAY[['meeting', 'lunch'], ['','']]);
|
|
|
|
INSERT INTO sal_emp
|
|
VALUES ('Carol',
|
|
ARRAY[20000, 25000, 25000, 25000],
|
|
ARRAY[['talk', 'consult'], ['meeting', '']]);
|
|
SELECT * FROM sal_emp;
|
|
name | pay_by_quarter | schedule
|
|
-------+---------------------------+-------------------------------
|
|
Bill | {10000,10000,10000,10000} | {{meeting,lunch},{"",""}}
|
|
Carol | {20000,25000,25000,25000} | {{talk,consult},{meeting,""}}
|
|
(2 rows)
|
|
</programlisting>
|
|
Note that with this syntax, multidimensional arrays must have matching
|
|
extents for each dimension. A mismatch causes an error report, rather than
|
|
silently discarding values as in the previous case.
|
|
For example:
|
|
<programlisting>
|
|
INSERT INTO sal_emp
|
|
VALUES ('Carol',
|
|
ARRAY[20000, 25000, 25000, 25000],
|
|
ARRAY[['talk', 'consult'], ['meeting']]);
|
|
ERROR: multidimensional arrays must have array expressions with matching dimensions
|
|
</programlisting>
|
|
Also notice that the array elements are ordinary SQL constants or
|
|
expressions; for instance, string literals are single quoted, instead of
|
|
double quoted as they would be in an array literal. The <literal>ARRAY</>
|
|
expression syntax is discussed in more detail in <xref
|
|
linkend="sql-syntax-array-constructors">.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Accessing Arrays</title>
|
|
|
|
<para>
|
|
Now, we can run some queries on the table.
|
|
First, we show how to access a single element of an array at a time.
|
|
This query retrieves the names of the employees whose pay changed in
|
|
the second quarter:
|
|
|
|
<programlisting>
|
|
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
|
|
|
|
name
|
|
-------
|
|
Carol
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
The array subscript numbers are written within square brackets.
|
|
By default <productname>PostgreSQL</productname> uses the
|
|
one-based numbering convention for arrays, that is,
|
|
an array of <replaceable>n</> elements starts with <literal>array[1]</literal> and
|
|
ends with <literal>array[<replaceable>n</>]</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
This query retrieves the third quarter pay of all employees:
|
|
|
|
<programlisting>
|
|
SELECT pay_by_quarter[3] FROM sal_emp;
|
|
|
|
pay_by_quarter
|
|
----------------
|
|
10000
|
|
25000
|
|
(2 rows)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
We can also access arbitrary rectangular slices of an array, or
|
|
subarrays. An array slice is denoted by writing
|
|
<literal><replaceable>lower-bound</replaceable>:<replaceable>upper-bound</replaceable></literal>
|
|
for one or more array dimensions. For example, this query retrieves the first
|
|
item on Bill's schedule for the first two days of the week:
|
|
|
|
<programlisting>
|
|
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
|
|
|
|
schedule
|
|
--------------------
|
|
{{meeting},{""}}
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
We could also have written
|
|
|
|
<programlisting>
|
|
SELECT schedule[1:2][1] FROM sal_emp WHERE name = 'Bill';
|
|
</programlisting>
|
|
|
|
with the same result. An array subscripting operation is always taken to
|
|
represent an array slice if any of the subscripts are written in the form
|
|
<literal><replaceable>lower</replaceable>:<replaceable>upper</replaceable></literal>.
|
|
A lower bound of 1 is assumed for any subscript where only one value
|
|
is specified, as in this example:
|
|
<programlisting>
|
|
SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill';
|
|
schedule
|
|
---------------------------
|
|
{{meeting,lunch},{"",""}}
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The current dimensions of any array value can be retrieved with the
|
|
<function>array_dims</function> function:
|
|
|
|
<programlisting>
|
|
SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';
|
|
|
|
array_dims
|
|
------------
|
|
[1:2][1:1]
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
<function>array_dims</function> produces a <type>text</type> result,
|
|
which is convenient for people to read but perhaps not so convenient
|
|
for programs. Dimensions can also be retrieved with
|
|
<function>array_upper</function> and <function>array_lower</function>,
|
|
which return the upper and lower bound of a
|
|
specified array dimension, respectively.
|
|
|
|
<programlisting>
|
|
SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = 'Carol';
|
|
|
|
array_upper
|
|
-------------
|
|
2
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Modifying Arrays</title>
|
|
|
|
<para>
|
|
An array value can be replaced completely:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
|
|
WHERE name = 'Carol';
|
|
</programlisting>
|
|
|
|
or using the <literal>ARRAY</literal> expression syntax:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
|
|
WHERE name = 'Carol';
|
|
</programlisting>
|
|
|
|
An array may also be updated at a single element:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter[4] = 15000
|
|
WHERE name = 'Bill';
|
|
</programListing>
|
|
|
|
or updated in a slice:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
|
|
WHERE name = 'Carol';
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
A stored array value can be enlarged by assigning to an element adjacent to
|
|
those already present, or by assigning to a slice that is adjacent
|
|
to or overlaps the data already present. For example, if array
|
|
<literal>myarray</> currently has 4 elements, it will have five
|
|
elements after an update that assigns to <literal>myarray[5]</>.
|
|
Currently, enlargement in this fashion is only allowed for one-dimensional
|
|
arrays, not multidimensional arrays.
|
|
</para>
|
|
|
|
<para>
|
|
Array slice assignment allows creation of arrays that do not use one-based
|
|
subscripts. For example one might assign to <literal>myarray[-2:7]</> to
|
|
create an array with subscript values running from -2 to 7.
|
|
</para>
|
|
|
|
<para>
|
|
New array values can also be constructed by using the concatenation operator,
|
|
<literal>||</literal>.
|
|
<programlisting>
|
|
SELECT ARRAY[1,2] || ARRAY[3,4];
|
|
?column?
|
|
-----------
|
|
{1,2,3,4}
|
|
(1 row)
|
|
|
|
SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
|
|
?column?
|
|
---------------------
|
|
{{5,6},{1,2},{3,4}}
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The concatenation operator allows a single element to be pushed on to the
|
|
beginning or end of a one-dimensional array. It also accepts two
|
|
<replaceable>N</>-dimensional arrays, or an <replaceable>N</>-dimensional
|
|
and an <replaceable>N+1</>-dimensional array.
|
|
</para>
|
|
|
|
<para>
|
|
When a single element is pushed on to the beginning of a one-dimensional
|
|
array, the result is an array with a lower bound subscript equal to
|
|
the righthand operand's lower bound subscript, minus one. When a single
|
|
element is pushed on to the end of a one-dimensional array, the result is
|
|
an array retaining the lower bound of the lefthand operand. For example:
|
|
<programlisting>
|
|
SELECT array_dims(1 || ARRAY[2,3]);
|
|
array_dims
|
|
------------
|
|
[0:2]
|
|
(1 row)
|
|
|
|
SELECT array_dims(ARRAY[1,2] || 3);
|
|
array_dims
|
|
------------
|
|
[1:3]
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
When two arrays with an equal number of dimensions are concatenated, the
|
|
result retains the lower bound subscript of the lefthand operand's outer
|
|
dimension. The result is an array comprising every element of the lefthand
|
|
operand followed by every element of the righthand operand. For example:
|
|
<programlisting>
|
|
SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);
|
|
array_dims
|
|
------------
|
|
[1:5]
|
|
(1 row)
|
|
|
|
SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);
|
|
array_dims
|
|
------------
|
|
[1:5][1:2]
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
When an <replaceable>N</>-dimensional array is pushed on to the beginning
|
|
or end of an <replaceable>N+1</>-dimensional array, the result is
|
|
analogous to the element-array case above. Each <replaceable>N</>-dimensional
|
|
sub-array is essentially an element of the <replaceable>N+1</>-dimensional
|
|
array's outer dimension. For example:
|
|
<programlisting>
|
|
SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);
|
|
array_dims
|
|
------------
|
|
[0:2][1:2]
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
An array can also be constructed by using the functions
|
|
<function>array_prepend</function>, <function>array_append</function>,
|
|
or <function>array_cat</function>. The first two only support one-dimensional
|
|
arrays, but <function>array_cat</function> supports multidimensional arrays.
|
|
|
|
Note that the concatenation operator discussed above is preferred over
|
|
direct use of these functions. In fact, the functions are primarily for use
|
|
in implementing the concatenation operator. However, they may be directly
|
|
useful in the creation of user-defined aggregates. Some examples:
|
|
|
|
<programlisting>
|
|
SELECT array_prepend(1, ARRAY[2,3]);
|
|
array_prepend
|
|
---------------
|
|
{1,2,3}
|
|
(1 row)
|
|
|
|
SELECT array_append(ARRAY[1,2], 3);
|
|
array_append
|
|
--------------
|
|
{1,2,3}
|
|
(1 row)
|
|
|
|
SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
|
|
array_cat
|
|
-----------
|
|
{1,2,3,4}
|
|
(1 row)
|
|
|
|
SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
|
|
array_cat
|
|
---------------------
|
|
{{1,2},{3,4},{5,6}}
|
|
(1 row)
|
|
|
|
SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
|
|
array_cat
|
|
---------------------
|
|
{{5,6},{1,2},{3,4}}
|
|
</programlisting>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Searching in Arrays</title>
|
|
|
|
<para>
|
|
To search for a value in an array, you must check each value of the
|
|
array. This can be done by hand, if you know the size of the array.
|
|
For example:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
|
|
pay_by_quarter[2] = 10000 OR
|
|
pay_by_quarter[3] = 10000 OR
|
|
pay_by_quarter[4] = 10000;
|
|
</programlisting>
|
|
|
|
However, this quickly becomes tedious for large arrays, and is not
|
|
helpful if the size of the array is uncertain. An alternative method is
|
|
described in <xref linkend="functions-comparisons">. The above
|
|
query could be replaced by:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
|
|
</programlisting>
|
|
|
|
In addition, you could find rows where the array had all values
|
|
equal to 10000 with:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
Arrays are not sets; searching for specific array elements
|
|
may be a sign of database misdesign. Consider
|
|
using a separate table with a row for each item that would be an
|
|
array element. This will be easier to search, and is likely to
|
|
scale up better to large numbers of elements.
|
|
</para>
|
|
</tip>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Array Input and Output Syntax</title>
|
|
|
|
<para>
|
|
The external text representation of an array value consists of items that
|
|
are interpreted according to the I/O conversion rules for the array's
|
|
element type, plus decoration that indicates the array structure.
|
|
The decoration consists of curly braces (<literal>{</> and <literal>}</>)
|
|
around the array value plus delimiter characters between adjacent items.
|
|
The delimiter character is usually a comma (<literal>,</>) but can be
|
|
something else: it is determined by the <literal>typdelim</> setting
|
|
for the array's element type. (Among the standard data types provided
|
|
in the <productname>PostgreSQL</productname> distribution, type
|
|
<literal>box</> uses a semicolon (<literal>;</>) but all the others
|
|
use comma.) In a multidimensional array, each dimension (row, plane,
|
|
cube, etc.) gets its own level of curly braces, and delimiters
|
|
must be written between adjacent curly-braced entities of the same level.
|
|
You may write whitespace before a left brace, after a right
|
|
brace, or before any individual item string. Whitespace after an item
|
|
is not ignored, however: after skipping leading whitespace, everything
|
|
up to the next right brace or delimiter is taken as the item value.
|
|
</para>
|
|
|
|
<para>
|
|
As shown previously, when writing an array value you may write double
|
|
quotes around any individual array
|
|
element. You <emphasis>must</> do so if the element value would otherwise
|
|
confuse the array-value parser. For example, elements containing curly
|
|
braces, commas (or whatever the delimiter character is), double quotes,
|
|
backslashes, or leading white space must be double-quoted. To put a double
|
|
quote or backslash in a quoted array element value, precede it with a
|
|
backslash.
|
|
Alternatively, you can use backslash-escaping to protect all data characters
|
|
that would otherwise be taken as array syntax or ignorable white space.
|
|
</para>
|
|
|
|
<para>
|
|
The array output routine will put double quotes around element values
|
|
if they are empty strings or contain curly braces, delimiter characters,
|
|
double quotes, backslashes, or white space. Double quotes and backslashes
|
|
embedded in element values will be backslash-escaped. For numeric
|
|
data types it is safe to assume that double quotes will never appear, but
|
|
for textual data types one should be prepared to cope with either presence
|
|
or absence of quotes. (This is a change in behavior from pre-7.2
|
|
<productname>PostgreSQL</productname> releases.)
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Remember that what you write in an SQL command will first be interpreted
|
|
as a string literal, and then as an array. This doubles the number of
|
|
backslashes you need. For example, to insert a <type>text</> array
|
|
value containing a backslash and a double quote, you'd need to write
|
|
<programlisting>
|
|
INSERT ... VALUES ('{"\\\\","\\""}');
|
|
</programlisting>
|
|
The string-literal processor removes one level of backslashes, so that
|
|
what arrives at the array-value parser looks like <literal>{"\\","\""}</>.
|
|
In turn, the strings fed to the <type>text</> data type's input routine
|
|
become <literal>\</> and <literal>"</> respectively. (If we were working
|
|
with a data type whose input routine also treated backslashes specially,
|
|
<type>bytea</> for example, we might need as many as eight backslashes
|
|
in the command to get one backslash into the stored array element.)
|
|
</para>
|
|
</note>
|
|
|
|
<tip>
|
|
<para>
|
|
The <literal>ARRAY</> constructor syntax is often easier to work with
|
|
than the array-literal syntax when writing array values in SQL commands.
|
|
In <literal>ARRAY</>, individual element values are written the same way
|
|
they would be written when not members of an array.
|
|
</para>
|
|
</tip>
|
|
</sect2>
|
|
|
|
</sect1>
|