1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Support XMLTABLE query expression

XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
This commit is contained in:
Alvaro Herrera
2017-03-08 12:39:37 -03:00
parent 270d7dd8a5
commit fcec6caafa
52 changed files with 4606 additions and 50 deletions

View File

@ -10332,7 +10332,8 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
To process values of data type <type>xml</type>, PostgreSQL offers
the functions <function>xpath</function> and
<function>xpath_exists</function>, which evaluate XPath 1.0
expressions.
expressions, and the <function>XMLTABLE</function>
table function.
</para>
<sect3 id="functions-xml-processing-xpath">
@ -10430,6 +10431,206 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
--------------
t
(1 row)
]]></screen>
</para>
</sect3>
<sect3 id="functions-xml-processing-xmltable">
<title><literal>xmltable</literal></title>
<indexterm>
<primary>xmltable</primary>
</indexterm>
<indexterm zone="functions-xml-processing-xmltable">
<primary>table function</primary>
<secondary>XMLTABLE</secondary>
</indexterm>
<synopsis>
<function>xmltable</function>( <optional>XMLNAMESPACES(<replaceable>namespace uri</replaceable> AS <replaceable>namespace name</replaceable><optional>, ...</optional>)</optional>
<replaceable>row_expression</replaceable> PASSING <optional>BY REF</optional> <replaceable>document_expression</replaceable> <optional>BY REF</optional>
COLUMNS <replaceable>name</replaceable> { <replaceable>type</replaceable> <optional>PATH <replaceable>column_expression</replaceable></optional> <optional>DEFAULT <replaceable>default_expression</replaceable></optional> <optional>NOT NULL | NULL</optional>
| FOR ORDINALITY }
<optional>, ...</optional>
)
</synopsis>
<para>
The <function>xmltable</function> function produces a table based
on the given XML value, an XPath filter to extract rows, and an
optional set of column definitions.
</para>
<para>
The optional <literal>XMLNAMESPACES</> clause is a comma-separated
list of namespaces. It specifies the XML namespaces used in
the document and their aliases. A default namespace specification
is not currently supported.
</para>
<para>
The required <replaceable>row_expression</> argument is an XPath
expression that is evaluated against the supplied XML document to
obtain an ordered sequence of XML nodes. This sequence is what
<function>xmltable</> transforms into output rows.
</para>
<para>
<replaceable>document_expression</> provides the XML document to
operate on.
The <literal>BY REF</literal> clauses have no effect in PostgreSQL,
but are allowed for SQL conformance and compatibility with other
implementations.
The argument must be a well-formed XML document; fragments/forests
are not accepted.
</para>
<para>
The mandatory <literal>COLUMNS</literal> clause specifies the list
of columns in the output table.
If the <literal>COLUMNS</> clause is omitted, the rows in the result
set contain a single column of type <literal>xml</> containing the
data matched by <replaceable>row_expression</>.
If <literal>COLUMNS</literal> is specified, each entry describes a
single column.
See the syntax summary above for the format.
The column name and type are required; the path, default and
nullability clauses are optional.
</para>
<para>
A column marked <literal>FOR ORDINALITY</literal> will be populated
with row numbers matching the order in which the
output rows appeared in the original input XML document.
At most one column may be marked <literal>FOR ORDINALITY</literal>.
</para>
<para>
The <literal>column_expression</> for a column is an XPath expression
that is evaluated for each row, relative to the result of the
<replaceable>row_expression</>, to find the value of the column.
If no <literal>column_expression</> is given, then the column name
is used as an implicit path.
</para>
<para>
If a column's XPath expression returns multiple elements, an error
is raised.
If the expression matches an empty tag, the result is an
empty string (not <literal>NULL</>).
Any <literal>xsi:nil</> attributes are ignored.
</para>
<para>
The text body of the XML matched by the <replaceable>column_expression</>
is used as the column value. Multiple <literal>text()</literal> nodes
within an element are concatenated in order. Any child elements,
processing instructions, and comments are ignored, but the text contents
of child elements are concatenated to the result.
Note that the whitespace-only <literal>text()</> node between two non-text
elements is preserved, and that leading whitespace on a <literal>text()</>
node is not flattened.
</para>
<para>
If the path expression does not match for a given row but
<replaceable>default_expression</> is specified, the value resulting
from evaluating that expression is used.
If no <literal>DEFAULT</> clause is given for the column,
the field will be set to <literal>NULL</>.
It is possible for a <replaceable>default_expression</> to reference
the value of output columns that appear prior to it in the column list,
so the default of one column may be based on the value of another
column.
</para>
<para>
Columns may be marked <literal>NOT NULL</>. If the
<replaceable>column_expression</> for a <literal>NOT NULL</> column
does not match anything and there is no <literal>DEFAULT</> or the
<replaceable>default_expression</> also evaluates to null, an error
is reported.
</para>
<para>
Unlike regular PostgreSQL functions, <replaceable>column_expression</>
and <replaceable>default_expression</> are not evaluated to a simple
value before calling the function.
<replaceable>column_expression</> is normally evaluated
exactly once per input row, and <replaceable>default_expression</>
is evaluated each time a default is needed for a field.
If the expression qualifies as stable or immutable the repeat
evaluation may be skipped.
Effectively <function>xmltable</> behaves more like a subquery than a
function call.
This means that you can usefully use volatile functions like
<function>nextval</> in <replaceable>default_expression</>, and
<replaceable>column_expression</> may depend on other parts of the
XML document.
</para>
<para>
Examples:
<screen><![CDATA[
CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
<SIZE unit="sq_mi">145935</SIZE>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="sq_km">697</SIZE>
</ROW>
</ROWS>
$$ AS data;
SELECT xmltable.*
FROM xmldata,
XMLTABLE('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
"COUNTRY_NAME" text,
country_id text PATH 'COUNTRY_ID',
size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
size_other text PATH
'concat(SIZE[@unit!="sq_km"], " ", SIZE[@unit!="sq_km"]/@unit)',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified') ;
id | ordinality | COUNTRY_NAME | country_id | size_sq_km | size_other | premier_name
----+------------+--------------+------------+------------+--------------+---------------
1 | 1 | Australia | AU | | | not specified
5 | 2 | Japan | JP | | 145935 sq_mi | Shinzo Abe
6 | 3 | Singapore | SG | 697 | | not specified
]]></screen>
The following example shows concatenation of multiple text() nodes,
usage of the column name as XPath filter, and the treatment of whitespace,
XML comments and processing instructions:
<screen><![CDATA[
CREATE TABLE xmlelements AS SELECT
xml $$
<root>
<element> Hello<!-- xyxxz -->2a2<?aaaaa?> <!--x--> bbb<x>xxx</x>CC </element>
</root>
$$ AS data;
SELECT xmltable.*
FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
element
----------------------
Hello2a2 bbbCC
]]></screen>
</para>
</sect3>