1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-14 18:42:34 +03:00

Accept XML documents when xmloption = content, as required by SQL:2006+.

Previously we were using the SQL:2003 definition, which doesn't allow
this, but that creates a serious dump/restore gotcha: there is no
setting of xmloption that will allow all valid XML data.  Hence,
switch to the 2006 definition.

Since libxml doesn't accept <!DOCTYPE> directives in the mode we
use for CONTENT parsing, the implementation is to detect <!DOCTYPE>
in the input and switch to DOCUMENT parsing mode.  This should not
cost much, because <!DOCTYPE> should be close to the front of the
input if it's there at all.  It's possible that this causes the
error messages for malformed input to be slightly different than
they were before, if said input includes <!DOCTYPE>; but that does
not seem like a big problem.

In passing, buy back a few cycles in parsing of large XML documents
by not doing strlen() of the whole input in parse_xml_decl().

Back-patch because dump/restore failures are not nice.  This change
shouldn't break any cases that worked before, so it seems safe to
back-patch.

Chapman Flack (revised a bit by me)

Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
This commit is contained in:
Tom Lane
2019-03-23 16:24:30 -04:00
parent 05f110cc0b
commit 8d1dadb25b
6 changed files with 271 additions and 29 deletions

View File

@ -4208,9 +4208,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
<para>
The <type>xml</type> type can store well-formed
<quote>documents</quote>, as defined by the XML standard, as well
as <quote>content</quote> fragments, which are defined by the
production <literal>XMLDecl? content</literal> in the XML
standard. Roughly, this means that content fragments can have
as <quote>content</quote> fragments, which are defined by reference
to the more permissive
<ulink url="https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode"><quote>document node</quote></ulink>
of the XQuery and XPath data model.
Roughly, this means that content fragments can have
more than one top-level element or character node. The expression
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
can be used to evaluate whether a particular <type>xml</type>
@ -4285,16 +4287,6 @@ SET xmloption TO { DOCUMENT | CONTENT };
data are allowed.
</para>
<note>
<para>
With the default XML option setting, you cannot directly cast
character strings to type <type>xml</type> if they contain a
document type declaration, because the definition of XML content
fragment does not accept them. If you need to do that, either
use <literal>XMLPARSE</literal> or change the XML option.
</para>
</note>
</sect2>
<sect2>