key/unique/keyref schema attributes currently use qudratic loops
to check their various constraints (that keys are unique and that
keyrefs refer to existing keys). That becomes extremely slow if
there are many elements with keys. This happens in the wild with
e.g. the OVAL XML descriptions of security patches. You need the
openscap schemata, and then an example xml file:
% zypper in openscap-utils
% wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml
% time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 16m59,857s
user 16m55,787s
sys 0m1,060s
This patch makes libxml use a hash table to avoid the quadratic
behaviour. The existing hash table only accepts strings as keys, so
we're mostly reusing the canonical representation of key values to derive
such strings (with the caveat given in a comment). The alternative
would be to rework the hash table code to accept either numbers or free
functions as hash workers, but the code is fast enough as is.
With the patch we have this then:
% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 0m3,531s
user 0m3,427s
sys 0m0,103s
So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
When ctxt->schema is NULL, xmlSchemaSAXPlug->xmlSchemaPreRun
alloc a new schema for ctxt->schema and set vctxt->xsiAssemble
to 1. Then xmlSchemaVStart->xmlSchemaPreRun initialize
vctxt->xsiAssemble to 0 again which cause the alloced schema
can not be freed anymore.
Found with libFuzzer.
Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com>
When reusing an xmlSchemaValidCtxtPtr to validate multiple xml documents
against the same schema, there is a memory leak in xmlschemas.c in
xmlSchemaClearValidCtxt(). The vctxt->idcKeys and associated counters
are not cleaned up in xmlSchemaClearValidCtxt() as they are in
xmlSchemaFreeValidCtxt(). As a result, vctxt->idcKeys grows with each
xmlValidateDoc() call that uses the same context and that memory is
never freed. Similarly, vctxt->nbIdcKeys and vctxt->sizeIdcKeys
increment and are never reset.
Closes: #23
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows
this is used in a callback which will pass a name, the name is ignored
but it's best to have the signature of the function match, pointed out
by Claude Petit
* xmlschemas.c: fix xmlSchemaAugmentImportedIDC() signature no functional
change
- Suppress warnings in xmlmemory.c by casting to 'void *'.
- Remove unneeded cast in xmlschemas.c that caused a macro precedence
error.
- Add dummy fields to short structs in xmlschemas.c. This increases the
size of the structs, but I can't see a better solution without using
C11's _Alignof operator.
There are still a couple of cast-align warnings in encoding.c. These
are legitimate portability issues that can't be fixed without reworking
the conversion functions.
For https://bugzilla.gnome.org/show_bug.cgi?id=766834
vctxt->parserCtxt is always NULL in xmlSchemaSAXHandleStartElementNs,
so this function can't call xmlStringLenDecodeEntities to decode the
entities.
For https://bugzilla.gnome.org/show_bug.cgi?id=709171
This makes xmlSchemaSAXHandleStartElementNs pass attributes through
xmlStringDecodeEntities, similar to how xmlSchemaVDocWalk passes them
through xmlNodeListGetString.
For https://bugzilla.gnome.org/show_bug.cgi?id=734363
When using xml schema validation, structured error callbacks do not get
passed a valid column number in xmlError field "int2".
$ ./xmlsaxparse colbug5.xml colbug5.xsd
colbug5.xml:3:0: Element '{urn:colbug5}bx': This element is not
expected.
Expected is ( {urn:colbug5}b ).
The schema error is reported for line 3, column 0 (= N/A).
I'd like to have the column number of the error passed in the xmlError
structure. With this test case: line 3, column 9.
Recently I have run into the very same problem Tiberius Duluman did back in
Wed, 13 May 2009 15:56:55 +0300 ([xml] Bug in xmlSchemaValidateOneElement
function). Now I can proof now that his problem is a valid problem. I checked
the latest available version of xmlschemas.c (2.9.0.) and the problem is still
there!
I think I have found a solution to the problem which I'd like proof with you:
My quick solution to the problem is to replace line 27849 in
xmlschemas.c
(v2.9.0.) in function xmlSchemaVDocWalk
valRoot = xmlDocGetRootElement(vctxt->doc);
with this one:
valRoot = vctxt->validationRoot ? vctxt->validationRoot : xmlDocGetRootElement(vctxt->doc);
Currently I'm using version 2.7.8. in Windows and this change seems to solve
the problem.
Based on Thomas Gamper <icicle@cg.tuwien.ac.at> findings and
initial patch
There is no point doing a regexp validation of further
content if there actually is no further content because the
element is nilled.
When generating a sequence add an extra epsilon transition
to avoid further constructs from entering via the last state
Bug reported by Johan Corveleyn <jcorvel@gmail.com>
Things now work correctly at the xmllint level:
thinkpad:~/XML -> xmllint --sax --noout --schema test_schema.xsd
test_xml.xml
test_xml.xml:72721: Schemas validity error : Element 'level1': Missing
child element(s). Expected is ( level2 ).
test_xml.xml fails to validate
thinkpad:~/XML -> xmllint --stream --schema test_schema.xsd test_xml.xml
test_xml.xml:72721: Schemas validity error : Element 'level1': Missing
child element(s). Expected is ( level2 ).
test_xml.xml fails to validate
thinkpad:~/XML ->
* error.c: fix a corner case of not reporting lines when we should
* include/libxml/xmlschemas.h doc/symbols.xml: had to add new entry
points to set the filename on a validation context and a locator
callback used to fetch the line and file from the context
* xmlschemas.c: add the new entry points xmlSchemaValidateSetFilename()
and xmlSchemaValidateSetLocator(), plus make sure the error reporting
routine gets the information if available. Add a locator for SAX.
* xmlreader.c: add and plug a locator for readers.
For https://bugzilla.gnome.org/show_bug.cgi?id=609796
Libxml2 fails to validate an instance document against a schema if an element
whose type is a complex extension of some base type with an optional child
element and that child element is not specified in the instance document. For
example, suppose I have some complex type BaseType that is defined to have one
child element in a sequence group that has minOccurs set to 0
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
but this led to a few real bugs and some part not yet understood
(relaxng/interleave)
* xmlschemas.c: when a particle need to be processed via counted
transition, if the group is nillable, the counting won't work, so
keep track of nillable subset as they are built and generate the
appropriate epsilon transitions as needed
* test/schemas/579746* result/schemas/579746*: add related test cases
based on the bug report
* xmlschemas.c: apparently we though it allowed any of the sub elements
to be missing, and probably not what's expected from the spec, though
it used to forbid it c.f.:
http://lists.xml.org/archives/xml-dev/200109/msg00512.html
asking HT for confirmation but it's likely that we were wrong on the
semantic
* result/schemas/all_1_[367]*: this changes the output of soem of our
internal regression tests
* xmlschemas.c: When validating a schema that includes the same file
that has no targetNamespace defined an internal erro was thrown,
depending on the orig namespace that should be allowed though
* test/schemas/582906-* result/schemas/582906-*: 2 tests case, one
where this is allowed, and one where this is forbidden
* xmlschemas.c: fixes the problem faced when importing the same schemas
multiple times but from different places which is allowed
* test/schemas/582887* result/schemas/582887*: adding the specific test
to the regressions