This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
use the new function instead of digging into the buffer in
all those modules
The main changes are when the internal of the buffers structure
were adressed directly, we now use routines coming from buf.h
The routine xmlParserInputRead() which wasn't used anywhere is
deprecated too.
Unless explicietely asked for when validating or replacing entities
with their value. Problem pointed out by Tom Lane <tgl@redhat.com>
* parser.c: do not load external parsed entities unless needed
* test/errors/extparsedent.xml result/errors/extparsedent.xml*:
add a regression test to avoid change of the behaviour in the future
tsan reported that rand() is not thread safe, so create
a thread safe wrapper, use rand_r() if available.
Consolidate the function, initialization and cleanup in
dict.c and make sure it is initialized in xmlInitParser()
For https://bugzilla.gnome.org/show_bug.cgi?id=643949
In case of error on an IO creation input the given context
is terminated with the given close function, except if the
error happened in xmlParserInputBufferCreateIO. This can
lead to a resource leak which is fixed by this patch.
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.
* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
Mostly except we keep support for some older constructs and
don't implement extension or privateuse. It's messy because
it's used mostly by XSD datatype which itself reference RFC 3066
and suggests a lexical space completely different from what
5646 defines.
In xmlInitParser, both __xmlGlobalInitMutexLock and xmlInitGlobals are
called before xmlInitThreads, and both use pthread symbols.
__xmlGlobalInitMutexLock does so directly, without checking if the symbol
exists, and xmlInitGlobals calls xmlNewMutex, which correctly depends on
libxml_is_threaded... except libxml_is_threaded is still -1 by then...
And again, when releasing the global mutex in __xmlGlobalInitMutexUnlock,
the pthread function is called directly.
The patch changes the initialization order and make sure the functions
are available before calling them
if encoding was autodetected, in xmlParseChunk, if initial size is 86 (a
chunk in UTF-16 encoding), the code that tries to read only the first line
will set the size to 90, which eventually leads to a memmove of 90 bytes
(in xmlBufferAdd) which will copy extra random memory bytes, which will
make the parser to fail because of these extra bytes.
xmlParseInNodeContext notices that the enclosing document is
an HTML document, so invoke the HTML parser for that fragment, and
the HTML parser finding a "<p>hello world!</p>" document automatically
augment it with defaulted <html> and <body>. This defaulting should
be turned off in the HTML parser for this to work, but there is no
such HTML parser option. There is an htmlOmittedDefaultValue global
variable that you could use, but really we should not rely on global
variable for processing options anymore, best is to add an
HTML_PARSE_NOIMPLIED.
* include/libxml/HTMLparser.h: add the HTML_PARSE_NOIMPLIED parser flag
* HTMLparser.c: do add implied element if HTML_PARSE_NOIMPLIED is set
* parser.c: add HTML_PARSE_NOIMPLIED to options for xmlParseInNodeContext
on HTML documents
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
but this led to a few real bugs and some part not yet understood
(relaxng/interleave)
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
fix unused variables, or unneeded increments as well as a couple
of space issues
* runtest.c: check for NULL before calling unlink()
* test/utf16bebom.xml: regression test showed that this test case was
broken but previous behaviour would not detect it !
* parser.c: fix 566012 for the push mode of the parser, tricky !
* test/ebcdic_566012.xml result//ebcdic_566012.xml*: add the test to the
regression suite
* encoding.c parser.c parserInternals.c: when we autodetect an encoding
but it's actually not completely compatible with the one declared
great care must be taken to not convert more than just the first line.
Led to some refactoring, more private functions and a bit of cleanup.
* parser.c: when replacing entities and that the entity is CDATA and
reference entities then white space character in replacement text
need to be replaced by 0x20
* result/noent/att10: correct the output of the associated regression
test
* libxml2.syms: the symbols with history, going back to 2.4.30
* Makefile.am configure.in: linking flags detection and use
* parser.c tree.c valid.c xpointer.c: various cleanup of functions
which could be made static or simply discarded, not that many