Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
This avoids an if-statement, because effectively it does nothing. And,
for example, binary artifact generated by GCC with -O2 optimization
settings does not contain that if-statement -- the code just uses the
hprefix->name field explicitly.
No functional changes intended.
Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.
Regressed with commit d025cfbb.
Fixes#815.
Remove special handling of CDATA sections in push parser. This makes
sure that only a single callback is generated for large sections.
Fixes#22 and needed for #412.
Some users set an entity's children manually in the getEntity SAX
callback to restrict entity expansion. This stopped working after
renaming the "checked" member of xmlEntity, making at least one
downstream project and its dependants susceptible to XXE attacks.
See #761.
This limit is somewhat arbitrary and can be reached when fuzzing
documents up to 1 MB.
Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.
Fixes#727.