xmlParserInputBufferCreateMem must make a copy of the buffer.
This fixes a regression from 2.11 which could cause reads from freed
memory depending on the use case.
Undeprecate xmlParserInputBufferCreateStatic which can avoid copying
the whole buffer.
Fix many places where malloc failures aren't reported.
Make xmlErrMemory public. This is useful for custom external entity
loaders.
Introduce new API function xmlSwitchEncodingName.
Change the way how we store whether the the parser is stopped. This used
to be signaled by setting ctxt->instate to XML_PARSER_EOF which was
misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and
introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in
xmlHaltParser. This allows to remove many checks of ctxt->instate.
Introduce xmlErrParser to handle errors if a parser context is
available.
This regressed in commit e0dd330b.
Also fixes a long-standing issue where namespaces from default
attributes weren't added if they match an existing namespace.
Fixes#643.
Setting these deprecated globals hasn't had an effect for a long time.
Make them constants. This reduces the size of per-thread storage from
~700 to ~250 bytes.
Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover.
This is a long-standing bug which was masked by
- xmlParseBalancedChunkMemoryRecover changing the document of the root
node. This is a really bad idea, resulting in a mismatch between
ctxt->myDoc and ctxt->node->doc.
- SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit
a31e1b06.
Fixes#641.
Partial revert of cb927e85 fixing CRLFs not incrementing the line
number.
This requires to rework xmlParseQNameHashed. The original implementation
prompted the change to xmlCurrentChar which really shouldn't modify the
'cur' pointer as side effect. But the NEXTL macro relies on this
behavior.
Ultimately, we should reintroduce the change to xmlCurrentChar and fix
the NEXTL macro. This will lead to single CRs incrementing the line
number as well which seems more consistent.
Fixes#628.
Entities which reference out-of-scope namespace have always been broken.
xmlParseBalancedChunkMemoryInternal tried to reuse the namespaces
currently in scope but these namespaces were ignored by the SAX handler.
Besides, there could be different namespaces in scope when expanding the
entity again. For example:
<!DOCTYPE doc [
<!ENTITY ent "<ns:elem/>">
]>
<doc>
<decl1 xmlns:ns="urn:ns1">
&ent;
</decl1>
<decl2 xmlns:ns="urn:ns2">
&ent;
</decl2>
</doc>
Add some comments outlining possible solutions to this problem.
For now, we stop copying namespaces to the temporary parser context
in xmlParseBalancedChunkMemoryInternal. This has never really worked
and the recent changes contained a partial fix which uncovered other
problems like a use-after-free with the XML Reader interface, found
by OSS-Fuzz.
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.
Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.
Reuse hash value from the dictionary to avoid computing them twice.
See #346.
When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.
Remove the flush flag and always decode all available data.
Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.