Follow-up to commit d0c3f01e. A parser context will be initialized to
SAX version 2, but this can be overridden with XML_PARSE_SAX1 later,
so we must initialize the SAX1 element handlers as well.
Change the check in xmlDetectSAX2 to only look for XML_SAX2_MAGIC, so
we don't switch to SAX1 if the SAX2 element handlers are NULL.
For some reason, xmlCtxtUseOptionsInternal set the start and end element
SAX handlers to the internal DOM builder functions when XML_PARSE_SAX1
was specified. This means that custom SAX handlers could never work with
that flag because these functions would receive the wrong user data
argument and crash immediately.
Fixes#535.
Make sure that xmlCharEncInput, xmlParserInputBufferPush and
xmlParserInputBufferGrow set the correct error code in the
xmlParserInputBuffer. Handle errors when calling these functions.
Revert another change from commit 98840d40.
Decode the whole buffer when reading from memory and switching to the
initial encoding. Add some comments about potential improvements.
* parser.c:
(xmlParseCharData):
- Check if the parser has stopped before advancing
`ctxt->input->cur`. This only occurs if a custom SAX error
handler calls xmlStopParser() on fatal errors.
Fixes#518.
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.
Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
Don't try to grow the input buffer in xmlParserShrink. This makes sure
that no memory allocations are made and the function always succeeds.
Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of
DTD parsing loops.
Shrink before growing.
The input buffer is now grown reliably when calling CUR_CHAR
(xmlCurrentChar) or NEXT (xmlNextChar). This allows to remove many
other invocations of GROW.
- Lower the amount of expansion which is always allowed from
10MB to 1MB.
- Lower the maximum amplification factor from 10 to 5.
- Lower the "fixed cost" from 50 to 20.
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.
Found with libFuzzer.
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.
This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
Only add consumed bytes if
- we're not parsing an entity
- we're parsing external parameter entities for the first time.
Always ignore internal parameter entities.