Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes#909.
It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.
Handling HTML5 parser errors is rather involved and not essential for
parsers.
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.
Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
Introduce xmlStrVASPrintf, trying to handle buggy snprintf
implementations.
Introduce xmlSetError to set errors atomically.
Introduce xmlUpdateError to set an error, fixing up node, file and line.
Introduce helper function xmlRaiseMemoryError.
Make legacy error handlers call xmlReportError, avoiding checks in
xmlVRaiseError.
Remove fragile support for getting file and line info from XInclude
nodes.
Fix many places where malloc failures aren't reported.
Make some API function return an error code. Changing the return type
from void to int is technically an ABI break but should be safe on most
platforms.
- xmlNodeSetContent
- xmlNodeSetContentLen
- xmlNodeAddContent
- xmlNodeAddContentLen
- xmlNodeSetBase
Introduce new API functions that return a separate error code if a
memory allocation fails.
- xmlNodeGetAttrValue
- xmlNodeGetBaseSafe
- xmlGetNsListSafe
Introduce private functions xmlTreeEnsureXMLDecl and xmlSplitQName4.
Don't report low-level errors to the global error handler.
Fix tree
Introduce xmlGetNsListSafe
Fix tree
On Windows, malloc hooks can be called after the final call to
xmlCleanupParser in various tests. This means that xmlMemMutex can still
be accessed if memory debugging is enabled, so the mutex should not be
cleaned.
This also means that tests may report spurious memory leaks on Windows.
The old implementation avoided the issue by keeping track of all
global state objects in a doubly linked list, so they could be cleaned
during xmlCleanupParser.
But as far as I can tell all memory will be freed eventually, so this is
mostly an issue with our test suite.
Add the directory containing libxml2.dll with os.add_dll_directory to
make tests work on MinGW.
This has changed in Python 3.8 but for some reason, the issue only
turned up with Python 3.11 on MinGW. Contrary to documentation, copying
libxml2.dll into the directory containing the .pyd file doesn't work.
Covered by: test/VC/ElementValid5
This only affects XML Reader API with LIBXML_REGEXP_ENABLED and
LIBXML_VALID_ENABLED turned on.
* result/VC/ElementValid5.rdr:
- Update result to add missing error message.
* python/tests/reader2.py:
* result/VC/ElementValid6.rdr:
* result/VC/ElementValid7.rdr:
* result/valid/781333.xml.err.rdr:
- Update result to fix grammar issue.
* valid.c:
(xmlValidatePopElement):
- Check return value of xmlRegExecPushString() to handle -1, and
assign 'ret = 0;' to return 0 from xmlValidatePopElement().
This change affects xmlTextReaderValidatePop() from
xmlreader.c.
- Fix grammar of error message by changing 'child' to
'children'.
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.
Found with libFuzzer.
As per https://peps.python.org/pep-0394/, the python binary can be one
of the following options:
- Python 2
- Python 3
- Not exist
All of the scripts in libxml2 use 'python', which may not exist.
As Python 2 reached EOL on the 1st January 2020, it's safe to move the
scripts to use python3 explicitly.
The expected errors contain an relative path, but the messages from the
parser contain absolute paths. However, due to the tests not actually
failing if there was an error this wasn't noticed.
Instead of putting relative paths in the expected messages use format()
to embed the correct absolute path.
Also use os.path.join() consistently when constructing paths to ensure
uniformly formatted paths.
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.
This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.
This also fixes line numbers in the SAX push parser which were never
handled correctly.
Added all test cases that have a non-empty error in result/valid/*.xml.err
Restructured to make it easier extensible with new test cases
Added coding cookie because there is non-ASCII in the error messages