We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:
> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.
PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
* Casting a string literal to `char *` and then immediately passing or
assigning the result to a `const char *` makes no sense.
* There is no need to cast `int` to `Py_ssize_t` as they have the same
sign and the latter is at least as wide as the former.
Align libxml2-config.cmake generated by Autotools and Meson with the
CMake version and only add dependencies to libraries when linking
statically. Also set LIBXML_STATIC for static builds.
Fixes#918.
I think this was required when some struct members like
xmlParserInputBuffer::buffer were changed from xmlBuffer to xmlBuf (20+
years ago).
Unfortunately, I missed the opportunity to align xmlBuffer with xmlBuf
before the ABI break.
Move tools, source files and output tables into codegen directory.
Rename some files.
Adjust tools to match modified files. Remove generation date and source
files from output.
Distribute all tools and sources.
It seems that the specification of the HTML output method in XSLT 1.0
had a lot of influence on how the HTML serializer in libxml2 ended up:
https://www.w3.org/TR/xslt-10/#section-HTML-Output-Method
There are two remaining behaviors suggested by XSLT 1.0 that don't match
the HTML5 fragment serialization algorithm:
We escape non-ASCII characters in URI attributes (the list of which is
probably outdated). This was originally recommended in appendix B of the
HTML 4.01 spec, but only for user agents:
https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.1
From my experience, any tool that processes HTML should escape as little
as possible. For example, we used to escape many more characters which
are invalid in URIs, but often used in template languages. (Note that we
still escape whitespace and control chars.) Nevertheless, I guess that
some libxslt users continue to expect this behavior from libxml2.
Then we collapse Boolean attributes using an outdated list. This is
mostly a cosmetic issue, but a somewhat important one for libxslt users.
We probably need a serialization option for the xmlsave module that
enables fully HTML5-conformant output.
When using built-in encodings, the label would be normalized which
causes various issues. We now create a copy of the handler with the
original name.
This is somewhat dangerous as it will require users to free built-in
encodings with xmlCharEncCloseFunc. But to handle the general case, this
was already required.
Fixes#916 in another way than originally proposed.