Nick Wellnhofer
2963a0f13a
tree: Undeprecate some members used by libxslt
2025-06-20 21:41:24 +02:00
Nick Wellnhofer
7e08d93c94
doc: Improve documentation of tree data types
2025-06-08 14:22:32 +02:00
Nick Wellnhofer
2b6b3945f2
Revert "SAX1: Align handling of default attributes with SAX2"
...
This reverts commit db65b2fc51 .
This didn't check for duplicate default attributes.
2025-06-03 16:21:56 +02:00
Nick Wellnhofer
5e7c72cd5c
doc: Misc fixes
2025-06-03 01:27:12 +02:00
Nick Wellnhofer
5f8e537d0a
doc: Misc fixes to xpointer
2025-06-03 00:59:00 +02:00
Nick Wellnhofer
0ab5d7c557
entities: Deprecate internal DTD-related functions
2025-06-03 00:13:26 +02:00
Nick Wellnhofer
347c2b2ec7
valid: Deprecate a few functions and xmllint --insert
2025-06-02 23:54:28 +02:00
Nick Wellnhofer
7bd8d1d9cc
doc: Prefix autolinks with '#'
...
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
6e33d136e1
error: Fix initGenericErrorDefaultFunc compatibility macro again
...
Now it really should work as before.
2025-05-28 14:57:37 +02:00
Nick Wellnhofer
30cf6d0980
parser: Add XML_INPUT_USE_SYS_CATALOG
...
Also clean up catalog resolution and add error handling using the
global error.
Don't try to look up the resolved URI a second time.
Add some comments. Fix documentation.
2025-05-26 16:51:59 +02:00
Nick Wellnhofer
4dc44c83ab
parser: Rework entity boundary check for element content
...
Only use depth of input stack. This makes the input ID unused
internally.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
db65b2fc51
SAX1: Align handling of default attributes with SAX2
...
The SAX1 parser is legacy code, but it seems more maintainable to align
it with SAX2.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
2f3655c9c3
parser: Pop PEs that start markup declarations explicitly
...
We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:
> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.
PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
dd1961e0d8
valid: Skip more validity checks if not validating
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
fca0860d6c
tree: Deprecate public struct members related to DTDs
...
Let's deprecate these members for now. If these are really used, they
can be undeprecated later.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
7c9b55356d
doc: Document unused error domains
2025-05-19 20:07:54 +02:00
Nick Wellnhofer
7008740a96
parser: Consolidate scanning of XML Names
...
Use new productions by default.
Fixes #194 .
Fixes #364 .
See #707 .
2025-05-19 19:58:33 +02:00
Nick Wellnhofer
210f5a3746
chvalid: Mark functions as deprecated
2025-05-16 23:27:51 +02:00
Nick Wellnhofer
954aae907d
doc: Improve regexp documentation
2025-05-16 21:13:17 +02:00
Nick Wellnhofer
c5b45fbc07
doc: Misc fixes
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
c4926b19d3
codegen: Merge xmlunicode.c into xmlregexp.c
...
Include generated parts.
Generate xmlChRangeGroups instead of functions for Unicode blocks.
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
4cb767e96e
codegen: Only generate tables for character ranges
...
The rest can be easily maintained manually.
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
6f4b452742
parser: Stop using ctxt->linenumbers
...
I think this was used to avoid setting the `line` member before it was
added (20+ years ago).
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a05fa9a905
codegen: Rerun codegen scripts
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2
include: Stop using *Ptr typedefs in public headers
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
2d83a84ca6
doc: Misc improvements
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
f0983199e8
html: Map some encodings according to HTML5
...
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.
Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
628006f457
encoding: Add windows-1252
...
Fixes #915 .
2025-05-12 13:27:22 +02:00
Nick Wellnhofer
f602c0c186
html: Rework serialization of meta encoding attributes
...
Don't allocate memory.
2025-05-12 00:05:02 +02:00
Nick Wellnhofer
0674ccb7cb
html: Stop omitting end tags when serializing
...
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
05b8fe0a06
html: Don't escape RAWTEXT and PLAINTEXT
...
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
777e2adf77
io: Consolidate escaping code
...
Use generated table approach of xmlSerializeText for xmlEscapeText.
Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078
entities: Always replace invalid chars when escaping
...
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes
Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f
html: Call lower-level escaping functions
...
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922
tree: Make xmlNodeListGetStringInternal work with escape flags
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af
doc: Fix some damage from automated conversions
...
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
98a61c9dff
doc: Fix briefs in tree docs
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5
html: Rework meta charset handling
...
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909 .
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
38ea8fa9de
doc: Fix varargs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568
doc: Move brief to top, params to bottom of doc comments
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68
doc: Misc fixes to error docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3
doc: Misc fixes to xmlsave docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7
doc: Misc fixes to HTML tree docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3
doc: Misc fixes to encoding docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd
doc: Misc fixes to valid docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
714decd6d6
doc: Misc fixes to entities docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
f38f3e7b25
doc: Misc fixes to IO documentation
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
e6cfd04994
doc: Misc fixes to tree docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
1bf44f09ba
doc: Misc fixes to parser docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b7274fb02f
doc: Misc fixes to HTML parser docs
2025-05-06 19:51:38 +02:00