Nick Wellnhofer
1e5375c1b4
SAX2: Check return value of xmlPushInput
...
Fix null deref in case of malloc failure.
2024-07-06 15:33:06 +02:00
Nick Wellnhofer
38195cf596
parser: Don't produce names with invalid UTF-8 in recovery mode
2024-07-06 15:33:06 +02:00
Nick Wellnhofer
fdfeecfe5e
parser: Reenable ctxt->directory
...
Unused internally, but used in downstream code.
Should fix #753 .
2024-07-02 22:06:53 +02:00
Nick Wellnhofer
606f410891
parser: Allow to disable catalogs with parser options
...
Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI.
Fixes #735 .
2024-07-02 22:06:53 +02:00
Nick Wellnhofer
866be54e22
parser: Don't use deprecated xmlSplitQName
2024-07-02 13:34:11 +02:00
Nick Wellnhofer
bc793390d5
parser: Update documentation
2024-06-27 16:23:14 +02:00
Nick Wellnhofer
eca972e682
parser: Add getters for XML declaration to parser context
...
Access to struct members will be deprecated.
2024-06-27 14:44:49 +02:00
Mike Dalessio
bbbbbb4649
parser: implement xmlCtxtGetOptions
...
In 712a31ab , the `options` struct member was deprecated. To allow
callers to check the status of options bits, introduce
xmlCtxtGetOptions.
2024-06-20 20:39:54 +00:00
Rosen Penev
217e9b7af2
clang-tidy: don't return in void functions
...
Found with readability-redundant-control-flow
Signed-off-by: Rosen Penev <rosenp@gmail.com >
2024-06-20 20:37:34 +00:00
Nick Wellnhofer
32cac377c8
parser: Selectively reenable reading from "-"
...
Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile.
This should hopefully fix most command line utilities.
See #737 .
2024-06-17 18:08:31 +02:00
Nick Wellnhofer
33a1f8978d
legacy: Merge SAX.c into legacy.c
2024-06-16 19:17:41 +02:00
Nick Wellnhofer
10d60d15d6
regexp: Stop using LIBXML_AUTOMATA_ENABLED
...
This macro always equals LIBXML_REGEXP_ENABLED.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
b0fc67aa22
build: Remove --with-tree configuration option
...
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.
See #717 .
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
039ce1e821
parser: Pass global object to sax->setDocumentLocator
...
Revert part of commit c011e760 .
Fixes #732 .
2024-06-14 16:41:43 +02:00
Nick Wellnhofer
dba1ed85a3
ftp: Remove FTP support
...
Remove the built-in FTP client. If you configure --with-legacy, old
symbols are retained for ABI compatibility.
2024-06-12 18:19:55 +02:00
Nick Wellnhofer
5238404325
parser: Pass resource type to resource loader
2024-06-12 16:36:12 +02:00
Nick Wellnhofer
89fcae4dfd
parser: Don't report malloc failures when creating context
...
We don't want messages to stderr before an error handler could be set on
a parser context.
2024-06-12 16:36:12 +02:00
Nick Wellnhofer
410931e385
parser: Only set input ID for PE refs
...
Other input streams don't require IDs.
2024-06-12 16:22:52 +02:00
Nick Wellnhofer
ff3b091910
parser: Implement XML_PARSE_NO_UNZIP option
2024-06-12 16:14:15 +02:00
Nick Wellnhofer
47cbb6bb3c
doc: Don't mention xmlNewInputURL
2024-06-12 16:04:45 +02:00
Nick Wellnhofer
8318b5a634
parser: Fix NULL checks for output arguments
2024-06-09 15:08:43 +02:00
Nick Wellnhofer
0cde1b78d6
parser: Fix "Truncated multi-byte sequence" error
...
Don't raise the error if decoding failed.
2024-06-07 00:02:31 +02:00
Nick Wellnhofer
122b61309f
parser: Fix performance regression when parsing namespaces
...
The namespace hash table didn't reuse deleted buckets, leading to
quadratic behavior.
Also ignore deleted buckets when resizing.
Fixes #726 .
2024-06-06 15:52:09 +02:00
Nick Wellnhofer
a7e26707be
parser: Don't overwrite OOM errors in xmlSBuf
2024-06-03 14:04:44 +02:00
Nick Wellnhofer
e75e878e02
doc: Update and fix documentation
2024-05-20 14:23:39 +02:00
Nick Wellnhofer
4fefba4cf6
parser: Rework handling of undeclared entities
...
Throw an error if entity substitution was requested.
Now we only downgrade to a warning if
- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.
Should fix #724 .
2024-05-15 17:58:48 +02:00
Nick Wellnhofer
4ff2dccf9f
SAX2: Warn if URI resolution failed
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
4fe116ebd3
parser: Don't report error on invalid URI
...
Only fragment identifiers are an error.
This removes the last user of xmlErrMsg*. Now every error reported by
the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed
or ctxt->valid being set to zero.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
a4c2b7233f
io: Don't set close callback in xmlParserInputBufferCreateFd
2024-05-05 17:27:12 +02:00
Nick Wellnhofer
fdc5ff3657
parser: Always throw entity errors if external DTD is loaded
...
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.
Also consolidate behavior when validating. See b717abdd .
2024-05-03 11:52:54 +02:00
Nick Wellnhofer
39e5b35bd0
parser: Don't create undeclared entity refs in substitution mode
...
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
2024-05-03 11:46:01 +02:00
Nick Wellnhofer
1cdfece12b
memory: Remove memory debugging
...
This is useless compared to sanitizers or valgrind and has a
considerable performance impact if enabled accidentally.
2024-04-28 20:42:55 +02:00
Nick Wellnhofer
45fe9924f0
parser: Don't create reference in xmlLookupGeneralEntity
...
This should only be done in xmlParseReference.
The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
b717abdd09
parser: Consolidate error handling for undeclared entities
...
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
f506ec6654
parser: Always decode entities in namespace URIs
...
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:
> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.
Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.
Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
2024-04-15 12:34:26 +02:00
Nick Wellnhofer
2840e33c5e
tree: Allocate XML namespace statically
2024-03-15 19:47:07 +01:00
Nick Wellnhofer
186562a182
parser: Fix detection of duplicate attributes in XML namespace
...
Fixes a regression from commit e0dd330b , resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.
Fixes #704 .
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
4d774612f3
parser: Fix column number in attribute values
...
Short-lived regression from 37c6618b .
2024-02-13 12:00:02 +01:00
Nick Wellnhofer
95f2a17440
parser: Fix crash in xmlParseInNodeContext with HTML documents
...
Ignore namespaces if we have an HTML document with namespaces added
manually.
Fixes #672 .
2024-01-30 13:35:41 +01:00
Nick Wellnhofer
6dc2fdb2bd
parser: Account for full size of non-well-formed entities
...
Account for the full size of the entity if parsing stops because of
errors. In our cost model, we have to assume that the entity loader
processes the whole entity regardless of its content.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
29beef653c
parser: Pop inputs if parsing DTD failed
...
This should provide some statistics in ctxt->sizeentcopy even in the
error or recovery case.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
02a2038de4
parser: Handle NOCDATA properly when expanding entities
...
Short-lived regression from e1153832 .
2024-01-10 14:17:49 +01:00
Nick Wellnhofer
e1153832b0
parser: Fix quadratic behavior when copying entities
...
Process the first and last text node with the SAX handler to make the
text merging optimization kick in.
Fixes #657 .
2024-01-07 15:42:39 +01:00
Nick Wellnhofer
f237e5b934
parser: Avoid duplicate namespace errors
...
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
02cc5c3609
parser: Add XML_PARSE_NO_XXE parser option
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
12f0bb9478
parser: Synchronize more options
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
3efbe916a1
parser: Mark 'token' member as unused in xmlParserCtxt
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
b82fd81d06
parser: Rework xmlCtxtParseDocument
...
Make xmlCtxtParseDocument take a parser input which can be popped after
parsing.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
d7d300ba04
parser: Remove remnants of runtime debugging feature
...
Apparently, this feature was remove long ago.
Fixes #651 .
2024-01-04 17:50:11 +01:00
Nick Wellnhofer
8c5848bdd5
parser: Make xmlParseContent more useful
...
This is an internal function which isn't really usable without some
hacks. See WebKit/Chromium trying to recreate the effects of
xmlDetectSAX2 manually, for example.
Make xmlParseContent perform late initialization and check whether the
content was fully parsed.
Also rename xmlDetectSAX2 and document why it's needed.
2024-01-04 17:45:03 +01:00