1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00

encoding: Detect truncated multi-byte sequences with ICU

Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
This commit is contained in:
Nick Wellnhofer
2025-03-10 02:18:51 +01:00
parent 76c6ddfef9
commit 69b83bb68e
14 changed files with 287 additions and 133 deletions

View File

@@ -140,4 +140,7 @@ XML_HIDDEN xmlChar *
xmlExpandEntitiesInAttValue(xmlParserCtxtPtr ctxt, const xmlChar *str,
int normalize);
XML_HIDDEN void
xmlParserCheckEOF(xmlParserCtxtPtr ctxt, xmlParserErrors code);
#endif /* XML_PARSER_H_PRIVATE__ */