libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	ab06bfa1f6	parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after `2a60ca06c`. Also avoids some unnecessary error messages.	2025-05-26 16:51:59 +02:00
Nick Wellnhofer	5ec83f7741	valid: Remove duplicate #FIXED check for namespaces Unlike the comment indicates, this is already checked.	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	7c10fff265	valid: Don't validate twice in xmlAddAttributeDecl This should only be done in xmlValidateAttributeDecl.	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	2f3655c9c3	parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal\|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal\|External}Subset, the only location where they are allowed to end.	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	dd1961e0d8	valid: Skip more validity checks if not validating	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	3a68d0b7a8	SAX2: Handle xml:id errors separately	2025-05-19 20:07:54 +02:00
Nick Wellnhofer	87087def4e	tests: Remove result files committed by accident	2025-05-13 23:00:51 +02:00
Nick Wellnhofer	f0983199e8	html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.	2025-05-12 14:04:30 +02:00
Nick Wellnhofer	825f3a9d0c	html: Always serialize attributes with double quotes Align with HTML5.	2025-05-11 21:42:51 +02:00
Nick Wellnhofer	cdaf657ffb	html: Don't escape < and > when serializing attribute values Align with HTML5. This will break some test suites.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	c8cea39d8a	save: Fix serialization of attribute defaults containing < Long-standing bug that produced invalid XML.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	46f05ea4d5	html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	f3a080bc48	html: Ignore U+0000 in body text Align with HTML5. Fixes #908.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	6896f478d4	Revert "valid: Remove duplicate error messages when streaming" This reverts commit `cd220b93d8`. This commit broke the xmstarlet tests.	2025-04-18 17:24:45 +02:00
Nick Wellnhofer	69b83bb68e	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.	2025-03-13 22:15:10 +01:00
Nick Wellnhofer	05bd1720ce	parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.	2025-03-01 15:18:20 +01:00
Nick Wellnhofer	9f86dae989	test: Add test case for UAF in xmlSchemaIDCFillNodeTables	2025-02-20 11:35:47 +01:00
Nick Wellnhofer	8cf6129bbd	html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.	2025-02-13 20:20:17 +01:00
Nick Wellnhofer	71122421a1	html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.	2025-02-13 14:31:44 +01:00
Nick Wellnhofer	b4d3d87ed2	parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.	2025-02-02 11:15:45 +01:00
Nick Wellnhofer	080285724b	html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.	2025-02-02 11:15:45 +01:00
Nick Wellnhofer	cd220b93d8	valid: Remove duplicate error messages when streaming	2024-12-28 11:55:24 +01:00
Nick Wellnhofer	459146140a	xpath: Fix parsing of non-ASCII names Fix a long-standing issue where QNames starting with a non-ASCII character would be rejected. This became more visible after "streaming" XPath evaluation was disabled since the latter handled non-ASCII names correctly. Fixes #818.	2024-11-05 12:30:44 +01:00
Nick Wellnhofer	ffb058f484	parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.	2024-10-28 20:26:55 +01:00
Nick Wellnhofer	f77ec16db0	html: Optimize htmlParseCharData	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	575be6c1f1	html: Fix line numbers with CRs	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	e179f3ec0e	html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	c6af101728	html: Test tokenizer against html5lib test suite	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	9678163f54	html: Don't check for valid XML characters	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	4eeac30944	html: Start to fix EOF and U+0000 handling	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	17da54c522	html: Normalize newlines	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	3adb396d87	html: Parse bogus comments instead of ignoring them Also treat XML processing instructions as bogus comments.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	e1834745e0	html: Add character data tests	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	f9ed30e972	html: HTML5 character data states	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	5951179239	html: Parse named character references according to HTML5	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	a80f8b64a9	html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	dcb2abb2fe	html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	bd9eed4694	parser: Make unsupported encodings an error in declarations This was changed in `45157261`, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.	2024-09-02 19:29:39 +02:00
Nick Wellnhofer	8ae06d5223	SAX2: Don't merge CDATA sections The Document Object Model (DOM) Level 3 Core Specification says: > Adjacent CDATASection nodes are not merged by use of the normalize > method of the Node interface. Fixes #412.	2024-08-29 01:31:19 +02:00
Nick Wellnhofer	322e733b84	xinclude: Fix fallback for text includes Fixes #772.	2024-07-18 19:32:23 +02:00
Nick Wellnhofer	842a044831	valid: Restore ID lookup Revert a change from `d025cfbb` and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.	2024-07-03 11:46:06 +02:00
Nick Wellnhofer	30be984a0f	encoding: Rework ISO-8859-X conversion Optimize code. Pass tables as context parameter. Check for XML_ENC_ERR_SPACE.	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	7c11da2d98	tests: Clarify licence of test/intsubset2.xml	2024-06-27 12:49:06 +02:00
Nick Wellnhofer	b8903b9e0d	runtest: Remove result handling from schemasOneTest We only care about errors.	2024-06-22 21:59:03 +02:00
Nick Wellnhofer	e68ccfa988	tests: Port Schematron tests to C	2024-06-22 21:59:03 +02:00
Nick Wellnhofer	1dd5e76a69	xinclude: Don't remove root element Don't replace include element at root with empty nodeset.	2024-06-18 20:12:03 +02:00
Nick Wellnhofer	52ce0d70f9	tests: Add XInclude test for issue #733	2024-06-17 17:35:12 +02:00
Nick Wellnhofer	2608baaf92	parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.	2024-06-14 20:06:07 +02:00
Nick Wellnhofer	669bd34993	xpointer: Remove support for XPointer locations The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. If you configure --with-legacy, old symbols are retained for ABI compatibility.	2024-06-12 18:20:01 +02:00
Nick Wellnhofer	4fefba4cf6	parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.	2024-05-15 17:58:48 +02:00

1 2 3 4 5 ...

659 Commits