libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	f19a95108a	parser: Report malloc failures Fix many places where malloc failures aren't reported. Make xmlErrMemory public. This is useful for custom external entity loaders. Introduce new API function xmlSwitchEncodingName. Change the way how we store whether the the parser is stopped. This used to be signaled by setting ctxt->instate to XML_PARSER_EOF which was misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in xmlHaltParser. This allows to remove many checks of ctxt->instate. Introduce xmlErrParser to handle errors if a parser context is available.	2023-12-11 22:13:05 +01:00
Nick Wellnhofer	7d446e9736	parser: Fix namespaces redefined from default attributes This regressed in commit `e0dd330b`. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.	2023-12-08 12:19:16 +01:00
Nick Wellnhofer	c011e7605d	globals: Remove unused globals from thread storage Setting these deprecated globals hasn't had an effect for a long time. Make them constants. This reduces the size of per-thread storage from ~700 to ~250 bytes.	2023-12-06 20:07:54 +01:00
Nick Wellnhofer	7f00273cf0	parser: Fix invalid free in xmlParseBalancedChunkMemoryRecover Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover. This is a long-standing bug which was masked by - xmlParseBalancedChunkMemoryRecover changing the document of the root node. This is a really bad idea, resulting in a mismatch between ctxt->myDoc and ctxt->node->doc. - SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit `a31e1b06`. Fixes #641.	2023-12-01 19:44:37 +01:00
Nick Wellnhofer	c7629c9eb1	parser: Clarify documentation regarding xmlReadMemory buffer size Fixes #638.	2023-11-30 16:52:34 +01:00
Nick Wellnhofer	43b511fa71	parser: Make CRLF increment line number Partial revert of `cb927e85` fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.	2023-11-26 15:18:09 +01:00
Nick Wellnhofer	aca37d8c77	parser: Only enable SAX2 if there are SAX2 element handlers This reverts part of commit `235b15a5` for backward compatibility and adds some comments trying to clarify the whole mess. Fixes #623.	2023-11-20 15:20:37 +01:00
Nick Wellnhofer	529df19619	parser: Don't overwrite error state in xmlParseTextDecl Fixes a null deref in xmlLoadEntityContent found by OSS-Fuzz.	2023-11-15 12:11:33 +01:00
Nick Wellnhofer	70cc45b81f	parser: Improve attribute hash table There's no need to grow the hash table dynamically. The size is known which simplifies the implementation.	2023-11-05 00:49:40 +01:00
Nick Wellnhofer	5859849454	parser: Fix combination of hash values This bug resulted in a stuck bit in hash values which can have a severe performance impact.	2023-11-04 23:50:02 +01:00
Nick Wellnhofer	7a2d412f68	parser: Copy default namespace in xmlParseBalancedChunkMemory	2023-10-31 20:19:27 +01:00
Nick Wellnhofer	e0c2f14d83	parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.	2023-10-31 14:04:57 +01:00
Nick Wellnhofer	028566745c	parser: Remove redundant IS_CHAR check in xmlCurrentChar	2023-10-22 16:32:54 +02:00
Nick Wellnhofer	c082ef4644	parser: Stop switching to ISO-8859-1 on encoding errors Use U+FFFD Replacement Character if invalid UTF-8 is encountered in recovery mode. Also rewrite xmlNextChar and xmlCurrentChar. Fixes #598.	2023-10-22 16:32:54 +02:00
Nick Wellnhofer	572ecc1719	parser: Fix buffer shrinking when push parsing Short-lived regression from `b76d81da`.	2023-10-22 14:01:50 +02:00
Nick Wellnhofer	86ef190e53	parser: Fix stack handling in xmlParseTryOrFinish After commit `e0dd330b`, this latent bug could cause use-after-free errors in rare circumstances like using the reader API with recovery and XIncludes.	2023-10-14 22:57:58 +02:00
Nick Wellnhofer	514ab39955	parser: Don't overwrite error state in xmlParseTextDecl If a memory allocation fails, this could cause a null deref after recent changes. Found by OSS-Fuzz.	2023-10-11 13:27:44 +02:00
Nick Wellnhofer	821a037038	parser: Fix memory leak in xmlLoadEntityContent Found by OSS-Fuzz.	2023-10-09 15:20:00 +02:00
Nick Wellnhofer	4fc5340ec5	parser: Also grow comment buffer if SAX is disabled Fix short-lived regression from `8afd321a`, found by OSS-Fuzz.	2023-10-08 14:26:35 +02:00
Nick Wellnhofer	36374bc9fc	parser: Fix error handling in xmlLoadEntityContent Backup more members of context struct. Fix small accounting error.	2023-10-08 14:08:44 +02:00
Nick Wellnhofer	b76d81dab3	parser: Fix regression when push parsing parameter entities Short-lived regression from `834b8123`. Also shrink parameter entity buffers when push parsing.	2023-10-06 13:11:19 +02:00
Nick Wellnhofer	134d2ad890	parser: Protect against quadratic default attribute expansion	2023-10-06 12:47:24 +02:00
Nick Wellnhofer	7615fae62e	parser: Make XML_PARSE_NSCLEAN option work again	2023-10-06 12:28:59 +02:00
Nick Wellnhofer	0ba22c0513	parser: Support encoded external PEs in entity values Corner case which was never supported.	2023-10-06 12:28:59 +02:00
Nick Wellnhofer	8afd321abd	parser: Missing checks for disableSAX	2023-10-06 12:28:59 +02:00
Nick Wellnhofer	97e99f4112	parser: Acknowledge that entities with namespaces are broken Entities which reference out-of-scope namespace have always been broken. xmlParseBalancedChunkMemoryInternal tried to reuse the namespaces currently in scope but these namespaces were ignored by the SAX handler. Besides, there could be different namespaces in scope when expanding the entity again. For example: <!DOCTYPE doc [ <!ENTITY ent "<ns:elem/>"> ]> <doc> <decl1 xmlns:ns="urn:ns1"> &ent; </decl1> <decl2 xmlns:ns="urn:ns2"> &ent; </decl2> </doc> Add some comments outlining possible solutions to this problem. For now, we stop copying namespaces to the temporary parser context in xmlParseBalancedChunkMemoryInternal. This has never really worked and the recent changes contained a partial fix which uncovered other problems like a use-after-free with the XML Reader interface, found by OSS-Fuzz.	2023-10-05 17:41:46 +02:00
Nick Wellnhofer	eb69c1d39d	parser: Fix initialization of namespace data Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN to xmlParserNsFree. Fixes #597.	2023-10-02 12:33:29 +02:00
Nick Wellnhofer	fc49679316	parser: Fix error handling in xmlParseQNameHashed Short-lived regression found by OSS-Fuzz.	2023-10-02 12:05:36 +02:00
Nick Wellnhofer	6dd87f5eef	malloc-fail: Fix memory leak in xmlParseBalancedChunkMemoryInternal Short-lived regression found by OSS-Fuzz.	2023-09-30 17:11:25 +02:00
Nick Wellnhofer	e0dd330b8f	parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.	2023-09-29 12:43:22 +02:00
Nick Wellnhofer	a873191cd2	parser: Introduce xmlParseQNameHashed	2023-09-29 12:43:08 +02:00
Nick Wellnhofer	8c084ebdc7	doc: Make apibuild.py happy	2023-09-21 22:57:33 +02:00
Nick Wellnhofer	11a1839ddd	globals: Move remaining globals back to correct header files This undoes a lot of damage.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	a77f9ab84c	globals: Don't include SAX2.h from globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	2e6c49a74d	globals: Don't store xmlParserVersion in global state This is a constant.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	a07ec7c1a7	threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.	2023-09-19 17:35:12 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	c19771c1f1	globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.	2023-09-19 17:34:38 +02:00
Nick Wellnhofer	d7cfe35650	parser: Avoid undefined behavior in xmlParseStartTag2 Instead of using arithmetic on dangling pointers, store ptrdiff_t values in void pointers which is at least implementation-defined.	2023-09-14 20:52:24 +02:00
Nick Wellnhofer	57cfd221a6	dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.	2023-09-01 14:52:04 +02:00
Nick Wellnhofer	53050b1dd8	parser: More fixes to push parser error handling	2023-08-29 20:06:43 +02:00
Nick Wellnhofer	bbd918b2e7	parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.	2023-08-29 18:43:10 +02:00
Nick Wellnhofer	c6083a32d6	parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser	2023-08-29 18:41:05 +02:00
Nick Wellnhofer	1edae30f82	parser: Don't check inputNr in xmlParseTryOrFinish There's no apparent reason for this check. inputNr should always be 1 here.	2023-08-29 18:17:14 +02:00
Nick Wellnhofer	e48f2695fe	parser: Remove push parser debugging code	2023-08-29 18:17:09 +02:00
Nick Wellnhofer	ed3bd05284	parser: Allow to set maximum amplification factor	2023-08-20 20:49:16 +02:00
Nick Wellnhofer	855818bd2b	parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.	2023-08-08 15:21:37 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	834b8123ef	parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.	2023-08-08 15:21:28 +02:00
Nick Wellnhofer	5aff27ae78	parser: Optimize xmlLoadEntityContent Load entity content via xmlParserInputBufferGrow, avoiding a copy. This also fixes an entity size accounting error.	2023-08-08 15:21:25 +02:00

1 2 3 4 5 ...

910 Commits