libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	868b94b80e	globals: Reformat libxml/globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	bbf08608fc	globals: Move buffer callback declarations to xmlIO.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	dc3382ef97	globals: Move xmlRegisterNodeDefault to tree.c Code in globals.c must not try to access globals itself since the accessor macros aren't defined and we would only see the main variable.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	e7b6ca156f	globals: Rework global state destruction on Windows If DllMain is used, rely on it working as expected. The old code seemed to attempt to free global state of other threads if, for some reason, the DllMain mechanism didn't work. In a static build, register a destructor with RegisterWaitForSingleObject. Make public functions xmlGetGlobalState and xmlInitializeGlobalState no-ops. Move initialization and registration of global state objects to xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage which can be inlined nicely. Also cleanup global state when using TLS. xmlLastError must be reset.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	39a275a541	globals: Define globals using macros Declare and define globals and helper functions by (ab)using the preprocessor.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	bf6bd16154	globals: Introduce xmlCheckThreadLocalStorage Checks whether (emulated) thread-local storage could be allocated.	2023-09-20 22:06:43 +02:00
Nick Wellnhofer	89f4976728	globals: Make xmlGlobalState private This removes a public struct but it seems impossible to use its members in a sensible way from external code.	2023-09-19 17:36:29 +02:00
Nick Wellnhofer	a07ec7c1a7	threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.	2023-09-19 17:35:12 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	c19771c1f1	globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.	2023-09-19 17:34:38 +02:00
Nick Wellnhofer	2a4b811424	globals: Rename members of xmlGlobalState This is a deliberate first step to remove some internals from the public API and to avoid issues when redefining tokens.	2023-09-19 17:34:30 +02:00
Nick Wellnhofer	edc2dd48cb	dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.	2023-09-04 16:07:23 +02:00
Nick Wellnhofer	57cfd221a6	dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.	2023-09-01 14:52:04 +02:00
Nick Wellnhofer	778cca386d	legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.	2023-08-20 23:16:12 +02:00
Nick Wellnhofer	ed3bd05284	parser: Allow to set maximum amplification factor	2023-08-20 20:49:16 +02:00
Nick Wellnhofer	f1c1f5c6b4	parser: Revert change to doc->encoding Fixes #579.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	834b8123ef	parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.	2023-08-08 15:21:28 +02:00
Nick Wellnhofer	59fa0bb383	parser: Simplify input pointer updates The base member always points to the beginning of the buffer.	2023-08-08 15:21:14 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	b8961df65d	SAX: Always validate xml:ids The behavior shouldn't depend on mostly random configuration options.	2023-05-09 03:25:24 +02:00
Nick Wellnhofer	8d5e33ef3e	Fix compiler warning on GCC < 8 -Wcast-function-type is only available since GCC 8.	2023-05-03 20:42:10 +02:00
Nick Wellnhofer	fc69cf568b	parser: Move xmlFatalErr to parserInternals.c	2023-04-30 17:51:29 +02:00
Nick Wellnhofer	3ff6abbf58	encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.	2023-04-30 16:43:29 +02:00
Nick Wellnhofer	fa993130f9	xpath: Remove remaining references to valueFrame Fixes #529.	2023-04-30 13:18:17 +02:00
Nick Wellnhofer	3ffcc03b16	parser: Deprecate more internal functions	2023-04-26 20:23:23 +02:00
Nick Wellnhofer	98840d40da	parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.	2023-03-21 21:35:15 +01:00
Nick Wellnhofer	04d1bedd8c	parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.	2023-03-21 13:19:18 +01:00
Nick Wellnhofer	b167c73144	parser: Fix short-lived regression causing infinite loops Fix `3eb6bf03`. We really have to halt the parser, so the input buffer gets reset.	2023-03-14 15:16:04 +01:00
Nick Wellnhofer	f8efa589e8	malloc-fail: Handle malloc failures in xmlSchemaInitTypes Note that this changes the return value of public function xmlSchemaInitTypes from void to int. This shouldn't break the ABI on most platforms. Found when investigating #500.	2023-03-14 15:14:38 +01:00
Nick Wellnhofer	d7daf9fd96	xmllint: Fix use-after-free with --maxmem Fixes #498.	2023-03-14 14:55:34 +01:00
Nick Wellnhofer	e7c3a4ca1b	parser: Deprecate some parser input functions	2023-03-13 19:19:46 +01:00
Nick Wellnhofer	2099441f32	parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.	2023-03-13 17:51:13 +01:00
Nick Wellnhofer	483793940c	malloc-fail: Stop using XPath stack frames There's too much code which assumes that if ctxt->value is non-null, a value can be successfully popped off the stack. This assumption can break with stack frames when malloc fails. Instead of trying to fix all call sites, remove the stack frame logic. It only offered very little protection against misbehaving extension functions. We already check the stack size after a function call which should be enough. Found by OSS-Fuzz.	2023-03-13 17:11:27 +01:00
Nick Wellnhofer	bd63d730b8	html: Impose some length limits Impose length limits on names, attribute values, PIs and comments, similar to the XML parser.	2023-03-12 17:40:55 +01:00
Nick Wellnhofer	3eb6bf0386	parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.	2023-03-12 17:05:51 +01:00
Nick Wellnhofer	b51478dc95	Revert "malloc-fail: Avoid use-after-free after unsuccessful valuePush" This reverts commit `6a12be77c6`. There's too much code reading ctxt->value directly and making the wrong assumptions.	2023-02-26 13:23:47 +01:00
Nick Wellnhofer	4f0a0fb7a2	xinclude: Fix include guard	2023-02-22 14:24:24 +01:00
Nick Wellnhofer	905386ec35	autotools: Fix make distcheck - Add private/xinclude.h to EXTRA_DIST - Add runsuite.log to CLEANFILES Fixes #485.	2023-02-13 11:14:34 +01:00
Nick Wellnhofer	6a12be77c6	malloc-fail: Avoid use-after-free after unsuccessful valuePush In xpath.c there's a lot of code like: valuePush(ctxt, xmlCacheNewX()); ... valuePop(ctxt); If xmlCacheNewX fails, no value will be pushed on the stack. If there's no error check in between, valuePop will pop an unrelated value which can lead to use-after-free errors. Instead of trying to fix all call sites, we simply stop popping values if an error was signaled. This requires to change the CHECK_TYPE macro which is often used to determine whether a value can be safely popped. Found with libFuzzer, see #344.	2023-02-03 12:40:15 +01:00
Nick Wellnhofer	59b3366178	error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.	2022-12-27 14:41:19 +01:00
Nick Wellnhofer	a41b09c739	parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.	2022-12-23 22:11:18 +01:00
Nick Wellnhofer	b47ebf047e	parser: Deprecate xmlString*DecodeEntities These are internal functions.	2022-12-21 21:06:03 +01:00
Nick Wellnhofer	ce76ebfd13	entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	a3c8b1805e	entities: Add entity flag for loop check	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	463bbeeca1	entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	7e3f469be9	entities: Use flags to store '<' check results Instead of abusing the LSB of the "checked" member, store the result of testing for occurrence of '<' character in "flags". Also use the flags in xmlParseStringEntityRef instead of rescanning every time.	2022-12-19 15:59:49 +01:00
Nick Wellnhofer	481d79d44c	entities: Add XML_ENT_PARSED flag To check whether an entity was already parsed, the code previously tested whether "checked" was non-zero or "children" was non-null. The "children" check could be unreliable because an empty entity also results in an empty (NULL) node list. Use a separate flag to make this check more reliable.	2022-12-19 15:26:46 +01:00
Nick Wellnhofer	f34f184f8e	entities: Add "flags" member to struct xmlEntity This will hold various flags and eventually replace the "checked" member.	2022-12-19 15:24:53 +01:00
Nick Wellnhofer	93a01c46f1	libxml.h: Add comments and indentation	2022-12-08 04:39:03 +01:00

1 2 3 4 5 ...

1036 Commits