libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	5859849454	parser: Fix combination of hash values This bug resulted in a stuck bit in hash values which can have a severe performance impact.	2023-11-04 23:50:02 +01:00
Nick Wellnhofer	61034116d0	error: Make more xmlError structs constant Prepare for future changes, see `45470611`.	2023-10-24 15:02:36 +02:00
Nick Wellnhofer	c082ef4644	parser: Stop switching to ISO-8859-1 on encoding errors Use U+FFFD Replacement Character if invalid UTF-8 is encountered in recovery mode. Also rewrite xmlNextChar and xmlCurrentChar. Fixes #598.	2023-10-22 16:32:54 +02:00
Nick Wellnhofer	253f260bb1	threads: Fix --with-thread-alloc Fixes #606.	2023-10-18 20:07:04 +02:00
Nick Wellnhofer	713ded60ad	entities: Make xmlFreeEntity public	2023-10-06 10:47:07 +02:00
Nick Wellnhofer	eb69c1d39d	parser: Fix initialization of namespace data Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN to xmlParserNsFree. Fixes #597.	2023-10-02 12:33:29 +02:00
Nick Wellnhofer	e0dd330b8f	parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.	2023-09-29 12:43:22 +02:00
Nick Wellnhofer	19161bab15	dict: Internal API to look up hash values	2023-09-29 12:43:08 +02:00
Nick Wellnhofer	1425d8f67b	dict: Separate RNG code	2023-09-29 00:15:40 +02:00
Nick Wellnhofer	b31813e60c	include: Add more missing stdio.h includes	2023-09-28 15:34:08 +02:00
Nick Wellnhofer	84e1ffc813	doc: Don't document internal macros in xmlversion.h	2023-09-22 19:01:11 +02:00
Nick Wellnhofer	b94283fbda	regexp: Add missing include	2023-09-22 14:23:27 +02:00
Nick Wellnhofer	45470611b0	error: Make xmlGetLastError return a const error This is a slight break of the API, but users really shouldn't modify the global error struct. The goal is to make xmlLastError use static buffers for its strings eventually. This should warn people if they're abusing the struct.	2023-09-22 13:29:07 +02:00
Nick Wellnhofer	8c084ebdc7	doc: Make apibuild.py happy	2023-09-21 22:57:33 +02:00
Nick Wellnhofer	72262030a6	parser: Readd some includes to parser.h and xmlreader.h Fix backward compatibility.	2023-09-21 15:06:05 +02:00
Nick Wellnhofer	9fc5090c05	hash: Clean up libxml/hash.h Rename variables, fix subincludes, whitespace.	2023-09-21 14:47:25 +02:00
Nick Wellnhofer	da274bfa55	build: Fix build when certain modules are disabled	2023-09-21 02:26:43 +02:00
Nick Wellnhofer	9b5cce7a71	include: Remove more unnecessary includes	2023-09-21 01:50:53 +02:00
Nick Wellnhofer	d6ba403368	globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.	2023-09-20 22:22:51 +02:00
Nick Wellnhofer	1117fae040	include: Remove unneeded includes	2023-09-20 22:07:41 +02:00
Nick Wellnhofer	736327df6b	include: Break inclusion cycle between tree.h and xmlregexp.h	2023-09-20 22:07:41 +02:00
Nick Wellnhofer	699299cae3	globals: Stop including globals.h	2023-09-20 22:07:40 +02:00
Nick Wellnhofer	11a1839ddd	globals: Move remaining globals back to correct header files This undoes a lot of damage.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	7909ff08e2	include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	eb985d6f8e	globals: Move error globals back to xmlerror.c	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	d1336fd393	globals: Move malloc hooks back to xmlmemory.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	a77f9ab84c	globals: Don't include SAX2.h from globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	2e6c49a74d	globals: Don't store xmlParserVersion in global state This is a constant.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	0830fcfa90	globals: Deprecate xmlLastError The last error should be accessed with xmlGetLastError.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	db8b9722cb	parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	868b94b80e	globals: Reformat libxml/globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	bbf08608fc	globals: Move buffer callback declarations to xmlIO.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	dc3382ef97	globals: Move xmlRegisterNodeDefault to tree.c Code in globals.c must not try to access globals itself since the accessor macros aren't defined and we would only see the main variable.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	e7b6ca156f	globals: Rework global state destruction on Windows If DllMain is used, rely on it working as expected. The old code seemed to attempt to free global state of other threads if, for some reason, the DllMain mechanism didn't work. In a static build, register a destructor with RegisterWaitForSingleObject. Make public functions xmlGetGlobalState and xmlInitializeGlobalState no-ops. Move initialization and registration of global state objects to xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage which can be inlined nicely. Also cleanup global state when using TLS. xmlLastError must be reset.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	39a275a541	globals: Define globals using macros Declare and define globals and helper functions by (ab)using the preprocessor.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	bf6bd16154	globals: Introduce xmlCheckThreadLocalStorage Checks whether (emulated) thread-local storage could be allocated.	2023-09-20 22:06:43 +02:00
Nick Wellnhofer	89f4976728	globals: Make xmlGlobalState private This removes a public struct but it seems impossible to use its members in a sensible way from external code.	2023-09-19 17:36:29 +02:00
Nick Wellnhofer	a07ec7c1a7	threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.	2023-09-19 17:35:12 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	c19771c1f1	globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.	2023-09-19 17:34:38 +02:00
Nick Wellnhofer	2a4b811424	globals: Rename members of xmlGlobalState This is a deliberate first step to remove some internals from the public API and to avoid issues when redefining tokens.	2023-09-19 17:34:30 +02:00
Nick Wellnhofer	edc2dd48cb	dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.	2023-09-04 16:07:23 +02:00
Nick Wellnhofer	57cfd221a6	dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.	2023-09-01 14:52:04 +02:00
Nick Wellnhofer	778cca386d	legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.	2023-08-20 23:16:12 +02:00
Nick Wellnhofer	ed3bd05284	parser: Allow to set maximum amplification factor	2023-08-20 20:49:16 +02:00
Nick Wellnhofer	f1c1f5c6b4	parser: Revert change to doc->encoding Fixes #579.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	834b8123ef	parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.	2023-08-08 15:21:28 +02:00
Nick Wellnhofer	59fa0bb383	parser: Simplify input pointer updates The base member always points to the beginning of the buffer.	2023-08-08 15:21:14 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00

1 2 3 4 5 ...

1066 Commits