libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	2f3655c9c3	parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal\|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal\|External}Subset, the only location where they are allowed to end.	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	dd1961e0d8	valid: Skip more validity checks if not validating	2025-05-25 14:26:30 +02:00
Nick Wellnhofer	7008740a96	parser: Consolidate scanning of XML Names Use new productions by default. Fixes #194. Fixes #364. See #707.	2025-05-19 19:58:33 +02:00
Nick Wellnhofer	c4926b19d3	codegen: Merge xmlunicode.c into xmlregexp.c Include generated parts. Generate xmlChRangeGroups instead of functions for Unicode blocks.	2025-05-16 19:04:20 +02:00
Nick Wellnhofer	a40f36e7f2	include: Stop using *Ptr typedefs in public headers	2025-05-16 18:03:12 +02:00
Nick Wellnhofer	f602c0c186	html: Rework serialization of meta encoding attributes Don't allocate memory.	2025-05-12 00:05:02 +02:00
Nick Wellnhofer	05b8fe0a06	html: Don't escape RAWTEXT and PLAINTEXT Align with HTML5.	2025-05-11 20:57:07 +02:00
Nick Wellnhofer	777e2adf77	io: Consolidate escaping code Use generated table approach of xmlSerializeText for xmlEscapeText. Move most code to xmlIO.c.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	dad1163078	entities: Always replace invalid chars when escaping The previous refactor painstakingly recreated the different behavior of separate functions that were merged. It makes Optimize IS_CHAR check for non-ASCII chars.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	971038e59f	html: Call lower-level escaping functions Removes the need to pass a document around.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	63535d3922	tree: Make xmlNodeListGetStringInternal work with escape flags	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	46f05ea4d5	html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	f7c412874b	doc: Remove more comment block headers	2025-05-02 17:41:26 +02:00
Nick Wellnhofer	69879da88f	doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.	2025-05-01 23:23:42 +02:00
Nick Wellnhofer	b349225952	include: Change some return types from int to enum This also affects some new functions from 2.13.	2025-03-14 02:31:01 +01:00
Nick Wellnhofer	fd1b939168	include: Convert some macros to enums	2025-03-14 00:35:40 +01:00
Nick Wellnhofer	69b83bb68e	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.	2025-03-13 22:15:10 +01:00
Nick Wellnhofer	03a8d5f93d	unicode: Make Unicode functions private	2025-03-04 17:31:11 +01:00
Nick Wellnhofer	3d37ff84c3	globals: Also use global state struct if threads are disabled	2025-03-04 16:54:41 +01:00
Nick Wellnhofer	361f7bff92	parser: Make nodePush, nodePop, namePush, namePop private	2025-03-04 16:47:14 +01:00
Nick Wellnhofer	9c16a153d8	Revert "include: Make most IS_* macros private" This reverts commit `84a6c82ff8`.	2025-02-13 20:20:17 +01:00
Nick Wellnhofer	a78843be5e	xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.	2025-01-28 23:20:37 +01:00
Nick Wellnhofer	bfe6af2eed	fuzz: Remove hacks to build lint fuzzer Don't include source file directly.	2025-01-17 20:06:45 +01:00
Nick Wellnhofer	c134e8b4dc	include: Make INPUT_CHUNK macro private	2024-12-21 20:02:34 +01:00
Nick Wellnhofer	84a6c82ff8	include: Make most IS_* macros private Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.	2024-12-21 20:01:30 +01:00
Nick Wellnhofer	2e18e5dc6d	memory: Grow dynamic arrays by 50% Growing by a factor lower than the golden ratio increases the chances of reusing memory freed from earlier allocations. Set growth rate to 1.5 which also reduces internal fragmentation.	2024-12-21 19:37:38 +01:00
Nick Wellnhofer	5320a4aa38	memory: Implement xmlGrowCapacity to safely grow arrays xmlGrowCapacity makes sure that dynamic arrays don't grow beyond an explicit maximum size. size_t considerations are also taken into account. A macro XML_MAX_ITEMS is provided as default maximum with value 1 billion. When fuzzing, the initial size is set to 1 to cause more reallocations. This can require adjustments if callers really need larger arrays.	2024-12-21 19:37:37 +01:00
Nick Wellnhofer	0dd910e82b	save: Fix handling of catastrophic errors Don't overwrite catastrophic errors xmlSaveErr. Overwrite non-catastrophic errors in xmlOutputBufferClose.	2024-12-19 02:30:36 +01:00
Nick Wellnhofer	57087e5fc7	parser: Don't overwrite catastrophic errors Stop reporting errors after a catastrophic error. Also make sure that ctxt->errNo matches ctxt->lastError.code.	2024-11-26 00:47:48 +01:00
Nick Wellnhofer	0bc4608c50	html: Use hash table to check for duplicate attributes	2024-10-06 20:04:00 +02:00
makise-homura	a3043b478f	threads: define _WIN32_WINNT as 0x0600 to use InitOnceExecuteOnce()	2024-08-16 22:26:07 +03:00
Nick Wellnhofer	a530ff125d	io: Always consume encoding handler when creating output buffers Also free encoding handler in error case. Remove xmlAllocOutputBufferInternal which was identical to xmlAllocOutputBuffer.	2024-07-29 14:25:39 +02:00
Nick Wellnhofer	4e93425a7f	threads: Prefer Win32 over pthreads	2024-07-16 20:03:01 +02:00
Nick Wellnhofer	769e5a4a42	threads: Allocate global RMutexes statically Avoid memory allocations during initialization.	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	79e119954c	error: Make xmlLastError const	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	a6f54f055b	io: Fine-tune initial IO buffer size	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	34c9108f15	encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	a221cd7849	buf: Rework xmlBuf code Always use what the old implementation called the "IO" allocation scheme, allowing to move the content pointer past the initial allocation. This is inexpensive and allows efficient shrinking. Optimize xmlBufGrow, reusing shrunken memory as much as possible. Simplify xmlBufAdd. Make xmlBufBackToBuffer return an error on overflow. Make "size" exclude the terminating NULL byte. Always provide an initial size. Reintroduce static buffers. Remove xmlBufResize and several other functions.	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	1cfc5b8089	entities: Rework serialization of numeric character references	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	8d1606265d	entities: Rework text escaping	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	728869809e	error: Add helper functions to print errors and abort	2024-07-15 16:33:38 +02:00
Nick Wellnhofer	8af55c8d20	parser: Rename new input API functions These weren't made public yet.	2024-07-11 01:33:29 +02:00
Nick Wellnhofer	d74ca59491	parser: Rename internal xmlNewInput functions	2024-07-11 01:31:50 +02:00
Nick Wellnhofer	4f329dc524	parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.	2024-07-11 01:26:32 +02:00
Nick Wellnhofer	38195cf596	parser: Don't produce names with invalid UTF-8 in recovery mode	2024-07-06 15:33:06 +02:00
Nick Wellnhofer	16e7ecd478	xinclude: Check URI length Don't report long URIs as OOM errors.	2024-07-01 18:03:06 +02:00
Nick Wellnhofer	f505dcaea0	tree: Remove underscores from xmlRegisterCallbacks	2024-06-27 14:45:35 +02:00
Nick Wellnhofer	598ee0d2c6	error: Remove underscores from xmlRaiseError	2024-06-27 14:43:10 +02:00
Nick Wellnhofer	1341deac13	xmllint: Move shell to xmllint Move source code for xmllint shell to shell.c and move it from the libxml2 library to the xmllint executable. Also allow shell to run without XPath and debug modules. Add stubs for old shell API functions in legacy build mode.	2024-06-16 18:47:12 +02:00
Nick Wellnhofer	84666581c2	catalog: Fix initialization Initialize mutex via xmlInitParser. Fix some other initialization calls.	2024-06-15 21:15:26 +02:00

1 2 3

128 Commits