libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	3ffcc03b16	parser: Deprecate more internal functions	2023-04-26 20:23:23 +02:00
Nick Wellnhofer	9282b08431	parser: Fix regression in memory pull parser with encoding Revert another change from commit `98840d40`. Decode the whole buffer when reading from memory and switching to the initial encoding. Add some comments about potential improvements.	2023-04-19 22:32:19 +02:00
Nick Wellnhofer	a19fa11e1d	parser: Fix regression when switching input encodings Revert some changes from commit `98840d40`. WebKit/Chromium can actually switch from ISO-8859-1 to UTF-16 in the middle of parsing. This is a bad idea, but we have to keep supporting this use case.	2023-04-13 15:20:56 +02:00
Nick Wellnhofer	921796b06b	parser: Don't grow push parser buffers This should fix a short-lived regression when push parsing with encodings.	2023-04-12 13:56:33 +02:00
Nick Wellnhofer	0e42adce77	parser: Halt parser if switching encodings fails Avoids buffer overread in htmlParseHTMLAttribute. Found by OSS-Fuzz.	2023-03-30 14:09:15 +02:00
Nick Wellnhofer	3660229219	parser: Fix buffer overread in xmlDetectEBCDIC Short-lived regression found by OSS-Fuzz.	2023-03-26 14:11:31 +02:00
Nick Wellnhofer	7fbd454d9f	parser: Grow input buffer earlier when reading characters Make more bytes available after invoking CUR_CHAR or NEXT.	2023-03-21 21:35:53 +01:00
Nick Wellnhofer	98840d40da	parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.	2023-03-21 21:35:15 +01:00
Nick Wellnhofer	04d1bedd8c	parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.	2023-03-21 13:19:18 +01:00
Nick Wellnhofer	1a91392c62	parser: More fixes to xmlParserGrow xmlHaltParser must be called after reporting an error. Switch to xmlBufSetInputBaseCur.	2023-03-16 17:48:57 +01:00
Nick Wellnhofer	ca2bfecea9	malloc-fail: Fix buffer overread when reading from input Found by OSS-Fuzz, see #344.	2023-03-15 17:34:32 +01:00
Nick Wellnhofer	b167c73144	parser: Fix short-lived regression causing infinite loops Fix `3eb6bf03`. We really have to halt the parser, so the input buffer gets reset.	2023-03-14 15:16:04 +01:00
Nick Wellnhofer	e7c3a4ca1b	parser: Deprecate some parser input functions	2023-03-13 19:19:46 +01:00
Nick Wellnhofer	2099441f32	parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.	2023-03-13 17:51:13 +01:00
Nick Wellnhofer	457fc622d5	malloc-fail: Fix null deref in xmlParserInputShrink Found by OSS-Fuzz.	2023-03-13 16:54:16 +01:00
Nick Wellnhofer	3eb6bf0386	parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.	2023-03-12 17:05:51 +01:00
Nick Wellnhofer	2355eac59e	malloc-fail: Fix null deref if growing input buffer fails Also add some error checks. Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	077df27eb1	parser: Fix integer overflow of input ID Applies a patch from Chromium. Also stop incrementing input ID of subcontexts. This isn't necessary. Fixes #465.	2022-12-22 15:22:01 +01:00
Nick Wellnhofer	ce76ebfd13	entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	463bbeeca1	entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	a8b31e68c2	parser: Fix progress check when parsing character data Skip over zero bytes to guarantee progress. Short-lived regression.	2022-11-21 21:39:10 +01:00
Nick Wellnhofer	691a771956	parser: Fix 'consumed' accounting when switching encodings	2022-11-20 21:27:59 +01:00
Nick Wellnhofer	249cee4b2a	io: Fix a few integer overflows in I/O statistics There are still many places where arithmetic on "consumed" stats isn't checked for overflow, affecting platforms with a 32-bit long type.	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	6b57061909	io: Rearrange code in xmlSwitchInputEncodingInt No functional change.	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	46cd7d224e	io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.	2022-11-20 21:12:18 +01:00
Nick Wellnhofer	9feafbc5c5	io: Check for memory buffer early in xmlParserInputGrow	2022-11-13 18:08:34 +01:00
Nick Wellnhofer	6843fc726f	Remove or annotate char casts	2022-09-01 04:31:30 +02:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	65dc8a63ac	Make xmlNewSAXParserCtx take a const sax handler Also improve documentation.	2022-09-01 00:17:45 +02:00
Nick Wellnhofer	0f568c0b73	Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.	2022-08-26 02:11:56 +02:00
Nick Wellnhofer	ca3807d946	Mark more functions setting globals as deprecated	2022-08-24 16:16:09 +02:00
Nick Wellnhofer	fd85b566f7	Mark more parser functions as deprecated No compiler warnings generated yet.	2022-08-24 15:12:24 +02:00
Nick Wellnhofer	9a82b94a94	Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.	2022-08-24 14:07:55 +02:00
Nick Wellnhofer	c21e9cd5d9	Use xmlStrlen in xmlNewStringInputStream xmlStrlen handles buffers larger than INT_MAX more gracefully.	2022-08-20 17:03:10 +02:00
Nick Wellnhofer	b1b654171e	Create stream with buffer in xmlNewStringInputStream Create an input stream with a buffer in xmlNewStringInputStream. Otherwise, switching encodings won't work. See #34.	2022-08-20 16:34:08 +02:00
Nick Wellnhofer	aab584dc31	Clean up encoding switching code - Remove xmlSwitchToEncodingInt which was basically just a wrapper around xmlSwitchInputEncodingInt. - Simplify xmlSwitchEncoding. - Improve error handling in xmlSwitchInputEncodingInt. - Deprecate xmlSwitchInputEncoding.	2022-04-02 19:09:12 +02:00
Nick Wellnhofer	92bff86614	Fix calls to deprecated init/cleanup functions Only use xmlInitParser/xmlCleanupParser.	2022-03-29 14:18:31 +02:00
Nick Wellnhofer	4951c462ea	Avoid arithmetic on freed pointers	2022-03-06 02:29:00 +01:00
Nick Wellnhofer	ebb1797030	Remove unneeded #includes	2022-03-04 22:11:49 +01:00
Nick Wellnhofer	776d15d383	Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h	2022-03-02 00:43:54 +01:00
Nick Wellnhofer	2489c1d024	Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.	2022-02-28 22:58:35 +01:00
Nick Wellnhofer	346c3a930c	Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	d7cb33cf44	Rework validation context flags Use a bitmask instead of magic values to - keep track whether the validation context is part of a parser context - keep track whether xmlValidateDtdFinal was called This allows to add addtional flags later. Note that this deliberately changes the name of a public struct member, assuming that this was always private data never to be used by client code.	2022-02-20 21:49:04 +01:00
David King	328456bf29	Fix memory leak in xmlNewInputFromFile Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806	2022-01-16 14:15:09 +01:00
Nick Wellnhofer	dcb80b92da	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.	2021-02-20 21:28:56 +01:00
Nick Wellnhofer	438e595a8c	Stop counting nbChars in parser context The value was inaccurate and never used.	2020-08-09 15:01:45 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Nick Wellnhofer	3776cb4745	Fix memory leak in xmlSwitchInputEncodingInt error path Found by OSS-Fuzz.	2018-11-22 16:28:46 +01:00
Nick Wellnhofer	7a1bd7f649	Revert "Change calls to xmlCharEncInput to set flush false" This reverts commit `6e6ae5daa6` which broke decoding of larger documents with ICU. See https://bugs.chromium.org/p/chromium/issues/detail?id=820163	2018-03-17 00:03:24 +01:00

1 2 3 4 5 ...

258 Commits