libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	d38e73f91e	parser: Always create UTF-8 in xmlParseReference It seems that this code path could only be triggered after an encoding error in recovery mode. Creating char-ref nodes is unnecessary and typically unexpected.	2023-08-08 15:19:44 +02:00
Nick Wellnhofer	131d0dc0a7	parser: Don't use 'standalone' member of xmlParserInput The standalone declaration is only parsed in the main input stream.	2023-08-08 15:19:39 +02:00
Nick Wellnhofer	d9ec182b65	parser: Don't detect encoding in xmlCtxtResetPush The encoding will be detected in xmlParseTryOrFinish.	2023-08-08 15:19:36 +02:00
Nick Wellnhofer	90bcbcfcc7	parser: Fix potential use-after-free in xmlParseCharDataInternal Return immediately if a SAX handler stops the parser. Fixes #569.	2023-07-20 21:40:57 +02:00
Nick Wellnhofer	e0f3016f71	parser: Fix regression when push parsing UTF-8 sequences Partial UTF-8 sequences are allowed when push parsing. Fixes #542.	2023-05-18 18:21:20 +02:00
Nick Wellnhofer	235b15a590	SAX: Always initialize SAX1 element handlers Follow-up to commit `d0c3f01e`. A parser context will be initialized to SAX version 2, but this can be overridden with XML_PARSE_SAX1 later, so we must initialize the SAX1 element handlers as well. Change the check in xmlDetectSAX2 to only look for XML_SAX2_MAGIC, so we don't switch to SAX1 if the SAX2 element handlers are NULL.	2023-05-08 19:15:44 +02:00
Nick Wellnhofer	d0c3f01e11	parser: Fix old SAX1 parser with custom callbacks For some reason, xmlCtxtUseOptionsInternal set the start and end element SAX handlers to the internal DOM builder functions when XML_PARSE_SAX1 was specified. This means that custom SAX handlers could never work with that flag because these functions would receive the wrong user data argument and crash immediately. Fixes #535.	2023-05-06 17:47:37 +02:00
Nick Wellnhofer	320f5084cd	parser: Improve handling of encoding and IO errors Make sure that xmlCharEncInput, xmlParserInputBufferPush and xmlParserInputBufferGrow set the correct error code in the xmlParserInputBuffer. Handle errors when calling these functions.	2023-04-30 21:31:54 +02:00
Nick Wellnhofer	fc69cf568b	parser: Move xmlFatalErr to parserInternals.c	2023-04-30 17:51:29 +02:00
Nick Wellnhofer	3ffcc03b16	parser: Deprecate more internal functions	2023-04-26 20:23:23 +02:00
Nick Wellnhofer	250faf3c83	parser: Fix regression in xmlParserNodeInfo accounting Commit `62150ed2` broke begin_pos and begin_line when extra node info was recorded. Fixes #523.	2023-04-20 15:38:00 +02:00
Nick Wellnhofer	9282b08431	parser: Fix regression in memory pull parser with encoding Revert another change from commit `98840d40`. Decode the whole buffer when reading from memory and switching to the initial encoding. Add some comments about potential improvements.	2023-04-19 22:32:19 +02:00
David Kilzer	86105c0493	Fix use-after-free in xmlParseContentInternal() * parser.c: (xmlParseCharData): - Check if the parser has stopped before advancing `ctxt->input->cur`. This only occurs if a custom SAX error handler calls xmlStopParser() on fatal errors. Fixes #518.	2023-04-16 12:01:05 -07:00
Nick Wellnhofer	b4d46cee80	parser: Remove first line handling in xmlParseChunk After reworking EBCDIC detection, this isn't necessary.	2023-04-12 15:10:01 +02:00
Nick Wellnhofer	98840d40da	parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.	2023-03-21 21:35:15 +01:00
Nick Wellnhofer	3eb9f5ca4e	parser: Limit name length in xmlParseEncName	2023-03-21 13:19:31 +01:00
Nick Wellnhofer	04d1bedd8c	parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.	2023-03-21 13:19:18 +01:00
Nick Wellnhofer	067986fa67	parser: Fix regressions from previous commits - Fix memory leak in xmlParseNmtoken. - Fix buffer overread after htmlParseCharDataInternal.	2023-03-18 16:51:40 +01:00
Nick Wellnhofer	3e85d7b7ab	parser: Rely on CUR_CHAR/NEXT to grow the input buffer The input buffer is now grown reliably when calling CUR_CHAR (xmlCurrentChar) or NEXT (xmlNextChar). This allows to remove many other invocations of GROW.	2023-03-17 14:02:23 +01:00
Nick Wellnhofer	c81d0d04bf	malloc-fail: Add more error checks when parsing names xmlParseName and similar functions must return NULL if an error occurs. Found by OSS-Fuzz, see #344.	2023-03-17 12:39:35 +01:00
Nick Wellnhofer	b167c73144	parser: Fix short-lived regression causing infinite loops Fix `3eb6bf03`. We really have to halt the parser, so the input buffer gets reset.	2023-03-14 15:16:04 +01:00
Nick Wellnhofer	2099441f32	parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.	2023-03-13 17:51:13 +01:00
Nick Wellnhofer	cabde70f8b	parser: Simplify calculation of available buffer space	2023-03-12 19:07:23 +01:00
Nick Wellnhofer	b75976e029	parser: Use size_t when subtracting input buffer pointers Avoid integer overflows.	2023-03-12 19:06:19 +01:00
Nick Wellnhofer	9a6ca81612	parser: Check for integer overflow when updating checkIndex Unfortunately, checkIndex is a long, not a size_t. Check for integer overflow before updating the value.	2023-03-12 19:03:11 +01:00
Nick Wellnhofer	bd63d730b8	html: Impose some length limits Impose length limits on names, attribute values, PIs and comments, similar to the XML parser.	2023-03-12 17:40:55 +01:00
Nick Wellnhofer	3eb6bf0386	parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.	2023-03-12 17:05:51 +01:00
Nick Wellnhofer	207ebdfd2a	malloc-fail: Fix out-of-bounds read in xmlGROW Short-lived regression from `56cc2211`.	2023-03-12 14:43:01 +01:00
Nick Wellnhofer	56cc2211bc	parser: Merge xmlParserInputGrow into xmlGROW Simplifies the code and makes error handling easier.	2023-03-09 22:27:58 +01:00
Nick Wellnhofer	14604a446e	malloc-fail: Fix out-of-bounds read in xmlCurrentChar Found by OSS-Fuzz.	2023-03-09 22:10:44 +01:00
Nick Wellnhofer	3f69fc805c	parser: Tighten expansion limits - Lower the amount of expansion which is always allowed from 10MB to 1MB. - Lower the maximum amplification factor from 10 to 5. - Lower the "fixed cost" from 50 to 20.	2023-03-08 13:58:49 +01:00
Nick Wellnhofer	5d55315e32	parser: Fix OOB read when formatting error message Don't try to print characters beyond the end of the buffer. Found by OSS-Fuzz.	2023-02-18 17:29:07 +01:00
Nick Wellnhofer	f8852184a1	malloc-fail: Fix memory leak in xmlParseEntityDecl Found with libFuzzer, see #344.	2023-02-17 17:16:50 +01:00
Nick Wellnhofer	e6d22f925a	malloc-fail: Fix reallocation in inputPush Store xmlRealloc result in temporary variable to avoid null deref in error handler. Found with libFuzzer, see #344.	2023-01-24 11:47:33 +01:00
Nick Wellnhofer	6fd8904108	malloc-fail: Fix use-after-free in xmlParseStartTag2 Fix error handling in xmlCtxtGrowAttrs. Found with libFuzzer, see #344.	2023-01-24 11:47:33 +01:00
Nick Wellnhofer	d1b8785693	malloc-fail: Fix infinite loop in xmlParseTextDecl Memory errors can set `instate` to `XML_PARSER_EOF` which results in `NEXT` making no progress. Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	bd9de3a31f	malloc-fail: Fix null deref in xmlAddDefAttrs Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	33d4a0fe40	parser: Fix progress check in xmlParseExternalSubset Avoid infinite loop. Short-lived regression from `f61b8a62`. Found with libFuzzer.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	74aa61e0bd	parser: Halt parser on DTD errors If we try to continue parsing after an error in the internal or external subset, entity expansion accounting gets more complicated. Simply halt the parser. Found with libFuzzer.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	d320a683d1	parser: Fix entity check in attributes Don't set the "checked" flag when checking entities in default attribute values. These entities could reference other entities which weren't defined yet, so the check isn't reliable. This fixes a short-lived regression which could lead to a call stack overflow later in xmlStringGetNodeList.	2023-01-17 13:59:24 +01:00
Nick Wellnhofer	59b3366178	error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.	2022-12-27 14:41:19 +01:00
Nick Wellnhofer	66e9fd66e8	parser: Fix infinite loop with push parser in recovery mode Short-lived regression from commit `b1f9c193`. Found by OSS-Fuzz.	2022-12-25 21:30:32 +01:00
Nick Wellnhofer	49b54d7e2b	parser: Fix null deref in xmlStringDecodeEntitiesInt Short-lived regression.	2022-12-25 15:06:51 +01:00
Nick Wellnhofer	1865668b61	parser: Fix accounting of consumed input bytes Only add consumed bytes if - we're not parsing an entity - we're parsing external parameter entities for the first time. Always ignore internal parameter entities.	2022-12-23 23:11:11 +01:00
Nick Wellnhofer	bc18f4a67c	parser: Lower entity nesting limit with XML_PARSE_HUGE The old limit of 1024 could lead to excessively deep call stacks. This could probably be set much lower without causing issues.	2022-12-23 22:11:18 +01:00
Nick Wellnhofer	dd62e541ec	parser: Don't increase depth twice when parsing internal entities Fix xmlParseBalancedChunkMemoryInternal.	2022-12-23 22:11:18 +01:00
Nick Wellnhofer	a41b09c739	parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.	2022-12-23 22:11:18 +01:00
Nick Wellnhofer	d972393f30	parser: Only report a single entity error Don't report errors multiple times for nested entity references.	2022-12-23 22:10:39 +01:00
Nick Wellnhofer	077df27eb1	parser: Fix integer overflow of input ID Applies a patch from Chromium. Also stop incrementing input ID of subcontexts. This isn't necessary. Fixes #465.	2022-12-22 15:22:01 +01:00

1 2 3 4 5 ...

857 Commits