Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						4a776c78ec 
					 
					
						
						
							
							html: Use htmlParseElementInternal in push parser  
						
						
						
						
					 
					
						2025-02-02 11:15:45 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						ba1537374b 
					 
					
						
						
							
							html: Fix corner case when push-parsing HTML5 comments  
						
						
						
						
					 
					
						2025-02-02 11:15:45 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e48fb5e4f2 
					 
					
						
						
							
							html: Handle incomplete UTF-8 when push-parsing  
						
						... 
						
						
						
						For now, incomplete UTF-8 is always an error in push mode.
Eventually, we could pass chunked data to the character handler when
push-parsing. Then we'd have to handle incomplete sequences. 
						
						
					 
					
						2025-02-02 11:15:45 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						6bb2ea8e70 
					 
					
						
						
							
							html: Adjust xmlDetectEncoding for HTML  
						
						... 
						
						
						
						Don't check for UTF-32 or EBCDIC.
We now perform BOM sniffing and the first step of the HTML5 prescan
algorithm (detect UTF-16 XML declarations). The rest of the algorithm
still has to be implemented. 
						
						
					 
					
						2025-02-02 11:15:44 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						227d8f739b 
					 
					
						
						
							
							html: Support encoding auto-detection in push parser  
						
						... 
						
						
						
						Align with pull parser. 
						
						
					 
					
						2025-02-02 11:15:44 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						641fb1acf5 
					 
					
						
						
							
							html: Fix state update in push parser  
						
						
						
						
					 
					
						2025-02-02 11:15:44 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						a86a8ae922 
					 
					
						
						
							
							html: Fix push-parsing of empty documents  
						
						... 
						
						
						
						Also simplify end-of-document handling in push parser.
Align with pull parser. 
						
						
					 
					
						2025-02-02 11:15:44 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						ca81916023 
					 
					
						
						
							
							include: Use intptr_t to cast between pointers and ints  
						
						
						
						
					 
					
						2025-01-03 20:59:10 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						53c131f667 
					 
					
						
						
							
							doc: Make apibuild.py work again  
						
						
						
						
					 
					
						2024-12-26 20:29:58 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0447275ef8 
					 
					
						
						
							
							html: Check reallocations for overflow  
						
						
						
						
					 
					
						2024-12-21 19:37:37 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						6548ba11b8 
					 
					
						
						
							
							parser: Fix argument checks in xmlCtxtParse*  
						
						... 
						
						
						
						- Raise invalid argument error.
- Free input stream if ctxt is NULL. 
						
						
					 
					
						2024-12-13 17:57:11 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						497081baab 
					 
					
						
						
							
							parser: Remove remaining calls to xml{Push|Pop}Input  
						
						
						
						
					 
					
						2024-11-19 00:25:23 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0f4f89005d 
					 
					
						
						
							
							parser: Rename inputPush to xmlCtxtPushInput  
						
						
						
						
					 
					
						2024-11-19 00:25:23 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						225ed70737 
					 
					
						
						
							
							html: Accelerate htmlParseCharData  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						207999793f 
					 
					
						
						
							
							html: Handle numeric character references directly  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0bc4608c50 
					 
					
						
						
							
							html: Use hash table to check for duplicate attributes  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						24a6149fc4 
					 
					
						
						
							
							html: Make sure that character data mode is reset  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						c32397d51f 
					 
					
						
						
							
							html: Improve character class macros  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e840655414 
					 
					
						
						
							
							html: Rewrite parsing of most data  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						f77ec16db0 
					 
					
						
						
							
							html: Optimize htmlParseCharData  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						440bd64c69 
					 
					
						
						
							
							html: Optimize htmlParseHTMLName  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						6040785ac4 
					 
					
						
						
							
							html: Deprecate AutoClose API  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						188cad68a4 
					 
					
						
						
							
							html: Remove obsolete content model  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0144f662d7 
					 
					
						
						
							
							html: Remove obsolete code  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						575be6c1f1 
					 
					
						
						
							
							html: Fix line numbers with CRs  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						be874d7831 
					 
					
						
						
							
							html: Ignore unexpected DOCTYPE declarations  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						462bf0b7a5 
					 
					
						
						
							
							html: Rework options  
						
						... 
						
						
						
						Introduce htmlCtxtSetOptions, see similar changes made to XML parser.
Add HTML_PARSE_HUGE alias. Support HTML_PARSE_BIG_LINES. 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						42c3823df0 
					 
					
						
						
							
							html: Update comment  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						9f04cce695 
					 
					
						
						
							
							html: Remove unused or useless return codes  
						
						... 
						
						
						
						htmlParseStartTag should always succeed (except for malloc failures). 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e179f3ec0e 
					 
					
						
						
							
							html: Stop reporting syntax errors  
						
						... 
						
						
						
						It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.
Handling HTML5 parser errors is rather involved and not essential for
parsers. 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						27752f75ca 
					 
					
						
						
							
							html: Fix EOF handling in start tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						b19d353970 
					 
					
						
						
							
							html: Fix EOF handling in comments  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						17e56ac54a 
					 
					
						
						
							
							html: Fix parsing of end tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						24a09033c9 
					 
					
						
						
							
							html: Fix bogus end tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						bca6485476 
					 
					
						
						
							
							html: Allow U+000C FORM FEED as whitespace  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						6edf1a645e 
					 
					
						
						
							
							html: Fix DOCTYPE parsing  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						9678163f54 
					 
					
						
						
							
							html: Don't check for valid XML characters  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						a6955c13c7 
					 
					
						
						
							
							html: Parse numeric character references according to HTML5  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						4eeac30944 
					 
					
						
						
							
							html: Start to fix EOF and U+0000 handling  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e062a4a9b3 
					 
					
						
						
							
							html: Add HTML5 parser option  
						
						... 
						
						
						
						This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.
This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.
A HTML5 tree builder could then be implemented on top of the SAX
callbacks. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						17da54c522 
					 
					
						
						
							
							html: Normalize newlines  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						341dc78f24 
					 
					
						
						
							
							html: Deduplicate code in htmlCurrentChar  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						3adb396d87 
					 
					
						
						
							
							html: Parse bogus comments instead of ignoring them  
						
						... 
						
						
						
						Also treat XML processing instructions as bogus comments. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						8444017578 
					 
					
						
						
							
							html: Add missing calls to htmlCheckParagraph()  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						86d6b9b051 
					 
					
						
						
							
							html: Deduplicate some code  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0d324bde36 
					 
					
						
						
							
							html: Simplify node info accounting  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						ccb61f599e 
					 
					
						
						
							
							html: Remove duplicate calls to htmlAutoClose  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						f9ed30e972 
					 
					
						
						
							
							html: HTML5 character data states  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						5951179239 
					 
					
						
						
							
							html: Parse named character references according to HTML5  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						d5cd0f07f8 
					 
					
						
						
							
							html: Prefer SKIP(1) over NEXT in HTML parser  
						
						... 
						
						
						
						Use SKIP(1) where it's safe to avoid a function call. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00