Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						575be6c1f1 
					 
					
						
						
							
							html: Fix line numbers with CRs  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						be874d7831 
					 
					
						
						
							
							html: Ignore unexpected DOCTYPE declarations  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						462bf0b7a5 
					 
					
						
						
							
							html: Rework options  
						
						... 
						
						
						
						Introduce htmlCtxtSetOptions, see similar changes made to XML parser.
Add HTML_PARSE_HUGE alias. Support HTML_PARSE_BIG_LINES. 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						42c3823df0 
					 
					
						
						
							
							html: Update comment  
						
						
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						9f04cce695 
					 
					
						
						
							
							html: Remove unused or useless return codes  
						
						... 
						
						
						
						htmlParseStartTag should always succeed (except for malloc failures). 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e179f3ec0e 
					 
					
						
						
							
							html: Stop reporting syntax errors  
						
						... 
						
						
						
						It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.
Handling HTML5 parser errors is rather involved and not essential for
parsers. 
						
						
					 
					
						2024-10-06 20:04:00 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						27752f75ca 
					 
					
						
						
							
							html: Fix EOF handling in start tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						b19d353970 
					 
					
						
						
							
							html: Fix EOF handling in comments  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						17e56ac54a 
					 
					
						
						
							
							html: Fix parsing of end tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						24a09033c9 
					 
					
						
						
							
							html: Fix bogus end tags  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						bca6485476 
					 
					
						
						
							
							html: Allow U+000C FORM FEED as whitespace  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						6edf1a645e 
					 
					
						
						
							
							html: Fix DOCTYPE parsing  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						9678163f54 
					 
					
						
						
							
							html: Don't check for valid XML characters  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						a6955c13c7 
					 
					
						
						
							
							html: Parse numeric character references according to HTML5  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						4eeac30944 
					 
					
						
						
							
							html: Start to fix EOF and U+0000 handling  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e062a4a9b3 
					 
					
						
						
							
							html: Add HTML5 parser option  
						
						... 
						
						
						
						This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.
This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.
A HTML5 tree builder could then be implemented on top of the SAX
callbacks. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						17da54c522 
					 
					
						
						
							
							html: Normalize newlines  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						341dc78f24 
					 
					
						
						
							
							html: Deduplicate code in htmlCurrentChar  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						3adb396d87 
					 
					
						
						
							
							html: Parse bogus comments instead of ignoring them  
						
						... 
						
						
						
						Also treat XML processing instructions as bogus comments. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						8444017578 
					 
					
						
						
							
							html: Add missing calls to htmlCheckParagraph()  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						86d6b9b051 
					 
					
						
						
							
							html: Deduplicate some code  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						0d324bde36 
					 
					
						
						
							
							html: Simplify node info accounting  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						ccb61f599e 
					 
					
						
						
							
							html: Remove duplicate calls to htmlAutoClose  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						f9ed30e972 
					 
					
						
						
							
							html: HTML5 character data states  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						5951179239 
					 
					
						
						
							
							html: Parse named character references according to HTML5  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						d5cd0f07f8 
					 
					
						
						
							
							html: Prefer SKIP(1) over NEXT in HTML parser  
						
						... 
						
						
						
						Use SKIP(1) where it's safe to avoid a function call. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						dc2d498318 
					 
					
						
						
							
							html: Rework htmlLookupSequence  
						
						... 
						
						
						
						Rename to htmlLookupString and use strstr for increased performance. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						637215a4de 
					 
					
						
						
							
							html: Always terminate doctype declarations on '>'  
						
						... 
						
						
						
						Align with HTML5 spec. This allows to remove the old quote handling in
htmlLookupSequence. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						72e29f9a3d 
					 
					
						
						
							
							html: Fix quadratic behavior in push parser  
						
						... 
						
						
						
						Fix quadratic behavior related to unquoted attribute values. We really
have to replicate parts of the HTML5 state machine to find the end of
tags relibably.
Fixes  #533 . 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						a80f8b64a9 
					 
					
						
						
							
							html: Allow attributes in end tags  
						
						... 
						
						
						
						Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						f2272c231b 
					 
					
						
						
							
							html: Handle unexpected-solidus-in-tag according to HTML5  
						
						
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						939b53ee12 
					 
					
						
						
							
							html: Stop skipping tag content  
						
						... 
						
						
						
						Tag and attributes names should always be parsed succesfully now. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						dcb2abb2fe 
					 
					
						
						
							
							html: Parse tag and attribute names according to HTML5  
						
						... 
						
						
						
						HTML5 allows bascially all characters in tag and attribute names. 
						
						
					 
					
						2024-10-06 18:13:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						5d36664fc9 
					 
					
						
						
							
							memory: Deprecate xmlGcMemSetup  
						
						
						
						
					 
					
						2024-07-16 17:42:10 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						8af55c8d20 
					 
					
						
						
							
							parser: Rename new input API functions  
						
						... 
						
						
						
						These weren't made public yet. 
						
						
					 
					
						2024-07-11 01:33:29 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						d74ca59491 
					 
					
						
						
							
							parser: Rename internal xmlNewInput functions  
						
						
						
						
					 
					
						2024-07-11 01:31:50 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						4f329dc524 
					 
					
						
						
							
							parser: Implement xmlCtxtParseContent  
						
						... 
						
						
						
						This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.
Fixes  #727 . 
						
						
					 
					
						2024-07-11 01:26:32 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						2e63656ec6 
					 
					
						
						
							
							parser: Check return value of inputPush  
						
						... 
						
						
						
						inputPush typically doesn't fail because we pre-allocate the input
table. The return value should be checked nevertheless. 
						
						
					 
					
						2024-07-08 11:27:52 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						fdfeecfe5e 
					 
					
						
						
							
							parser: Reenable ctxt->directory  
						
						... 
						
						
						
						Unused internally, but used in downstream code.
Should fix  #753 . 
						
						
					 
					
						2024-07-02 22:06:53 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						30ef77554b 
					 
					
						
						
							
							parser: Don't use deprecated xmlCopyChar  
						
						
						
						
					 
					
						2024-07-02 13:34:11 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						dd8e378513 
					 
					
						
						
							
							HTML: Rework UTF8ToHtml  
						
						... 
						
						
						
						Optimize code. Check for XML_ENC_ERR_SPACE. Use error macros. 
						
						
					 
					
						2024-07-01 18:05:40 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						f505dcaea0 
					 
					
						
						
							
							tree: Remove underscores from xmlRegisterCallbacks  
						
						
						
						
					 
					
						2024-06-27 14:45:35 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						1112699cfa 
					 
					
						
						
							
							legacy: Remove most legacy functions from public headers  
						
						... 
						
						
						
						Also remove warning messages. 
						
						
					 
					
						2024-06-17 15:47:42 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						039ce1e821 
					 
					
						
						
							
							parser: Pass global object to sax->setDocumentLocator  
						
						... 
						
						
						
						Revert part of commit c011e760Fixes  #732 . 
						
						
					 
					
						2024-06-14 16:41:43 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						89fcae4dfd 
					 
					
						
						
							
							parser: Don't report malloc failures when creating context  
						
						... 
						
						
						
						We don't want messages to stderr before an error handler could be set on
a parser context. 
						
						
					 
					
						2024-06-12 16:36:12 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						e75e878e02 
					 
					
						
						
							
							doc: Update and fix documentation  
						
						
						
						
					 
					
						2024-05-20 14:23:39 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						a4c2b7233f 
					 
					
						
						
							
							io: Don't set close callback in xmlParserInputBufferCreateFd  
						
						
						
						
					 
					
						2024-05-05 17:27:12 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						05654cfe00 
					 
					
						
						
							
							html: Deprecate htmlHandleOmittedElem  
						
						
						
						
					 
					
						2024-04-28 18:58:27 +02:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						aa04838eab 
					 
					
						
						
							
							html: Use binary search in htmlEntityValueLookup  
						
						
						
						
					 
					
						2024-03-26 14:21:11 +01:00 
						 
				 
			
				
					
						
							
							
								Nick Wellnhofer 
							
						 
					 
					
						
						
							
						
						3efbe916a1 
					 
					
						
						
							
							parser: Mark 'token' member as unused in xmlParserCtxt  
						
						
						
						
					 
					
						2024-01-05 20:39:40 +01:00