1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

48 Commits

Author SHA1 Message Date
Nick Wellnhofer
58fc89e8a9 Deprecate internal parser functions 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
a308c0cdf7 Deprecate old HTML SAX API 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
34a050cdee Move some HTML functions to correct header file 2022-08-24 16:44:39 +02:00
Nick Wellnhofer
9a82b94a94 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt
Add API functions to create a parser context with a custom SAX handler
without having to mess with ctxt->sax manually.
2022-08-24 14:07:55 +02:00
Nick Wellnhofer
576912fa04 Make HTML parser functions take const pointers
The 'cur' parameter of htmlParseDoc and htmlSAXParseDoc should be
'const xmlChar *'.

Fixes bug 770650.
2017-06-17 15:59:13 +02:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Daniel Veillard
f1121c48af Add an HTML parser option to avoid a default doctype
- include/libxml/HTMLparser.h: defines the new HTML parser option
  HTML_PARSE_NODEFDTD
- HTMLparser.c: if option is set don't add a default DTD
- xmllint.c: add the corresponding --nodefdtd option in xmllint
2010-07-26 14:02:42 +02:00
Daniel Veillard
e20fb5a72c Fix xmlParseInNodeContext for HTML content
xmlParseInNodeContext notices that the enclosing document is
an HTML document, so invoke the HTML parser for that fragment, and
the HTML parser finding a "<p>hello world!</p>" document automatically
augment it with defaulted <html> and <body>. This defaulting should
be turned off in the HTML parser for this to work, but there is no
such HTML parser option. There is an htmlOmittedDefaultValue global
variable that you could use, but really we should not rely on global
variable for processing options anymore, best is to add an
HTML_PARSE_NOIMPLIED.
* include/libxml/HTMLparser.h: add the HTML_PARSE_NOIMPLIED parser flag
* HTMLparser.c: do add implied element if HTML_PARSE_NOIMPLIED is set
* parser.c: add HTML_PARSE_NOIMPLIED to options for xmlParseInNodeContext
  on HTML documents
2010-01-29 20:47:08 +01:00
Daniel Veillard
34c647cfae exports htmlNewParserCtxt() as Michael Day pointed out this is needed to
* HTMLparser.c include/libxml/HTMLparser.h: exports htmlNewParserCtxt()
  as Michael Day pointed out this is needed to use htmlCtxtRead*()
Daniel
2006-09-21 06:53:59 +00:00
Daniel Veillard
8874b94cd2 added a parser XML_PARSE_COMPACT option to allocate small text nodes (less
* HTMLparser.c parser.c SAX2.c debugXML.c tree.c valid.c xmlreader.c
  xmllint.c include/libxml/HTMLparser.h include/libxml/parser.h:
  added a parser XML_PARSE_COMPACT option to allocate small
  text nodes (less than 8 bytes on 32bits, less than 16bytes on 64bits)
  directly within the node, various changes to cope with this.
* result/XPath/tests/* result/XPath/xptr/* result/xmlid/*: this
  slightly change the output
Daniel
2005-08-25 13:19:21 +00:00
Daniel Veillard
ea4b0baef2 added a recovery mode for the HTML parser based on the suggestions of bug
* HTMLparser.c include/libxml/HTMLparser.h: added a recovery mode
  for the HTML parser based on the suggestions of bug #169834 by
  Paul Loberg
Daniel
2005-08-23 16:06:08 +00:00
Daniel Veillard
a2351322c8 hack based on Arjan van de Ven suggestion to reduce ELF footprint and
* elfgcchack.h doc/elfgcchack.xsl libxml.h: hack based on Arjan van de
  Ven suggestion to reduce ELF footprint and generated code. Based on
  aliasing of libraries function to generate direct call instead of
  indirect ones
* doc/libxml2-api.xml doc/Makefile.am doc/apibuild.py: added automatic
  generation of elfgcchack.h based on the API description, extended
  the API description to show the conditionals configuration flags
  required for symbols.
* nanohttp.c parser.c xmlsave.c include/libxml/*.h: lot of cleanup
* doc/*: regenerated the docs.
Daniel
2004-06-27 12:08:10 +00:00
Daniel Veillard
be5869729a modified the file header to add more informations, painful... updated to
* include/libxml/*.h include/libxml/*.h.in: modified the file
  header to add more informations, painful...
* genChRanges.py genUnicode.py: updated to generate said changes
  in headers
* doc/apibuild.py: extract headers, add them to libxml2-api.xml
* *.html *.xsl *.xml: updated the stylesheets to flag geprecated
  APIs modules. Updated the stylesheets, some cleanups, regenerated
* doc/html/*.html: regenerated added back book1 and libxml-lib.html
Daniel
2003-11-18 20:56:51 +00:00
Daniel Veillard
73b013fc17 added a new configure option --with-push, some cleanups, chased code size
* HTMLparser.c Makefile.am configure.in legacy.c parser.c
  parserInternals.c testHTML.c xmllint.c include/libxml/HTMLparser.h
  include/libxml/parser.h include/libxml/parserInternals.h
  include/libxml/xmlversion.h.in: added a new configure
  option --with-push, some cleanups, chased code size anomalies.
  Now a library configured --with-minimum is around 150KB,
  sounds good enough.
Daniel
2003-09-30 12:36:01 +00:00
Daniel Veillard
9475a352bd added the same htmlRead APIs than their XML counterparts new parser
* HTMLparser.c testHTML.c xmllint.c include/libxml/HTMLparser.h:
  added the same htmlRead APIs than their XML counterparts
* include/libxml/parser.h: new parser options, not yet implemented,
  added an options field to the context.
* tree.c: patch from Shaun McCance to fix bug #123238 when ]]>
  is found within a cdata section.
* result/noent/cdata2 result/cdata2 result/cdata2.rdr
  result/cdata2.sax test/cdata2: add one more cdata test
Daniel
2003-09-26 12:47:50 +00:00
Igor Zlatkovic
76874e4516 Exportability taint of the headers 2003-08-25 09:05:12 +00:00
Daniel Veillard
2fdbd32d51 new dictionary module to keep a single instance of the names used by the
* dict.c include/libxml/dict.h Makefile.am include/libxml/Makefile.am:
  new dictionary module to keep a single instance of the names used
  by the parser
* DOCBparser.c HTMLparser.c parser.c parserInternals.c valid.c:
  switched all parsers to use the dictionary internally
* include/libxml/HTMLparser.h include/libxml/parser.h
  include/libxml/parserInternals.h include/libxml/valid.h:
  Some of the interfaces changed as a result to receive or return
  "const xmlChar *" instead of "xmlChar *", this is either
  insignificant from an user point of view or when the returning
  value changed, those function are really parser internal methods
  that no user code should really change
* doc/libxml2-api.xml doc/html/*: the API interface changed and
  the docs were regenerated
Daniel
2003-08-18 12:15:38 +00:00
William M. Brack
7a82165d2e Minor changes to comments, etc. for improving documentation generation
* encoding.c, threads.c, include/libxml/HTMLparser.h,
  doc/libxml2-api.xml: Minor changes to comments, etc. for
  improving documentation generation
* doc/Makefile.am: further adjustment to auto-generation of
  win32/libxml2.def.src
2003-08-15 07:27:40 +00:00
Daniel Veillard
02ea141495 exported htmlCreateMemoryParserCtxt() it was static Daniel
* HTMLparser.c include/libxml/HTMLparser.h:  exported
  htmlCreateMemoryParserCtxt() it was static
Daniel
2003-04-09 12:08:47 +00:00
Daniel Veillard
930dfb6324 applied HTML improvements from Nick Kew, allowing to do more checking to
* HTMLparser.c include/libxml/HTMLparser.h: applied HTML
  improvements from Nick Kew, allowing to do more checking
  to HTML elements and attributes.
Daniel
2003-02-05 10:17:38 +00:00
Daniel Veillard
1b31e4a0b2 fixing #79334 making htmlParseDocument a public entry point. rebuilt the
* HTMLparser.c win32/libxml2.def.src win32/dsp/libxml2.def.src
  include/libxml/HTMLparser.h: fixing #79334 making htmlParseDocument
  a public entry point.
* doc/*: rebuilt the API and docs
Daniel
2002-05-27 14:44:50 +00:00
Daniel Veillard
61f261749f Heiko W. Rupp fixed a lot of comments to generate better API descriptions
* include/libxml/*.h: Heiko W. Rupp fixed a lot of comments
  to generate better API descriptions etc...
Daniel
2002-03-12 18:46:39 +00:00
Daniel Veillard
963d2ae415 cleanup patch from Anthony Jones fix the headers to avoid in make scan
* SAX.c: cleanup patch from Anthony Jones
* doc/Makefile.am: fix the headers to avoid in make scan
* parserInternals.c xpath.c include/libxml/*.h: cleanup of the
  includes, * vs Ptr and general cleanup
* parsedecl.py: first version of a script to extract the
  module interfaces, the goal will be to provide .decl or XML
  specification of the interfaces to build wrappers.
Daniel
2002-01-20 22:08:18 +00:00
Daniel Veillard
cbaf399537 applied 42 documentation patches from Charlie Bozeman. Regenerated the
* *.c include/libxml/*.h doc/html/*: applied 42 documentation
  patches from Charlie Bozeman. Regenerated the HTML docs.
Daniel
2001-12-31 16:16:02 +00:00
Daniel Veillard
bb3712974b trying to fix some troubles w.r.t. function returning const xxxPtr. Daniel
* HTMLparser.c HTMLtree.c include/libxml/HTMLparser.h:
  trying to fix some troubles w.r.t. function returning
  const xxxPtr.
Daniel
2001-08-16 23:26:59 +00:00
Daniel Veillard
220907319a cleanup of global variables, marking some const or private. Daniel
* include/libxml/parserInternals.h include/libxml/HTMLparser.h
  xmlIO.c tree.c parserInternals.c entities.c encoding.c
  HTMLparser.c: cleanup of global variables, marking some
  const or private.
Daniel
2001-07-16 00:06:07 +00:00
Daniel Veillard
c5d64345cf Summer's cleanup, a really big one:
* AUTHORS: added William and Bjorn
* include/libxml/*.h *.c README doc/*.html etc.: changed old email to
  daniel@veillard.com hopefully I won't have to do this again
* doc/Makefile.am doc/html/*.html: cleanup makefile, checked that
  docs can be rebuilt cleanly now
* include/libxml/xml*version.h*: removed include/libxml/xmlversion.h
  from CVs it's generated, added include/libxml/xmlwin32version.h
  also generated but which should change far less frequently.
* catalog.c nanoftp.c: made sure to include libxml.h not
  libxml/xmlversion.h directly
* include/libxml/*.h: include xmlwin32version.h instead of xmlversion.h
  when compiling on WIN32 and MSC
Daniel
2001-06-24 12:13:24 +00:00
Daniel Veillard
02bb170a8b - HTMLparser.[ch] HTMLtree.c: stored the inline/block property
of element and use it to avoid outputting formatting spaces at
  the wrong place. Implemented the format parameter for HTML save.
- result/HTML/doc2.htm result/HTML/doc3.htm result/HTML/fp40.htm
  result/HTML/script.html result/HTML/test2.html result/HTML/test3.html
  result/HTML/wired.html: of course this impact the result of a
  number of HTML tests
Daniel
2001-06-13 21:11:59 +00:00
Daniel Veillard
56a4cb8c4d Huge cleanup, I switched to compile with
-Wall -g -O -ansi -pedantic -W -Wunused -Wimplicit
-Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat
-Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow
-Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline
- HTMLparser.[ch] HTMLtree.c SAX.c debugXML.c encoding.[ch]
  encoding.h entities.c error.c list.[ch] nanoftp.c
  nanohttp.c parser.[ch] parserInternals.[ch] testHTML.c
  testSAX.c testURI.c testXPath.c tree.[ch] uri.c
  valid.[ch] xinclude.c xmlIO.[ch] xmllint.c xmlmemory.c
  xpath.c xpathInternals.h xpointer.[ch] example/gjobread.c:
  Cleanup, staticfied a number of non-exported functions,
  detected and cleaned up a dozen of problem found this way,
  avoided a lot of public function name/typedef/system names clashes
- doc/xml.html: updated
- configure.in: switched private flags to the really pedantic ones.
Daniel
2001-03-24 17:00:36 +00:00
Owen Taylor
3473f88a7a Revert directory structure changes 2001-02-23 17:55:21 +00:00
CET 2001 Tomasz Koczko
64636e7f6e moved to libxml directory - this allow simplify automake/autoconf. Now
Thu Feb 23 02:03:56 CET 2001 Tomasz Koczko <kloczek@pld.org.pl>

        * *.c *.h libxml files: moved to libxml directory - this allow
	  simplify automake/autoconf. Now isn't neccessary hack on
	  am/ac level for make and remove libxml symlink (modified for this
	  also configure.in and main Makefile.am). Now automake abilities
	  are used in best way (like in many other projects with libraries).
	* include/win32config.h: moved to libxml directory (now include
	  directory isn't neccessary).
	* Makefile.am, examples/Makefile.am, libxml/Makefile.am:
	  added empty DEFS and in INCLUDES rest only -I$(top_builddir) -
	  this allow minimize parameters count passed to libtool script
	  (now compilation is also slyghtly more quiet).
	* configure.in: simplifies libzdetestion - prepare separated
	  variables for keep libz name and path to libz header files isn't
	  realy neccessary (if someone have libz installed in non standard
	  prefix path to header files ald library can be passed as:
	  $ CFALGS="-I</libz.h/path>" LDFLAGS="-L</libz/path>" ./configure
	* autogen.sh: check now for libxml/entities.h.

	After above building libxml pass correctly and also pass
	"make install DESTDIR=</install/prefix>" from tar ball generated by
	"make dist". Seems ac/am reorganization is finished. This changes
	not touches any other things on *.{c,h} files level.
2001-02-23 01:37:32 +00:00
Daniel Veillard
f41fbbf6a9 testing and bug fixing related to XSLT:
- xpath.c result/XPath/tests/chaptersprefol: bugfixes on order and
  on predicate
- HTMLparser.[ch] HTMLtree.c result/HTML/doc3.htm.err
  result/HTML/doc3.htm.sax result/HTML/wired.html: sometimes one
  really want to have tags closed on output even if we accept
  unclosed ones on input
Daniel
2001-02-13 17:05:35 +00:00
Daniel Veillard
a6d8eb6256 Finally had a bit of time to resynch both trees:
- HTMLparser.[ch]: added a way to avoid adding automatically
  omitted tags. htmlHandleOmittedElem() allows to change the
  default handling.
- tree.[ch] xmllint.c: added xmlDocDumpFormatMemory() and
  xmlDocDumpFormatMemoryEnc(), uses memory functions for output
  of xmllint too when using --memory flag, added a memory test
  suite at the Makefile level.
- xpathInternals.h xpath.[ch] xpointer.c: fixed problems
  with namespace use when encountering QNames in XPath evalation,
  added xmlns() scheme in XPointer.
- nanoftp.c : incorporated a fix
- parser.c xmlIO.c: fixed problems raised with encoding when using
  the memory I/O
- parserInternals.c: closed bug 25934 reported by
  torsten.landschoff@innominate.de
- TODO: updated
Daniel
2000-12-27 10:46:47 +00:00
Daniel Veillard
47e12f2318 HTML attributes handling:
- SAX.c: HTML attributes need normalization too (Bjorn Reese)
- HTMLparser.[ch]: addded htmlIsScriptAttribute()
Daniel
2000-10-15 14:24:25 +00:00
Daniel Veillard
e010c17d78 Mostly HTML generation and parsing enhancements:
- HTMLparser.[ch] testHTML.c: applied the second set of
  patches from Wayne Davison <wayned@blorf.net>, adding
  htmlEncodeEntities()
- HTMLparser.c: fixed an ignorable white space detection bug
  occuring when parsing with SAX only
- result/HTML/*.sax: updated since the output is now HTML
  encoded...
Daniel.
2000-08-28 10:04:51 +00:00
Daniel Veillard
47f3f31f83 - HTMLparser.[ch]: applied some of Wayne Davison <wayned@blorf.net> patches
Daniel
2000-08-27 22:40:15 +00:00
Daniel Veillard
32bc74ef98 - doc/encoding.html doc/xml.html: added I18N doc
- encoding.[ch] HTMLtree.[ch] parser.c HTMLparser.c: I18N encoding
  improvements, both parser and filters, added ASCII & HTML,
  fixed the ISO-Latin-1 one
- xmllint.c testHTML.c: added/made visible --encode
- debugXML.c : cleanup
- most .c files: applied patches due to warning on Windows and
  when using Sun Pro cc compiler
- xpath.c : cleanup memleaks
- nanoftp.c : added a TESTING preprocessor flag for standalong
  compile so that people can report bugs more easilly
- nanohttp.c : ditched socklen_t which was a portability mess
  and replaced it with unsigned int.
- tree.[ch]: added xmlHasProp()
- TODO: updated
- test/ : added more test for entities, NS, encoding, HTML, wap
- configure.in: preparing for 2.2.0 release
Daniel
2000-07-14 14:49:25 +00:00
Daniel Veillard
361d845de0 Work done on the plane, ready to release libxml2-2.0.0, Daniel 2000-04-03 19:48:13 +00:00
Daniel Veillard
71b656e067 - added xmlRemoveID() and xmlRemoveRef()
- added check and handling when possibly removing an ID
- fixed some entities problems
- added xmlParseTryOrFinish()
- changed the way struct aredeclared to allow gtk-doc to expose those
- closed #4960
- fixes to libs detection from Albert Chin-A-Young
- preparing 1.8.3 release
Daniel
2000-01-05 14:46:17 +00:00
Daniel Veillard
5e5c62351f - Push mode for the HTML parser (new calls)
- Improved the memory debugger to provide content informations
- cleanups, last known mem leak killed
Daniel
1999-12-29 12:49:06 +00:00
Daniel Veillard
5cb5ab8d94 - release 1.8.2 - HTML handling improvement - new tree handling functions
- release 1.8.2
- HTML handling improvement
- new tree handling functions
- default namespace on attribute bug fixed
- libxml use for C++ fixed (for good this time !)
Daniel
1999-12-21 15:35:29 +00:00
Daniel Veillard
f600e2537f - Fixed bug #4344 - Fixed C++ problems in headers - Released 1.8.1 Daniel
- Fixed bug #4344
- Fixed C++ problems in headers
- Released 1.8.1
Daniel
1999-12-18 15:32:46 +00:00
Daniel Veillard
dd6b36766f Fixed CHAR, errno, alpha RPM compile, updated doc, Daniel 1999-09-23 22:19:22 +00:00
Daniel Veillard
b96e643849 Release 1.6, lot of fixes, more validation, code cleanup, added namespace
on attributes, Daniel.
1999-08-29 21:02:19 +00:00
Daniel Veillard
82150d8a99 HTML parsing, output is now correct, added HTMLtests target and testcases, Daniel 1999-07-07 07:32:15 +00:00
Daniel Veillard
5233ffc8d3 Restore binary compat, more HTML stuff, allow stdin input, Daniel. 1999-07-06 22:25:25 +00:00
Daniel Veillard
be70ff7162 Closing reported bugs: 617 1591 1592, adding an HTML parser, Daniel 1999-07-05 16:50:46 +00:00