1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00
Commit Graph

202 Commits

Author SHA1 Message Date
Nick Wellnhofer
1112699cfa legacy: Remove most legacy functions from public headers
Also remove warning messages.
2024-06-17 15:47:42 +02:00
Nick Wellnhofer
5fca9498fd doc: Hide internal macro 2024-06-16 19:56:08 +02:00
Nick Wellnhofer
387f0c784f include: Readd circular dependency between tree.h and parser.h
There are dozens of downstream projects that only include tree.h but use
declarations from parser.h. This broke after the recent cleanup of
circular dependencies.

Make tree.h include parser.h again. This is a hack but doesn't change
the include directory struture.

This commit only made it into the 2.12 branch but wasn't applied to
master, so the issue turned up in 2.13.0 again.

Should fix #734.
2024-06-15 16:27:54 +02:00
Nick Wellnhofer
712a31abe4 parser: Deprecate most public struct members
This will probably cause many warnings in downstream code abusing
libxml2 internals, but we can always undeprecate some members later.
2024-06-13 18:04:34 +02:00
Nick Wellnhofer
5238404325 parser: Pass resource type to resource loader 2024-06-12 16:36:12 +02:00
Nick Wellnhofer
64ad272525 parser: Introduce per-context resource loader 2024-06-12 16:22:52 +02:00
Nick Wellnhofer
ff3b091910 parser: Implement XML_PARSE_NO_UNZIP option 2024-06-12 16:14:15 +02:00
Nick Wellnhofer
5b1d7ff0b2 parser: Remove redefinitions for legacy globals 2024-05-20 23:59:55 +02:00
Nick Wellnhofer
8961056f9b parser: Make experimental input API private
This needs to be reworked.
2024-01-23 00:47:44 +01:00
Nick Wellnhofer
02cc5c3609 parser: Add XML_PARSE_NO_XXE parser option 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
12f0bb9478 parser: Synchronize more options 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
3efbe916a1 parser: Mark 'token' member as unused in xmlParserCtxt 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
b82fd81d06 parser: Rework xmlCtxtParseDocument
Make xmlCtxtParseDocument take a parser input which can be popped after
parsing.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
d7d300ba04 parser: Remove remnants of runtime debugging feature
Apparently, this feature was remove long ago.

Fixes #651.
2024-01-04 17:50:11 +01:00
Nick Wellnhofer
875bb08489 parser: Implement xmlCtxtSetOptions
Surprisingly, some options can only be enabled with xmlCtxtUseOptions
and it's impossible to unset them. Add a new API function
xmlCtxtSetOptions which sets or clears all options.

Finally document all parser options.

Make sure to synchronize option bits and struct members.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
2b79f106ff parser: Simplify entity size accounting 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
7e0bbbc143 parser: New input API
Provide a new set of functions to create xmlParserInputs. These can be
used for the document entity or from external entity loaders.

- Don't require xmlParserInputBuffer.
- All functions take a base URI.
- All functions take an encoding as string.
- xmlNewInputURL also takes a public ID.
- xmlNewInputMemory takes a size_t.
- Optimization hints for memory buffers.

Improve documentation.

Only call xmlInitParser before allocating a new parser context.

Call xmlCtxtUseOptions as early as possible.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
a5dcf0f422 parser: Mark more parser context members as unused 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
6a9a88a17f parser: Move progressive flag into input struct 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
d944a41515 parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.

Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
c1bddd4c26 parser: Mark 'length' member of xmlParserInput as unused 2023-12-25 23:38:40 +01:00
Nick Wellnhofer
955c177f69 parser: Stop using 'directory' struct member
This was only used as a pointless fallback for URI resolution.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
54c70ed57f parser: Improve error handling
Introduce xmlCtxtSetErrorHandler allowing to set a structured error for
a parser context. There already was the "serror" SAX handler but this
always receives the parser context as argument.

Start to use xmlRaiseMemoryError.

Remove useless arguments from memory error functions. Rename
xmlErrMemory to xmlCtxtErrMemory.

Remove a few calls to xmlGenericError.

Remove support for runtime entity debugging.
2023-12-21 02:46:27 +01:00
Nick Wellnhofer
5d2dbe79fa parser: Fix build --without-output
Fixes #647
2023-12-14 13:48:41 +01:00
Nick Wellnhofer
df0b540b3e include: Rename XML_EMPTY helper macro
Avoid name clash with downstream projects.
2023-12-07 14:59:47 +01:00
Nick Wellnhofer
a9738e311c include: Move declaration of xmlInitGlobals
Fix downstream build issues after reworking globals.h.
2023-12-07 14:59:40 +01:00
Nick Wellnhofer
9122ad0ce6 include: Move globals from xmlsave.h to parser.h
Fix downstream build issues after reworking globals.h.
2023-12-07 12:31:06 +01:00
Nick Wellnhofer
c011e7605d globals: Remove unused globals from thread storage
Setting these deprecated globals hasn't had an effect for a long time.
Make them constants. This reduces the size of per-thread storage from
~700 to ~250 bytes.
2023-12-06 20:07:54 +01:00
Nick Wellnhofer
ff6c318862 include: Remove useless 'const' from function arguments 2023-11-23 15:27:00 +01:00
Nick Wellnhofer
aca37d8c77 parser: Only enable SAX2 if there are SAX2 element handlers
This reverts part of commit 235b15a5 for backward compatibility and
adds some comments trying to clarify the whole mess.

Fixes #623.
2023-11-20 15:20:37 +01:00
Nick Wellnhofer
e0dd330b8f parser: Use hash tables to avoid quadratic behavior
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.

Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.

Reuse hash value from the dictionary to avoid computing them twice.

See #346.
2023-09-29 12:43:22 +02:00
Nick Wellnhofer
8c084ebdc7 doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
Nick Wellnhofer
72262030a6 parser: Readd some includes to parser.h and xmlreader.h
Fix backward compatibility.
2023-09-21 15:06:05 +02:00
Nick Wellnhofer
da274bfa55 build: Fix build when certain modules are disabled 2023-09-21 02:26:43 +02:00
Nick Wellnhofer
d6ba403368 globals: Move remaining declarations to correct places
globals.h is now deprecated. Sanity is restored.
2023-09-20 22:22:51 +02:00
Nick Wellnhofer
11a1839ddd globals: Move remaining globals back to correct header files
This undoes a lot of damage.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
d1336fd393 globals: Move malloc hooks back to xmlmemory.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
2e6c49a74d globals: Don't store xmlParserVersion in global state
This is a constant.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
db8b9722cb parser: Deprecate global parser options
Note that setting global options has no effect anyway when using any of
the modern parser API functions which take an option argument like
xmlReadMemory or when using xmlCtxtUseOptions.

Global options only have an effect when using old API functions
xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling
xmlCtxtUseOptions.

Unfortunately, many downstream projects still modify global parser
options often without realizing that it has no effect. If necessary,
switch to the modern API. Then you can safely remove all code that
changes global options.

Here's a list of deprecated functions and global variables together with
the corresponding parser options.

- xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue
  Parser option XML_PARSE_NOENT

- xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue
  Inverse of parser option XML_PARSE_NOBLANKS

- xmlPedanticParserDefault, xmlPedanticParserDefaultValue
  Parser option XML_PARSE_PEDANTIC

- xmlLineNumbersDefault, xmlLineNumbersDefaultValue
  Always enabled by new API

- xmlDoValidityCheckingDefaultValue
  Parser option XML_PARSE_DTDVALID

- xmlGetWarningsDefaultValue
  Inverse of parser option XML_PARSE_NOWARNING

- xmlLoadExtDtdDefaultValue
  Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
ed3bd05284 parser: Allow to set maximum amplification factor 2023-08-20 20:49:16 +02:00
Nick Wellnhofer
ec7be50662 parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.

Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.

Introduce private helper functions to switch encodings used by both the
XML and HTML parser:

- xmlDetectEncoding which skips over the BOM, allowing to remove the
  BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
  about encoding mismatches.

If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.

Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)

The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.

The 'standalone' member of xmlParserInput is renamed to 'flags'.

A new parser state XML_PARSER_XML_DECL is added for the push parser.
2023-08-08 15:19:46 +02:00
Nick Wellnhofer
e7c3a4ca1b parser: Deprecate some parser input functions 2023-03-13 19:19:46 +01:00
Nick Wellnhofer
59b3366178 error: Limit number of parser errors
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
2022-12-27 14:41:19 +01:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
ce9baf94d5 Remove XMLCALL and XMLCDECL macros from public headers 2022-12-08 02:48:27 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
65dc8a63ac Make xmlNewSAXParserCtx take a const sax handler
Also improve documentation.
2022-09-01 00:17:45 +02:00
Nick Wellnhofer
51035c539e Generate deprecation warnings for old SAX API 2022-08-25 20:17:03 +02:00
Nick Wellnhofer
9a82b94a94 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt
Add API functions to create a parser context with a custom SAX handler
without having to mess with ctxt->sax manually.
2022-08-24 14:07:55 +02:00