1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

983 Commits

Author SHA1 Message Date
Nick Wellnhofer
61034116d0 error: Make more xmlError structs constant
Prepare for future changes, see 45470611.
2023-10-24 15:02:36 +02:00
Nick Wellnhofer
253f260bb1 threads: Fix --with-thread-alloc
Fixes #606.
2023-10-18 20:07:04 +02:00
Nick Wellnhofer
713ded60ad entities: Make xmlFreeEntity public 2023-10-06 10:47:07 +02:00
Nick Wellnhofer
e0dd330b8f parser: Use hash tables to avoid quadratic behavior
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.

Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.

Reuse hash value from the dictionary to avoid computing them twice.

See #346.
2023-09-29 12:43:22 +02:00
Nick Wellnhofer
b31813e60c include: Add more missing stdio.h includes 2023-09-28 15:34:08 +02:00
Nick Wellnhofer
84e1ffc813 doc: Don't document internal macros in xmlversion.h 2023-09-22 19:01:11 +02:00
Nick Wellnhofer
b94283fbda regexp: Add missing include 2023-09-22 14:23:27 +02:00
Nick Wellnhofer
45470611b0 error: Make xmlGetLastError return a const error
This is a slight break of the API, but users really shouldn't modify the
global error struct. The goal is to make xmlLastError use static buffers
for its strings eventually. This should warn people if they're abusing
the struct.
2023-09-22 13:29:07 +02:00
Nick Wellnhofer
8c084ebdc7 doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
Nick Wellnhofer
72262030a6 parser: Readd some includes to parser.h and xmlreader.h
Fix backward compatibility.
2023-09-21 15:06:05 +02:00
Nick Wellnhofer
9fc5090c05 hash: Clean up libxml/hash.h
Rename variables, fix subincludes, whitespace.
2023-09-21 14:47:25 +02:00
Nick Wellnhofer
da274bfa55 build: Fix build when certain modules are disabled 2023-09-21 02:26:43 +02:00
Nick Wellnhofer
9b5cce7a71 include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
Nick Wellnhofer
d6ba403368 globals: Move remaining declarations to correct places
globals.h is now deprecated. Sanity is restored.
2023-09-20 22:22:51 +02:00
Nick Wellnhofer
1117fae040 include: Remove unneeded includes 2023-09-20 22:07:41 +02:00
Nick Wellnhofer
736327df6b include: Break inclusion cycle between tree.h and xmlregexp.h 2023-09-20 22:07:41 +02:00
Nick Wellnhofer
11a1839ddd globals: Move remaining globals back to correct header files
This undoes a lot of damage.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
7909ff08e2 include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
eb985d6f8e globals: Move error globals back to xmlerror.c 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
d1336fd393 globals: Move malloc hooks back to xmlmemory.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
a77f9ab84c globals: Don't include SAX2.h from globals.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
2e6c49a74d globals: Don't store xmlParserVersion in global state
This is a constant.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
0830fcfa90 globals: Deprecate xmlLastError
The last error should be accessed with xmlGetLastError.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
db8b9722cb parser: Deprecate global parser options
Note that setting global options has no effect anyway when using any of
the modern parser API functions which take an option argument like
xmlReadMemory or when using xmlCtxtUseOptions.

Global options only have an effect when using old API functions
xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling
xmlCtxtUseOptions.

Unfortunately, many downstream projects still modify global parser
options often without realizing that it has no effect. If necessary,
switch to the modern API. Then you can safely remove all code that
changes global options.

Here's a list of deprecated functions and global variables together with
the corresponding parser options.

- xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue
  Parser option XML_PARSE_NOENT

- xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue
  Inverse of parser option XML_PARSE_NOBLANKS

- xmlPedanticParserDefault, xmlPedanticParserDefaultValue
  Parser option XML_PARSE_PEDANTIC

- xmlLineNumbersDefault, xmlLineNumbersDefaultValue
  Always enabled by new API

- xmlDoValidityCheckingDefaultValue
  Parser option XML_PARSE_DTDVALID

- xmlGetWarningsDefaultValue
  Inverse of parser option XML_PARSE_NOWARNING

- xmlLoadExtDtdDefaultValue
  Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
868b94b80e globals: Reformat libxml/globals.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
bbf08608fc globals: Move buffer callback declarations to xmlIO.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
dc3382ef97 globals: Move xmlRegisterNodeDefault to tree.c
Code in globals.c must not try to access globals itself since the
accessor macros aren't defined and we would only see the main
variable.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
e7b6ca156f globals: Rework global state destruction on Windows
If DllMain is used, rely on it working as expected. The old code seemed
to attempt to free global state of other threads if, for some reason,
the DllMain mechanism didn't work.

In a static build, register a destructor with
RegisterWaitForSingleObject.

Make public functions xmlGetGlobalState and xmlInitializeGlobalState
no-ops.

Move initialization and registration of global state objects to
xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage
which can be inlined nicely.

Also cleanup global state when using TLS. xmlLastError must be reset.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
39a275a541 globals: Define globals using macros
Declare and define globals and helper functions by (ab)using the
preprocessor.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
bf6bd16154 globals: Introduce xmlCheckThreadLocalStorage
Checks whether (emulated) thread-local storage could be allocated.
2023-09-20 22:06:43 +02:00
Nick Wellnhofer
89f4976728 globals: Make xmlGlobalState private
This removes a public struct but it seems impossible to use its members
in a sensible way from external code.
2023-09-19 17:36:29 +02:00
Nick Wellnhofer
4e1c13ebfd debug: Remove debugging code
This is barely useful these days and only clutters the code base.
2023-09-19 17:35:09 +02:00
Nick Wellnhofer
c19771c1f1 globals: Move code from threads.c to globals.c
Move all code that handles globals to the place where it belongs.
2023-09-19 17:34:38 +02:00
Nick Wellnhofer
2a4b811424 globals: Rename members of xmlGlobalState
This is a deliberate first step to remove some internals from the
public API and to avoid issues when redefining tokens.
2023-09-19 17:34:30 +02:00
Nick Wellnhofer
778cca386d legacy: Add stubs for disabled modules
When legacy support is requested, always enable stubs for FTP and
XPointer location modules which were removed from the standard
configuration. Going forward, the --with-legacy configuration option
should be used to provide maximum ABI compatibility.

Fixes #433.
2023-08-20 23:16:12 +02:00
Nick Wellnhofer
ed3bd05284 parser: Allow to set maximum amplification factor 2023-08-20 20:49:16 +02:00
Nick Wellnhofer
f1c1f5c6b4 parser: Revert change to doc->encoding
Fixes #579.
2023-08-17 12:47:14 +02:00
Nick Wellnhofer
ec7be50662 parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.

Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.

Introduce private helper functions to switch encodings used by both the
XML and HTML parser:

- xmlDetectEncoding which skips over the BOM, allowing to remove the
  BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
  about encoding mismatches.

If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.

Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)

The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.

The 'standalone' member of xmlParserInput is renamed to 'flags'.

A new parser state XML_PARSER_XML_DECL is added for the push parser.
2023-08-08 15:19:46 +02:00
Nick Wellnhofer
b8961df65d SAX: Always validate xml:ids
The behavior shouldn't depend on mostly random configuration options.
2023-05-09 03:25:24 +02:00
Nick Wellnhofer
8d5e33ef3e Fix compiler warning on GCC < 8
-Wcast-function-type is only available since GCC 8.
2023-05-03 20:42:10 +02:00
Nick Wellnhofer
3ff6abbf58 encoding: Rework error codes
Use an enum instead of magic numbers. Fix a few error codes. Simplify
handling of "space" and "partial" errors.

See #506.
2023-04-30 16:43:29 +02:00
Nick Wellnhofer
fa993130f9 xpath: Remove remaining references to valueFrame
Fixes #529.
2023-04-30 13:18:17 +02:00
Nick Wellnhofer
3ffcc03b16 parser: Deprecate more internal functions 2023-04-26 20:23:23 +02:00
Nick Wellnhofer
98840d40da parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.

Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
2023-03-21 21:35:15 +01:00
Nick Wellnhofer
f8efa589e8 malloc-fail: Handle malloc failures in xmlSchemaInitTypes
Note that this changes the return value of public function
xmlSchemaInitTypes from void to int. This shouldn't break the ABI on
most platforms.

Found when investigating #500.
2023-03-14 15:14:38 +01:00
Nick Wellnhofer
d7daf9fd96 xmllint: Fix use-after-free with --maxmem
Fixes #498.
2023-03-14 14:55:34 +01:00
Nick Wellnhofer
e7c3a4ca1b parser: Deprecate some parser input functions 2023-03-13 19:19:46 +01:00
Nick Wellnhofer
483793940c malloc-fail: Stop using XPath stack frames
There's too much code which assumes that if ctxt->value is non-null,
a value can be successfully popped off the stack. This assumption can
break with stack frames when malloc fails.

Instead of trying to fix all call sites, remove the stack frame logic.
It only offered very little protection against misbehaving extension
functions. We already check the stack size after a function call which
should be enough.

Found by OSS-Fuzz.
2023-03-13 17:11:27 +01:00
Nick Wellnhofer
bd63d730b8 html: Impose some length limits
Impose length limits on names, attribute values, PIs and comments,
similar to the XML parser.
2023-03-12 17:40:55 +01:00
Nick Wellnhofer
b51478dc95 Revert "malloc-fail: Avoid use-after-free after unsuccessful valuePush"
This reverts commit 6a12be77c6.

There's too much code reading ctxt->value directly and making the wrong
assumptions.
2023-02-26 13:23:47 +01:00