1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

1049 Commits

Author SHA1 Message Date
Nick Wellnhofer
9b5cce7a71 include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
Nick Wellnhofer
d6ba403368 globals: Move remaining declarations to correct places
globals.h is now deprecated. Sanity is restored.
2023-09-20 22:22:51 +02:00
Nick Wellnhofer
1117fae040 include: Remove unneeded includes 2023-09-20 22:07:41 +02:00
Nick Wellnhofer
736327df6b include: Break inclusion cycle between tree.h and xmlregexp.h 2023-09-20 22:07:41 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
11a1839ddd globals: Move remaining globals back to correct header files
This undoes a lot of damage.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
7909ff08e2 include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
eb985d6f8e globals: Move error globals back to xmlerror.c 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
d1336fd393 globals: Move malloc hooks back to xmlmemory.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
a77f9ab84c globals: Don't include SAX2.h from globals.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
2e6c49a74d globals: Don't store xmlParserVersion in global state
This is a constant.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
0830fcfa90 globals: Deprecate xmlLastError
The last error should be accessed with xmlGetLastError.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
db8b9722cb parser: Deprecate global parser options
Note that setting global options has no effect anyway when using any of
the modern parser API functions which take an option argument like
xmlReadMemory or when using xmlCtxtUseOptions.

Global options only have an effect when using old API functions
xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling
xmlCtxtUseOptions.

Unfortunately, many downstream projects still modify global parser
options often without realizing that it has no effect. If necessary,
switch to the modern API. Then you can safely remove all code that
changes global options.

Here's a list of deprecated functions and global variables together with
the corresponding parser options.

- xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue
  Parser option XML_PARSE_NOENT

- xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue
  Inverse of parser option XML_PARSE_NOBLANKS

- xmlPedanticParserDefault, xmlPedanticParserDefaultValue
  Parser option XML_PARSE_PEDANTIC

- xmlLineNumbersDefault, xmlLineNumbersDefaultValue
  Always enabled by new API

- xmlDoValidityCheckingDefaultValue
  Parser option XML_PARSE_DTDVALID

- xmlGetWarningsDefaultValue
  Inverse of parser option XML_PARSE_NOWARNING

- xmlLoadExtDtdDefaultValue
  Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
868b94b80e globals: Reformat libxml/globals.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
bbf08608fc globals: Move buffer callback declarations to xmlIO.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
dc3382ef97 globals: Move xmlRegisterNodeDefault to tree.c
Code in globals.c must not try to access globals itself since the
accessor macros aren't defined and we would only see the main
variable.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
e7b6ca156f globals: Rework global state destruction on Windows
If DllMain is used, rely on it working as expected. The old code seemed
to attempt to free global state of other threads if, for some reason,
the DllMain mechanism didn't work.

In a static build, register a destructor with
RegisterWaitForSingleObject.

Make public functions xmlGetGlobalState and xmlInitializeGlobalState
no-ops.

Move initialization and registration of global state objects to
xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage
which can be inlined nicely.

Also cleanup global state when using TLS. xmlLastError must be reset.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
39a275a541 globals: Define globals using macros
Declare and define globals and helper functions by (ab)using the
preprocessor.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
bf6bd16154 globals: Introduce xmlCheckThreadLocalStorage
Checks whether (emulated) thread-local storage could be allocated.
2023-09-20 22:06:43 +02:00
Nick Wellnhofer
89f4976728 globals: Make xmlGlobalState private
This removes a public struct but it seems impossible to use its members
in a sensible way from external code.
2023-09-19 17:36:29 +02:00
Nick Wellnhofer
a07ec7c1a7 threads: Move library initialization code to threads.c
This allows to consolidate the initialization code since the global init
lock was already implemented in threads.c.
2023-09-19 17:35:12 +02:00
Nick Wellnhofer
4e1c13ebfd debug: Remove debugging code
This is barely useful these days and only clutters the code base.
2023-09-19 17:35:09 +02:00
Nick Wellnhofer
c19771c1f1 globals: Move code from threads.c to globals.c
Move all code that handles globals to the place where it belongs.
2023-09-19 17:34:38 +02:00
Nick Wellnhofer
2a4b811424 globals: Rename members of xmlGlobalState
This is a deliberate first step to remove some internals from the
public API and to avoid issues when redefining tokens.
2023-09-19 17:34:30 +02:00
Nick Wellnhofer
edc2dd48cb dict: Update hash function
Update hash function from classic Jenkins OAAT (dict.c) and a variant of
DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash
function passes all SMHasher tests.
2023-09-04 16:07:23 +02:00
Nick Wellnhofer
57cfd221a6 dict: Use xoroshiro64** as PRNG
Stop using rand_r. This enables hash randomization on all platforms.
2023-09-01 14:52:04 +02:00
Nick Wellnhofer
778cca386d legacy: Add stubs for disabled modules
When legacy support is requested, always enable stubs for FTP and
XPointer location modules which were removed from the standard
configuration. Going forward, the --with-legacy configuration option
should be used to provide maximum ABI compatibility.

Fixes #433.
2023-08-20 23:16:12 +02:00
Nick Wellnhofer
ed3bd05284 parser: Allow to set maximum amplification factor 2023-08-20 20:49:16 +02:00
Nick Wellnhofer
f1c1f5c6b4 parser: Revert change to doc->encoding
Fixes #579.
2023-08-17 12:47:14 +02:00
Nick Wellnhofer
95e81a360c parser: Decode all data in xmlCharEncInput
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.

Remove the flush flag and always decode all available data.

Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.
2023-08-08 15:21:31 +02:00
Nick Wellnhofer
834b8123ef parser: Stream data when reading from memory
Don't create a copy of the whole input buffer. Read the data chunk by
chunk to save memory.

Historically, it was probably envisioned to read data from memory
without additional copying. This doesn't work reliably with the current
design of the XML parser which requires a terminating null byte at the
end of input buffers. This lead to xmlReadMemory interfaces, which
expect pointer and size arguments, being changed to make a
zero-terminated copy of the input buffer. Interfaces based on
xmlReadDoc, which actually expect a zero-terminated string and
would make zero-copy operation work, were then simplified to rely on
xmlReadMemoryi, resulting in an unnecessary copy.

To avoid copying (possibly gigabytes) of memory temporarily, we now
stream in-memory input just like content read from files in a
chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
250 bytes). As a side effect, we also avoid another copy of the whole
input when handling non-UTF-8 data which was made possible by some
earlier commits.

Interfaces expecting zero-terminated strings now make use of strnlen
which unfortunately isn't part of the standard C library and only
mandated since POSIX 2008.
2023-08-08 15:21:28 +02:00
Nick Wellnhofer
59fa0bb383 parser: Simplify input pointer updates
The base member always points to the beginning of the buffer.
2023-08-08 15:21:14 +02:00
Nick Wellnhofer
ec7be50662 parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.

Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.

Introduce private helper functions to switch encodings used by both the
XML and HTML parser:

- xmlDetectEncoding which skips over the BOM, allowing to remove the
  BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
  about encoding mismatches.

If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.

Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)

The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.

The 'standalone' member of xmlParserInput is renamed to 'flags'.

A new parser state XML_PARSER_XML_DECL is added for the push parser.
2023-08-08 15:19:46 +02:00
Nick Wellnhofer
b8961df65d SAX: Always validate xml:ids
The behavior shouldn't depend on mostly random configuration options.
2023-05-09 03:25:24 +02:00
Nick Wellnhofer
8d5e33ef3e Fix compiler warning on GCC < 8
-Wcast-function-type is only available since GCC 8.
2023-05-03 20:42:10 +02:00
Nick Wellnhofer
fc69cf568b parser: Move xmlFatalErr to parserInternals.c 2023-04-30 17:51:29 +02:00
Nick Wellnhofer
3ff6abbf58 encoding: Rework error codes
Use an enum instead of magic numbers. Fix a few error codes. Simplify
handling of "space" and "partial" errors.

See #506.
2023-04-30 16:43:29 +02:00
Nick Wellnhofer
fa993130f9 xpath: Remove remaining references to valueFrame
Fixes #529.
2023-04-30 13:18:17 +02:00
Nick Wellnhofer
3ffcc03b16 parser: Deprecate more internal functions 2023-04-26 20:23:23 +02:00
Nick Wellnhofer
98840d40da parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.

Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
2023-03-21 21:35:15 +01:00
Nick Wellnhofer
04d1bedd8c parser: Rework shrinking of input buffers
Don't try to grow the input buffer in xmlParserShrink. This makes sure
that no memory allocations are made and the function always succeeds.

Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of
DTD parsing loops.

Shrink before growing.
2023-03-21 13:19:18 +01:00
Nick Wellnhofer
b167c73144 parser: Fix short-lived regression causing infinite loops
Fix 3eb6bf03. We really have to halt the parser, so the input buffer
gets reset.
2023-03-14 15:16:04 +01:00
Nick Wellnhofer
f8efa589e8 malloc-fail: Handle malloc failures in xmlSchemaInitTypes
Note that this changes the return value of public function
xmlSchemaInitTypes from void to int. This shouldn't break the ABI on
most platforms.

Found when investigating #500.
2023-03-14 15:14:38 +01:00
Nick Wellnhofer
d7daf9fd96 xmllint: Fix use-after-free with --maxmem
Fixes #498.
2023-03-14 14:55:34 +01:00
Nick Wellnhofer
e7c3a4ca1b parser: Deprecate some parser input functions 2023-03-13 19:19:46 +01:00
Nick Wellnhofer
2099441f32 parser: Stop calling xmlParserInputShrink
Introduce xmlParserShrink which takes a parser context to simplify error
handling.
2023-03-13 17:51:13 +01:00
Nick Wellnhofer
483793940c malloc-fail: Stop using XPath stack frames
There's too much code which assumes that if ctxt->value is non-null,
a value can be successfully popped off the stack. This assumption can
break with stack frames when malloc fails.

Instead of trying to fix all call sites, remove the stack frame logic.
It only offered very little protection against misbehaving extension
functions. We already check the stack size after a function call which
should be enough.

Found by OSS-Fuzz.
2023-03-13 17:11:27 +01:00
Nick Wellnhofer
bd63d730b8 html: Impose some length limits
Impose length limits on names, attribute values, PIs and comments,
similar to the XML parser.
2023-03-12 17:40:55 +01:00
Nick Wellnhofer
3eb6bf0386 parser: Stop calling xmlParserInputGrow
Introduce xmlParserGrow which takes a parser context to simplify error
handling.
2023-03-12 17:05:51 +01:00
Nick Wellnhofer
b51478dc95 Revert "malloc-fail: Avoid use-after-free after unsuccessful valuePush"
This reverts commit 6a12be77c6.

There's too much code reading ctxt->value directly and making the wrong
assumptions.
2023-02-26 13:23:47 +01:00