1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-10-26 00:57:39 +03:00
Commit Graph

524 Commits

Author SHA1 Message Date
Samuel Thibault
8543577b04 malloc: Fix checking for small negative values of tcache_key
tcache_key is unsigned so we should turn it explicitly to signed before
taking its absolute value.
2025-08-10 23:45:35 +02:00
Samuel Thibault
2536c4f858 malloc: Make sure tcache_key is odd enough
We want tcache_key not to be a commonly-occurring value in memory, so ensure
a minimum amount of one and zero bits.

And we need it non-zero, otherwise even if tcache_double_free_verify sets
e->key to 0 before calling __libc_free, it gets called again by __libc_free,
thus looping indefinitely.

Fixes: c968fe5062 ("malloc: Use tailcalls in __libc_free")
2025-08-10 09:44:08 +02:00
Wilco Dijkstra
a5e9269f51 malloc: Fix MALLOC_DEBUG
MALLOC_DEBUG only works on locked arenas, so move the call to
check_inuse_chunk from __libc_free() to _int_free_chunk().
Regress now passes if MALLOC_DEBUG is enabled.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-08-08 14:00:43 +00:00
Wilco Dijkstra
94ebcfc4f2 malloc: Remove use of __curbrk
Remove an odd use of __curbrk and use MORECORE (0) instead.
This fixes Hurd build since it doesn't define this symbol.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-08-08 13:59:31 +00:00
Wilco Dijkstra
7ab623afb9 Revert "Remove use of __curbrk."
This reverts commit 1ee0b771a9.
2025-08-04 17:31:56 +00:00
Wilco Dijkstra
91a7726374 Revert "Improve MALLOC_DEBUG"
This reverts commit 4b3e65682d.
2025-08-04 17:31:54 +00:00
Wilco Dijkstra
3191dda282 Revert "Use _int_free_chunk in tcache_thread_shutdown"
This reverts commit 05ef6a4974.
2025-08-04 17:31:49 +00:00
Wilco Dijkstra
1bf4a379e8 Revert "malloc: Cleanup libc_realloc"
This reverts commit dea1e52af3.
2025-08-04 17:31:45 +00:00
Wilco Dijkstra
8c2b6e528d Revert "Change mmap representation"
This reverts commit 4b74591022.
2025-08-04 17:31:40 +00:00
Wilco Dijkstra
1ee0b771a9 Remove use of __curbrk. 2025-08-04 17:13:55 +00:00
Wilco Dijkstra
4b3e65682d Improve MALLOC_DEBUG 2025-08-04 17:13:55 +00:00
Wilco Dijkstra
05ef6a4974 Use _int_free_chunk in tcache_thread_shutdown 2025-08-04 17:13:55 +00:00
Wilco Dijkstra
dea1e52af3 malloc: Cleanup libc_realloc
Minor cleanup of libc_realloc: remove unnecessary special cases for mmap, move
ar_ptr initialization, first check for oldmem == NULL.
2025-08-04 17:13:55 +00:00
Wilco Dijkstra
4b74591022 Change mmap representation 2025-08-04 17:13:55 +00:00
Wilco Dijkstra
35a7a7ab99 malloc: Cleanup sysmalloc_mmap
Cleanup sysmalloc_mmap - simplify padding since it is always a constant.
Remove av parameter which is only used in do_check_chunk, but since it may be
NULL for mmap, it will cause a crash in checking mode.  Remove the odd check
on mmap in do_check_chunk.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-08-02 15:21:16 +00:00
Wilco Dijkstra
b68b125ad1 malloc: Improve checked_request2size
Change checked_request2size to return SIZE_MAX for huge inputs.  This
ensures large allocation requests stay large and can't be confused with a
small allocation.  As a result several existing checks against PTRDIFF_MAX
become redundant.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-08-02 14:38:35 +00:00
Wilco Dijkstra
21fda179c2 malloc: Cleanup madvise defines
Remove redundant ifdefs for madvise/THP.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-08-02 14:19:50 +00:00
Wilco Dijkstra
ad4caba414 malloc: Fix MAX_TCACHE_SMALL_SIZE
MAX_TCACHE_SMALL_SIZE should use chunk size since it is used after
checked_request2size.  Increase limit of tcache_max_bytes by 1 since all
comparisons use '<'.  As a result, the last tcache entry is now used as
expected.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-08-02 14:16:24 +00:00
William Hunt
9097cbf5d8 malloc: Enable THP always support on hugetlb tunable
Enable support for THP always when glibc.malloc.hugetlb=1, as the tunable
currently only gives explicit support in malloc for the THP madvise mode
by aligning to a huge page size. Add a thp_mode parameter to mp_ and check
in madvise_thp whether the system is using madvise mode, otherwise the
`__madvise` call is useless. Set the thp_mode to be unsupported by default,
but if the hugetlb tunable is set this updates thp_mode. Performance of
xalancbmk improves by 4.9% on Neoverse V2 when THP always mode is set on the
system and glibc.malloc.hugetlb=1.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-07-29 15:05:51 +00:00
Wilco Dijkstra
089b4fb90f malloc: Remove redundant NULL check
Remove a redundant NULL check from tcache_get_n.

Reviewed-by: Cupertino Miranda <cupertino.miranda@oracle.com>
2025-07-29 14:11:58 +00:00
Cupertino Miranda
0263528f8d malloc: fix definition for MAX_TCACHE_SMALL_SIZE
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2025-07-14 19:44:48 +02:00
Wilco Dijkstra
1061b75412 malloc: Cleanup tcache_init()
Cleanup tcache_init() by using the new __libc_malloc2 interface.

Reviewed-by: Cupertino Miranda <cupertino.miranda@oracle.com>
2025-06-26 15:08:17 +00:00
William Hunt
9a5a7613ac malloc: replace instances of __builtin_expect with __glibc_unlikely
Replaced all instances of __builtin_expect to __glibc_unlikely
within malloc.c and malloc-debug.c.  This improves the portability
of glibc by avoiding calls to GNU C built-in functions.  Since all
the expected results from calls to __builtin_expect were 0,
__glibc_likely was never used as a replacement.  Multiple
calls to __builtin_expect within a single if statement have
been replaced with one call to __glibc_unlikely, which wraps
every condition.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-26 15:07:53 +00:00
William Hunt
d1ad959b00 malloc: refactored aligned_OK and misaligned_chunk
Renamed aligned_OK to misaligned_mem as to be similar
to misaligned_chunk, and reversed any assertions using
the macro.  Made misaligned_chunk call misaligned_mem after
chunk2mem rather than bitmasking with the malloc alignment
itself, since misaligned_chunk is meant to test the data
chunk itself rather than the header, and the compiler
will optimise the addition so the ternary operator is not
needed.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-26 14:57:53 +00:00
Wilco Dijkstra
ba32fd7d04 malloc: Cleanup _mid_memalign
Remove unused 'address' parameter from _mid_memalign and callers.
Fix off-by-one alignment calculation in __libc_pvalloc.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-06-18 13:37:00 +00:00
Cupertino Miranda
cbfd798810 malloc: add tcache support for large chunk caching
Existing tcache implementation in glibc seems to focus in caching
smaller data size allocations, limiting the size of the allocation to
1KB.

This patch changes tcache implementation to allow to cache any chunk
size allocations.  The implementation adds extra bins (linked-lists)
which store chunks with different ranges of allocation sizes. Bin
selection is done in multiples in powers of 2 and chunks are inserted in
growing size ordering within the bin.  The last bin contains all other
sizes of allocations.

This patch although by default preserves the same implementation,
limitting caches to 1KB chunks, it now allows to increase the max size
for the cached chunks with the tunable glibc.malloc.tcache_max.

It also now verifies if chunk was mmapped, in which case __libc_free
will not add it to tcache.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-16 12:05:22 +00:00
Wilco Dijkstra
7e10e30e64 malloc: Count tcache entries downwards
Currently tcache requires 2 global variable accesses to determine
whether a block can be added to the tcache.  Change the counts array
to 'num_slots' to indicate the number of entries that could be added.
If 'num_slots' reaches zero, no more blocks can be added.  If the entries
pointer is not NULL, at least one block is available for allocation.

Now each tcache bin can support a different maximum number of entries,
and they can be individually switched on or off (a zero initialized
num_slots+entry means the tcache bin is not available for free or malloc).

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-06-03 17:16:39 +00:00
Wilco Dijkstra
36189c76fb malloc: Improve performance of __libc_calloc
Improve performance of __libc_calloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_calloc doesn't need
to setup a frame.

On Neoverse V2, bench-calloc-simple improves by 5.0% overall.
Bench-calloc-thread 1 improves by 24%.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-05-14 09:22:32 +00:00
Wilco Dijkstra
25d37948c9 malloc: Improve malloc initialization
Move malloc initialization to __libc_early_init.  Use a hidden __ptmalloc_init
for initialization and a weak call to avoid pulling in the system malloc in a
static binary.  All previous initialization checks can now be removed.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-05-12 16:10:28 +00:00
David Lau
eff1f680cf malloc: Improved double free detection in the tcache
The previous double free detection did not account for an attacker to
use a terminating null byte overflowing from the previous
chunk to change the size of a memory chunk is being sorted into.
So that the check in 'tcache_double_free_verify' would pass
even though it is a double free.

Solution:
Let 'tcache_double_free_verify' iterate over all tcache entries to
detect double frees.

This patch only protects from buffer overflows by one byte.
But I would argue that off by one errors are the most common
errors to be made.

Alternatives Considered:
  Store the size of a memory chunk in big endian and thus
  the chunk size would not get overwritten because entries in the
  tcache are not that big.

  Move the tcache_key before the actual memory chunk so that it
  does not have to be checked at all, this would work better in general
  but also it would increase the memory usage.

Signed-off-by: David Lau  <david.lau@fau.de>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-05-12 11:58:30 +00:00
Wilco Dijkstra
5d10174581 malloc: Inline tcache_try_malloc
Inline tcache_try_malloc into calloc since it is the only caller.  Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-05-01 20:01:53 +00:00
Cupertino Miranda
1c9ac027a5 malloc: move tcache_init out of hot tcache paths
This patch moves any calls of tcache_init away after tcache hot paths.
Since there is no reason to initialize tcaches in the hot path and since
we need to be able to check tcache != NULL in any case, because of
tcache_thread_shutdown function, moving tcache_init away from hot path
can only be beneficial.
The patch also removes the initialization of tcaches within the
__libc_free call. It only makes sense to initialize tcaches for the
thread after it calls one of the allocation functions. Also the patch
removes the save/restore of errno from tcache_init code, as it is no
longer needed.
2025-04-16 13:09:16 +00:00
Wilco Dijkstra
c968fe5062 malloc: Use tailcalls in __libc_free
Use tailcalls to avoid the overhead of a frame on the free fastpath.
Move tcache initialization to _int_free_chunk().  Add malloc_printerr_tail()
which can be tailcalled without forcing a frame like no-return functions.
Change tcache_double_free_verify() to retry via __libc_free() after clearing
the key.

Reviewed-by: Florian Weimer  <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15 11:14:58 +00:00
Wilco Dijkstra
393b1a6e50 malloc: Inline tcache_free
Inline tcache_free since it's only used by __libc_free.  Add __glibc_likely
for the tcache checks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15 11:14:58 +00:00
Wilco Dijkstra
9b0c8ced9c malloc: Improve free checks
The checks on size can be merged and use __builtin_add_overflow.  Since
tcache only handles small sizes (and rejects sizes < MINSIZE), delay this
check until after tcache.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15 11:14:57 +00:00
Wilco Dijkstra
0296654d61 malloc: Inline _int_free_check
Inline _int_free_check since it is only used by __libc_free.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15 11:14:47 +00:00
Wilco Dijkstra
69da24fbc5 malloc: Inline _int_free
Inline _int_free since it is a small function and only really used by
__libc_free.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14 16:07:46 +00:00
Wilco Dijkstra
b0cb99bef5 malloc: Move mmap code out of __libc_free hotpath
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it.  Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first.  Move saving of errno
to cases that actually need it.  Remove a safety check that fails on mmap
chunks and a check that mmap chunks cannot be added to tcache.

Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.

Reviewed-by: DJ Delorie  <dj@redhat.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14 15:08:24 +00:00
Wilco Dijkstra
b0897944cc malloc: Improve performance of __libc_malloc
Improve performance of __libc_malloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_malloc doesn't need
to setup a frame and we delay tcache initialization and setting of errno until
later.

On Neoverse V2, bench-malloc-simple improves by 6.7% overall (up to 8.5% for
ST case) and bench-malloc-thread improves by 20.3% for 1 thread and 14.4% for
32 threads.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-28 15:31:34 +00:00
Wilco Dijkstra
1233da4943 malloc: Use __always_inline for simple functions
Use __always_inline for small helper functions that are critical for
performance.  This ensures inlining always happens when expected.
Performance of bench-malloc-simple improves by 0.6% on average on
Neoverse V2.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-26 13:17:51 +00:00
Wilco Dijkstra
cd33535002 malloc: Use _int_free_chunk for remainders
When splitting a chunk, release the tail part by calling int_free_chunk.
This avoids inserting random blocks into tcache that were never requested
by the user.  Fragmentation will be worse if they are never used again.
Note if the tail is fairly small, we could avoid splitting it at all.
Also remove an oddly placed initialization of tcache in _libc_realloc.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-25 18:53:01 +00:00
Cupertino Miranda
855561a1fb malloc: missing initialization of tcache in _mid_memalign
_mid_memalign includes tcache code but does not attempt to initialize
tcaches.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-21 17:52:14 +00:00
Wilco Dijkstra
575de3d666 malloc: Improve csize2tidx
Remove the alignment rounding up from csize2tidx - this makes no sense
since the input should be a chunk size.  Removing it enables further
optimizations, for example chunksize_nomask can be safely used and
invalid sizes < MINSIZE are not mapped to a valid tidx.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-18 19:19:33 +00:00
Ben Kallus
4cf2d86936 malloc: Add integrity check to largebin nextsizes
If attacker overwrites the bk_nextsize link in the first chunk of a
largebin that later has a smaller chunk inserted into it, malloc will
write a heap pointer into an attacker-controlled address [0].

This patch adds an integrity check to mitigate this attack.

[0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/large_bin_attack.c

Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu>
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-03 18:31:27 -05:00
Ben Kallus
d10176c0ff malloc: Add size check when moving fastbin->tcache
By overwriting a forward link in a fastbin chunk that is subsequently
moved into the tcache, it's possible to get malloc to return an
arbitrary address [0].

When a chunk is fetched from a fastbin, its size is checked against the
expected chunk size for that fastbin (see malloc.c:3991). This patch
adds a similar check for chunks being moved from a fastbin to tcache,
which renders obsolete the exploitation technique described above.

Now updated to use __glibc_unlikely instead of __builtin_expect, as
requested.

[0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/fastbin_reverse_into_tcache.c

Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-13 16:31:28 -03:00
Paul Eggert
2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Wangyang Guo
226e3b0a41 malloc: Add tcache path for calloc
This commit add tcache support in calloc() which can largely improve
the performance of small size allocation, especially in multi-thread
scenario. tcache_available() and tcache_try_malloc() are split out as
a helper function for better reusing the code.

Also fix tst-safe-linking failure after enabling tcache. In previous,
calloc() is used as a way to by-pass tcache in memory allocation and
trigger safe-linking check in fastbins path. With tcache enabled, it
needs extra workarounds to bypass tcache.

Result of bench-calloc-thread benchmark

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.656
4 threads  | 0.470
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-11 11:34:47 +08:00
H.J. Lu
1c4cebb84b malloc: Optimize small memory clearing for calloc
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes
on 64-bit targets) for calloc.  Use repeated stores with 1 branch, instead
of up to 3 branches.  On x86-64, it is faster than memset since calling
memset needs 1 indirect branch, 1 broadcast, and up to 4 branches.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-12-04 04:28:15 +08:00
k4lizen
e2436d6f5a malloc: send freed small chunks to smallbin
Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.

Splitting and malloc_consolidate still add small
chunks to unsorted, but we can hint the compiler
that that is a relatively rare occurance.
Benchmarking shows this to be consistently good.

Authored-by: k4lizen <k4lizen@proton.me>
Signed-off-by: Aleksa Siriški <sir@tmina.org>
2024-11-29 13:27:13 +00:00
Wangyang Guo
c69e8cccaf malloc: Avoid func call for tcache quick path in free()
Tcache is an important optimzation to accelerate memory free(), things
within this code path should be kept as simple as possible. This commit
try to remove the function call when free() invokes tcache code path by
inlining _int_free().

Result of bench-malloc-thread benchmark

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.879
4 threads  | 0.874

The performance data shows it can improve bench-malloc-thread benchmark
by ~12% in both single thread and multi-thread scenario.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-27 08:24:09 +08:00