Cleanup sysmalloc_mmap - simplify padding since it is always a constant.
Remove av parameter which is only used in do_check_chunk, but since it may be
NULL for mmap, it will cause a crash in checking mode. Remove the odd check
on mmap in do_check_chunk.
Reviewed-by: DJ Delorie <dj@redhat.com>
Change checked_request2size to return SIZE_MAX for huge inputs. This
ensures large allocation requests stay large and can't be confused with a
small allocation. As a result several existing checks against PTRDIFF_MAX
become redundant.
Reviewed-by: DJ Delorie <dj@redhat.com>
MAX_TCACHE_SMALL_SIZE should use chunk size since it is used after
checked_request2size. Increase limit of tcache_max_bytes by 1 since all
comparisons use '<'. As a result, the last tcache entry is now used as
expected.
Reviewed-by: DJ Delorie <dj@redhat.com>
Enable support for THP always when glibc.malloc.hugetlb=1, as the tunable
currently only gives explicit support in malloc for the THP madvise mode
by aligning to a huge page size. Add a thp_mode parameter to mp_ and check
in madvise_thp whether the system is using madvise mode, otherwise the
`__madvise` call is useless. Set the thp_mode to be unsupported by default,
but if the hugetlb tunable is set this updates thp_mode. Performance of
xalancbmk improves by 4.9% on Neoverse V2 when THP always mode is set on the
system and glibc.malloc.hugetlb=1.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Replaced all instances of __builtin_expect to __glibc_unlikely
within malloc.c and malloc-debug.c. This improves the portability
of glibc by avoiding calls to GNU C built-in functions. Since all
the expected results from calls to __builtin_expect were 0,
__glibc_likely was never used as a replacement. Multiple
calls to __builtin_expect within a single if statement have
been replaced with one call to __glibc_unlikely, which wraps
every condition.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Renamed aligned_OK to misaligned_mem as to be similar
to misaligned_chunk, and reversed any assertions using
the macro. Made misaligned_chunk call misaligned_mem after
chunk2mem rather than bitmasking with the malloc alignment
itself, since misaligned_chunk is meant to test the data
chunk itself rather than the header, and the compiler
will optimise the addition so the ternary operator is not
needed.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Introduce tests-link-with-libpthread to list tests that
require linking with libpthread, and use that to generate
dependencies on $(shared-thread-library) for all multi-threaded tests.
Fixes build failures of commit cde5caa4bb
("malloc: add testing for large tcache support") on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Remove unused 'address' parameter from _mid_memalign and callers.
Fix off-by-one alignment calculation in __libc_pvalloc.
Reviewed-by: DJ Delorie <dj@redhat.com>
This patch adds large tcache support tests by re-executing malloc tests
using the tunable: glibc.malloc.tcache_max=1048576
Test names are postfixed with "largetcache".
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Existing tcache implementation in glibc seems to focus in caching
smaller data size allocations, limiting the size of the allocation to
1KB.
This patch changes tcache implementation to allow to cache any chunk
size allocations. The implementation adds extra bins (linked-lists)
which store chunks with different ranges of allocation sizes. Bin
selection is done in multiples in powers of 2 and chunks are inserted in
growing size ordering within the bin. The last bin contains all other
sizes of allocations.
This patch although by default preserves the same implementation,
limitting caches to 1KB chunks, it now allows to increase the max size
for the cached chunks with the tunable glibc.malloc.tcache_max.
It also now verifies if chunk was mmapped, in which case __libc_free
will not add it to tcache.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Currently tcache requires 2 global variable accesses to determine
whether a block can be added to the tcache. Change the counts array
to 'num_slots' to indicate the number of entries that could be added.
If 'num_slots' reaches zero, no more blocks can be added. If the entries
pointer is not NULL, at least one block is available for allocation.
Now each tcache bin can support a different maximum number of entries,
and they can be individually switched on or off (a zero initialized
num_slots+entry means the tcache bin is not available for free or malloc).
Reviewed-by: DJ Delorie <dj@redhat.com>
Improve performance of __libc_calloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_calloc doesn't need
to setup a frame.
On Neoverse V2, bench-calloc-simple improves by 5.0% overall.
Bench-calloc-thread 1 improves by 24%.
Reviewed-by: DJ Delorie <dj@redhat.com>
Move malloc initialization to __libc_early_init. Use a hidden __ptmalloc_init
for initialization and a weak call to avoid pulling in the system malloc in a
static binary. All previous initialization checks can now be removed.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The previous double free detection did not account for an attacker to
use a terminating null byte overflowing from the previous
chunk to change the size of a memory chunk is being sorted into.
So that the check in 'tcache_double_free_verify' would pass
even though it is a double free.
Solution:
Let 'tcache_double_free_verify' iterate over all tcache entries to
detect double frees.
This patch only protects from buffer overflows by one byte.
But I would argue that off by one errors are the most common
errors to be made.
Alternatives Considered:
Store the size of a memory chunk in big endian and thus
the chunk size would not get overwritten because entries in the
tcache are not that big.
Move the tcache_key before the actual memory chunk so that it
does not have to be checked at all, this would work better in general
but also it would increase the memory usage.
Signed-off-by: David Lau <david.lau@fau.de>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Inline tcache_try_malloc into calloc since it is the only caller. Also fix
usize2tidx and use it in __libc_malloc, __libc_calloc and _mid_memalign.
The result is simpler, cleaner code.
Reviewed-by: DJ Delorie <dj@redhat.com>
This patch moves any calls of tcache_init away after tcache hot paths.
Since there is no reason to initialize tcaches in the hot path and since
we need to be able to check tcache != NULL in any case, because of
tcache_thread_shutdown function, moving tcache_init away from hot path
can only be beneficial.
The patch also removes the initialization of tcaches within the
__libc_free call. It only makes sense to initialize tcaches for the
thread after it calls one of the allocation functions. Also the patch
removes the save/restore of errno from tcache_init code, as it is no
longer needed.
Use tailcalls to avoid the overhead of a frame on the free fastpath.
Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail()
which can be tailcalled without forcing a frame like no-return functions.
Change tcache_double_free_verify() to retry via __libc_free() after clearing
the key.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Inline tcache_free since it's only used by __libc_free. Add __glibc_likely
for the tcache checks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The checks on size can be merged and use __builtin_add_overflow. Since
tcache only handles small sizes (and rejects sizes < MINSIZE), delay this
check until after tcache.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it. Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first. Move saving of errno
to cases that actually need it. Remove a safety check that fails on mmap
chunks and a check that mmap chunks cannot be added to tcache.
Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Improve performance of __libc_malloc by splitting it into 2 parts: first handle
the tcache fastpath, then do the rest in a separate tailcalled function.
This results in significant performance gains since __libc_malloc doesn't need
to setup a frame and we delay tcache initialization and setting of errno until
later.
On Neoverse V2, bench-malloc-simple improves by 6.7% overall (up to 8.5% for
ST case) and bench-malloc-thread improves by 20.3% for 1 thread and 14.4% for
32 threads.
Reviewed-by: DJ Delorie <dj@redhat.com>
Use __always_inline for small helper functions that are critical for
performance. This ensures inlining always happens when expected.
Performance of bench-malloc-simple improves by 0.6% on average on
Neoverse V2.
Reviewed-by: DJ Delorie <dj@redhat.com>
When splitting a chunk, release the tail part by calling int_free_chunk.
This avoids inserting random blocks into tcache that were never requested
by the user. Fragmentation will be worse if they are never used again.
Note if the tail is fairly small, we could avoid splitting it at all.
Also remove an oddly placed initialization of tcache in _libc_realloc.
Reviewed-by: DJ Delorie <dj@redhat.com>
Remove the alignment rounding up from csize2tidx - this makes no sense
since the input should be a chunk size. Removing it enables further
optimizations, for example chunksize_nomask can be safely used and
invalid sizes < MINSIZE are not mapped to a valid tidx.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Change heap_max_size() to improve performance of arena_for_chunk().
Instead of a complex calculation, using a simple mask operation to get the
arena base pointer. HEAP_MAX_SIZE should be larger than the huge page size,
otherwise heaps will use not huge pages.
On AArch64 this removes 6 instructions from arena_for_chunk(), and
bench-malloc-thread improves by 1.1% - 1.8%.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
By overwriting a forward link in a fastbin chunk that is subsequently
moved into the tcache, it's possible to get malloc to return an
arbitrary address [0].
When a chunk is fetched from a fastbin, its size is checked against the
expected chunk size for that fastbin (see malloc.c:3991). This patch
adds a similar check for chunks being moved from a fastbin to tcache,
which renders obsolete the exploitation technique described above.
Now updated to use __glibc_unlikely instead of __builtin_expect, as
requested.
[0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/fastbin_reverse_into_tcache.c
Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The functions serve very similar purposes. The advantage of
__rtld_libc_freeres is that it is located within ld.so, so it is
more natural to poke at link map internals there.
This slightly regresses cleanup capabilities for statically linked
binaries. If that becomes a problem, we should start calling
__rtld_libc_freeres from __libc_freeres (perhaps after renaming it).