Cleanup _int_memalign. Simplify the logic. Add a seperate check
for mmap. Only release the tail chunk if it is at least MINSIZE.
Use the new mmap abstractions.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linux handles virtual memory in Virtual Memory Areas (VMAs). The
madvise(MADV_HUGEPAGE) call works on a VMA granularity, which sets the
VM_HUGEPAGE flag on the VMA. If this VMA or a portion of it is mremapped
to a different location, Linux will create a new VMA, which will have
the same flags as the old one. This implies that the VM_HUGEPAGE flag
will be retained. Therefore, if we can guarantee that the old VMA was
marked with VM_HUGEPAGE, then there is no need to call madvise_thp() in
mremap_chunk().
The old chunk comes from a heap or non-heap allocation, both of which
have already been enlightened for THP. This implies that, if THP is on,
and the size of the old chunk is greater than or equal to thp_pagesize,
the VMA to which this chunk belongs to, has the VM_HUGEPAGE flag set.
Hence in this case we can avoid invoking the madvise() syscall.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add mmap_set_chunk() to create a new chunk from an mmap block.
Remove set_mmap_is_hp() since it is done inside mmap_set_chunk().
Rename prev_size_mmap() to mmap_base_offset(). Cleanup comments.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the odd atomic_forced_read which is neither atomic nor forced.
Some uses are completely redundant, so simply remove them. In other cases
the intended use is to force a memory ordering, so use acquire load for those.
In yet other cases their purpose is unclear, for example __nscd_cache_search
appears to allow concurrent accesses to the cache while it is being garbage
collected by another thread! Use relaxed atomic loads here to block spills
from accidentally reloading memory that is being changed.
Passes regress on AArch64, OK for commit?
Refactor malloc.c to remove dead code, create macros to abstract duplicated
code, and cleanup sysmalloc_mmap_fallback to remove logic not related to the
mmap call.
Change the return type of mmap_base to uintptr_t since this allows using
operations on the return value, and avoids casting in both calls in
mremap_chunk and munmap_chunk.
Cleanup sysmalloc_mmap_fallback. Remove unused parameters nb, oldsize
and av. Remove redundant overflow check and instead use size_t for all
parameters except extra_flags to prevent overflows. Move logic not concerned
with the mmap call itself outside the function after both calls to
sysmalloc_mmap_fallback are made; this means move code for naming the VMA
and marking the arena being extended as non-contiguous to the calling code to
be handled in the case that the mmap is successful. Calculate the fallback
size from nb to avoid modifying size after it has been set for MORECORE.
Remove unused noncontiguous macro.
Remove redundant assert for checking unreachable option for global_max_fast.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Remove support for obsolete dumped heaps. Dumping heaps was discontinued
8 years ago, however loading a dumped heap is still supported. This blocks
changes and improvements of the malloc data structures - hence it is time
to remove this. Ancient binaries that still call malloc_set_state will now
get the -1 error code. Update tst-mallocstate.c to just check for this.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
We currently unlock the arena mutex in arena_get_retry() unconditionally.
Therefore, hoist out the unlock from the if-else control block.
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
Minor cleanup of libc_realloc: remove unnecessary special cases for mmap, move
ar_ptr initialization, first check for oldmem == NULL.
Reviewed-by: DJ Delorie <dj@redhat.com>
Remove all unused atomics. Replace uses of catomic_increment and
catomic_decrement with atomic_fetch_add_relaxed which maps to a standard
compiler builtin. Relaxed memory ordering is correct for simple counters
since they only need atomicity.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
I have not checked with all versions for all ABIs, but I saw failures
with gcc-14 on arm, alpha, hppa, i686, sparc, sh4, and microblaze.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Add mremap_chunk support for mmap()ed chunks using hugepages by accounting for
their alignment, to prevent the mremap call failing in most cases where the
size passed is not a hugepage size multiple. It also improves robustness for
reallocating hugepages since mremap is much less likely to fail, so running
out of memory when reallocating a larger size and having to copy the old
contents after mremap fails is also less likely.
To track whether an mmap()ed chunk uses hugepages, have a flag in the lowest
bit of the mchunk_prev_size field which is set after a call to sysmalloc_mmap,
and accessed later in mremap_chunk. Create macros for getting and setting this
bit, and for mapping the bit off when accessing the field for mmap()ed chunks.
Since the alignment cannot be lower than 8 bytes, this flag cannot affect the
alignment data.
Add malloc/tst-tcfree4-malloc-check to the tests-exclude-malloc-check list as
malloc-check prevents the tcache from being used to store chunks. This test
caused failures due to a bug in mem2chunk_check to be fixed in a later patch.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Change the mmap chunk layout to be identical to a normal chunk. This makes it
safe for tcache to hold mmap chunks and simplifies size calculations in
memsize and musable. Add mmap_base() and mmap_size() macros to simplify code.
Reviewed-by: Cupertino Miranda <cupertino.miranda@oracle.com>
When transparent hugepages (THP) are configured to 32MB on x86/loongarch
systems, the current big_size value may not be sufficiently large to
guarantee that free(ptr) [1] will call munmap(ptr_aligned, big_size).
Tested on x86_64 and loongarch64.
PS: Without this patch and using 32M THP, there is a about 50% chance
that malloc/tst-free-errno-malloc-hugetlb1 will fail on both x86_64 and
loongarch64.
[1] malloc/tst-free-errno.c:
...
errno = 1789;
/* This call to free() is supposed to call
munmap (ptr_aligned, big_size);
which increases the number of VMAs by 1, which is supposed
to fail. */
-> free (ptr);
TEST_VERIFY (get_errno () == 1789);
}
...
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
We want tcache_key not to be a commonly-occurring value in memory, so ensure
a minimum amount of one and zero bits.
And we need it non-zero, otherwise even if tcache_double_free_verify sets
e->key to 0 before calling __libc_free, it gets called again by __libc_free,
thus looping indefinitely.
Fixes: c968fe5062 ("malloc: Use tailcalls in __libc_free")
MALLOC_DEBUG only works on locked arenas, so move the call to
check_inuse_chunk from __libc_free() to _int_free_chunk().
Regress now passes if MALLOC_DEBUG is enabled.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arenas support huge pages but not transparent huge pages. Add this by
also checking mp_.thp_pagesize when creating a new arena, and use madvise.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove an odd use of __curbrk and use MORECORE (0) instead.
This fixes Hurd build since it doesn't define this symbol.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cleanup sysmalloc_mmap - simplify padding since it is always a constant.
Remove av parameter which is only used in do_check_chunk, but since it may be
NULL for mmap, it will cause a crash in checking mode. Remove the odd check
on mmap in do_check_chunk.
Reviewed-by: DJ Delorie <dj@redhat.com>
Change checked_request2size to return SIZE_MAX for huge inputs. This
ensures large allocation requests stay large and can't be confused with a
small allocation. As a result several existing checks against PTRDIFF_MAX
become redundant.
Reviewed-by: DJ Delorie <dj@redhat.com>
MAX_TCACHE_SMALL_SIZE should use chunk size since it is used after
checked_request2size. Increase limit of tcache_max_bytes by 1 since all
comparisons use '<'. As a result, the last tcache entry is now used as
expected.
Reviewed-by: DJ Delorie <dj@redhat.com>
Enable support for THP always when glibc.malloc.hugetlb=1, as the tunable
currently only gives explicit support in malloc for the THP madvise mode
by aligning to a huge page size. Add a thp_mode parameter to mp_ and check
in madvise_thp whether the system is using madvise mode, otherwise the
`__madvise` call is useless. Set the thp_mode to be unsupported by default,
but if the hugetlb tunable is set this updates thp_mode. Performance of
xalancbmk improves by 4.9% on Neoverse V2 when THP always mode is set on the
system and glibc.malloc.hugetlb=1.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Replaced all instances of __builtin_expect to __glibc_unlikely
within malloc.c and malloc-debug.c. This improves the portability
of glibc by avoiding calls to GNU C built-in functions. Since all
the expected results from calls to __builtin_expect were 0,
__glibc_likely was never used as a replacement. Multiple
calls to __builtin_expect within a single if statement have
been replaced with one call to __glibc_unlikely, which wraps
every condition.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Renamed aligned_OK to misaligned_mem as to be similar
to misaligned_chunk, and reversed any assertions using
the macro. Made misaligned_chunk call misaligned_mem after
chunk2mem rather than bitmasking with the malloc alignment
itself, since misaligned_chunk is meant to test the data
chunk itself rather than the header, and the compiler
will optimise the addition so the ternary operator is not
needed.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Introduce tests-link-with-libpthread to list tests that
require linking with libpthread, and use that to generate
dependencies on $(shared-thread-library) for all multi-threaded tests.
Fixes build failures of commit cde5caa4bb
("malloc: add testing for large tcache support") on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Remove unused 'address' parameter from _mid_memalign and callers.
Fix off-by-one alignment calculation in __libc_pvalloc.
Reviewed-by: DJ Delorie <dj@redhat.com>
This patch adds large tcache support tests by re-executing malloc tests
using the tunable: glibc.malloc.tcache_max=1048576
Test names are postfixed with "largetcache".
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Existing tcache implementation in glibc seems to focus in caching
smaller data size allocations, limiting the size of the allocation to
1KB.
This patch changes tcache implementation to allow to cache any chunk
size allocations. The implementation adds extra bins (linked-lists)
which store chunks with different ranges of allocation sizes. Bin
selection is done in multiples in powers of 2 and chunks are inserted in
growing size ordering within the bin. The last bin contains all other
sizes of allocations.
This patch although by default preserves the same implementation,
limitting caches to 1KB chunks, it now allows to increase the max size
for the cached chunks with the tunable glibc.malloc.tcache_max.
It also now verifies if chunk was mmapped, in which case __libc_free
will not add it to tcache.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Currently tcache requires 2 global variable accesses to determine
whether a block can be added to the tcache. Change the counts array
to 'num_slots' to indicate the number of entries that could be added.
If 'num_slots' reaches zero, no more blocks can be added. If the entries
pointer is not NULL, at least one block is available for allocation.
Now each tcache bin can support a different maximum number of entries,
and they can be individually switched on or off (a zero initialized
num_slots+entry means the tcache bin is not available for free or malloc).
Reviewed-by: DJ Delorie <dj@redhat.com>