- Add GCS marking to some of the tests when target supports GCS
- Fix tst-ro-dynamic-mod.map linker script to avoid removing
GNU properties
- Add header with macros for GNU properties
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Allocate GCS based on the stack size, this can be used for coroutines
(makecontext) and thread creation (if the kernel allows user allocated
GCS).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Unlike for BTI, the kernel does not process GCS properties so update
GL(dl_aarch64_gcs) before the GCS status is set.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
check_gcs is called for each dependency of a DSO, but the GNU property
of the ld.so is not processed so ldso->l_mach.gcs may not be correct.
Just assume ld.so is GCS compatible independently of the ELF marking.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
- Handle GCS marking
- Use l_searchlist.r_list for gcs (allows using the
same function for static exe)
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use the dynamic linker start code to enable GCS in the dynamic linked
case after _dl_start returns and before _dl_start_user which marks
the point after which user code may run.
Like in the static linked case this ensures that GCS is enabled on a
top level stack frame.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use the ARCH_SETUP_TLS hook to enable GCS in the static linked case.
The system call must be inlined and then GCS is enabled on a top
level stack frame that does not return and has no exception handlers
above it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This tunable controls Guarded Control Stack (GCS) for the process.
0 = disabled: do not enable GCS
1 = enforced: check markings and fail if any binary is not marked
2 = optional: check markings but keep GCS off if a binary is unmarked
3 = override: enable GCS, markings are ignored
By default it is 0, so GCS is disabled, value 1 will enable GCS.
The status is stored into GL(dl_aarch64_gcs) early and only applied
later, since enabling GCS is tricky: it must happen on a top level
stack frame. Using GL instead of GLRO because it may need updates
depending on loaded libraries that happen after readonly protection
is applied, however library marking based GCS setting is not yet
implemented.
Describe new tunable in the manual.
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Changed the makecontext logic: previously the first setcontext jumped
straight to the user callback function and the return address is set
to __startcontext. This does not work when GCS is enabled as the
integrity of the return address is protected, so instead the context
is setup such that setcontext jumps to __startcontext which calls the
user callback (passed in x20).
The map_shadow_stack syscall is used to allocate a suitably sized GCS
(which includes some reserved area to account for altstack signal
handlers and otherwise supports maximum number of 16 byte aligned
stack frames on the given stack) however the GCS is never freed as
the lifetime of ucontext and related stack is user managed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Userspace ucontext needs to store GCSPR, it does not have to be
compatible with the kernel ucontext. For now we use the linux
struct gcs_context layout but only use the gcspr field from it.
Similar implementation to the longjmp code, supports switching GCS
if the target GCS is capped, and unwinding a continuous GCS to a
previous state.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This implementations ensures that longjmp across different stacks
works: it scans for GCS cap token and switches GCS if necessary
then the target GCSPR is restored with a GCSPOPM loop once the
current GCSPR is on the same GCS.
This makes longjmp linear time in the number of jumped over stack
frames when GCS is enabled.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The target specific internal __longjmp is called with a __jmp_buf
argument which has its size exposed in the ABI. On aarch64 this has
no space left, so GCSPR cannot be restored in longjmp in the usual
way, which is needed for the Guarded Control Stack (GCS) extension.
setjmp is implemented via __sigsetjmp which has a jmp_buf argument
however it is also called with __pthread_unwind_buf_t argument cast
to jmp_buf (in cancellation cleanup code built with -fno-exception).
The two types, jmp_buf and __pthread_unwind_buf_t, have common bits
beyond the __jmp_buf field and there is unused space there which we
can use for saving GCSPR.
For this to work some bits of those two generic types have to be
reserved for target specific use and the generic code in glibc has
to ensure that __longjmp is always called with a __jmp_buf that is
embedded into one of those two types. Morally __longjmp should be
changed to take jmp_buf as argument, but that is an intrusive change
across targets.
Note: longjmp is never called with __pthread_unwind_buf_t from user
code, only the internal __libc_longjmp is called with that type and
thus the two types could have separate longjmp implementations on a
target. We don't rely on this now (but might in the future given that
cancellation unwind does not need to restore GCSPR).
Given the above this patch finds an unused slot for GCSPR. This
placement is not exposed in the ABI so it may change in the future.
This is also very target ABI specific so the generic types cannot
be easily changed to clearly mark the reserved fields.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The Guarded Control Stack instructions can be present even if the
hardware does not support the extension (runtime checked feature),
so the asm code should be backward compatible with old assemblers.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This variable used to be needed to wait in group switching until all sleepers
have confirmed that they have woken. This is no longer needed. Nothing waits
on this variable so there is no need to track how many threads are currently
asleep in each group.
Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The new test elf/tst-rseq-tls-range-4096-static reliably detected
the extra TLS allocation problem (tcb_offset was dropped from
the allocation size) on aarch64. It also failed with a crash
in dlopen *before* the extra TLS changes, so TLS alignment with
static dlopen was already broken.
Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
Careful updates of grnd_alloc.len are required to ensure that
after fork, grnd_alloc.states does not contain entries that
are also encountered by __getrandom_reset_state in TCBs.
For the same reason, it is necessary to overwrite the TCB state
pointer with NULL before updating grnd_alloc.states in
__getrandom_vdso_release.
Before this change, different TCBs could share the same getrandom
state after multi-threaded fork. This would be a critical security
bug (predictable randomness) if not caught during development.
The additional check in stdlib/tst-arc4random-thread makes it more
likely that the test fails due to the bugs mentioned above.
Both __getrandom_reset_state and __getrandom_vdso_release could
put reserved NULL pointers into the states array. This is also
fixed with this commit. After these changes, no null pointers were
observed in the states array during testing.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Some kernels on S390 appear to return a CPU affinity mask based on
configured processors rather than the ones online. Overallocate the CPU
set to match that, but operate only on the ones online.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
If the bit is not 0, the operations FRCHG and FSCHG are
undefined and cause a trap; qemu now checks for this as
well, so we set it to 0 temporarily and restore the old
value in getcontext afterwards (setcontext/swapcontext
already do so).
From the discussion in the bugreport, this can probably
be optimised in one place but none of the people involved
are SH4 assembly experts, this patch is field-tested, and
it’s not a code path run often. The other question, what
happens if a signal occurs while the bit is temporarily 0,
is also still unsolved, but to fix that a kernel change is
most likely needed; this patch changes a certain trap on
many CPUs for a hard-to-get trap in a signal handler if a
signal is delivered during the few instructions the PR bit
is temporarily set to 0, so it’s not a regression for most
users.
See BZ and https://bugs.launchpad.net/qemu/+bug/1796520 for
related discussion, references and review comments.
Signed-off-by: mirabilos <tg@debian.org>
Reviewed-by: Oleg Endo <olegendo@gcc.gnu.org>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
clang issues:
error: value size does not match register size specified by the
constraint and modifier [-Werror,-Wasm-operand-widths]
while tryng to use 32 bit variables with 'mrs' to get/set the
fpsr, dczid_el0, and ctr.
The Mach RPC host_get_uptime64() is implemented. It returns the elapsed time
value since bootup. See
https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=fc494bfe3fb6363e1077dc035eb119970d84a9d1
In this patch, the RPC is used to implement the monotonic clock for
mach.
* config.h.in: Add HAVE_HOST_GET_UPTIME64 config entry
* sysdeps/mach/clock_gettime.c: Add CLOCK_MONOTONIC case
* sysdeps/mach/configure: Check the existence of host_get_uptime64 RPC
* sysdeps/mach/configure.ac: Check the existence of host_get_uptime64 RPC
Message-ID: <20250106043907.1046-1-zhmingluo@163.com>
commit 494d65129e
Author: Michael Jeanson <mjeanson@efficios.com>
Date: Thu Aug 1 10:35:34 2024 -0400
nptl: Introduce <rseq-access.h> for RSEQ_* accessors
added things like
asm volatile ("movl %%fs:%P1(%q2),%0" \
: "=r" (__value) \
: "i" (offsetof (struct rseq_area, member)), \
"r" (__rseq_offset)); \
But this doesn't work for x32 when __rseq_offset is negative since the
address is computed as
FS + 32-bit to 64-bit zero extension of __rseq_offset
+ offsetof (struct rseq_area, member)
Cast __rseq_offset to long long int
"r" ((long long int) __rseq_offset)); \
to sign-extend 32-bit __rseq_offset to 64-bit. This is a no-op for x86-64
since x86-64 __rseq_offset is 64-bit. This fixes BZ #32543.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Move the rseq area to the newly added 'extra TLS' block, this is the
last step in adding support for the rseq extended ABI. The size of the
rseq area is now dynamic and depends on the rseq features reported by
the kernel through the elf auxiliary vector. This will allow
applications to use rseq features past the 32 bytes of the original rseq
ABI as they become available in future kernels.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
In preparation to move the rseq area to the 'extra TLS' block, we need
accessors based on the thread pointer and the rseq offset. The ONCE
variant of the accessors ensures single-copy atomicity for loads and
stores which is required for all fields once the registration is active.
A separate header is required to allow including <atomic.h> which
results in an include loop when added to <tcb-access.h>.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This allows accessing the internal aliases of __rseq_size and
__rseq_offset from ld.so without ifdefs and avoids dynamic symbol
binding at run time for both variables.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Add the Linux implementation of 'extra TLS' which will allocate space
for the rseq area at the end of the TLS blocks in allocation order.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Add the logic to append an 'extra TLS' block in the TLS block allocator
with a generic stub implementation. The duplicated code in
'csu/libc-tls.c' and 'elf/dl-tls.c' is to handle both statically linked
applications and the ELF dynamic loader.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Get the rseq feature size and alignment requirement from the auxiliary
vector for use inside the dynamic loader. Use '__rseq_size' directly to
store the feature size. If the main thread registration fails or is
disabled by tunable, reset the value to 0.
This will be used in the TLS block allocator to compute the size and
alignment of the rseq area block for the extended ABI support.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>