Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque
state allocated with mmap using flags specified by the vDSO.
Multiple states are allocated at once, as many as fit into a page, and
these are held in an array of available states to be doled out to each
thread upon first use, and recycled when a thread terminates. As these
states run low, more are allocated.
To make this procedure async-signal-safe, a simple guard is used in the
LSB of the opaque state address, falling back to the syscall if there's
reentrancy contention.
Also, _Fork() is handled by blocking signals on opaque state allocation
(so _Fork() always sees a consistent state even if it interrupts a
getrandom() call) and by iterating over the thread stack cache on
reclaim_stack. Each opaque state will be in the free states list
(grnd_alloc.states) or allocated to a running thread.
The cancellation is handled by always using GRND_NONBLOCK flags while
calling the vDSO, and falling back to the cancellable syscall if the
kernel returns EAGAIN (would block). Since getrandom is not defined by
POSIX and cancellation is supported as an extension, the cancellation is
handled as 'may occur' instead of 'shall occur' [1], meaning that if
vDSO does not block (the expected behavior) getrandom will not act as a
cancellation entrypoint. It avoids a pthread_testcancel call on the fast
path (different than 'shall occur' functions, like sem_wait()).
It is currently enabled for x86_64, which is available in Linux 6.11,
and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are
available in Linux 6.12.
Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1]
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64
Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64
Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64
Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
It follows the internal signature:
extern int clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
Checked on s390x-linux-gnu and s390-linux-gnu.
On s390x syscalls are triggered by svc instruction. One can
pass the syscall number encoded in the instruction "svc 123"
or by storing it in r1:
lghi r1,123
svc 0
If the syscall number is encoded in the instruction, this can
cause broken syscall restarts. Therefore this patch is now just
passing the syscall number in r1.
See also kernel-commit:
"s390/signal: switch to using vdso for sigreturn and syscall restart"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/s390/[%e2%80%a6]call.c?h=v6.0-rc1&id=df29a7440c4b5c65765c8f60396b3b13063e24e9
As information, the "svc 0" feature was introduced in kernel 2.5.62:
commit b5aad611393ef2e132e3648fa4c6e56a9cfa8708
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h. The macro
is now moved to a per-arch single-threade.h header.
The SINGLE_THREAD_P is used on some more places.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, the INTERNAL_SYSCALL_DECL is an empty declaration
on all ports.
This patch removes the 'err' argument on INTERNAL_SYSCALL* macro
and remove the INTERNAL_SYSCALL_DECL usage.
Checked with a build against all affected ABIs.
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, there is no need to replicate the INLINE_SYSCALL.
The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__,
which is ok now and it allows cleanup some archaic code that assume
otherwise.
Checked with a build against all affected ABIs.
No architecture currently defines the vDSO symbol. On archictures
with 64-bit time_t the HAVE_CLOCK_GETRES_VSYSCALL is renamed to
HAVE_CLOCK_GETRES64_VSYSCALL, it simplifies clock_gettime code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
No architecture currently defines the vDSO symbol. On architectures
with 64-bit time_t the HAVE_CLOCK_GETTIME_VSYSCALL is renamed to
HAVE_CLOCK_GETTIME64_VSYSCALL, it simplifies clock_gettime code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Linux vDSO initialization code the internal function pointers require a
lot of duplicated boilerplate over different architectures. This patch
aims to simplify not only the code but the required definition to enable
a vDSO symbol.
The changes are:
1. Consolidate all init-first.c on only one implementation and enable
the symbol based on HAVE_*_VSYSCALL existence.
2. Set the HAVE_*_VSYSCALL to the architecture expected names string.
3. Add a new internal implementation, get_vdso_mangle_symbol, which
returns a mangled function pointer.
Currently the clock_gettime, clock_getres, gettimeofday, getcpu, and time
are handled in an arch-independent way, powerpc still uses some
arch-specific vDSO symbol handled in a specific init-first implementation.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf, i386-linux-gnu,
mips64-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
sparc64-linux-gnu, and x86_64-linux-gnu.
* sysdeps/powerpc/powerpc32/backtrace.c (is_sigtramp_address,
is_sigtramp_address_rt): Use HAVE_SIGTRAMP_{RT}32 instead of SHARED.
* sysdeps/powerpc/powerpc64/backtrace.c (is_sigtramp_address):
Likewise.
* sysdeps/unix/sysv/linux/aarch64/init-first.c: Remove file.
* sysdeps/unix/sysv/linux/aarch64/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/arm/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/arm/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/mips/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/mips/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/i386/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/riscv/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/riscv/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/s390/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/s390/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/sparc/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/x86/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h
(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Define value based on kernel exported
name.
* sysdeps/unix/sysv/linux/arm/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sysdep.h
(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETCPU_VSYSCALL, HAVE_TIME_VSYSCALL, HAVE_GET_TBFREQ,
HAVE_SIGTRAMP_RT64, HAVE_SIGTRAMP_32, HAVE_SIGTRAMP_RT32i,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/riscv/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/s390/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/sparc/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h
(HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME, VDSO_HASH): Define to
invalid names if architecture does not define them.
(get_vdso_mangle_symbol): New symbol.
* sysdeps/unix/sysv/linux/init-first.c: New file.
* sysdeps/unix/sysv/linux/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/init-first.c (gettimeofday,
clock_gettime, clock_getres, getcpu, time): Remove declaration.
(__libc_vdso_platform_setup_arch): Likewise and use
get_vdso_mangle_symbol to setup vDSO symbols.
(sigtramp_rt64, sigtramp32, sigtramp_rt32, get_tbfreq): Add
attribute_hidden.
* sysdeps/unix/sysv/linux/powerpc/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/sysdep-vdso.h (VDSO_SYMBOL): Remove
definition.
2000-08-02 Andreas Jaeger <aj@suse.de>
* sysdeps/unix/sysv/linux/s390/Dist: New file.
* sysdeps/unix/sysv/linux/s390/sysdep.h: New file.
* sysdeps/unix/sysv/linux/s390/sysdep.S: New file.
* sysdeps/unix/sysv/linux/s390/syscall.S: New file.
* sysdeps/unix/sysv/linux/s390/sys/user.h: New file.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h: New file.
* sysdeps/unix/sysv/linux/s390/sys/ptrace.h: New file.
* sysdeps/unix/sysv/linux/s390/sys/elf.h: New file.
* sysdeps/unix/sysv/linux/s390/socket.S: New file.
* sysdeps/unix/sysv/linux/s390/sigcontextinfo.h: New file.
* sysdeps/unix/sysv/linux/s390/shmctl.c: New file.
* sysdeps/unix/sysv/linux/s390/setreuid.c: New file.
* sysdeps/unix/sysv/linux/s390/setresuid.c: New file.
* sysdeps/unix/sysv/linux/s390/setresgid.c: New file.
* sysdeps/unix/sysv/linux/s390/setregid.c: New file.
* sysdeps/unix/sysv/linux/s390/setgroups.c: New file.
* sysdeps/unix/sysv/linux/s390/setgid.c: New file.
* sysdeps/unix/sysv/linux/s390/setfsuid.c: New file.
* sysdeps/unix/sysv/linux/s390/setfsgid.c: New file.
* sysdeps/unix/sysv/linux/s390/seteuid.c: New file.
* sysdeps/unix/sysv/linux/s390/setegid.c: New file.
* sysdeps/unix/sysv/linux/s390/semctl.c: New file.
* sysdeps/unix/sysv/linux/s390/register-dump.h: New file.
* sysdeps/unix/sysv/linux/s390/putpmsg.c: New file.
* sysdeps/unix/sysv/linux/s390/putmsg.c: New file.
* sysdeps/unix/sysv/linux/s390/profil-counter.h: New file.
* sysdeps/unix/sysv/linux/s390/msgctl.c: New file.
* sysdeps/unix/sysv/linux/s390/mmap.S: New file.
* sysdeps/unix/sysv/linux/s390/getuid.c: New file.
* sysdeps/unix/sysv/linux/s390/getresuid.c: New file.
* sysdeps/unix/sysv/linux/s390/getresgid.c: New file.
* sysdeps/unix/sysv/linux/s390/getpmsg.c: New file.
* sysdeps/unix/sysv/linux/s390/getmsg.c: New file.
* sysdeps/unix/sysv/linux/s390/getgroups.c: New file.
* sysdeps/unix/sysv/linux/s390/getegid.c: New file.
* sysdeps/unix/sysv/linux/s390/geteuid.c: New file.
* sysdeps/unix/sysv/linux/s390/fchown.c: New file.
* sysdeps/unix/sysv/linux/s390/clone.S: New file.
* sysdeps/unix/sysv/linux/s390/brk.c: New file.
* sysdeps/unix/sysv/linux/s390/bits/time.h: New file.
* sysdeps/unix/sysv/linux/s390/bits/resource.h: New file.
* sysdeps/unix/sysv/linux/s390/bits/mman.h: New file.
* sysdeps/unix/sysv/linux/s390/bits/fcntl.h: New file.
* sysdeps/unix/sysv/linux/s390/Makefile: New file.
* sysdeps/s390/sysdep.h: New file.
* sysdeps/s390/sys/ucontext.h: New file.
* sysdeps/s390/sub_n.S: New file.
* sysdeps/s390/strncpy.S: New file.
* sysdeps/s390/strcpy.S: New file.
* sysdeps/s390/stackinfo.h: New file.
* sysdeps/s390/setjmp.S: New file.
* sysdeps/s390/s390-mcount.S: New file.
* sysdeps/s390/mul_1.S: New file.
* sysdeps/s390/memusage.h: New file.
* sysdeps/s390/memset.S: New file.
* sysdeps/s390/memcpy.S: New file.
* sysdeps/s390/memchr.S: New file.
* sysdeps/s390/machine-gmon.h: New file.
* sysdeps/s390/ldbl2mpn.c: New file.
* sysdeps/s390/gmp-mparam.h: New file.
* sysdeps/s390/fpu/fpu_control.h: New file.
* sysdeps/s390/fpu/fesetround.c: New file.
* sysdeps/s390/fpu/fegetround.c: New file.
* sysdeps/s390/fpu/fclrexcpt.c: New file.
* sysdeps/s390/fpu/bits/fenv.h: New file.
* sysdeps/s390/ffs.c: New file.
* sysdeps/s390/elf/start.S: New file.
* sysdeps/s390/elf/setjmp.S: New file.
* sysdeps/s390/elf/bsd-setjmp.S: New file.
* sysdeps/s390/elf/bsd-_setjmp.S: New file.
* sysdeps/s390/dl-machine.h: New file.
* sysdeps/s390/bzero.S: New file.
* sysdeps/s390/bsd-setjmp.S: New file.
* sysdeps/s390/bsd-_setjmp.S: New file.
* sysdeps/s390/bits/string.h: New file.
* sysdeps/s390/bits/setjmp.h: New file.
* sysdeps/s390/bits/huge_val.h: New file.
* sysdeps/s390/bits/endian.h: New file.
* sysdeps/s390/bits/byteswap.h: New file.
* sysdeps/s390/bcopy.S: New file.
* sysdeps/s390/backtrace.c: New file.
* sysdeps/s390/atomicity.h: New file.
* sysdeps/s390/asm-syntax.h: New file.
* sysdeps/s390/addmul_1.S: New file.
* sysdeps/s390/add_n.S: New file.
* sysdeps/s390/abort-instr.h: New file.
* sysdeps/s390/__longjmp.c: New file.
* sysdeps/s390/Makefile: New file.
* sysdeps/s390/Implies: New file.
* sysdeps/s390/Dist: New file.
Patches by Martin Schwidefsky <schwidefsky@de.ibm.com>.