1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-11-03 20:53:13 +03:00
Files
glibc/sysdeps/unix/sysv/linux/aarch64/sysdep.h
Yury Khrustalev 27effb3d50 aarch64: clear ZA state of SME before clone and clone3 syscalls
This change adds a call to the __arm_za_disable() function immediately
before the SVC instruction inside clone() and clone3() wrappers. It also
adds a macro for inline clone() used in fork() and adds the same call to
the vfork implementation. This sets the ZA state of SME to "off" on return
from these functions (for both the child and the parent).

The __arm_za_disable() function is described in [1] (8.1.3). Note that
the internal Glibc name for this function is __libc_arm_za_disable().

When this change was originally proposed [2,3], it generated a long
discussion where several questions and concerns were raised. Here we
will address these concerns and explain why this change is useful and,
in fact, necessary.

In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
__arm_za_disable() function in clone() and clone3() wrappers. The following
explains in detail why this is the case.

When we consider using the __arm_za_disable() function inside the clone()
and clone3() libc wrappers, we talk about the C library subroutines clone()
and clone3() rather than the syscalls with similar names. In the current
version of Glibc, clone() is public and clone3() is private, but it being
private is not pertinent to this discussion.

We will begin with stating that this change is NOT a bug fix for something
in the kernel. The requirement to call __arm_za_disable() does NOT come from
the kernel. It also is NOT needed to satisfy a contract between the kernel
and userspace. This is why it is not for the kernel documentation to describe
this requirement. This requirement is instead needed to satisfy a pure userspace
scheme outlined in [1] and to make sure that software that uses Glibc (or any
other C library that has correct handling of SME states (see below)) conforms
to [1] without having to unnecessarily become SME-aware thus losing portability.

To recap (see [1] (6.2)), SME extension defines SME state which is part of
processor state. Part of this SME state is ZA state that is necessary to
manage ZA storage register in the context of the ZA lazy saving scheme [1]
(6.6). This scheme exists because it would be challenging to handle ZA
storage of SME in either callee-saved or caller-saved manner.

There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):

- "off":       PSTATE.ZA == 0
- "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null
- "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null

As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
depending on the permitted ZA-states on entry and on normal return from
a call to this subroutine. Callers of a subroutine must know and respect
the ZA-interface of the subroutines they are using. Using a subroutine
in a way that is not permitted by its ZA-interface is undefined behaviour.

In particular, clone() and clone3() (the C library functions) have the
ZA-private interface. This means that the permitted ZA-states on entry
are "off" and "dormant" and that the permitted states on return are "off"
or "dormant" (but if and only if it was "dormant" on entry).

This means that both functions in question should correctly handle both
"off" and "dormant" ZA-states on entry. The conforming states on return
are "off" and "dormant" (if inbound state was already "dormant").

This change ensures that the ZA-state on return is always "off". Note,
that, in the context of clone() and clone3(), "on return" means a point
when execution resumes at certain address after transferring from clone()
or clone3(). For the caller (we may refer to it as "parent") this is the
return address in the link register where the RET instruction jumps. For
the "child", this is the target branch address.

So, the "off" state on return is permitted and conformant. Why can't we
retain the "dormant" state? In theory, we can, but we shouldn't, here is
why.

Every subroutine with a private-ZA interface, including clone() and clone3(),
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
responsibility on a subroutine if ZA-state on return is "dormant" because
this state has special meaning. The "caller" (that is the place in code
where execution is transferred to, so this include both "parent" and "child")
may check the ZA-state and use it as per the spec of the "dormant" state that
is outlined in [1] (6.6.6 and 6.6.7).

Conforming to this would require more code inside of clone() and clone3()
which hardly is desirable.

For the return to "parent" this could be achieved in theory, but given that
neither clone() nor clone3() are supposed to be used in the middle of an
SME operation, if wouldn't be useful. For the "return" to "child" this
would be particularly difficult to achieve given the complexity of these
functions and their interfaces. Most importantly, it would be illegal
and somewhat meaningless to allow a "child" to start execution in the
"dormant" ZA-state because the very essence of the "dormant" state implies
that there is a place to return and that there is some outer context that
we are allowed to interact with.

To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
execution resumes after a call to clone() or clone3() is correct and also
the most simple way to conform to [1].

Can there be situations when we can avoid calling __arm_za_disable()?

Calling __arm_za_disable() implies certain (sufficiently small) overhead,
so one might rightly ponder avoiding making a call to this function when
we can afford not to. The most trivial cases like this (e.g. when the
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
about other possible use cases would require making code inside clone() and
clone3() more complicated and it would defeat the point of trying to make
an optimisation of not calling __arm_za_disable().

Why can't the kernel do this instead?

The handling of SME state by the kernel is described in [4]. In short,
kernel must not impose a specific ZA-interface onto a userspace function.
Interaction with the kernel happens (among other thing) via system calls.
In Glibc many of the system calls (notably, including SYS_clone and
SYS_clone3) are used via wrappers, and the kernel has no control of them
and, moreover, it cannot dictate how these wrappers should behave because
it is simply outside of the kernel's remit.

However, in certain cases, the kernel may ensure that a "child" doesn't
start in an incorrect state. This is what is done by the recent change
included in 6.16 kernel [5]. This is not enough to ensure that code that
uses clone() and clone3() function conforms to [1] when it runs on a
system that provides SME, hence this change.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-14 09:42:46 +01:00

273 lines
7.9 KiB
C

/* Copyright (C) 2005-2025 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation; either version 2.1 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#ifndef _LINUX_AARCH64_SYSDEP_H
#define _LINUX_AARCH64_SYSDEP_H 1
#include <sysdeps/unix/sysdep.h>
#include <sysdeps/aarch64/sysdep.h>
#include <sysdeps/unix/sysdep.h>
#include <sysdeps/unix/sysv/linux/sysdep.h>
/* Defines RTLD_PRIVATE_ERRNO and USE_DL_SYSINFO. */
#include <dl-sysdep.h>
#include <tls.h>
/* In order to get __set_errno() definition in INLINE_SYSCALL. */
#ifndef __ASSEMBLER__
#include <errno.h>
#endif
/* For Linux we can use the system call table in the header file
/usr/include/asm/unistd.h
of the kernel. But these symbols do not follow the SYS_* syntax
so we have to redefine the `SYS_ify' macro here. */
#undef SYS_ify
#define SYS_ify(syscall_name) (__NR_##syscall_name)
#ifdef __ASSEMBLER__
/* Linux uses a negative return value to indicate syscall errors,
unlike most Unices, which use the condition codes' carry flag.
Since version 2.1 the return value of a system call might be
negative even if the call succeeded. E.g., the `lseek' system call
might return a large offset. Therefore we must not anymore test
for < 0, but test for a real error by making sure the value in R0
is a real error number. Linus said he will make sure the no syscall
returns a value in -1 .. -4095 as a valid result so we can safely
test with -4095. */
# undef PSEUDO
# define PSEUDO(name, syscall_name, args) \
.text; \
ENTRY (name); \
DO_CALL (syscall_name, args); \
cmn x0, #4095; \
b.cs .Lsyscall_error;
# undef PSEUDO_END
# define PSEUDO_END(name) \
SYSCALL_ERROR_HANDLER \
END (name)
# undef PSEUDO_NOERRNO
# define PSEUDO_NOERRNO(name, syscall_name, args) \
.text; \
ENTRY (name); \
DO_CALL (syscall_name, args);
# undef PSEUDO_END_NOERRNO
# define PSEUDO_END_NOERRNO(name) \
END (name)
# define ret_NOERRNO ret
/* The function has to return the error code. */
# undef PSEUDO_ERRVAL
# define PSEUDO_ERRVAL(name, syscall_name, args) \
.text; \
ENTRY (name) \
DO_CALL (syscall_name, args); \
neg x0, x0
# undef PSEUDO_END_ERRVAL
# define PSEUDO_END_ERRVAL(name) \
END (name)
# define ret_ERRVAL ret
# if !IS_IN (libc)
# define SYSCALL_ERROR .Lsyscall_error
# if RTLD_PRIVATE_ERRNO
# define SYSCALL_ERROR_HANDLER \
.Lsyscall_error: \
adrp x1, C_SYMBOL_NAME(rtld_errno); \
neg w0, w0; \
str w0, [x1, :lo12:C_SYMBOL_NAME(rtld_errno)]; \
mov x0, -1; \
RET;
# else
# define SYSCALL_ERROR_HANDLER \
.Lsyscall_error: \
adrp x1, :gottprel:errno; \
neg w2, w0; \
ldr PTR_REG(1), [x1, :gottprel_lo12:errno]; \
mrs x3, tpidr_el0; \
mov x0, -1; \
str w2, [x1, x3]; \
RET;
# endif
# else
# define SYSCALL_ERROR __syscall_error
# define SYSCALL_ERROR_HANDLER \
.Lsyscall_error: \
b __syscall_error;
# endif
/* Linux takes system call args in registers:
syscall number x8
arg 1 x0
arg 2 x1
arg 3 x2
arg 4 x3
arg 5 x4
arg 6 x5
arg 7 x6
The compiler is going to form a call by coming here, through PSEUDO, with
arguments
syscall number in the DO_CALL macro
arg 1 x0
arg 2 x1
arg 3 x2
arg 4 x3
arg 5 x4
arg 6 x5
arg 7 x6
*/
# undef DO_CALL
# define DO_CALL(syscall_name, args) \
mov x8, SYS_ify (syscall_name); \
svc 0
/* Clear ZA state of SME (ASM version). */
/* The __libc_arm_za_disable function has special calling convention
that allows to call it without stack manipulation and preserving
most of the registers. */
.macro CALL_LIBC_ARM_ZA_DISABLE
mov x13, x30
.cfi_register x30, x13
bl __libc_arm_za_disable
mov x30, x13
.cfi_register x13, x30
.endm
#else /* not __ASSEMBLER__ */
# define VDSO_NAME "LINUX_2.6.39"
# define VDSO_HASH 123718537
/* List of system calls which are supported as vsyscalls. */
# define HAVE_CLOCK_GETRES64_VSYSCALL "__kernel_clock_getres"
# define HAVE_CLOCK_GETTIME64_VSYSCALL "__kernel_clock_gettime"
# define HAVE_GETTIMEOFDAY_VSYSCALL "__kernel_gettimeofday"
# define HAVE_GETRANDOM_VSYSCALL "__kernel_getrandom"
# define HAVE_CLONE3_WRAPPER 1
# undef INTERNAL_SYSCALL_RAW
# define INTERNAL_SYSCALL_RAW(name, nr, args...) \
({ long _sys_result; \
{ \
LOAD_ARGS_##nr (args) \
register long _x8 asm ("x8") = (name); \
asm volatile ("svc 0 // syscall " # name \
: "=r" (_x0) : "r"(_x8) ASM_ARGS_##nr : "memory"); \
_sys_result = _x0; \
} \
_sys_result; })
# undef INTERNAL_SYSCALL
# define INTERNAL_SYSCALL(name, nr, args...) \
INTERNAL_SYSCALL_RAW(SYS_ify(name), nr, args)
# undef INTERNAL_SYSCALL_AARCH64
# define INTERNAL_SYSCALL_AARCH64(name, nr, args...) \
INTERNAL_SYSCALL_RAW(__ARM_NR_##name, nr, args)
# define LOAD_ARGS_0() \
register long _x0 asm ("x0");
# define LOAD_ARGS_1(x0) \
long _x0tmp = (long) (x0); \
LOAD_ARGS_0 () \
_x0 = _x0tmp;
# define LOAD_ARGS_2(x0, x1) \
long _x1tmp = (long) (x1); \
LOAD_ARGS_1 (x0) \
register long _x1 asm ("x1") = _x1tmp;
# define LOAD_ARGS_3(x0, x1, x2) \
long _x2tmp = (long) (x2); \
LOAD_ARGS_2 (x0, x1) \
register long _x2 asm ("x2") = _x2tmp;
# define LOAD_ARGS_4(x0, x1, x2, x3) \
long _x3tmp = (long) (x3); \
LOAD_ARGS_3 (x0, x1, x2) \
register long _x3 asm ("x3") = _x3tmp;
# define LOAD_ARGS_5(x0, x1, x2, x3, x4) \
long _x4tmp = (long) (x4); \
LOAD_ARGS_4 (x0, x1, x2, x3) \
register long _x4 asm ("x4") = _x4tmp;
# define LOAD_ARGS_6(x0, x1, x2, x3, x4, x5) \
long _x5tmp = (long) (x5); \
LOAD_ARGS_5 (x0, x1, x2, x3, x4) \
register long _x5 asm ("x5") = _x5tmp;
# define LOAD_ARGS_7(x0, x1, x2, x3, x4, x5, x6)\
long _x6tmp = (long) (x6); \
LOAD_ARGS_6 (x0, x1, x2, x3, x4, x5) \
register long _x6 asm ("x6") = _x6tmp;
# define ASM_ARGS_0
# define ASM_ARGS_1 , "r" (_x0)
# define ASM_ARGS_2 ASM_ARGS_1, "r" (_x1)
# define ASM_ARGS_3 ASM_ARGS_2, "r" (_x2)
# define ASM_ARGS_4 ASM_ARGS_3, "r" (_x3)
# define ASM_ARGS_5 ASM_ARGS_4, "r" (_x4)
# define ASM_ARGS_6 ASM_ARGS_5, "r" (_x5)
# define ASM_ARGS_7 ASM_ARGS_6, "r" (_x6)
# undef INTERNAL_SYSCALL_NCS
# define INTERNAL_SYSCALL_NCS(number, nr, args...) \
INTERNAL_SYSCALL_RAW (number, nr, args)
#undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
#define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
/* Clear ZA state of SME (C version). */
/* The __libc_arm_za_disable function has special calling convention
that allows to call it without stack manipulation and preserving
most of the registers. */
#define CALL_LIBC_ARM_ZA_DISABLE() \
({ \
unsigned long int __tmp; \
asm volatile ( \
" mov %0, x30\n" \
" .cfi_register x30, %0\n" \
" bl __libc_arm_za_disable\n" \
" mov x30, %0\n" \
" .cfi_register %0, x30\n" \
: "=r" (__tmp) \
: \
: "x14", "x15", "x16", "x17", "x18", "memory" ); \
})
/* Do clear ZA state of SME before making normal clone syscall. */
#define INLINE_CLONE_SYSCALL(a0, a1, a2, a3, a4) \
({ \
CALL_LIBC_ARM_ZA_DISABLE (); \
INLINE_SYSCALL_CALL (clone, a0, a1, a2, a3, a4); \
})
#endif /* __ASSEMBLER__ */
#endif /* linux/aarch64/sysdep.h */