mirror of
				https://sourceware.org/git/glibc.git
				synced 2025-11-03 20:53:13 +03:00 
			
		
		
		
	This change adds a call to the __arm_za_disable() function immediately before the SVC instruction inside clone() and clone3() wrappers. It also adds a macro for inline clone() used in fork() and adds the same call to the vfork implementation. This sets the ZA state of SME to "off" on return from these functions (for both the child and the parent). The __arm_za_disable() function is described in [1] (8.1.3). Note that the internal Glibc name for this function is __libc_arm_za_disable(). When this change was originally proposed [2,3], it generated a long discussion where several questions and concerns were raised. Here we will address these concerns and explain why this change is useful and, in fact, necessary. In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent to this change, mainly, the chapters 6.2 and 6.6), should have a call to the __arm_za_disable() function in clone() and clone3() wrappers. The following explains in detail why this is the case. When we consider using the __arm_za_disable() function inside the clone() and clone3() libc wrappers, we talk about the C library subroutines clone() and clone3() rather than the syscalls with similar names. In the current version of Glibc, clone() is public and clone3() is private, but it being private is not pertinent to this discussion. We will begin with stating that this change is NOT a bug fix for something in the kernel. The requirement to call __arm_za_disable() does NOT come from the kernel. It also is NOT needed to satisfy a contract between the kernel and userspace. This is why it is not for the kernel documentation to describe this requirement. This requirement is instead needed to satisfy a pure userspace scheme outlined in [1] and to make sure that software that uses Glibc (or any other C library that has correct handling of SME states (see below)) conforms to [1] without having to unnecessarily become SME-aware thus losing portability. To recap (see [1] (6.2)), SME extension defines SME state which is part of processor state. Part of this SME state is ZA state that is necessary to manage ZA storage register in the context of the ZA lazy saving scheme [1] (6.6). This scheme exists because it would be challenging to handle ZA storage of SME in either callee-saved or caller-saved manner. There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA bit and the TPIDR2_EL0 register (see [1] (6.6.3)): - "off": PSTATE.ZA == 0 - "active": PSTATE.ZA == 1 TPIDR2_EL0 == null - "dormant": PSTATE.ZA == 1 TPIDR2_EL0 != null As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface depending on the permitted ZA-states on entry and on normal return from a call to this subroutine. Callers of a subroutine must know and respect the ZA-interface of the subroutines they are using. Using a subroutine in a way that is not permitted by its ZA-interface is undefined behaviour. In particular, clone() and clone3() (the C library functions) have the ZA-private interface. This means that the permitted ZA-states on entry are "off" and "dormant" and that the permitted states on return are "off" or "dormant" (but if and only if it was "dormant" on entry). This means that both functions in question should correctly handle both "off" and "dormant" ZA-states on entry. The conforming states on return are "off" and "dormant" (if inbound state was already "dormant"). This change ensures that the ZA-state on return is always "off". Note, that, in the context of clone() and clone3(), "on return" means a point when execution resumes at certain address after transferring from clone() or clone3(). For the caller (we may refer to it as "parent") this is the return address in the link register where the RET instruction jumps. For the "child", this is the target branch address. So, the "off" state on return is permitted and conformant. Why can't we retain the "dormant" state? In theory, we can, but we shouldn't, here is why. Every subroutine with a private-ZA interface, including clone() and clone3(), must comply with the lazy saving scheme [1] (6.7.2). This puts additional responsibility on a subroutine if ZA-state on return is "dormant" because this state has special meaning. The "caller" (that is the place in code where execution is transferred to, so this include both "parent" and "child") may check the ZA-state and use it as per the spec of the "dormant" state that is outlined in [1] (6.6.6 and 6.6.7). Conforming to this would require more code inside of clone() and clone3() which hardly is desirable. For the return to "parent" this could be achieved in theory, but given that neither clone() nor clone3() are supposed to be used in the middle of an SME operation, if wouldn't be useful. For the "return" to "child" this would be particularly difficult to achieve given the complexity of these functions and their interfaces. Most importantly, it would be illegal and somewhat meaningless to allow a "child" to start execution in the "dormant" ZA-state because the very essence of the "dormant" state implies that there is a place to return and that there is some outer context that we are allowed to interact with. To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the execution resumes after a call to clone() or clone3() is correct and also the most simple way to conform to [1]. Can there be situations when we can avoid calling __arm_za_disable()? Calling __arm_za_disable() implies certain (sufficiently small) overhead, so one might rightly ponder avoiding making a call to this function when we can afford not to. The most trivial cases like this (e.g. when the calling thread doesn't have access to SME or to the TPIDR2_EL0 register) are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning about other possible use cases would require making code inside clone() and clone3() more complicated and it would defeat the point of trying to make an optimisation of not calling __arm_za_disable(). Why can't the kernel do this instead? The handling of SME state by the kernel is described in [4]. In short, kernel must not impose a specific ZA-interface onto a userspace function. Interaction with the kernel happens (among other thing) via system calls. In Glibc many of the system calls (notably, including SYS_clone and SYS_clone3) are used via wrappers, and the kernel has no control of them and, moreover, it cannot dictate how these wrappers should behave because it is simply outside of the kernel's remit. However, in certain cases, the kernel may ensure that a "child" doesn't start in an incorrect state. This is what is done by the recent change included in 6.16 kernel [5]. This is not enough to ensure that code that uses clone() and clone3() function conforms to [1] when it runs on a system that provides SME, hence this change. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst [2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com [3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com [4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html [5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
		
			
				
	
	
		
			273 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			C
		
	
	
	
	
	
			
		
		
	
	
			273 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			C
		
	
	
	
	
	
/* Copyright (C) 2005-2025 Free Software Foundation, Inc.
 | 
						|
 | 
						|
   This file is part of the GNU C Library.
 | 
						|
 | 
						|
   The GNU C Library is free software; you can redistribute it and/or
 | 
						|
   modify it under the terms of the GNU Lesser General Public License as
 | 
						|
   published by the Free Software Foundation; either version 2.1 of the
 | 
						|
   License, or (at your option) any later version.
 | 
						|
 | 
						|
   The GNU C Library is distributed in the hope that it will be useful,
 | 
						|
   but WITHOUT ANY WARRANTY; without even the implied warranty of
 | 
						|
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 | 
						|
   Lesser General Public License for more details.
 | 
						|
 | 
						|
   You should have received a copy of the GNU Lesser General Public
 | 
						|
   License along with the GNU C Library; if not, see
 | 
						|
   <https://www.gnu.org/licenses/>.  */
 | 
						|
 | 
						|
#ifndef _LINUX_AARCH64_SYSDEP_H
 | 
						|
#define _LINUX_AARCH64_SYSDEP_H 1
 | 
						|
 | 
						|
#include <sysdeps/unix/sysdep.h>
 | 
						|
#include <sysdeps/aarch64/sysdep.h>
 | 
						|
#include <sysdeps/unix/sysdep.h>
 | 
						|
#include <sysdeps/unix/sysv/linux/sysdep.h>
 | 
						|
 | 
						|
/* Defines RTLD_PRIVATE_ERRNO and USE_DL_SYSINFO.  */
 | 
						|
#include <dl-sysdep.h>
 | 
						|
 | 
						|
#include <tls.h>
 | 
						|
 | 
						|
/* In order to get __set_errno() definition in INLINE_SYSCALL.  */
 | 
						|
#ifndef __ASSEMBLER__
 | 
						|
#include <errno.h>
 | 
						|
#endif
 | 
						|
 | 
						|
/* For Linux we can use the system call table in the header file
 | 
						|
	/usr/include/asm/unistd.h
 | 
						|
   of the kernel.  But these symbols do not follow the SYS_* syntax
 | 
						|
   so we have to redefine the `SYS_ify' macro here.  */
 | 
						|
#undef SYS_ify
 | 
						|
#define SYS_ify(syscall_name)	(__NR_##syscall_name)
 | 
						|
 | 
						|
#ifdef __ASSEMBLER__
 | 
						|
 | 
						|
/* Linux uses a negative return value to indicate syscall errors,
 | 
						|
   unlike most Unices, which use the condition codes' carry flag.
 | 
						|
 | 
						|
   Since version 2.1 the return value of a system call might be
 | 
						|
   negative even if the call succeeded.  E.g., the `lseek' system call
 | 
						|
   might return a large offset.  Therefore we must not anymore test
 | 
						|
   for < 0, but test for a real error by making sure the value in R0
 | 
						|
   is a real error number.  Linus said he will make sure the no syscall
 | 
						|
   returns a value in -1 .. -4095 as a valid result so we can safely
 | 
						|
   test with -4095.  */
 | 
						|
 | 
						|
# undef	PSEUDO
 | 
						|
# define PSEUDO(name, syscall_name, args)				      \
 | 
						|
  .text;								      \
 | 
						|
  ENTRY (name);								      \
 | 
						|
    DO_CALL (syscall_name, args);					      \
 | 
						|
    cmn x0, #4095;							      \
 | 
						|
    b.cs .Lsyscall_error;
 | 
						|
 | 
						|
# undef	PSEUDO_END
 | 
						|
# define PSEUDO_END(name)						      \
 | 
						|
  SYSCALL_ERROR_HANDLER							      \
 | 
						|
  END (name)
 | 
						|
 | 
						|
# undef	PSEUDO_NOERRNO
 | 
						|
# define PSEUDO_NOERRNO(name, syscall_name, args)			      \
 | 
						|
  .text;								      \
 | 
						|
  ENTRY (name);								      \
 | 
						|
    DO_CALL (syscall_name, args);
 | 
						|
 | 
						|
# undef	PSEUDO_END_NOERRNO
 | 
						|
# define PSEUDO_END_NOERRNO(name)					      \
 | 
						|
  END (name)
 | 
						|
 | 
						|
# define ret_NOERRNO ret
 | 
						|
 | 
						|
/* The function has to return the error code.  */
 | 
						|
# undef	PSEUDO_ERRVAL
 | 
						|
# define PSEUDO_ERRVAL(name, syscall_name, args) \
 | 
						|
  .text;								      \
 | 
						|
  ENTRY (name)								      \
 | 
						|
    DO_CALL (syscall_name, args);					      \
 | 
						|
    neg x0, x0
 | 
						|
 | 
						|
# undef	PSEUDO_END_ERRVAL
 | 
						|
# define PSEUDO_END_ERRVAL(name) \
 | 
						|
  END (name)
 | 
						|
 | 
						|
# define ret_ERRVAL ret
 | 
						|
 | 
						|
# if !IS_IN (libc)
 | 
						|
#  define SYSCALL_ERROR  .Lsyscall_error
 | 
						|
#  if RTLD_PRIVATE_ERRNO
 | 
						|
#   define SYSCALL_ERROR_HANDLER				\
 | 
						|
.Lsyscall_error:						\
 | 
						|
	adrp	x1, C_SYMBOL_NAME(rtld_errno);			\
 | 
						|
	neg     w0, w0;						\
 | 
						|
	str     w0, [x1, :lo12:C_SYMBOL_NAME(rtld_errno)];	\
 | 
						|
	mov	x0, -1;						\
 | 
						|
	RET;
 | 
						|
#  else
 | 
						|
 | 
						|
#   define SYSCALL_ERROR_HANDLER				\
 | 
						|
.Lsyscall_error:						\
 | 
						|
	adrp	x1, :gottprel:errno;				\
 | 
						|
	neg	w2, w0;						\
 | 
						|
	ldr	PTR_REG(1), [x1, :gottprel_lo12:errno];		\
 | 
						|
	mrs	x3, tpidr_el0;					\
 | 
						|
	mov	x0, -1;						\
 | 
						|
	str	w2, [x1, x3];					\
 | 
						|
	RET;
 | 
						|
#  endif
 | 
						|
# else
 | 
						|
#  define SYSCALL_ERROR __syscall_error
 | 
						|
#  define SYSCALL_ERROR_HANDLER                                 \
 | 
						|
.Lsyscall_error:                                                \
 | 
						|
	b	__syscall_error;
 | 
						|
# endif
 | 
						|
 | 
						|
/* Linux takes system call args in registers:
 | 
						|
	syscall number	x8
 | 
						|
	arg 1		x0
 | 
						|
	arg 2		x1
 | 
						|
	arg 3		x2
 | 
						|
	arg 4		x3
 | 
						|
	arg 5		x4
 | 
						|
	arg 6		x5
 | 
						|
	arg 7		x6
 | 
						|
 | 
						|
   The compiler is going to form a call by coming here, through PSEUDO, with
 | 
						|
   arguments
 | 
						|
	syscall number	in the DO_CALL macro
 | 
						|
	arg 1		x0
 | 
						|
	arg 2		x1
 | 
						|
	arg 3		x2
 | 
						|
	arg 4		x3
 | 
						|
	arg 5		x4
 | 
						|
	arg 6		x5
 | 
						|
	arg 7		x6
 | 
						|
 | 
						|
*/
 | 
						|
 | 
						|
# undef	DO_CALL
 | 
						|
# define DO_CALL(syscall_name, args)		\
 | 
						|
    mov x8, SYS_ify (syscall_name);		\
 | 
						|
    svc 0
 | 
						|
 | 
						|
/* Clear ZA state of SME (ASM version).  */
 | 
						|
/* The __libc_arm_za_disable function has special calling convention
 | 
						|
   that allows to call it without stack manipulation and preserving
 | 
						|
   most of the registers.  */
 | 
						|
	.macro CALL_LIBC_ARM_ZA_DISABLE
 | 
						|
	mov		x13, x30
 | 
						|
	.cfi_register	x30, x13
 | 
						|
	bl		__libc_arm_za_disable
 | 
						|
	mov		x30, x13
 | 
						|
	.cfi_register	x13, x30
 | 
						|
	.endm
 | 
						|
 | 
						|
#else /* not __ASSEMBLER__ */
 | 
						|
 | 
						|
# define VDSO_NAME  "LINUX_2.6.39"
 | 
						|
# define VDSO_HASH  123718537
 | 
						|
 | 
						|
/* List of system calls which are supported as vsyscalls.  */
 | 
						|
# define HAVE_CLOCK_GETRES64_VSYSCALL	"__kernel_clock_getres"
 | 
						|
# define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
 | 
						|
# define HAVE_GETTIMEOFDAY_VSYSCALL	"__kernel_gettimeofday"
 | 
						|
# define HAVE_GETRANDOM_VSYSCALL        "__kernel_getrandom"
 | 
						|
 | 
						|
# define HAVE_CLONE3_WRAPPER		1
 | 
						|
 | 
						|
# undef INTERNAL_SYSCALL_RAW
 | 
						|
# define INTERNAL_SYSCALL_RAW(name, nr, args...)		\
 | 
						|
  ({ long _sys_result;						\
 | 
						|
     {								\
 | 
						|
       LOAD_ARGS_##nr (args)					\
 | 
						|
       register long _x8 asm ("x8") = (name);			\
 | 
						|
       asm volatile ("svc	0	// syscall " # name     \
 | 
						|
		     : "=r" (_x0) : "r"(_x8) ASM_ARGS_##nr : "memory");	\
 | 
						|
       _sys_result = _x0;					\
 | 
						|
     }								\
 | 
						|
     _sys_result; })
 | 
						|
 | 
						|
# undef INTERNAL_SYSCALL
 | 
						|
# define INTERNAL_SYSCALL(name, nr, args...)			\
 | 
						|
	INTERNAL_SYSCALL_RAW(SYS_ify(name), nr, args)
 | 
						|
 | 
						|
# undef INTERNAL_SYSCALL_AARCH64
 | 
						|
# define INTERNAL_SYSCALL_AARCH64(name, nr, args...)		\
 | 
						|
	INTERNAL_SYSCALL_RAW(__ARM_NR_##name, nr, args)
 | 
						|
 | 
						|
# define LOAD_ARGS_0()				\
 | 
						|
  register long _x0 asm ("x0");
 | 
						|
# define LOAD_ARGS_1(x0)			\
 | 
						|
  long _x0tmp = (long) (x0);			\
 | 
						|
  LOAD_ARGS_0 ()				\
 | 
						|
  _x0 = _x0tmp;
 | 
						|
# define LOAD_ARGS_2(x0, x1)			\
 | 
						|
  long _x1tmp = (long) (x1);			\
 | 
						|
  LOAD_ARGS_1 (x0)				\
 | 
						|
  register long _x1 asm ("x1") = _x1tmp;
 | 
						|
# define LOAD_ARGS_3(x0, x1, x2)		\
 | 
						|
  long _x2tmp = (long) (x2);			\
 | 
						|
  LOAD_ARGS_2 (x0, x1)				\
 | 
						|
  register long _x2 asm ("x2") = _x2tmp;
 | 
						|
# define LOAD_ARGS_4(x0, x1, x2, x3)		\
 | 
						|
  long _x3tmp = (long) (x3);			\
 | 
						|
  LOAD_ARGS_3 (x0, x1, x2)			\
 | 
						|
  register long _x3 asm ("x3") = _x3tmp;
 | 
						|
# define LOAD_ARGS_5(x0, x1, x2, x3, x4)	\
 | 
						|
  long _x4tmp = (long) (x4);			\
 | 
						|
  LOAD_ARGS_4 (x0, x1, x2, x3)			\
 | 
						|
  register long _x4 asm ("x4") = _x4tmp;
 | 
						|
# define LOAD_ARGS_6(x0, x1, x2, x3, x4, x5)	\
 | 
						|
  long _x5tmp = (long) (x5);			\
 | 
						|
  LOAD_ARGS_5 (x0, x1, x2, x3, x4)		\
 | 
						|
  register long _x5 asm ("x5") = _x5tmp;
 | 
						|
# define LOAD_ARGS_7(x0, x1, x2, x3, x4, x5, x6)\
 | 
						|
  long _x6tmp = (long) (x6);			\
 | 
						|
  LOAD_ARGS_6 (x0, x1, x2, x3, x4, x5)		\
 | 
						|
  register long _x6 asm ("x6") = _x6tmp;
 | 
						|
 | 
						|
# define ASM_ARGS_0
 | 
						|
# define ASM_ARGS_1	, "r" (_x0)
 | 
						|
# define ASM_ARGS_2	ASM_ARGS_1, "r" (_x1)
 | 
						|
# define ASM_ARGS_3	ASM_ARGS_2, "r" (_x2)
 | 
						|
# define ASM_ARGS_4	ASM_ARGS_3, "r" (_x3)
 | 
						|
# define ASM_ARGS_5	ASM_ARGS_4, "r" (_x4)
 | 
						|
# define ASM_ARGS_6	ASM_ARGS_5, "r" (_x5)
 | 
						|
# define ASM_ARGS_7	ASM_ARGS_6, "r" (_x6)
 | 
						|
 | 
						|
# undef INTERNAL_SYSCALL_NCS
 | 
						|
# define INTERNAL_SYSCALL_NCS(number, nr, args...)	\
 | 
						|
	INTERNAL_SYSCALL_RAW (number, nr, args)
 | 
						|
 | 
						|
#undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
 | 
						|
#define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
 | 
						|
 | 
						|
/* Clear ZA state of SME (C version).  */
 | 
						|
/* The __libc_arm_za_disable function has special calling convention
 | 
						|
   that allows to call it without stack manipulation and preserving
 | 
						|
   most of the registers.  */
 | 
						|
#define CALL_LIBC_ARM_ZA_DISABLE()			\
 | 
						|
({							\
 | 
						|
  unsigned long int __tmp;				\
 | 
						|
  asm volatile (					\
 | 
						|
  "	mov		%0, x30\n"			\
 | 
						|
  "	.cfi_register	x30, %0\n"			\
 | 
						|
  "	bl		__libc_arm_za_disable\n"	\
 | 
						|
  "	mov		x30, %0\n"			\
 | 
						|
  "	.cfi_register	%0, x30\n"			\
 | 
						|
  : "=r" (__tmp)					\
 | 
						|
  :							\
 | 
						|
  : "x14", "x15", "x16", "x17", "x18", "memory" );	\
 | 
						|
})
 | 
						|
 | 
						|
/* Do clear ZA state of SME before making normal clone syscall.  */
 | 
						|
#define INLINE_CLONE_SYSCALL(a0, a1, a2, a3, a4)	\
 | 
						|
({							\
 | 
						|
  CALL_LIBC_ARM_ZA_DISABLE ();				\
 | 
						|
  INLINE_SYSCALL_CALL (clone, a0, a1, a2, a3, a4);	\
 | 
						|
})
 | 
						|
 | 
						|
#endif	/* __ASSEMBLER__ */
 | 
						|
 | 
						|
#endif /* linux/aarch64/sysdep.h */
 |