1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-08-08 17:42:12 +03:00
Commit Graph

16973 Commits

Author SHA1 Message Date
Claudiu Zissulescu
86fbb4cbb8 sframe: Add support for SFRAME_F_FDE_FUNC_START_PCREL flag
The Sframe V2 has a new errata which introduces the
SFRAME_F_FDE_FUNC_START_PCREL flag. This flag indicates the encoding
of the SFrame FDE function start address field like this:

- if set, sfde_func_start_address field contains the offset in bytes
to the start PC of the associated function from the field itself.

- if unset, sfde_func_start_address field contains the offset in bytes
to the start PC of the associated function from the start of the
SFrame section.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-07-24 19:38:50 +02:00
Adhemerval Zanella
a12d72019e Disable SFrame support by default
And add extra checks to enable for binutils 2.45 and if the architecture
explicitly enables it.  When SFrame is disabled, all the related code
is also not enabled for backtrace() and _dl_find_object(), so SFrame
backtracking is not used even if the binary has the SFrame segment.

This patch also adds some other related fixes:

  * Fixed an issue with AC_CHECK_PROG_VER, where the READELF_SFRAME
    usage prevented specifying a different readelf through READELF
    environment variable at configure time.

  * Add an extra arch-specific internal definition,
    libc_cv_support_sframe, to disable --enable-sframe on architectures
    that have binutils but not glibc support (s390x).

  * Renamed the tests without the .sframe segment and move the
    tst-backtrace1 from pthread to debug.

  * Use the built compiler strip to remove the .sframe segment,
    instead of the system one (which might not support SFrame).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Sam James <sam@gentoo.org>
2025-07-24 19:38:47 +02:00
Florian Weimer
0f93d54cde Revert "Linux: Keep termios ioctl constants strictly internal"
This reverts commit 3d3572f590.

Reason for revert: TCGETS etc. work to some extent on at least
a subset of architectures, so there is no pressing need to force
applications off them.  Removal of the macros breaks building
the sanitizers, impacting both GCC and LLVM.

Reviewed-by: Sam James <sam@gentoo.org>
2025-07-21 15:13:08 +02:00
H.J. Lu
aec8498873 x86-64: Properly compile ISA optimized modf and modff
There are 3 variants of modf and modff: SSE2, SSE4.1 and AVX.  s_modf.c
and s_modff.c include the generic implementation compiled with the minimum
x86 ISA level.  The IFUNC selector is used only if the minimum ISA level
is less than AVX.  SSE4.1 variant is included only if the ISA level is
less than SSE4.1.  AVX variant is included only the ISA level is less than
AVX.

AVX variant should be compiled with -mavx, not -msse2avx -DSSE2AVX which
are used to encode SSE assembly sources with EVEX encoding.

The routines that are shared between libc and libm should use different
rules to avoid using the same MODULE_NAME, to avoid potential issues
like BZ #33165 where __stack_chk_fail not being routed to the internal
symbol.

Tested with -march=x86-64, -march=x86-64-v2, -march=x86-64-v3 and
-march=x86-64-v4.

This fixes BZ #33165 and BZ #33173.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-07-18 10:22:19 -07:00
H.J. Lu
13bf7812ef x86-64: Compile ISA versions of modf/modff with -fno-stack-protector
Since modf and modff are compiled into both libc and libm, when glibc is
configured with --enable-stack-protector=all, ISA versions of modf and
modff should be compiled with -fno-stack-protector to avoid calling
__stack_chk_fail via PLT in libc.so.

This fixes BZ #33165.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-07-17 05:49:47 -07:00
Claudiu Zissulescu
3360913c37 elf: Add SFrame stack tracing
This patch adds the necessary bits to enable stack tracing using
SFrame.  In the case the new SFrame stack tracing procedure doesn't
find SFrame related info, the stack tracing falls back on default
Dwarf implementation.

The new SFrame stack tracing procedure is added to debug/backtrace.c
file, the support functions are added in sysdeps folder, namely
sframe.h, read-sframe.c and read-sfame.h.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-07-14 10:56:37 +01:00
Claudiu Zissulescu
b231c21fc6 aarch64: Add SFrame support for aarch64 architecture
The SFrame is supported for AArch64 architecture.
    Enable SFrame stack tracer for AArch64 too.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-07-14 10:56:36 +01:00
Claudiu Zissulescu
170206b641 x86: Add SFrame support for x86 architecture
The SFrame is well supported by x86 architecture since binutils 2.41.
Enable it to be used as default frame tracer.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-07-14 10:56:36 +01:00
Adhemerval Zanella
c055c54e96 x86_64: Optimize modf/modff for x86_64-v2
The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size.  On Ryzen 9 (zen3) with
gcc 14.2.1:

x86_64-v2
reciprocal-throughput        master        patch       difference
workload-0_1                 7.9610       7.7914            2.13%
workload-1_maxint            9.4323       7.8021           17.28%
workload-maxint_maxfloat     8.7379       7.8049           10.68%
workload-integral            7.9492       7.7991            1.89%

latency                      master        patch       difference
workload-0_1                 7.9511      10.8910          -36.97%
workload-1_maxint           15.8278      10.9048           31.10%
workload-maxint_maxfloat    11.3495      10.9139            3.84%
workload-integral           11.5938      10.9071            5.92%

x86_64-v3
reciprocal-throughput        master        patch       difference
workload-0_1                 8.7522       7.9781            8.84%
workload-1_maxint            9.6690       7.9872           17.39%
workload-maxint_maxfloat     8.7634       7.9857            8.87%
workload-integral            8.7397       7.9893            8.59%

latency                      master        patch       difference
workload-0_1                 8.7447       9.5589           -9.31%
workload-1_maxint           13.7480       9.5690           30.40%
workload-maxint_maxfloat    10.0092       9.5680            4.41%
workload-integral            9.7518       9.5743            1.82%

For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.

Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-07-11 13:01:31 -03:00
Florian Weimer
3d3572f590 Linux: Keep termios ioctl constants strictly internal
Undefine TCGETS, TCGETS2, and related ioctl constants in the installed
headers.  Extract the correct constants (using the kernel type
definitions) automatically from the UAPI headers.  The kernel
constants are available under KERNEL_* names during the glibc build,
computed using assembler constant extraction mechanism.

Alpha may have to use TCGETS instead of TCGETS2 because TCTGETS2
became available in Linux 4.20 only.  Introduce ARCH_TCGETS to make
this choice explict.

To support emulation on powerpc, glibc versions of the termios
constants are added to the emulation code in internal-ioctl.h.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-07-11 16:04:07 +02:00
Andreas Schwab
d6c2760ef7 Remove termios2 ioctl defintions from public headers
The use of the termios2 ioctl interface is an implementation detail which
should not bleed into public headers.  Remove the PowerPC version of
<bits/ioctls.h> and define the termios2 ioctl numbers in <termios_arch.h>
instead.  Also remove the include check from there which is unneeded in an
internal header.
2025-07-10 11:39:43 +02:00
H.J. Lu
7130c2ae97 x86: Avoid vector/r16-r31 registers and memcpy/memset in mcount_internal
Since mcount_internal is called from mcount/__fentry__ which preserve
only RAX, RCX, RDX, RSI, RDI, R8 and R9, compile mcount.c with

-fno-tree-loop-distribute-patterns -mgeneral-regs-only -mno-apxf

to void vector/r16-r31 registers and memcpy/memset in mcount_internal.
This fixes BZ #33134.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-07-09 05:33:05 +08:00
Samuel Thibault
6afece738c htl: move __pthread_get_cleanup_stack to libc
This fixes the cleanup call from __qsort_r
2025-07-06 19:56:15 +00:00
Samuel Thibault
b80f108b55 htl: Drop ptr_pthread_once from pthread_functions
It is unused since ccdb68e829 ("htl: move pthread_once into libc")
2025-07-06 10:02:17 +00:00
Florian Weimer
ea85e7d550 elf: Restore support for _r_debug interpositions and copy relocations
The changes in commit a93d9e03a3
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool.  It brings its
own definition of _r_debug (rather than a declaration).

Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC.  To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play.  Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.

Only perform the copy relocation initialization if libc has been
loaded.  Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Florian Weimer
8329939a37 elf: Introduce _dl_debug_change_state
It combines updating r_state with the debugger notification.

The second change to  _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Florian Weimer
7278d11f3a elf: Introduce separate _r_debug_array variable
It replaces the ns_debug member of the namespaces.  Previously,
the base namespace had an unused ns_debug member.

This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed.  (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-05 20:15:12 +02:00
Samuel Thibault
21cbe4a588 hurd: Mark more xfails for missing RLIMIT_AS support 2025-07-05 11:13:46 +02:00
Florian Weimer
1c5f2ae4f9 Linux: Fix typo in comment in termios_internals.h 2025-07-04 13:17:31 +02:00
Adhemerval Zanella
eeb7b079d5 stdlib: Fix __libc_message_impl iovec size (BZ 32947)
The iovec size should account for all substrings between each conversion
specification.  For the format:

  "abc %s efg"

The list of substrings are:

  ["abc ", arg, " efg]

which is 2 times the number of maximum arguments *plus* one.

This issue triggered 'out of bounds' errors by stdlib/tst-bz20544 when
glibc is built with experimental UBSAN support [1].

Besides adjusting the iovec size, a new runtime and check is added to
avoid wrong __libc_message_impl usage.

Checked on x86_64-linux-gnu.

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/ubsan-undef

Co-authored-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-06-30 13:51:41 -03:00
Wilco Dijkstra
681a24ae4d AArch64: Avoid memset ifunc in cpu-features.c [BZ #33112]
During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet.  Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset.  Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-06-30 13:17:38 +00:00
Adhemerval Zanella
79bfbc93de powerpc: Remove modf optimization
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).

On power10 with gcc 14.2.1:

reciprocal-throughput        master         patch        difference
workload-0_1                 1.1351        0.9067            20.12%
workload-1_maxint            1.4230        0.9040            36.47%
workload-maxint_maxfloat     1.5038        0.9076            39.65%
workload-integral            1.1280        0.9111            19.23%

latency                      master         patch        difference
workload-0_1                 1.1440        2.7117          -137.03%
workload-1_maxint            4.0556        2.7070            33.25%
workload-maxint_maxfloat     3.2122        2.7164            15.43%
workload-integral            3.2381        2.7281            15.75%

Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
2025-06-25 15:05:30 -03:00
Adhemerval Zanella
5c2b21c478 powerpc: Remove modff optimization
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).

On power10 with gcc 14.2.1:

reciprocal-throughput        master        patch        difference
workload-0_1                 1.5210       1.3942             8.34%
workload-1_maxint            2.0926       1.3940            33.38%
workload-maxint_maxfloat     1.7851       1.3940            21.91%
workload-integral            1.5216       1.3941             8.37%

latency                      master        patch        difference
workload-0_1                 1.5928       2.6337           -65.35%
workload-1_maxint            3.2929       2.6337            20.02%
workload-maxint_maxfloat     1.9697       2.6341           -33.73%
workload-integral            2.0597       2.6337           -27.87%

Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
2025-06-25 15:05:30 -03:00
Maciej W. Rozycki
36bcbc6b5b Linux: Convert '__close_nocancel_nostatus' to a standalone handler
Make '__close_nocancel_nostatus' standalone.  This is a generic version
analogous to '__close_nocancel'.  Platforms may choose to implement an
inline variant instead where the syscall invocation code sequence is
short enough to be beneficial over a function call.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-24 21:17:25 +01:00
Maciej W. Rozycki
3b0d495ac4 Linux: Fix '__close_nocancel_nostatus' clobbering 'errno' [BZ #33035]
Fix fallout from commit c181840c93 ("Consolidate non cancellable close
call") that caused '__close_nocancel_nostatus' to clobber 'errno' on a
close(2) failure, a 2.27 regression.

The problem came from a rewrite from 'close_not_cancel_no_status' to
'__close_nocancel_nostatus' switching from an inline implementation that
used INTERNAL_SYSCALL macro (which stays away from 'errno') to a call to
'__close_nocancel' function that uses INLINE_SYSCALL_CALL macro (which
does poke at 'errno').

Implement '__close_nocancel_nostatus' in terms of INTERNAL_SYSCALL_CALL
then, which leaves 'errno' intact.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-24 21:17:25 +01:00
Xi Ruoyao
fc6f074e04 riscv: linux: Add support for getrandom vDSO
Linux kernel >= 6.16 has getrandom() in vDSO for RISC-V.  Enable the use
of it in Glibc so it would benefit the programs using the Glibc high
quality random number functions.

Link: https://git.kernel.org/torvalds/c/ee0d03053e70
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-23 11:48:15 -03:00
Andreas Schwab
9b3730a54b powerpc: use .machine power10 in POWER10 assembler sources
They were misattributed as POWER9 sources.
2025-06-23 14:40:34 +02:00
Collin Funk
b3b0d0308c hurd: Remove a duplicate entry from 'tests-unsupported'.
When building on GNU/Hurd the following warnings repeat themselves:

    ../Rules:400: target '/home/collin/obj/glibc/io/test-lfs.out' given more than once in the same rule
    ../Rules:400: target '/home/collin/obj/glibc/io/test-lfs.out' given more than once in the same rule

This is because commit 73b854e955 (hurd: Mark more memory-hungry tests
as unsupported, 2025-01-12) added it to 'tests-unsupported' even though
it was already added by decf02d382 (hurd: Mark two tests as unsupported,
2023-04-13).
Message-ID: <54dc6bf7e0dbedb1b19356f41fec843c1c523b11.1750130025.git.collin.funk1@gmail.com>
2025-06-21 14:46:53 +02:00
Collin Funk
5071149e89 hurd: Fix redefinition of 'P2ALIGN'.
When building on GNU/Hurd warnings like the following occur:

    ../sysdeps/x86_64/multiarch/strnlen-evex-base.S:53:10: warning: "P2ALIGN" redefined
       53 | # define P2ALIGN(...)   .p2align 4,, 6
          |          ^~~~~~~
    In file included from /usr/include/x86_64-gnu/mach/x86_64/syscall_sw.h:30,
                     from ../sysdeps/mach/sysdep.h:21,
                     from ../sysdeps/mach/x86/sysdep.h:31,
                     from ../sysdeps/x86_64/multiarch/strnlen-evex-base.S:24:
    /usr/include/x86_64-gnu/mach/x86_64/asm.h:78:9: note: this is the location of the previous definition
       78 | #define P2ALIGN(p2)     .p2align p2     /* gas-specific */
          |         ^~~~~~~

The fix is to undefine the macro from system headers in sysdep.h so that
it can be properly defined in assembly files where its definition
depends on whether string functions are being compiled for
wide-characters or not.
Message-ID: <721cd3a1bae1a553857db1dd69761a175f611364.1750131904.git.collin.funk1@gmail.com>
2025-06-21 14:39:36 +02:00
H.J. Lu
0ef7965e5b x86: Update tst-gnu2-tls2 tests
Update tst-gnu2-tls2 tests to set XMM0...XMM7 to all 1s in malloc to
verify that XMM registers are preserved when _dl_tlsdesc_dynamic is
called by clearing vectors with zeroed XMM registers before
_dl_tlsdesc_dynamic and using these XMM registers to clear vectors
after _dl_tlsdesc_dynamic.  This improves the BZ #31372 test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-06-19 05:46:31 +08:00
H.J. Lu
848f0e46f0 i386: Update ___tls_get_addr to preserve vector registers
Compiler generates the following instruction sequence for dynamic TLS
access:

	leal	tls_var@tlsgd(,%ebx,1), %eax
	call	___tls_get_addr@PLT

CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL.  But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.

1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:

_dl_tlsdesc_dynamic:
	/* Like all TLS resolvers, preserve call-clobbered registers.
	   We need two scratch regs anyway.  */
	subl	$32, %esp
	cfi_adjust_cfa_offset (32)

It is wrong to use

	movl	%ebx, -28(%esp)
	movl	%esp, %ebx
	cfi_def_cfa_register(%ebx)
	...
	mov	%ebx, %esp
	cfi_def_cfa_register(%esp)
	movl	-28(%esp), %ebx

to preserve EBX on stack.  Fix it with:

	movl	%ebx, 28(%esp)
	movl	%esp, %ebx
	cfi_def_cfa_register(%ebx)
	...
	mov	%ebx, %esp
	cfi_def_cfa_register(%esp)
	movl	28(%esp), %ebx

4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.

This fixes BZ #32996.

Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-19 04:30:31 +08:00
Adhemerval Zanella
f165e244e4 math: Simplify and optimize modf implementation
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).

The generic implementation generates similar code on x86_64, while
the optimization one for aarch64 (where truncf is supported as a
builtin by through frintz), the improvements are:

reciprocal-throughput           master    patch    difference
workload-0_1                    3.0595   3.0698        -0.34%
workload-1_maxint               5.1747   3.0542        40.98%
workload-maxint_maxfloat        3.4391   3.0349        11.75%
workload-integral               3.2732   3.0293         7.45%

latency                         master    patch    difference
workload-0_1                    3.5267   4.7107       -33.57%
workload-1_maxint               6.9074   4.7282        31.55%
workload-maxint_maxfloat        3.7210   4.7506       -27.67%
workload-integral               3.8634   4.8137       -24.60%

Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 15:56:40 -03:00
Adhemerval Zanella
61cc9922f3 math: Simplify and optimize modff implementation
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).

The generic implementation generates similar code for x86_64, while
the optimization path aarch64 (where truncf is supported as a builtin)
through frintz), the improvements are:

reciprocal-throughput           master     patch    difference
workload-0_1                    3.0740    3.0326         1.35%
workload-1_maxint               5.2231    3.0436        41.73%
workload-maxint_maxfloat        4.0962    3.0551        25.42%
workload-integral               3.7093    3.0612        17.47%

latency                         master     patch    difference
workload-0_1                    3.5521    4.7313       -33.20%
workload-1_maxint               6.7148    4.7314        29.54%
workload-maxint_maxfloat        4.0458    4.7518       -17.45%
workload-integral               3.9719    4.7427       -19.40%

Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 15:56:00 -03:00
Luna Lamb
6849c5b791 AArch64: Improve codegen SVE log1p helper
Improve codegen by packing coefficients.
4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh
and atanh respectively.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 17:28:51 +00:00
Dylan Fleming
dee22d2a81 AArch64: Optimise SVE FP64 Hyperbolics
Reworke SVE FP64 hyperbolics to use the SVE FEXPA
instruction.

Also update the special case handelling for large
inputs to be entirely vectorised.

Performance improvements on Neoverse V1:

cosh_sve: 19% for |x| < 709, 5x otherwise
sinh_sve: 24% for |x| < 709, 5.9x otherwise
tanh_sve: 12% for |x| < 19,  9x otherwise

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 17:28:51 +00:00
Dylan Fleming
1e3d1ddf97 AArch64: Optimize SVE exp functions
Improve performance of SVE exps by making better use
of the SVE FEXPA instruction.

Performance improvement on Neoverse V1:
exp2_sve:   21%
exp2f_sve:  24%
exp10f_sve: 23%
expm1_sve:  25%

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 17:28:51 +00:00
Adhemerval Zanella
8788bd77d6 sparc: Fix sparc32 Fix argument passing to __libc_start_main (BZ 32981)
Commit 404526ee2e changed _start to write
the last argument to __libc_start_main without taking into consideration
that the function did not create a full stack frame, which leads to
overwriting the argv[0].
2025-06-18 11:20:34 -03:00
Andreas Schwab
0dbbc44bfd Fix termios related targets
Move Linux-specific termios headers and tests from misc to termios subdir
and install newly added bits/termios-cbaud.h.
2025-06-18 16:12:43 +02:00
Yury Khrustalev
c0f0db2d59 aarch64: simplify calls to __libc_arm_za_disable in assembly
There is no functional change in this patch.

We remove stores and loads to stack, return address signing, and redundant
CFI directives before and after call to __libc_arm_za_disable().

The __libc_arm_za_disable implementation follows special calling convention
that allows to avoid most of the operations that would be necessary for a
call to a normal function (see [1] for details).

First, we rely on __libc_arm_za_disable() not clobbering certain registers,
and we put return address into one of these registers. Now we don't need
to store it on stack, so we don't need to sign return address using PAC.

Second, as a result of the above, we don't need to update the CFI offset.

This patch provides small optimisation avoiding unnecessary store and load
on stack also simplifies assembly code and CFI directives.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-18 09:42:33 +01:00
Yury Khrustalev
eeedfc2f74 aarch64: GCS: use internal struct in __alloc_gcs
No functional change here, just a small refactoring to simplify
using __alloc_gcs() for allocating shadow stacks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-18 09:37:13 +01:00
Andreas Schwab
eae5bb0f60 powerpc: Remove assembler workarounds
Now that we require at least binutils 2.39 the support for POWER9 and
POWER10 instructions can be assumed.
2025-06-18 09:29:10 +02:00
Jeremy Harris
9f680bfe9b Add TCPI_OPT_USEC_TS from Linux 6.14 and TCPI_OPT_TFO_CHILD from 6.15 to netinet/tcp.h.
This patch adds the TCPI_OPT_USEC_TS constant from Linux 6.14 to
sysdeps/gnu/netinet/tcp.h

This patch adds the TCPI_OPT_TFO_CHILD constant from Linux 6.15 to
sysdeps/gnu/netinet/tcp.h

Signed-off-by: Jeremy Harris <jgh@exim.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:57:44 -03:00
H. Peter Anvin (Intel)
964cf50bef linux/termios: regression test for termios speed functions
Test that runs through a fairly large combination of the various
termios speed functions, for the new speed_t interface, for the old
speed_t interface (if enabled), and for the new baud_t interface.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-06-17 09:57:44 -03:00
H. Peter Anvin (Intel)
be413adedf termios: unify the naming of the termios speed fields
The generic code has __ispeed and __ospeed; Linux has c_ispeed and
c_ospeed. Use an anonymous union member to allow both set of names on
all platforms.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
5f138519eb termios: add new baud_t interface, defined to be explicitly numeric
Add an explicitly numeric interface for baudrate setting. For glibc,
this only announces what is a fair accompli, but this is a plausible
way forward for standardization, and may be possible to infill on
non-compliant systems. The POSIX committee has stated:

[https://www.austingroupbugs.net/view.php?id=1916#c7135]

	A future version of this standard is expected to add at least
	the following symbolic constants for use as values of objects
	of type speed_t: B57600, B115200, B230400, B460800, and
	B921600.

	Implementations are encouraged to propose additional
	interfaces which will make it possible to set and query a
	wider range of speeds than just those enumerated by the
	constants beginning with B. If a set of common interfaces
	emerges between several implementations, a future version of
	this standard will likely add those interfaces.

This is exactly that interface.

The use of the term "baud" is due to the need to have a term
contrasting "speed", and it is already well established as a legacy
term -- including in the names of the legacy Bxxx
constants. Futhermore, it *is* valid from the point of view that the
termios interface fundamentally emulates an RS-232 serial port as far
as the application software is concerned.

The documentation states that for the current version of glibc,
speed_t == baud_t, but explicitly declares that this may not be the
case in the future.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
ad37ecd579 termios: merge the termios baud definitions
Now all platforms unconditionally use the "sane" definitions of the
termios baud constants. Unify them into a common file.

Note: I have made them explicitly unsigned to avoid problems with
compiler warnings for comparisons of unequal signedness or
similar. These constants were historically octal on most platforms,
and so unsigned by default.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
bff11c2fa9 hurd/termios: remove USE_OLD_TTY
Hurd with USE_OLD_TTY was the only remaining platform with speed_t not
containing a proper baud rate. From the looks of it, that code has
long since bitrotted.

Remove the vestiges of USE_OLD_TTY.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
5cf101a85a linux: implement arbitrary and split speeds in termios
Linux has supported arbitrary speeds and split speeds in the kernel
since 2008 on all platforms except Alpha (fixed in 2020), but glibc
was never updated to match. This is further complicated by POSIX uses
of macros for the cf[gs]et[io]speed interfaces, rather than plain
numbers, as it really ought to have.

On most platforms, the glibc ABI includes the c_[io]speed fields in
struct termios, but they are incorrectly used. On MIPS and SPARC, they
are entirely missing.

For backwards compatibility, the kernel will still use the legacy
speed fields unless they are set to BOTHER, and will use the legacy
output speed as the input speed if the latter is 0 (== B0). However,
the specific encoding used is visible to user space applications,
including ones other than the one running.

- SPARC and MIPS get a new struct termios, and tc[gs]etattr() is
  versioned accordingly. However, the new struct termios is set to be
  a strict extension of the old one, which means that cf* interfaces
  other than the speed-related ones do not need versioning.
- The Bxxx constants are redefined as equivalent to their integer
  values and the legacy Bxxx constants are renamed __Bxxx.
- cf[gs]et[io]speed() and cfsetspeed() are versioned accordingly.
- tcgetattr() and cfset[io]speed() are adjusted to always keep the
  c_[io]speed fields correct (unlike earlier versions), but to
  canonicalize the representation to ALSO configure the legacy fields
  if a valid legacy representation exists.
- tcsetattr(), too, canonicalizes the representation in this way
  before passing it to the kernel, to maximize compatibility with
  older applications/tools.
- The old IBAUD0 hack is removed; it is no longer necessary since
  even the legacy c_cflag baud rate fields have had separate input
  values for a long time.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
5f54d8bc48 linux/termios/powerpc: deal with powerpc-unique ioctl emulation
The powerpc architecture, only, emulates the termios ioctls using the
glibc termios structure. Export the real kernel ones as the termios2
interface; although the kernel doesn't call it termios2, it is exactly
the termios2 interface, and it avoids the namespace clash between the
emulated ioctls and the real kernel ioctls.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:11:38 -03:00
H. Peter Anvin (Intel)
091256f0d1 linux/ioctls: use <linux/sockios.h> for sockios ioctls
In the kernel, these are <linux/sockios.h>. The differences between
<linux/sockios.h> and the copied data in <bits/ioctls.h> are minor;
mainly some #ifdefs, so try to use <linux/sockios.h> directly; it is
hopefully clean enough these days to use directly.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-17 09:11:38 -03:00