As discussed in bug 28327, the fromfp functions changed type in C23
(compared to the version in TS 18661-1); they now return the same type
as the floating-point argument, instead of intmax_t / uintmax_t.
As with other such incompatible changes compared to the initial TS
18661 versions of interfaces (the types of totalorder functions, in
particular), it seems appropriate to support only the new version as
an API, not the old one (although many programs written for the old
API might in fact work wtih the new one as well). Thus, the existing
implementations should become compat symbols. They are sufficiently
different from how I'd expect to implement the new version that using
separate implementations in separate files is more convenient than
trying to share code, and directly sharing testcases would be
problematic as well.
Rename the existing fromfp implementation and test files to names
reflecting how they're intended to become compat symbols, so freeing
up the existing filenames for a subsequent implementation of the C23
versions of these functions (which is the point at which the existing
implementations would actually become compat symbols).
gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I
think it will make sense to adapt the test generator to be able to
generate most tests for both versions of the functions (with extra
test inputs added that are only of interest with the C23 version).
The ldbl-opt/nldbl-* files are also not renamed; since those are for a
static only library, no compat versions are needed, and they'll just
have their contents changed when the C23 version is implemented.
Tested for x86_64, and with build-many-glibcs.py.
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.
The PowerPC optimization for PPC 601/603 (30 years old) is removed.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The only usage was for pthread_spin_lock, introduced by 12d2dd7060,
as a way to optimize the code for certain architectures. Now that atomic
builtins are used by default, let the compiler use the best code sequence
for the atomic exchange.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Now that atomic builtins are used by default, we can rely on the
compiler to define when to use 64-bit atomic operations.
It allows the use of 64-bit atomic operations on some 32-bit ABIs where
they were not previously enabled due to missing pre-processor handling:
hppa, mips64n32, s390, and sparcv9.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs, except alpha and sparc, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_RELEASE).
For alpha, it uses a 'wmb' which does not map to any of C11
barriers.
For sparc it uses a stronger 'member #LoadStore | #StoreStore',
where the release barrier maps to just 'membar #StoreLoad'. The
patch keeps the sparc definition.
For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).
Tested on aarch64-linux-gnu.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs, except alpha, powerpc, and x86_64, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the
exception of aarch64 (where the acquire fence is generated as
'dmb ishld' instead of 'dmb ish').
For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline
synchronization).
For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).
Tested on aarch64-linux-gnu, where the acquire produces a different
instruction that the current code.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs save for sparcv9 and s390 defines it to __sync_synchronize,
which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST).
For Sparc, it uses a stricter #StoreStore|#LoadStore|#StoreLoad|#LoadLoad
instead of the #StoreLoad generated by __sync_synchronize.
For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline synchronization).
The barrier is used only in one place (pthread_mutex_setprioceiling),
and using a stricter barrier for s390 is ok performance-wise.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The __HAVE_64B_ATOMICS can be define based on __WORDSIZE, and
the __ARCH_ACQ_INSTR, MUTEX_HINT_*, and barriers definition are
defined by the target cpu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This patch addresses the actual cause of CVE-2025-5745
The vector non-volatile registers are not used anymore for
32 byte load and comparison operation
Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
This patch addresses the actual cause of CVE-2025-5702
The vector non-volatile registers are not used anymore for
32 byte load and comparison operation
Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
To avoid linknamespace issues on old standards. It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
To avoid linknamespace issues on old standards. It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
This patch replaces _dl_stack_flags global variable by
_dl_stack_prot_flags.
The advantage is that any convertion from p_flags to final used mprotect
flags occurs at loading of p_flags. It avoids repeated spurious
convertions of _dl_stack_flags, for example in allocate_thread_stack.
This modification was suggested in:
https://sourceware.org/pipermail/libc-alpha/2025-March/165537.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.1351 0.9067 20.12%
workload-1_maxint 1.4230 0.9040 36.47%
workload-maxint_maxfloat 1.5038 0.9076 39.65%
workload-integral 1.1280 0.9111 19.23%
latency master patch difference
workload-0_1 1.1440 2.7117 -137.03%
workload-1_maxint 4.0556 2.7070 33.25%
workload-maxint_maxfloat 3.2122 2.7164 15.43%
workload-integral 3.2381 2.7281 15.75%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.5210 1.3942 8.34%
workload-1_maxint 2.0926 1.3940 33.38%
workload-maxint_maxfloat 1.7851 1.3940 21.91%
workload-integral 1.5216 1.3941 8.37%
latency master patch difference
workload-0_1 1.5928 2.6337 -65.35%
workload-1_maxint 3.2929 2.6337 20.02%
workload-maxint_maxfloat 1.9697 2.6341 -33.73%
workload-integral 2.0597 2.6337 -27.87%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
Compiler generates the following instruction sequence for dynamic TLS
access:
leal tls_var@tlsgd(,%ebx,1), %eax
call ___tls_get_addr@PLT
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.
1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:
_dl_tlsdesc_dynamic:
/* Like all TLS resolvers, preserve call-clobbered registers.
We need two scratch regs anyway. */
subl $32, %esp
cfi_adjust_cfa_offset (32)
It is wrong to use
movl %ebx, -28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl -28(%esp), %ebx
to preserve EBX on stack. Fix it with:
movl %ebx, 28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl 28(%esp), %ebx
4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.
This fixes BZ #32996.
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This reverts commit 3367d8e180
Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)
Tested on ppc64le without regression.
This reverts commit b9182c793c
Reason for revert: Power10 memchr clobbers v20 vector register
(Bug 33059)
This is not a security issue, unlike CVE-2025-5745 and
CVE-2025-5702.
Tested on ppc64le without regression.
This reverts commit 90bcc8721e
This change is in the chain of the final revert that fixes the CVE
i.e. 3367d8e180
Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)
Tested on ppc64le with no regressions.
This reverts commit 23f0d81608
Reason for revert: Power10 strncmp clobbers non-volatile vector
registers (Bug 33060)
Tested on ppc64le with no regressions.
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogbf>:
0: 1e260000 fmov w0, s0
4: d3577801 ubfx x1, x0, #23, #8
8: 340000e1 cbz w1, 24 <__ilogbf+0x24>
c: 5101fc20 sub w0, w1, #0x7f
10: 7103fc3f cmp w1, #0xff
14: 54000040 b.eq 1c <__ilogbf+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalidf_i>
24: 53175800 lsl w0, w0, #9
28: 340000a0 cbz w0, 3c <__ilogbf+0x3c>
2c: 5ac01000 clz w0, w0
30: 12800fc1 mov w1, #0xffffff81 // #-127
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalidf_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogb>:
0: 9e660000 fmov x0, d0
4: d374f801 ubfx x1, x0, #52, #11
8: 340000e1 cbz w1, 24 <__ilogb+0x24>
c: 510ffc20 sub w0, w1, #0x3ff
10: 711ffc3f cmp w1, #0x7ff
14: 54000040 b.eq 1c <__ilogb+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalid_i>
24: d374cc00 lsl x0, x0, #12
28: b40000a0 cbz x0, 3c <__ilogb+0x3c>
2c: dac01000 clz x0, x0
30: 12807fc1 mov w1, #0xfffffc01 // #-1023
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalid_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore.
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1). The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.
Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).
As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space). I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.
As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).
Tested for x86_64 and x86, and with build-many-glibcs.py.
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.
[1] https://www.gnu.org/software/gnulib/manual/gnulib.html#C-strings
Checked with a build for some powerpc variations.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This is only needed for -mno-secure-plt, and this linkage mode
is not supported with powerpc64 and powerp64le.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The current approach tracks math maximum supported errors by explicitly
setting them per function and architecture. On newer implementations or
new compiler versions, the file is updated with newer values if it
shows higher results. The idea is to track the maximum known error, to
update the manual with the obtained values.
The constant libm-test-ulps shows little value, where it is usually a
mechanical change done by the maintainer, for past releases it is
usually ignored whether the ulp change resulted from a compiler
regression, and the math tests already have a maximum ulp error that
triggers a regression.
It was shown by a recent update after the new acosf [1] implementation
that is correctly rounded, where the libm-test-ulps was indeed from a
compiler issue.
This patch removes all arch-specific libm-test-ulps, adds system generic
libm-test-ulps where applicable, and changes its semantics. The generic
files now track specific implementation constraints, like if it is
expected to be correctly rounded, or if the system-specific has
different error expectations.
Now multiple libm-test-ulps can be defined, and system-specific
overrides generic implementation. This is for the case where
arch-specific implementation might show worse precision than generic
implementation, for instance, the cbrtf on i686.
Regressions are only reported if the implementation shows larger errors
than 9 ulps (13 for IBM long double) unless it is overridden by
libm-test-ulps and the maximum error is not printed at the end of tests.
The regen-ulps rule is also removed since it does not make sense to
update the libm-test-ulps automatically.
The manual error table is also removed, Paul Zimmermann and others have
been tracking libm precision with a more comprehensive analysis for some
releases; so link to his work instead.
[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201
powerpc was the only architecture with arch-specific hooks for
LD_SHOW_AUXV, and with the information moved to ld diagnostics there
is no need to keep the _dl_procinfo hook.
Checked with a build for all affected ABIs.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The _dl_string_platform is moved to hwcapinfo.h, since it is only used
by hwcapinfo.c and test-get_hwcap internal test.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The ld.so diagnostics already prints AT_HWCAP values, but only in
hexadecimal. To avoid duplicating the strings, consolidate the
hwcap_names from cpu-features.h on a new file, dl-hwcap-info.h
(and it also improves the hwcap string description with more
values).
For future AT_HWCAP3/AT_HWCAP4 extensions, it is just a matter
to add them on dl-hwcap-info.c so both ld diagnostics and
tunable filtering will parse the new values.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Code used during early static startup in elf/dl-tls.c uses
__mempcpy.
Fixes commit cbd9fd2369 ("Consolidate
TLS block allocation for static binaries with ld.so").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linux bogsucker 6.1.55-gentoo-dist-hardened #1 SMP Sun Oct 1 18:03:02 UTC 2023 ppc64le POWER9 (architected), altivec supported CHRP IBM pSeries (emulated by qemu) GNU/Linux
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Linux timberdoodle 6.1.60-gentoo-dist-hardened #1 SMP Fri Dec 1 22:10:49 UTC 2023 ppc64 POWER9 (architected), altivec supported CHRP IBM pSeries (emulated by qemu) GNU/Linux
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>