glibc

lib/glibc

Fork 0

mirror of https://sourceware.org/git/glibc.git synced 2025-12-06 12:01:08 +03:00

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	8cd6efca5b	Add add_ssaaaa and sub_ssaaaa to gmp-arch.h To enable “longlong.h” removal, add_ssaaaa and sub_ssaaaa are moved to gmp-arch.h. The generic implementation now uses a static inline. This provides better type checking than the GNU extension, which casts the asm constraint; and it also works better with clang. Most architectures use the generic implementation, with except of arc, arm, hppa, x86, m68k, powerpc, and sparc. The 32 bit architectures the compiler generates good enough code using uint64_t types, where for 64 bit architecture the patch leverages the math_u128.h definitions that uses 128-bit integers when available (all 64 bit architectures on gcc 15). The strongly typed implementation required some changes. I adjusted _FP_W_TYPE, _FP_WS_TYPE, and _FP_I_TYPE to use the same type as mp_limb_t on aarch64, powerpc64le, x86_64, and riscv64. This basically means using “long” instead of “long long.” Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-26 10:10:02 -03:00
Paul Eggert	2642002380	Update copyright dates with scripts/update-copyrights	2025-01-01 11:22:09 -08:00
Adhemerval Zanella	bccb0648ea	math: Use tanf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic tanf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, and to use a generic 128 bit routine for ABIs that do not support it natively. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 82.3961 54.8052 33.49% x86_64v2 82.3415 54.8052 33.44% x86_64v3 69.3661 50.4864 27.22% i686 219.271 45.5396 79.23% aarch64 29.2127 19.1951 34.29% power10 19.5060 16.2760 16.56% reciprocal-throughput master patched improvement x86_64 28.3976 19.7334 30.51% x86_64v2 28.4568 19.7334 30.65% x86_64v3 21.1815 16.1811 23.61% i686 105.016 15.1426 85.58% aarch64 18.1573 10.7681 40.70% power10 8.7207 8.7097 0.13% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:27 -03:00

Author

SHA1

Message

Date

Adhemerval Zanella

8cd6efca5b

Add add_ssaaaa and sub_ssaaaa to gmp-arch.h

To enable “longlong.h” removal, add_ssaaaa and sub_ssaaaa are moved to
gmp-arch.h.  The generic implementation now uses a static inline.  This
provides better type checking than the GNU extension, which casts the
asm constraint; and it also works better with clang.

Most architectures use the generic implementation, with except of
arc, arm, hppa, x86, m68k, powerpc, and sparc.  The 32 bit architectures
the compiler generates good enough code using uint64_t types, where
for 64 bit architecture the patch leverages the math_u128.h definitions
that uses 128-bit integers when available (all 64 bit architectures
on gcc 15).

The strongly typed implementation required some changes.  I adjusted
_FP_W_TYPE, _FP_WS_TYPE, and _FP_I_TYPE to use the same type as
mp_limb_t on aarch64, powerpc64le, x86_64, and riscv64.  This basically
means using “long” instead of “long long.”

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>

2025-11-26 10:10:02 -03:00

Paul Eggert

2642002380

Update copyright dates with scripts/update-copyrights

2025-01-01 11:22:09 -08:00

Adhemerval Zanella

bccb0648ea

math: Use tanf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

2024-11-22 10:52:27 -03:00

3 Commits