glibc

lib/glibc

mirror of https://sourceware.org/git/glibc.git synced 2026-01-06 11:51:29 +03:00

Files

Adhemerval Zanella d1509f2ce3 math: Use acosh from CORE-MATH

The current implementation precision shows the following accuracy, on
two different ranges ([1,21) and [21, DBL_MAX)) with 10e9 uniform
randomly generated numbers (first column is the accuracy in ULP, with
'0' being correctly rounded, second is the number of samples with the
corresponding precision):

* range [1,21]

 * FE_TONEAREST
    0:       8931139411  89.31%
    1:       1068697545  10.69%
    2:           163044   0.00%
 * FE_UPWARD
    0:       7936620731  79.37%
    1:       2062594522  20.63%
    2:           783977   0.01%
    3:              770   0.00%
 * FE_DOWNWARD
    0:       7936459794  79.36%
    1:       2062734117  20.63%
    2:           805312   0.01%
    3:              777   0.00%
 * FE_TOWARDZERO
    0:       7910345595  79.10%
    1:       2088584522  20.89%
    2:          1069106   0.01%
    3:              777   0.00%

* Range [21, DBL_MAX)
 * FE_TONEAREST
    0:       5163888431  51.64%
    1:       4836111569  48.36%
 * FE_UPWARD
    0:       4835951885  48.36%
    1:       5164048115  51.64%
 * FE_DOWNWARD
    0:       5164048432  51.64%
    1:       4835951568  48.36%
 * FE_TOWARDZERO
    0:       5164058042  51.64%
    1:       4835941958  48.36%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master       patched   improvement
x86_64                      20.9131       47.2187      -125.79%
x86_64v2                    20.8823       41.1042       -96.84%
x86_64v3                    19.0282       25.8045       -35.61%
aarch64                     14.7419       18.1535       -23.14%
power10                     8.98341       11.0423       -22.92%

Latency                      master       patched   improvement
x86_64                      75.5494       89.5979      -18.60%
x86_64v2                    74.4443       87.6292      -17.71%
x86_64v3                    71.8558       70.7086        1.60%
aarch64                     30.3361       29.2709        3.51%
power10                     20.9263       19.2482        8.02%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

2025-10-27 09:34:04 -03:00

aarch64

AArch64: Use math-use-builtins for roundeven(f)/lrint(f)/lround(f)

2025-10-17 17:03:54 +00:00

alpha

alpha: Fix missing inexact-flag raising for lround/lrint

2025-09-11 14:48:00 -03:00

arc

math: Add fetestexcept internal alias

2025-09-11 14:46:07 -03:00

arm

arm: Add ARM VFPv4 VFMA instruction support in fma/fmaf (BZ 15503)