mirror of
https://sourceware.org/git/glibc.git
synced 2026-01-06 11:51:29 +03:00
The current implementation precision shows the following accuracy, on
two different ranges ([1,21) and [21, DBL_MAX)) with 10e9 uniform
randomly generated numbers (first column is the accuracy in ULP, with
'0' being correctly rounded, second is the number of samples with the
corresponding precision):
* range [1,21]
* FE_TONEAREST
0: 8931139411 89.31%
1: 1068697545 10.69%
2: 163044 0.00%
* FE_UPWARD
0: 7936620731 79.37%
1: 2062594522 20.63%
2: 783977 0.01%
3: 770 0.00%
* FE_DOWNWARD
0: 7936459794 79.36%
1: 2062734117 20.63%
2: 805312 0.01%
3: 777 0.00%
* FE_TOWARDZERO
0: 7910345595 79.10%
1: 2088584522 20.89%
2: 1069106 0.01%
3: 777 0.00%
* Range [21, DBL_MAX)
* FE_TONEAREST
0: 5163888431 51.64%
1: 4836111569 48.36%
* FE_UPWARD
0: 4835951885 48.36%
1: 5164048115 51.64%
* FE_DOWNWARD
0: 5164048432 51.64%
1: 4835951568 48.36%
* FE_TOWARDZERO
0: 5164058042 51.64%
1: 4835941958 48.36%
The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:
reciprocal-throughput master patched improvement
x86_64 20.9131 47.2187 -125.79%
x86_64v2 20.8823 41.1042 -96.84%
x86_64v3 19.0282 25.8045 -35.61%
aarch64 14.7419 18.1535 -23.14%
power10 8.98341 11.0423 -22.92%
Latency master patched improvement
x86_64 75.5494 89.5979 -18.60%
x86_64v2 74.4443 87.6292 -17.71%
x86_64v3 71.8558 70.7086 1.60%
aarch64 30.3361 29.2709 3.51%
power10 20.9263 19.2482 8.02%
For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>