1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-11-05 08:10:46 +03:00

math: Use erf from CORE-MATH

The current implementation precision shows the following accuracy, on
three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with
10e9 uniform randomly generated numbers for each range (first column
is the accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MIN, -4.2]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-4.2, 4.2]
 * FE_TONEAREST
     0:       9764404513  97.64%
     1:        235595487   2.36%
 * FE_UPWARD
     0:       9468013928  94.68%
     1:        531986072   5.32%
 * FE_DOWNWARD
     0:       9493787693  94.94%
     1:        506212307   5.06%
 * FE_TOWARDZERO
     0:       9585271351  95.85%
     1:        414728649   4.15%

* Range [4.2, DBL_MAX]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master       patched   improvement
x86_64                      38.2754       78.0311      -103.87%
x86_64v2                    38.3325       75.7555       -97.63%
x86_64v3                    34.6604       28.3182        18.30%
aarch64                     23.1499       21.4307         7.43%
power10                     12.3051       9.3766         23.80%

Latency                      master       patched   improvement
x86_64                      84.3062      121.3580       -43.95%
x86_64v2                    84.1817      117.4250       -39.49%
x86_64v3                    81.0933       70.6458        12.88%
aarch64                      35.012       29.5012        15.74%
power10                     21.7205       18.4589        15.02%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
This commit is contained in:
Adhemerval Zanella
2025-10-10 15:15:29 -03:00
parent 1cae0550e8
commit 72a48e45bd
6 changed files with 1842 additions and 284 deletions

View File

@@ -249,6 +249,8 @@ core-math:
sysdeps/ieee754/dbl-64/e_lgamma_r.c
# src/binary64/asinh/asinh.c, revision fde815f8
sysdeps/ieee754/dbl-64/s_asinh.c
# src/binary64/erf/erf.c, revision 384ed01d
sysdeps/ieee754/dbl-64/s_erf.c
# src/binary32/acos/acosf.c, revision 56dd347
sysdeps/ieee754/flt-32/e_acosf.c
# src/binary32/acosh/acoshf.c, revision d0b9ddd

View File

@@ -5620,6 +5620,9 @@ erf -0x1.c975cap+0
erf -0x1.e6a006p+0
erf -0x1.4d32f4p-12
erf 0x1.c5bf891b4ef6ap-1023
erf -0x1.c5bf891b4ef6ap-1023
erfc 0.0
erfc -0
erfc 0x1p-55

View File

@@ -3348,3 +3348,141 @@ erf -0x1.4d32f4p-12
= erf tonearest ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa9p-12 : inexact-ok
= erf towardzero ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa88p-12 : inexact-ok
= erf upward ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa88p-12 : inexact-ok
erf 0x1.c5bf891b4ef6ap-1023
= erf downward binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
= erf tonearest binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
= erf towardzero binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
= erf upward binary32 0x8p-152 : 0x1p-148 : inexact-ok underflow errno-erange-ok
= erf downward binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
= erf tonearest binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
= erf towardzero binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
= erf upward binary64 0x8p-152 : 0x9.06eba8214db7p-152 : inexact-ok
= erf downward intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf tonearest intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf towardzero intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf upward intel96 0x8p-152 : 0x9.06eba8214db688ep-152 : inexact-ok
= erf downward m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf tonearest m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf towardzero m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
= erf upward m68k96 0x8p-152 : 0x9.06eba8214db688ep-152 : inexact-ok
= erf downward binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
= erf tonearest binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
= erf towardzero binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
= erf upward binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
= erf downward ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
= erf tonearest ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
= erf towardzero ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
= erf upward ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
= erf downward binary32 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest binary32 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero binary32 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward binary32 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward binary64 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest binary64 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero binary64 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward binary64 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward intel96 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest intel96 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero intel96 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward intel96 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward m68k96 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest m68k96 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero m68k96 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward m68k96 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward binary128 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest binary128 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero binary128 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward binary128 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward ibm128 0x0p+0 : 0x0p+0 : inexact-ok
= erf tonearest ibm128 0x0p+0 : 0x0p+0 : inexact-ok
= erf towardzero ibm128 0x0p+0 : 0x0p+0 : inexact-ok
= erf upward ibm128 0x0p+0 : 0x0p+0 : inexact-ok
= erf downward binary64 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
= erf tonearest binary64 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
= erf towardzero binary64 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
= erf upward binary64 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
= erf downward intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
= erf tonearest intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf towardzero intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
= erf upward intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf downward m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
= erf tonearest m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf towardzero m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
= erf upward m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf downward binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf tonearest binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf towardzero binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf upward binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029cp-1024 : inexact-ok
= erf downward ibm128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
= erf tonearest ibm128 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow errno-erange-ok
= erf towardzero ibm128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
= erf upward ibm128 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow errno-erange-ok
erf -0x1.c5bf891b4ef6ap-1023
= erf downward binary32 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero binary32 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward binary32 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward binary64 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero binary64 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward binary64 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward intel96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest intel96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero intel96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward intel96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward binary128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest binary128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero binary128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward binary128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf tonearest ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf towardzero ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf upward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
= erf downward binary32 -0x8p-152 : -0x1p-148 : inexact-ok underflow errno-erange-ok
= erf tonearest binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
= erf towardzero binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
= erf upward binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
= erf downward binary64 -0x8p-152 : -0x9.06eba8214db7p-152 : inexact-ok
= erf tonearest binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
= erf towardzero binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
= erf upward binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
= erf downward intel96 -0x8p-152 : -0x9.06eba8214db688ep-152 : inexact-ok
= erf tonearest intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf towardzero intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf upward intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf downward m68k96 -0x8p-152 : -0x9.06eba8214db688ep-152 : inexact-ok
= erf tonearest m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf towardzero m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf upward m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
= erf downward binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
= erf tonearest binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
= erf towardzero binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
= erf upward binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
= erf downward ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
= erf tonearest ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
= erf towardzero ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
= erf upward ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
= erf downward binary64 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
= erf tonearest binary64 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
= erf towardzero binary64 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
= erf upward binary64 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
= erf downward intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf tonearest intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf towardzero intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
= erf upward intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
= erf downward m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf tonearest m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
= erf towardzero m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
= erf upward m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
= erf downward binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029cp-1024 : inexact-ok
= erf tonearest binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf towardzero binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf upward binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
= erf downward ibm128 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow errno-erange-ok
= erf tonearest ibm128 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow errno-erange-ok
= erf towardzero ibm128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
= erf upward ibm128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok

View File

@@ -10,6 +10,7 @@ ifeq ($(subdir),math)
# correctly rounded results.
CFLAGS-e_lgamma_r.c += -fexcess-precision=standard
CFLAGS-e_gamma_r.c += -fexcess-precision=standard
CFLAGS-s_erf.c += -fexcess-precision=standard
endif
ifeq ($(subdir),gmon)

View File

@@ -35,6 +35,18 @@ double: 0
Function: "atanh_upward":
double: 0
Function: "erf":
double: 0
Function: "erf_downward":
double: 0
Function: "erf_towardzero":
double: 0
Function: "erf_upward":
double: 0
Function: "lgamma":
double: 0

File diff suppressed because it is too large Load Diff