math: Use erf from CORE-MATH

The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-11-05 08:10:46 +03:00 · 2025-10-10 15:15:29 -03:00
parent 1cae0550e8
commit 72a48e45bd
6 changed files with 1842 additions and 284 deletions
--- a/2
+++ b/2
@@ -249,6 +249,8 @@ core-math:
  sysdeps/ieee754/dbl-64/e_lgamma_r.c
  # src/binary64/asinh/asinh.c, revision fde815f8
  sysdeps/ieee754/dbl-64/s_asinh.c
+  # src/binary64/erf/erf.c, revision 384ed01d
+  sysdeps/ieee754/dbl-64/s_erf.c
  # src/binary32/acos/acosf.c, revision 56dd347
  sysdeps/ieee754/flt-32/e_acosf.c
  # src/binary32/acosh/acoshf.c, revision d0b9ddd
--- a/math/auto-libm-test-in
+++ b/math/auto-libm-test-in
@@ -5620,6 +5620,9 @@ erf -0x1.c975cap+0
 erf -0x1.e6a006p+0
 erf -0x1.4d32f4p-12

+erf 0x1.c5bf891b4ef6ap-1023
+erf -0x1.c5bf891b4ef6ap-1023
+
 erfc 0.0
 erfc -0
 erfc 0x1p-55
--- a/math/auto-libm-test-out-erf
+++ b/math/auto-libm-test-out-erf
@@ -3348,3 +3348,141 @@ erf -0x1.4d32f4p-12
 = erf tonearest ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa9p-12 : inexact-ok
 = erf towardzero ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa88p-12 : inexact-ok
 = erf upward ibm128 -0x1.4d32f4p-12 : -0x1.77f98ef609eb313046ceab3fa88p-12 : inexact-ok
+erf 0x1.c5bf891b4ef6ap-1023
+= erf downward binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf tonearest binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf towardzero binary32 0x8p-152 : 0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf upward binary32 0x8p-152 : 0x1p-148 : inexact-ok underflow errno-erange-ok
+= erf downward binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
+= erf tonearest binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
+= erf towardzero binary64 0x8p-152 : 0x9.06eba8214db68p-152 : inexact-ok
+= erf upward binary64 0x8p-152 : 0x9.06eba8214db7p-152 : inexact-ok
+= erf downward intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf tonearest intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf towardzero intel96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf upward intel96 0x8p-152 : 0x9.06eba8214db688ep-152 : inexact-ok
+= erf downward m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf tonearest m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf towardzero m68k96 0x8p-152 : 0x9.06eba8214db688dp-152 : inexact-ok
+= erf upward m68k96 0x8p-152 : 0x9.06eba8214db688ep-152 : inexact-ok
+= erf downward binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
+= erf tonearest binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
+= erf towardzero binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
+= erf upward binary128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
+= erf downward ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
+= erf tonearest ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
+= erf towardzero ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
+= erf upward ibm128 0x8p-152 : 0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
+= erf downward binary32 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest binary32 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero binary32 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward binary32 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward binary64 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest binary64 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero binary64 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward binary64 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward intel96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest intel96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero intel96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward intel96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward m68k96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest m68k96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero m68k96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward m68k96 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward binary128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest binary128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero binary128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward binary128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward ibm128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf tonearest ibm128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf towardzero ibm128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf upward ibm128 0x0p+0 : 0x0p+0 : inexact-ok
+= erf downward binary64 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf tonearest binary64 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf towardzero binary64 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf upward binary64 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf downward intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf tonearest intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf towardzero intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf upward intel96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf downward m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf tonearest m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf towardzero m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf upward m68k96 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf downward binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf tonearest binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf towardzero binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf upward binary128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffe85be7e3c4dd029cp-1024 : inexact-ok
+= erf downward ibm128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
+= erf tonearest ibm128 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow errno-erange-ok
+= erf towardzero ibm128 0x3.8b7f12369ded4p-1024 : 0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
+= erf upward ibm128 0x3.8b7f12369ded4p-1024 : 0x4p-1024 : inexact-ok underflow errno-erange-ok
+erf -0x1.c5bf891b4ef6ap-1023
+= erf downward binary32 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero binary32 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward binary32 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward binary64 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero binary64 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward binary64 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward intel96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest intel96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero intel96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward intel96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward binary128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest binary128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero binary128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward binary128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf tonearest ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf towardzero ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf upward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok
+= erf downward binary32 -0x8p-152 : -0x1p-148 : inexact-ok underflow errno-erange-ok
+= erf tonearest binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf towardzero binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf upward binary32 -0x8p-152 : -0x8p-152 : inexact-ok underflow errno-erange-ok
+= erf downward binary64 -0x8p-152 : -0x9.06eba8214db7p-152 : inexact-ok
+= erf tonearest binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
+= erf towardzero binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
+= erf upward binary64 -0x8p-152 : -0x9.06eba8214db68p-152 : inexact-ok
+= erf downward intel96 -0x8p-152 : -0x9.06eba8214db688ep-152 : inexact-ok
+= erf tonearest intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf towardzero intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf upward intel96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf downward m68k96 -0x8p-152 : -0x9.06eba8214db688ep-152 : inexact-ok
+= erf tonearest m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf towardzero m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf upward m68k96 -0x8p-152 : -0x9.06eba8214db688dp-152 : inexact-ok
+= erf downward binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
+= erf tonearest binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bffp-152 : inexact-ok
+= erf towardzero binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
+= erf upward binary128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bfe8p-152 : inexact-ok
+= erf downward ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
+= erf tonearest ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6cp-152 : inexact-ok
+= erf towardzero ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
+= erf upward ibm128 -0x8p-152 : -0x9.06eba8214db688d71d48a7f6bcp-152 : inexact-ok
+= erf downward binary64 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf tonearest binary64 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf towardzero binary64 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf upward binary64 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow-ok errno-erange-ok
+= erf downward intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf tonearest intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf towardzero intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf upward intel96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf downward m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf tonearest m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85cp-1024 : inexact-ok
+= erf towardzero m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf upward m68k96 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe858p-1024 : inexact-ok
+= erf downward binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029cp-1024 : inexact-ok
+= erf tonearest binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf towardzero binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf upward binary128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffe85be7e3c4dd029ap-1024 : inexact-ok
+= erf downward ibm128 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow errno-erange-ok
+= erf tonearest ibm128 -0x3.8b7f12369ded4p-1024 : -0x4p-1024 : inexact-ok underflow errno-erange-ok
+= erf towardzero ibm128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
+= erf upward ibm128 -0x3.8b7f12369ded4p-1024 : -0x3.ffffffffffffcp-1024 : inexact-ok underflow errno-erange-ok
--- a/sysdeps/i386/Makefile
+++ b/sysdeps/i386/Makefile
@@ -10,6 +10,7 @@ ifeq ($(subdir),math)
 # correctly rounded results.
 CFLAGS-e_lgamma_r.c += -fexcess-precision=standard
 CFLAGS-e_gamma_r.c += -fexcess-precision=standard
+CFLAGS-s_erf.c += -fexcess-precision=standard
 endif

 ifeq ($(subdir),gmon)
--- a/sysdeps/ieee754/dbl-64/libm-test-ulps
+++ b/sysdeps/ieee754/dbl-64/libm-test-ulps
@@ -35,6 +35,18 @@ double: 0
 Function: "atanh_upward":
 double: 0

+Function: "erf":
+double: 0
+
+Function: "erf_downward":
+double: 0
+
+Function: "erf_towardzero":
+double: 0
+
+Function: "erf_upward":
+double: 0
+
 Function: "lgamma":
 double: 0

--- a/sysdeps/ieee754/dbl-64/s_erf.c
+++ b/sysdeps/ieee754/dbl-64/s_erf.c