1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-12-24 17:51:17 +03:00
Commit Graph

1613 Commits

Author SHA1 Message Date
Adhemerval Zanella
3078358ac6 math: Remove the SVID error handling from tgammaf
It improves latency for about 1.5% and throughput for about 2-4%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 10:19:37 -03:00
Adhemerval Zanella
de0e623434 math: Remove the SVID error handling from lgammaf/lgammaf_r
It improves latency throughput for about 2%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 09:27:07 -03:00
Adhemerval Zanella
7ec8eb5676 math: Remove the SVID error handling from atan2f
It improves latency for about 3-6% and throughput for about 5-12%.

Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-05 07:15:52 -03:00
Joseph Myers
26e4810210 Rename fromfp files in preparation for changing types for C23
As discussed in bug 28327, the fromfp functions changed type in C23
(compared to the version in TS 18661-1); they now return the same type
as the floating-point argument, instead of intmax_t / uintmax_t.

As with other such incompatible changes compared to the initial TS
18661 versions of interfaces (the types of totalorder functions, in
particular), it seems appropriate to support only the new version as
an API, not the old one (although many programs written for the old
API might in fact work wtih the new one as well).  Thus, the existing
implementations should become compat symbols.  They are sufficiently
different from how I'd expect to implement the new version that using
separate implementations in separate files is more convenient than
trying to share code, and directly sharing testcases would be
problematic as well.

Rename the existing fromfp implementation and test files to names
reflecting how they're intended to become compat symbols, so freeing
up the existing filenames for a subsequent implementation of the C23
versions of these functions (which is the point at which the existing
implementations would actually become compat symbols).

gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I
think it will make sense to adapt the test generator to be able to
generate most tests for both versions of the functions (with extra
test inputs added that are only of interest with the C23 version).
The ldbl-opt/nldbl-* files are also not renamed; since those are for a
static only library, no compat versions are needed, and they'll just
have their contents changed when the C23 version is implemented.

Tested for x86_64, and with build-many-glibcs.py.
2025-11-04 23:41:35 +00:00
Joseph Myers
26d11a0944 Add C23 long_double_t, _FloatN_t
C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t
(originally introduced in TS 18661-3), analogous to float_t and
double_t.  Add these typedefs to glibc.  (There are no _FloatNx_t
typedefs.)

C23 also slightly changes the rules for how such typedef names should
be defined, compared to the definition in TS 18661-3.  In both cases,
<TYPE>_t corresponds to the evaluation format for <TYPE>, as specified
by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal
__GLIBC_FLT_EVAL_METHOD).  Specifically, each FLT_EVAL_METHOD value
corresponds to some type U (for example, 64 corresponds to U =
_Float64), and for types with exactly the same set of values as U, TS
18661-3 says expressions with those types are to be evaluated to the
range and precision of type U (so <TYPE>_t is defined to U), whereas
C23 only does that for types whose values are a strict subset of those
of type U (so <TYPE>_t is defined to <TYPE>).

As with other cases where semantics changed between TS 18661 and C23,
this patch only implements the newer version of the semantics
(including adjusting existing definitions of float_t and double_t as
needed).  The new semantics are contradictory between the main
standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the
choice of double_t when double and long double have the same values
(the main standard says it's defined as long double in that case,
whereas Annex H would define it as double), which I've raised on the
WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double
and long double have the same values is a fairly theoretical
combination of features); for now glibc follows the value in the main
standard in that case.

Note that I think all existing GCC targets supported by glibc only use
values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header
code is somewhat theoretical, though potentially relevant with other
compilers since the choice of FLT_EVAL_METHOD is only an API choice,
not an ABI one; it can vary with compiler options, and these typedefs
should not be used in ABIs).  The testcase (expanded to cover the new
typedefs) is really just repeating the same logic in a second place
(so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent
with FLT_EVAL_METHOD).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-11-04 17:12:00 +00:00
Adhemerval Zanella
0dfc849eff math: Remove the SVID error handling wrapper from sqrt
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.

The PowerPC optimization for PPC 601/603 (30 years old) is removed.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella
f27a146409 math: Remove the SVID error handling from sinhf
It improves latency for about 3-10% and throughput for about 5-15%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella
0e1a1178ee math: Remove the SVID error handling from remainder
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version.  The performance
on a Zen3 chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput           input    master   NO-SVID  improvement
x86_64                     subnormals   18.8522   16.2506       13.80%
x86_64                         normal  421.8260  403.9270        4.24%
x86_64                 close-exponent   21.0579   18.7642       10.89%
i686                       subnormals   21.3443   21.4229       -0.37%
i686                           normal  525.8380   538.807       -2.47%
i686                   close-exponent   21.6589   21.7983       -0.64%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Adhemerval Zanella
c4c6c79d70 math: Remove the SVID error handling from remainderf
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin.  This optimization enables us to
migrate the implementation to a C version.  The performance on a Zen3
chip is similar to the SVID one.

The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).

Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):

reciprocal-throughput          input   master  NO-SVID  improvement
x86_64                    subnormals  17.5349  15.6125       10.96%
x86_64                        normal  53.8134  52.5754        2.30%
x86_64                close-exponent  20.0211  18.6656        6.77%
i686                      subnormals  21.8105  20.1856        7.45%
i686                          normal  73.1945  71.2199        2.70%
i686                  close-exponent  22.2141   20.331        8.48%

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-11-04 04:14:01 -03:00
Wilco Dijkstra
1136c036a3 math: Remove xfail from pow test [BZ #33563]
Remove xfail from pow testcase since pow and powf have been fixed.
Also check float128 maximum value.  See BZ #33563.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-31 19:13:53 +00:00
Adhemerval Zanella
ee946212fe math: Remove the SVID error handling wrapper from yn/jn
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:35 -03:00
Adhemerval Zanella
8d4815e6d7 math: Remove the SVID error handling wrapper from y1/j1
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:33 -03:00
Adhemerval Zanella
b050cb53b0 math: Remove the SVID error handling wrapper from y0/j0
Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:31 -03:00
Adhemerval Zanella
03eeeba705 math: Remove the SVID error handling from coshf
It improves latency for about 3-10% and throughput for about 5-15%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:28 -03:00
Adhemerval Zanella
555c39c0fc math: Remove the SVID error handling from atanhf
It improves latency for about 1-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:26 -03:00
Adhemerval Zanella
8facb464b4 math: Remove the SVID error handling from acoshf
It improves latency for about 3-7% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:24 -03:00
Adhemerval Zanella
f92aba68bc math: Remove the SVID error handling from asinf
It improves latency for about 2% and throughput for about 5%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:22 -03:00
Adhemerval Zanella
9f8dea5b5d math: Remove the SVID error handling from acosf
It improves latency for about 2-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:20 -03:00
Adhemerval Zanella
0b484d7b77 math: Remove the SVID error handling from log10f
It improves latency for about 3-10% and throughput for about 5-10%.

Tested on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-30 15:41:17 -03:00
Adhemerval Zanella
e4d812c980 math: Consolidate erf/erfc definitions
The common code definitions are consolidated in s_erf_common.h
and s_erf_common.c.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:46:01 -03:00
Adhemerval Zanella
fc419290f9 math: Consolidate internal erf/erfc tables
The shared internal data definitions are consolidated in
s_erf_data.c and the erfc only one are moved to s_erfc_data.c.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
acaad9ab06 math: Use erfc from CORE-MATH
The current implementation precision shows the following accuracy, on
three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform
randomly generated numbers for each range (first column is the
accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MAX, -5]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-5, 5]
 * FE_TONEAREST
     0:       8069309665  80.69%
     1:       1882910247  18.83%
     2:         47485296   0.47%
     3:           293749   0.00%
     4:             1043   0.00%
 * FE_UPWARD
     0:       5540301026  55.40%
     1:       2026739127  20.27%
     2:       1774882486  17.75%
     3:        567324466   5.67%
     4:         86913847   0.87%
     5:          3820789   0.04%
     6:            18259   0.00%
 * FE_DOWNWARD
     0:       5520969586  55.21%
     1:       2057293099  20.57%
     2:       1778334818  17.78%
     3:        557521494   5.58%
     4:         82473927   0.82%
     5:          3393276   0.03%
     6:            13800   0.00%
 * FE_TOWARDZERO
     0:       6220287175  62.20%
     1:       2323846149  23.24%
     2:       1251999920  12.52%
     3:        190748245   1.91%
     4:         12996232   0.13%
     5:           122279   0.00%

* Range [5, DBL_MAX]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                      49.0980       267.0660      -443.94%
x86_64v2                    49.3220       257.6310      -422.34%
x86_64v3                    42.9539        84.9571       -97.79%
aarch64                     28.7266        52.9096       -84.18%
power10                     14.1673        25.1273       -77.36%

Latency                      master        patched   improvement
x86_64                      95.6640       269.7060      -181.93%
x86_64v2                    95.8296       260.4860      -171.82%
x86_64v3                    91.1658       112.7150       -23.64%
aarch64                     37.0745        58.6791       -58.27%
power10                     23.3197        31.5737       -35.39%

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
72a48e45bd math: Use erf from CORE-MATH
The current implementation precision shows the following accuracy, on
three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with
10e9 uniform randomly generated numbers for each range (first column
is the accuracy in ULP, with '0' being correctly rounded, second is the
number of samples with the corresponding precision):

* Range [-DBL_MIN, -4.2]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

* Range [-4.2, 4.2]
 * FE_TONEAREST
     0:       9764404513  97.64%
     1:        235595487   2.36%
 * FE_UPWARD
     0:       9468013928  94.68%
     1:        531986072   5.32%
 * FE_DOWNWARD
     0:       9493787693  94.94%
     1:        506212307   5.06%
 * FE_TOWARDZERO
     0:       9585271351  95.85%
     1:        414728649   4.15%

* Range [4.2, DBL_MAX]
 * FE_TONEAREST
     0:      10000000000 100.00%
 * FE_UPWARD
     0:      10000000000 100.00%
 * FE_DOWNWARD
     0:      10000000000 100.00%
 * FE_TOWARDZERO
     0:      10000000000 100.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master       patched   improvement
x86_64                      38.2754       78.0311      -103.87%
x86_64v2                    38.3325       75.7555       -97.63%
x86_64v3                    34.6604       28.3182        18.30%
aarch64                     23.1499       21.4307         7.43%
power10                     12.3051       9.3766         23.80%

Latency                      master       patched   improvement
x86_64                      84.3062      121.3580       -43.95%
x86_64v2                    84.1817      117.4250       -39.49%
x86_64v3                    81.0933       70.6458        12.88%
aarch64                      35.012       29.5012        15.74%
power10                     21.7205       18.4589        15.02%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
1cae0550e8 math: Use tgamma from CORE-MATH
The current implementation precision shows the following accuracy, on
one range ([-20,20]) with 10e9 uniform randomly generated numbers for
each range (first column is the accuracy in ULP, with '0' being
correctly rounded, second is the number of samples with the
corresponding precision):

* Range [-20,20]
 * FE_TONEAREST
     0:       4504877808  45.05%
     1:       4402224940  44.02%
     2:        947652295   9.48%
     3:        131076831   1.31%
     4:         13222216   0.13%
     5:           910045   0.01%
     6:            35253   0.00%
     7:              606   0.00%
     8:                6   0.00%
 * FE_UPWARD
     0:       3477307921  34.77%
     1:       4838637866  48.39%
     2:       1413942684  14.14%
     3:        240762564   2.41%
     4:         27113094   0.27%
     5:          2130934   0.02%
     6:           102599   0.00%
     7:             2324   0.00%
     8:               14   0.00%
 * FE_DOWNWARD
     0:       3923545410  39.24%
     1:       4745067290  47.45%
     2:       1137899814  11.38%
     3:        171596912   1.72%
     4:         20013805   0.20%
     5:          1773899   0.02%
     6:            99911   0.00%
     7:             2928   0.00%
     8:               31   0.00%
 * FE_TOWARDZERO
     0:       3697160741  36.97%
     1:       4731951491  47.32%
     2:       1303092738  13.03%
     3:        231969191   2.32%
     4:         32344517   0.32%
     5:          3283092   0.03%
     6:           193010   0.00%
     7:             5175   0.00%
     8:               45   0.00%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                     237.7960       175.4090        26.24%
x86_64v2                   232.9320       163.4460        29.83%
x86_64v3                   193.0680        89.7721        53.50%
aarch64                    113.6340        56.7350        50.07%
power10                     92.0617        26.6137        71.09%

Latency                      master        patched   improvement
x86_64                     266.7190       208.0130        22.01%
x86_64v2                   263.6070       200.0280        24.12%
x86_64v3                   214.0260       146.5180        31.54%
aarch64                    114.4760        58.5235        48.88%
power10                     84.3718        35.7473        57.63%

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
d67d2f4688 math: Use lgamma from CORE-MATH
The current implementation precision shows the following accuracy, on
one range ([-1,1]) with 10e9 uniform randomly generated numbers for
each range (first column is the accuracy in ULP, with '0' being
correctly rounded, second is the number of samples with the
corresponding precision):

* Range [-20, 20]
 * FE_TONEAREST
     0:       6701254075  67.01%
     1:       3230897408  32.31%
     2:         63986940   0.64%
     3:          3605417   0.04%
     4:           233189   0.00%
     5:            20973   0.00%
     6:             1869   0.00%
     7:              125   0.00%
     8:                4   0.00%
 * FE_UPWARDA
     0:       4207428861  42.07%
     1:       5001137116  50.01%
     2:        740542213   7.41%
     3:         49116304   0.49%
     4:          1715617   0.02%
     5:            54464   0.00%
     6:             4956   0.00%
     7:              451   0.00%
     8:               16   0.00%
     9:                2   0.00%
 * FE_DOWNWARD
     0:       4155925193  41.56%
     1:       4989821364  49.90%
     2:        770312796   7.70%
     3:         72014726   0.72%
     4:         11040522   0.11%
     5:           872811   0.01%
     6:            12480   0.00%
     7:              106   0.00%
     8:                2   0.00%
 * FE_TOWARDZERO
     0:       4225861532  42.26%
     1:       5027051105  50.27%
     2:        706443411   7.06%
     3:         39877908   0.40%
     4:           713109   0.01%
     5:            47513   0.00%
     6:             4961   0.00%
     7:              438   0.00%
     8:               23   0.00%

* Range [20, 0x5.d53649e2d4674p+1012]
 * FE_TONEAREST
     0:       7262241995  72.62%
     1:       2737758005  27.38%
 * FE_UPWARD
     0:       4690392401  46.90%
     1:       5143728216  51.44%
     2:        165879383   1.66%
 * FE_DOWNWARD
     0:       4690333331  46.90%
     1:       5143794937  51.44%
     2:        165871732   1.66%
 * FE_TOWARDZERO
     0:       4690343071  46.90%
     1:       5143786761  51.44%
     2:        165870168   1.66%

The CORE-MATH implementation is correctly rounded for any rounding mode.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows:

reciprocal-throughput        master        patched   improvement
x86_64                     112.9740       135.8640       -20.26%
x86_64v2                   111.8910       131.7590       -17.76%
x86_64v3                   108.2800        68.0935        37.11%
aarch64                     61.3759        49.2403        19.77%
power10                     42.4483        24.1943        43.00%

Latency                      master        patched   improvement
x86_64                     144.0090       167.9750       -16.64%
x86_64v2                   139.2690       167.1900       -20.05%
x86_64v3                   130.1320        96.9347        25.51%
aarch64                     66.8538        53.2747        20.31%
power10                     49.5076        29.6917        40.03%

For x86_64/x86_64-v2, most performance hit came from the fma call
through the ifunc mechanism.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
140e802cb3 math: Move atanh internal data to separate file
The internal data definitions are moved to s_atanh_data.c.
It helps on ABIs that build the implementation multiple times for
ifunc optimizations, like x86_64.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Adhemerval Zanella
cb8d1575b6 math: Consolidate acosh and asinh internal table
The shared internal data definitions are consolidated in
s_asincosh_data.c.

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27 09:34:04 -03:00
Paul Zimmermann
48fde7b026 various fixes detected with -Wdouble-promotion
Changes with respect to v1:
- added comment in e_j1f.c to explain the use of float is enough
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-22 12:35:40 +02:00
Siddhesh Poyarekar
1b657c53c2 Simplify powl computation for small integral y [BZ #33411]
The powl implementation for x86_64 ends up multiplying X once more than
necessary and then throwing away that result.  This results in an
overflow flag being set in cases where there is no overflow.

Simplify the relevant portion by special casing the -3 to 3 range and
simply multiplying repetitively.

Resolves: BZ #33411
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-21 14:00:10 -04:00
Adhemerval Zanella
0e4ca88bd2 math: Fix compare sort function on compoundn
To use the fabs function to the used type, instead of the double
variant.  it fixes a build issue with clang:

./s_compoundn_template.c:64:14: error: absolute value function 'fabs' given an argument of type 'const long double' but has parameter of type 'double' which may cause truncation of value [-Werror,-Wabsolute-value]
   64 |   FLOAT pd = fabs (*(const FLOAT *) p);
      |              ^
./s_compoundn_template.c:64:14: note: use function 'fabsl' instead
   64 |   FLOAT pd = fabs (*(const FLOAT *) p);
      |              ^~~~
      |              fabsl

Reviewed-by: Collin Funk <collin.funk1@gmail.com>
2025-10-21 09:27:05 -03:00
Adhemerval Zanella
b9b28ce35f math: Suppress more aliases builtin type conflicts
Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:26:02 -03:00
Adhemerval Zanella
39bf95c1ba math: Suppress clang -Wabsolute-value warning on math_check_force_underflow
clang warns:

  ../sysdeps/x86/fpu/powl_helper.c:233:3: error: absolute value function
  '__builtin_fabsf' given an argument of type 'typeof (res)' (aka 'long
  double') but has parameter of type 'float' which may cause truncation of
  value [-Werror,-Wabsolute-value]
    math_check_force_underflow (res);
    ^
  ./math-underflow.h:45:11: note: expanded from macro
  'math_check_force_underflow'
        if (fabs_tg (force_underflow_tmp)                         \
            ^
  ./math-underflow.h:27:20: note: expanded from macro 'fabs_tg'
  #define fabs_tg(x) __MATH_TG ((x), (__typeof (x)) __builtin_fabs, (x))
                     ^
  ../math/math.h:899:16: note: expanded from macro '__MATH_TG'
                 float: FUNC ## f ARGS,           \
                        ^
  <scratch space>:73:1: note: expanded from here
  __builtin_fabsf
  ^

Due the use of _Generic from TG_MATH.

Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:24:21 -03:00
Adhemerval Zanella
850d93f514 math: Use binary search on lgammaf slow path
And remove some unused entries of the fallback table.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:08 -03:00
Adhemerval Zanella
ae49afe74d math: Optimize fma call on log2pf1
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST
to provide correctly rounded results.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:00 -03:00
Adhemerval Zanella
82a4f50b4e math: Optimize fma call on asinpif
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO
to provide correctly rounded results.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:11:56 -03:00
Adhemerval Zanella
1c459af1ee math: Update auto-libm-test-out-log2p1
The 0797283910 did not update log2p1 output with the newer values.
2025-10-14 08:46:06 -03:00
Luna Lamb
653e6c4fff AArch64: Implement AdvSIMD and SVE log10p1(f) routines
Vector variants of the new C23 log10p1 routines.

Note: Benchmark inputs for log10p1(f) are identical to log1p(f)

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:45:59 +00:00
Luna Lamb
db42732474 AArch64: Implement AdvSIMD and SVE log2p1(f) routines
Vector variants of the new C23 log2p1 routines.

Note: Benchmark inputs for log2p1(f) are identical to log1p(f).

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:44:09 +00:00
Adhemerval Zanella
63ba1a1509 math: Add fetestexcept internal alias
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella
2eb8836de7 math: Add feclearexcept internal alias
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Hasaan Khan
8ced7815fb AArch64: Implement exp2m1 and exp10m1 routines
Vector variants of the new C23 exp2m1 & exp10m1 routines.

Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10
respectively, this also includes the floating point variations.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-02 16:50:24 +00:00
Adhemerval Zanella
6ab36c4e6d math: Update auto-libm-tests-in with ldbl-128ibm compoundn/pown failures
It fixes ce488f7c16 which updated
the out files without using gen-auto-libm-tests.c instructions.

Checked on powerpc64le-linux-gnu.

Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-07-28 13:58:54 -03:00
Sachin Monga
ce488f7c16 math: xfail some pown and compoundn tests for ibm128-libgcc
On powerpc math/test-ibm128-pown shows below failures:

testing long double (without inline functions)
infinity has wrong sign.
Failure: Test: pown_downward (-inf, 0x7fffffffffffffffLL)
Result:
 is:          inf   inf
 should be:  -inf  -inf
Failure: Test: pown_downward (-0, 9223372036854775807LL)
Result:
 is:          0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 ulp       :  0.0000
 max.ulp   :  16.0000
Failure: pown_downward (-0x1p+0, 9223372036854775807LL): Exception "Invalid operation" set
Failure: pown_downward (-0x1p+0, 9223372036854775807LL): errno set to 34, expected 0 (unchanged)
Failure: Test: pown_downward (-0x1p+0, 9223372036854775807LL)
Result:
 is:         qNaN
 should be:  -1.00000000000000000000000000000000e+00  -0x1.000000000000000000000000000p+0
infinity has wrong sign.
Failure: Test: pown_towardzero (-0, -0x7fffffffffffffffLL)
Result:
 is:          inf   inf
 should be:  -inf  -inf
infinity has wrong sign.
Failure: Test: pown_towardzero (-inf, 0x7fffffffffffffffLL)
Result:
 is:          inf   inf
 should be:  -inf  -inf
Failure: Test: pown_towardzero (-inf, -0x7fffffffffffffffLL)
Result:
 is:          0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 ulp       :  0.0000
 max.ulp   :  16.0000
Failure: Test: pown_towardzero (-0, 9223372036854775807LL)
Result:
 is:          0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 ulp       :  0.0000
 max.ulp   :  16.0000
Failure: pown_towardzero (-0x1p+0, -9223372036854775807LL): Exception "Invalid operation" set
Failure: pown_towardzero (-0x1p+0, -9223372036854775807LL): errno set to 34, expected 0 (unchanged)
Failure: Test: pown_towardzero (-0x1p+0, -9223372036854775807LL)
Result:
 is:         qNaN
 should be:  -1.00000000000000000000000000000000e+00  -0x1.000000000000000000000000000p+0
Failure: pown_towardzero (-0x1p+0, 9223372036854775807LL): Exception "Invalid operation" set
Failure: pown_towardzero (-0x1p+0, 9223372036854775807LL): errno set to 34, expected 0 (unchanged)
Failure: Test: pown_towardzero (-0x1p+0, 9223372036854775807LL)
Result:
 is:         qNaN
 should be:  -1.00000000000000000000000000000000e+00  -0x1.000000000000000000000000000p+0
infinity has wrong sign.
Failure: Test: pown_upward (-0, -0x7fffffffffffffffLL)
Result:
 is:          inf   inf
 should be:  -inf  -inf
Failure: Test: pown_upward (-inf, -0x7fffffffffffffffLL)
Result:
 is:          0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 ulp       :  0.0000
 max.ulp   :  16.0000
Failure: pown_upward (-0x1p+0, -9223372036854775807LL): Exception "Invalid operation" set
Failure: pown_upward (-0x1p+0, -9223372036854775807LL): errno set to 34, expected 0 (unchanged)
Failure: Test: pown_upward (-0x1p+0, -9223372036854775807LL)
Result:
 is:         qNaN
 should be:  -1.00000000000000000000000000000000e+00  -0x1.000000000000000000000000000p+0

Likewise, math/test-ibm128-compoundn shows below failure:

testing long double (without inline functions)
Failure: compoundn_upward (0xf.ffffffffffff8p+1020, 1LL): Exception "Overflow" set
Failure: compoundn_upward (0xf.ffffffffffff8p+1020, 1LL): errno set to 34, expected 0 (unchanged)
Failure: Test: compoundn_upward (0xf.ffffffffffff8p+1020, 1LL)
Result:
 is:          inf   inf
 should be:   1.79769313486231570814527423731707e+308   0x1.fffffffffffff00000000000008p+1023

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-07-24 19:36:21 +02:00
Carlos O'Donell
801d566dde gen-libm-test: Use 'original source' instead of 'master' in code.
Use more inclusive language in generated sources.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-05-21 12:48:00 -04:00
Dylan Fleming
96abd59bf2 AArch64: Implement AdvSIMD and SVE atan2pi/f
Implement double and single precision variants of the C23 routine atan2pi
for both AdvSIMD and SVE.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-05-19 15:35:25 +00:00
Dylan Fleming
edf6202815 AArch64: Implement AdvSIMD and SVE atanpi/f
Implement double and single precision variants of the C23 routine atanpi
for both AdvSIMD and SVE.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-05-19 15:34:40 +00:00
Dylan Fleming
0ef2cf44e7 AArch64: Implement AdvSIMD and SVE asinpi/f
Implement double and single precision variants of the C23 routine asinpi
for both AdvSIMD and SVE.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-05-19 15:33:50 +00:00
Dylan Fleming
993997ca1b AArch64: Implement AdvSIMD and SVE acospi/f
Implement double and single precision variants of the C23 routine acospi
for both AdvSIMD and SVE.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-05-19 15:31:59 +00:00
Joseph Myers
06caf53adf Implement C23 rootn.
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the rootn functions, which compute the Yth root of X for
integer Y (with a domain error if Y is 0, even if X is a NaN).  The
integer exponent has type long long int in C23; it was intmax_t in TS
18661-4, and as with other interfaces changed after their initial
appearance in the TS, I don't think we need to support the original
version of the interface.

As with pown and compoundn, I strongly encourage searching for worst
cases for ulps error for these implementations (necessarily
non-exhaustively, given the size of the input space).  I also expect a
custom implementation for a given format could be much faster as well
as more accurate, although the implementation is simpler than those
for pown and compoundn.

This completes adding to glibc those TS 18661-4 functions (ignoring
DFP) that are included in C23.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23
mathematical functions (not just the TS 18661-4 ones) missing built-in
functions in GCC, where such functions might usefully be added.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-05-14 10:51:46 +00:00
Joseph Myers
ae31254432 Implement C23 compoundn
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1).  The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.

Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).

As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space).  I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.

As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-05-09 15:17:27 +00:00