1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-11-02 09:33:31 +03:00
Commit Graph

1339 Commits

Author SHA1 Message Date
Paul Zimmermann
48fde7b026 various fixes detected with -Wdouble-promotion
Changes with respect to v1:
- added comment in e_j1f.c to explain the use of float is enough
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-22 12:35:40 +02:00
Adhemerval Zanella
9d0b7ec87c math: Suppress clang -Wincompatible-library-redeclaration on s_llround
Clang issues:

  ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: error: incompatible
  redeclaration of library function 'lround'
  [-Werror,-Wincompatible-library-redeclaration]
  libm_alias_double (__lround, lround)
                               ^
  ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: note: 'lround' is a builtin
  with type 'long (double)'

Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:24:27 -03:00
Adhemerval Zanella
407b2eea75 math: use fabs on __ieee754_lgamma_r
clang issues:

  ../sysdeps/ieee754/dbl-64/e_lgamma_r.c:234:29: error: absolute value function 'fabsf'
  given an argument of type 'double' but has parameter of type 'float' which may cause \
  truncation of value [-Werror,-Wabsolute-value]

It should not matter because the value is 0.0, but using fabs is
simpler than adding a warning suppresion.

Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:24:24 -03:00
Adhemerval Zanella
76dfd91275 Suppress -Wmaybe-uninitialized only for gcc
The warning is not supported by clang.

Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21 09:24:05 -03:00
Wilco Dijkstra
35807cc5cd math: Add builtin support for (l)lround(f)
Add builtin support for (l)lround(f) via the math-use-builtins
header mechanism.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-17 17:03:54 +00:00
Adhemerval Zanella
850d93f514 math: Use binary search on lgammaf slow path
And remove some unused entries of the fallback table.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:08 -03:00
Adhemerval Zanella
6610a293b3 math: Use stdbit.h instead of builtin in math_config.h
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:04 -03:00
Adhemerval Zanella
ae49afe74d math: Optimize fma call on log2pf1
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST
to provide correctly rounded results.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:00 -03:00
Adhemerval Zanella
82a4f50b4e math: Optimize fma call on asinpif
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO
to provide correctly rounded results.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:11:56 -03:00
Adhemerval Zanella
fab32b6526 math: Remove erfcf fma usage
The fma is not required to provide correctly rounded and it helps
on !__FP_FAST_FMA ISAs.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella
68cb78eccc math: Remove asinhf fma usage
The fma is not required to provide correctly rounded and it helps
on !__FP_FAST_FMA ISAs.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella
c075ff00a6 math: Optimize fma call on acospif
The fma is required only for inputs less than 0x1.0fd288p-127.  Also
only add the extra check for !__FP_FAST_FMA targets.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella
c9d9336f50 math: Remove acoshf fma usage
The fma is not strickly required to provide correctly rounded and
it helps on !__FP_FAST_FMA ABIs.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Paul Zimmermann
ea5b996be9 replace use of double by float [BZ#29326] 2025-10-14 09:46:00 +02:00
Adhemerval Zanella
61ac7c6a75 math: Optimize flt-32 remainder implementation
With same micro-optimization done for the double variant:

  * Combine the |y| zero check.
  * Rework the check to adjust result and call fmod.
  * Remove one check after fmod.
  * Remove float-int-float roundtrip on return.

Also use math_config.h macros and indent the code.  The resulting
strategy is different in many places that I think requires a
different Copyright.

I see the following performance improvements using remainder benchtests
(using reciprocal-throughput metric):

Architecture     | Input           |   master |   patch  | Improvemnt
-----------------|-----------------|----------|-----------------------
x86_64           | subnormals      |  20.4176 |  19.6144 |      3.93%
x86_64           | normal          |  54.0939 |  52.2343 |      3.44%
x86_64           | close-exponent  |  23.9120 |  22.3768 |      6.42%
aarch64          | subnormals      |   9.2423 |   8.3825 |      9.30%
aarch64          | normal          |  30.5393 |   29.244 |      4.24%
aarch64          | close-exponent  |  15.5405 |  13.9256 |     10.39%

The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was
a AMD Ryzen 9 5900X, gcc 15.2.1.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:44 -03:00
Adhemerval Zanella
f0facb2d27 math: Optimize dbl-64 remainder implementation
The commit 34b9f8bc17 provides an optimized fmod implementation; use
the same strategy used for remainderf and implement the double variant
on top of fmod.

I see the following performance improvements using remainder benchtests
(using reciprocal-throughput metric):

Architecture     | Input           |   master |   patch  | Improvemnt
-----------------|-----------------|----------|-----------------------
x86_64           | subnormals      |  76.1345 |  21.5334 |     71.72%
x86_64           | normal          | 553.2670 | 426.5670 |     22.90%
x86_64           | close-exponent  |  30.5111 |  22.6893 |     25.64%
aarch64          | subnormals      |  26.0734 |   8.4876 |     67.45%
aarch64          | normal          | 205.2590 |  200.082 |      2.52%
aarch64          | close-exponent  |  13.8481 |  13.6663 |      1.31%

The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was
a AMD Ryzen 9 5900X, gcc 15.2.1.

This implementation also fixes the math/test-double-remainder issues
on alpha.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:31 -03:00
Collin Funk
6d9e110577 math: fix Wshift-overflow warning.
When compiling on x86_64 with -Wshift-overflow=2 you can see the
following warning:

../sysdeps/ieee754/flt-32/math_config.h: In function ‘is_inf’:
../sysdeps/ieee754/flt-32/math_config.h:184:37: warning: result of ‘2139095040 << 1’ requires 33 bits to represent, but ‘int’ only has 32 bits [-Wshift-overflow=]
  184 |   return (x << 1) == (EXPONENT_MASK << 1);
      |                                     ^~

This patch adjusts the definitions to use UINT32_C. This matches the
definitions in sysdeps/ieee754/dbl-64/math_config.h which use UINT64_C
for these definitions.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-02 18:01:23 -07:00
Adhemerval Zanella
cde86de627 math: Remove clz_uint64/ctz_uint64 and use stdbit.h
Checked on aarch64-linux-gnu and x86_64-linux-gnu
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
2025-09-11 14:47:25 -03:00
Adhemerval Zanella
bd7b04ec7c math: Split erf and erfc
Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:59 -03:00
Adhemerval Zanella
f40cdb65f5 math: Use internal fesetround alias on fma
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella
adecb3bec1 math: Use internal fetestexcept alias on fma
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella
41c2f1d9a3 math: Use internal feholdexcept alias on fma
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella
08c68809d0 math: Use internal feupdateenv alias on fma
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Adhemerval Zanella
5624ee0482 math: Use internal feholdexcept alias on fma
To avoid linknamespace issues on old standards.  It is required
if the fallback fma implementation is used if/when it is also
used internally for other implementation.
Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11 14:46:07 -03:00
Maciej W. Rozycki
aa4dbb2eeb stdio-common: Convert macros across scanf input specifier tests
Convert 'compare_real', 'read_real', and 'verify_input' macros to
functions so as to improve readability and avoid pitfalls.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-23 01:02:46 +01:00
Maciej W. Rozycki
a1e5ee13ab stdio-common: Adjust header inclusion in scanf input specifier tests
Move the inclusion of the data class header from the individual tests to
the data-type-specific skeleton, providing for the use of the data type
under test in the data class header and reducing duplication.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-23 01:02:46 +01:00
Maciej W. Rozycki
ca0f999a93 stdio-common: Fix NaN input data for scanf input specifier tests [BZ #32857]
Update NaN input data with 'n-char-sequence' in reference data matching
data under test, removing test failures with the M68K host.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-23 01:02:46 +01:00
Maciej W. Rozycki
1c1f5e8f6d stdio-common: Add 'f' conversion tests for . scanf input [BZ #12701]
Verify that . input is rejected by 'f' conversion (and its uppercase
counterpart).  Replace 0 input with .0 rather than adding new one,
because the integral part of 0 is already covered by 0.0 data, so
there's no need to keep this duplication.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-11 17:42:12 +01:00
Maciej W. Rozycki
291f4d4fe5 stdio-common: Add 'e' conversion tests for . scanf input [BZ #12701]
Verify that . input is rejected by 'e' conversion (and its uppercase
counterpart).  Replace 0e0 input with .0e0 rather than adding new one,
because 0 significand is already covered by 0e+0 data, so there's no
need to keep this duplication.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-11 17:42:12 +01:00
Maciej W. Rozycki
14957cb1c4 stdio-common: Add 'a', 'g' conversion tests for 0x. scanf input [BZ #12701]
Verify that 0x. input is rejected by 'a' and 'g' conversions (and their
uppercase counterparts).  Replace 0x0p0 input with 0x.0p0 rather than
adding new one, because 0x0 significand is already covered by 0x0p+0
data, so there's no need to keep this duplication.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-08-11 17:42:12 +01:00
Adhemerval Zanella
5c2b21c478 powerpc: Remove modff optimization
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).

On power10 with gcc 14.2.1:

reciprocal-throughput        master        patch        difference
workload-0_1                 1.5210       1.3942             8.34%
workload-1_maxint            2.0926       1.3940            33.38%
workload-maxint_maxfloat     1.7851       1.3940            21.91%
workload-integral            1.5216       1.3941             8.37%

latency                      master        patch        difference
workload-0_1                 1.5928       2.6337           -65.35%
workload-1_maxint            3.2929       2.6337            20.02%
workload-maxint_maxfloat     1.9697       2.6341           -33.73%
workload-integral            2.0597       2.6337           -27.87%

Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
2025-06-25 15:05:30 -03:00
Adhemerval Zanella
f165e244e4 math: Simplify and optimize modf implementation
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).

The generic implementation generates similar code on x86_64, while
the optimization one for aarch64 (where truncf is supported as a
builtin by through frintz), the improvements are:

reciprocal-throughput           master    patch    difference
workload-0_1                    3.0595   3.0698        -0.34%
workload-1_maxint               5.1747   3.0542        40.98%
workload-maxint_maxfloat        3.4391   3.0349        11.75%
workload-integral               3.2732   3.0293         7.45%

latency                         master    patch    difference
workload-0_1                    3.5267   4.7107       -33.57%
workload-1_maxint               6.9074   4.7282        31.55%
workload-maxint_maxfloat        3.7210   4.7506       -27.67%
workload-integral               3.8634   4.8137       -24.60%

Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 15:56:40 -03:00
Adhemerval Zanella
61cc9922f3 math: Simplify and optimize modff implementation
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).

The generic implementation generates similar code for x86_64, while
the optimization path aarch64 (where truncf is supported as a builtin)
through frintz), the improvements are:

reciprocal-throughput           master     patch    difference
workload-0_1                    3.0740    3.0326         1.35%
workload-1_maxint               5.2231    3.0436        41.73%
workload-maxint_maxfloat        4.0962    3.0551        25.42%
workload-integral               3.7093    3.0612        17.47%

latency                         master     patch    difference
workload-0_1                    3.5521    4.7313       -33.20%
workload-1_maxint               6.7148    4.7314        29.54%
workload-maxint_maxfloat        4.0458    4.7518       -17.45%
workload-integral               3.9719    4.7427       -19.40%

Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-18 15:56:00 -03:00
Adhemerval Zanella
39775f00b1 math: Optimize float ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogbf>:
   0:   1e260000        fmov    w0, s0
   4:   d3577801        ubfx    x1, x0, #23, #8
   8:   340000e1        cbz     w1, 24 <__ilogbf+0x24>
   c:   5101fc20        sub     w0, w1, #0x7f
  10:   7103fc3f        cmp     w1, #0xff
  14:   54000040        b.eq    1c <__ilogbf+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalidf_i>
  24:   53175800        lsl     w0, w0, #9
  28:   340000a0        cbz     w0, 3c <__ilogbf+0x3c>
  2c:   5ac01000        clz     w0, w0
  30:   12800fc1        mov     w1, #0xffffff81                 // #-127
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalidf_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella
afe09d44f3 math: Remove UB and optimize double ilogbf
The subnormal exponent calculation invokes UB by left shifting the
signed expoenent to find the first leading bit.

The patch reimplements ilogb using the math_config.h macros and
uses the new stdbit.h function to simplify the subnormal handling.

On aarch64 it generates better code:

* master:

0000000000000000 <__ieee754_ilogbf>:
   0:   1e260000        fmov    w0, s0
   4:   12007801        and     w1, w0, #0x7fffffff
   8:   72091c1f        tst     w0, #0x7f800000
   c:   54000141        b.ne    34 <__ieee754_ilogbf+0x34>  // b.any
  10:   34000201        cbz     w1, 50 <__ieee754_ilogbf+0x50>
  14:   53185c21        lsl     w1, w1, #8
  18:   12800fa0        mov     w0, #0xffffff82                 // #-126
  1c:   d503201f        nop
  20:   531f7821        lsl     w1, w1, #1
  24:   51000400        sub     w0, w0, #0x1
  28:   7100003f        cmp     w1, #0x0
  2c:   54ffffac        b.gt    20 <__ieee754_ilogbf+0x20>
  30:   d65f03c0        ret
  34:   13177c20        asr     w0, w1, #23
  38:   12b01002        mov     w2, #0x7f7fffff                 // #2139095039
  3c:   5101fc00        sub     w0, w0, #0x7f
  40:   6b02003f        cmp     w1, w2
  44:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  48:   1a819000        csel    w0, w0, w1, ls  // ls = plast
  4c:   d65f03c0        ret
  50:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  54:   d65f03c0        ret

* patch:

0000000000000000 <__ieee754_ilogbf>:
   0:   1e260001        fmov    w1, s0
   4:   d3577820        ubfx    x0, x1, #23, #8
   8:   350000e0        cbnz    w0, 24 <__ieee754_ilogbf+0x24>
   c:   53175821        lsl     w1, w1, #9
  10:   34000141        cbz     w1, 38 <__ieee754_ilogbf+0x38>
  14:   5ac01021        clz     w1, w1
  18:   12800fc0        mov     w0, #0xffffff81                 // #-127
  1c:   4b010000        sub     w0, w0, w1
  20:   d65f03c0        ret
  24:   7103fc1f        cmp     w0, #0xff
  28:   5101fc00        sub     w0, w0, #0x7f
  2c:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  30:   1a811000        csel    w0, w0, w1, ne  // ne = any
  34:   d65f03c0        ret
  38:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  3c:   d65f03c0        ret

Other architecture with support for stdc_leading_zeros and/or
__builtin_clzll should have similar improvements.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella
c4be334400 math: Optimize double ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogb>:
   0:   9e660000        fmov    x0, d0
   4:   d374f801        ubfx    x1, x0, #52, #11
   8:   340000e1        cbz     w1, 24 <__ilogb+0x24>
   c:   510ffc20        sub     w0, w1, #0x3ff
  10:   711ffc3f        cmp     w1, #0x7ff
  14:   54000040        b.eq    1c <__ilogb+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalid_i>
  24:   d374cc00        lsl     x0, x0, #12
  28:   b40000a0        cbz     x0, 3c <__ilogb+0x3c>
  2c:   dac01000        clz     x0, x0
  30:   12807fc1        mov     w1, #0xfffffc01                 // #-1023
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalid_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella
eb1e9194fa math: Remove UB and optimize double ilogb
The subnormal exponent calculation invokes UB by left shifting the
signed exponent to find the first leading bit.  The implementation
also uses 32 bits operations, which generates suboptimal code in
64 bits architectures.

The patch reimplements ilogb using the math_config.h macros and
uses the new stdbit function to simplify the subnormal handling.

On aarch64 it generates better code:

* master:

0000000000000000 <__ieee754_ilogb>:
   0:   9e660000        fmov    x0, d0
   4:   d360fc02        lsr     x2, x0, #32
   8:   d360f801        ubfx    x1, x0, #32, #31
   c:   f26c285f        tst     x2, #0x7ff00000
  10:   540001a1        b.ne    44 <__ieee754_ilogb+0x44>  // b.any
  14:   2a000022        orr     w2, w1, w0
  18:   34000322        cbz     w2, 7c <__ieee754_ilogb+0x7c>
  1c:   35000221        cbnz    w1, 60 <__ieee754_ilogb+0x60>
  20:   2a0003e1        mov     w1, w0
  24:   7100001f        cmp     w0, #0x0
  28:   12808240        mov     w0, #0xfffffbed                 // #-1043
  2c:   540000ad        b.le    40 <__ieee754_ilogb+0x40>
  30:   531f7821        lsl     w1, w1, #1
  34:   51000400        sub     w0, w0, #0x1
  38:   7100003f        cmp     w1, #0x0
  3c:   54ffffac        b.gt    30 <__ieee754_ilogb+0x30>
  40:   d65f03c0        ret
  44:   13147c20        asr     w0, w1, #20
  48:   12b00202        mov     w2, #0x7fefffff                 // #2146435071
  4c:   510ffc00        sub     w0, w0, #0x3ff
  50:   6b02003f        cmp     w1, w2
  54:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  58:   1a819000        csel    w0, w0, w1, ls  // ls = plast
  5c:   d65f03c0        ret
  60:   53155021        lsl     w1, w1, #11
  64:   12807fa0        mov     w0, #0xfffffc02                 // #-1022
  68:   531f7821        lsl     w1, w1, #1
  6c:   51000400        sub     w0, w0, #0x1
  70:   7100003f        cmp     w1, #0x0
  74:   54ffffac        b.gt    68 <__ieee754_ilogb+0x68>
  78:   d65f03c0        ret
  7c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  80:   d65f03c0        ret

* patch:

0000000000000000 <__ieee754_ilogb>:
   0:   9e660001        fmov    x1, d0
   4:   d374f820        ubfx    x0, x1, #52, #11
   8:   350000e0        cbnz    w0, 24 <__ieee754_ilogb+0x24>
   c:   d374cc21        lsl     x1, x1, #12
  10:   b4000141        cbz     x1, 38 <__ieee754_ilogb+0x38>
  14:   dac01021        clz     x1, x1
  18:   12807fc0        mov     w0, #0xfffffc01                 // #-1023
  1c:   4b010000        sub     w0, w0, w1
  20:   d65f03c0        ret
  24:   711ffc1f        cmp     w0, #0x7ff
  28:   510ffc00        sub     w0, w0, #0x3ff
  2c:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  30:   1a811000        csel    w0, w0, w1, ne  // ne = any
  34:   d65f03c0        ret
  38:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  3c:   d65f03c0        ret

Other architecture with support for stdc_leading_zeros and/or
__builtin_clzll should have similar improvements.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Andreas Schwab
d3e0f63fb9 ldbl-128: also disable lgammaf128_r builtin when building lgammal_r 2025-05-21 14:11:50 +02:00
Andreas Schwab
b1f33b2eeb Fix typos in ldbl-opt makefile
The -fno-builtin options need to disable the long double builtins.
2025-05-20 12:42:54 +02:00
Joseph Myers
06caf53adf Implement C23 rootn.
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the rootn functions, which compute the Yth root of X for
integer Y (with a domain error if Y is 0, even if X is a NaN).  The
integer exponent has type long long int in C23; it was intmax_t in TS
18661-4, and as with other interfaces changed after their initial
appearance in the TS, I don't think we need to support the original
version of the interface.

As with pown and compoundn, I strongly encourage searching for worst
cases for ulps error for these implementations (necessarily
non-exhaustively, given the size of the input space).  I also expect a
custom implementation for a given format could be much faster as well
as more accurate, although the implementation is simpler than those
for pown and compoundn.

This completes adding to glibc those TS 18661-4 functions (ignoring
DFP) that are included in C23.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23
mathematical functions (not just the TS 18661-4 ones) missing built-in
functions in GCC, where such functions might usefully be added.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-05-14 10:51:46 +00:00
Joseph Myers
ae31254432 Implement C23 compoundn
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1).  The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.

Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).

As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space).  I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.

As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-05-09 15:17:27 +00:00
Adhemerval Zanella
84977600da math: Fix UB on sinpif (BZ 32925)
The left shift overflows for 'int', use uint32_t instead.  It syncs
with CORE-MATH commit bbfabd99.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:20:28 -03:00
Adhemerval Zanella
7a0d7fb25c math: Fix UB on erfcf (BZ 32924)
The left shift overflows for 'int', use uint64_t instead.  It syncs
with CORE-MATH commit d0a2be200cbc1344d800d9ef0ebee9ad67dd3ad8.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:20:25 -03:00
Adhemerval Zanella
8eeb7de8a2 math: Fix UB on cospif (BZ 32923)
The left shift overflows for 'int', use uint32_t instead.  It syncs
with CORE-MATH commit bbfabd993a71b049c210b0febfd06d18369fadc1.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:20:16 -03:00
Adhemerval Zanella
7619c1b032 math: Fix UB on cbrtf (BZ 32922)
The left shift overflows for 'int64_t', use unsigned instead.  It syncs
with CORE-MATH commit f7c7408d1749ec2859ea249495af699359ae559b.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:20:10 -03:00
Adhemerval Zanella
c8775c0423 math: Fix UB on sinhf (BZ 32921)
The left shift overflows for 'int', use uint64_t instead.  It syncs
with CORE-MATH commit bbfabd99.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:20:04 -03:00
Adhemerval Zanella
de0c4adf94 math: Fix UB on logf (BZ 32920)
The left shift overflows for 'int', use a literal instead.  It syncs
with OPTIMIZED-ROUTINES commit 0f87f607b976820ef41fe64d004fe67dc7af8236.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:19:59 -03:00
Adhemerval Zanella
4a1b96bf52 math: Fix UB on coshf (BZ 32919)
The left shift overflows for 'int', use uint64_t instead.  It syncs
with CORE-MATH commit 4d6192d2.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:19:54 -03:00
Adhemerval Zanella
92f7b6061d math: Fix UB on atanhf (BZ 32918)
The left shift overflows for 'int', use unsigned instead.  It syncs
with CORE-MATH commit 4d6192d2.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-04-29 15:19:42 -03:00
Jakub Jelinek
63c99cd50b math: Fix up THREEp96 constant in expf128 [BZ #32411]
As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

Reviewed-by: Joseph Myers <josmyers@redhat.com>
2025-04-09 18:24:11 +02:00