glibc

lib/glibc

mirror of https://sourceware.org/git/glibc.git synced 2025-11-02 09:33:31 +03:00

Author	SHA1	Message	Date
Paul Zimmermann	48fde7b026	various fixes detected with -Wdouble-promotion Changes with respect to v1: - added comment in e_j1f.c to explain the use of float is enough Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-22 12:35:40 +02:00
Adhemerval Zanella	9d0b7ec87c	math: Suppress clang -Wincompatible-library-redeclaration on s_llround Clang issues: ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: error: incompatible redeclaration of library function 'lround' [-Werror,-Wincompatible-library-redeclaration] libm_alias_double (__lround, lround) ^ ../sysdeps/ieee754/dbl-64/s_llround.c:83:30: note: 'lround' is a builtin with type 'long (double)' Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:27 -03:00
Adhemerval Zanella	407b2eea75	math: use fabs on __ieee754_lgamma_r clang issues: ../sysdeps/ieee754/dbl-64/e_lgamma_r.c:234:29: error: absolute value function 'fabsf' given an argument of type 'double' but has parameter of type 'float' which may cause \ truncation of value [-Werror,-Wabsolute-value] It should not matter because the value is 0.0, but using fabs is simpler than adding a warning suppresion. Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:24 -03:00
Adhemerval Zanella	76dfd91275	Suppress -Wmaybe-uninitialized only for gcc The warning is not supported by clang. Reviewed-by: Sam James <sam@gentoo.org>	2025-10-21 09:24:05 -03:00
Wilco Dijkstra	35807cc5cd	math: Add builtin support for (l)lround(f) Add builtin support for (l)lround(f) via the math-use-builtins header mechanism. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-17 17:03:54 +00:00
Adhemerval Zanella	850d93f514	math: Use binary search on lgammaf slow path And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:08 -03:00
Adhemerval Zanella	6610a293b3	math: Use stdbit.h instead of builtin in math_config.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:04 -03:00
Adhemerval Zanella	ae49afe74d	math: Optimize fma call on log2pf1 The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST to provide correctly rounded results. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:00 -03:00
Adhemerval Zanella	82a4f50b4e	math: Optimize fma call on asinpif The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO to provide correctly rounded results. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:11:56 -03:00
Adhemerval Zanella	fab32b6526	math: Remove erfcf fma usage The fma is not required to provide correctly rounded and it helps on !__FP_FAST_FMA ISAs. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	68cb78eccc	math: Remove asinhf fma usage The fma is not required to provide correctly rounded and it helps on !__FP_FAST_FMA ISAs. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	c075ff00a6	math: Optimize fma call on acospif The fma is required only for inputs less than 0x1.0fd288p-127. Also only add the extra check for !__FP_FAST_FMA targets. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	c9d9336f50	math: Remove acoshf fma usage The fma is not strickly required to provide correctly rounded and it helps on !__FP_FAST_FMA ABIs. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Paul Zimmermann	ea5b996be9	replace use of double by float [BZ#29326]	2025-10-14 09:46:00 +02:00
Adhemerval Zanella	61ac7c6a75	math: Optimize flt-32 remainder implementation With same micro-optimization done for the double variant: * Combine the \|y\| zero check. * Rework the check to adjust result and call fmod. * Remove one check after fmod. * Remove float-int-float roundtrip on return. Also use math_config.h macros and indent the code. The resulting strategy is different in many places that I think requires a different Copyright. I see the following performance improvements using remainder benchtests (using reciprocal-throughput metric): Architecture \| Input \| master \| patch \| Improvemnt -----------------\|-----------------\|----------\|----------------------- x86_64 \| subnormals \| 20.4176 \| 19.6144 \| 3.93% x86_64 \| normal \| 54.0939 \| 52.2343 \| 3.44% x86_64 \| close-exponent \| 23.9120 \| 22.3768 \| 6.42% aarch64 \| subnormals \| 9.2423 \| 8.3825 \| 9.30% aarch64 \| normal \| 30.5393 \| 29.244 \| 4.24% aarch64 \| close-exponent \| 15.5405 \| 13.9256 \| 10.39% The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was a AMD Ryzen 9 5900X, gcc 15.2.1. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-03 15:19:44 -03:00
Adhemerval Zanella	f0facb2d27	math: Optimize dbl-64 remainder implementation The commit `34b9f8bc17` provides an optimized fmod implementation; use the same strategy used for remainderf and implement the double variant on top of fmod. I see the following performance improvements using remainder benchtests (using reciprocal-throughput metric): Architecture \| Input \| master \| patch \| Improvemnt -----------------\|-----------------\|----------\|----------------------- x86_64 \| subnormals \| 76.1345 \| 21.5334 \| 71.72% x86_64 \| normal \| 553.2670 \| 426.5670 \| 22.90% x86_64 \| close-exponent \| 30.5111 \| 22.6893 \| 25.64% aarch64 \| subnormals \| 26.0734 \| 8.4876 \| 67.45% aarch64 \| normal \| 205.2590 \| 200.082 \| 2.52% aarch64 \| close-exponent \| 13.8481 \| 13.6663 \| 1.31% The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was a AMD Ryzen 9 5900X, gcc 15.2.1. This implementation also fixes the math/test-double-remainder issues on alpha. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-03 15:19:31 -03:00
Collin Funk	6d9e110577	math: fix Wshift-overflow warning. When compiling on x86_64 with -Wshift-overflow=2 you can see the following warning: ../sysdeps/ieee754/flt-32/math_config.h: In function ‘is_inf’: ../sysdeps/ieee754/flt-32/math_config.h:184:37: warning: result of ‘2139095040 << 1’ requires 33 bits to represent, but ‘int’ only has 32 bits [-Wshift-overflow=] 184 \| return (x << 1) == (EXPONENT_MASK << 1); \| ^~ This patch adjusts the definitions to use UINT32_C. This matches the definitions in sysdeps/ieee754/dbl-64/math_config.h which use UINT64_C for these definitions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-02 18:01:23 -07:00
Adhemerval Zanella	cde86de627	math: Remove clz_uint64/ctz_uint64 and use stdbit.h Checked on aarch64-linux-gnu and x86_64-linux-gnu Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-09-11 14:47:25 -03:00
Adhemerval Zanella	bd7b04ec7c	math: Split erf and erfc Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:59 -03:00
Adhemerval Zanella	f40cdb65f5	math: Use internal fesetround alias on fma To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	adecb3bec1	math: Use internal fetestexcept alias on fma To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	41c2f1d9a3	math: Use internal feholdexcept alias on fma To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	08c68809d0	math: Use internal feupdateenv alias on fma To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	5624ee0482	math: Use internal feholdexcept alias on fma To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Maciej W. Rozycki	aa4dbb2eeb	stdio-common: Convert macros across scanf input specifier tests Convert 'compare_real', 'read_real', and 'verify_input' macros to functions so as to improve readability and avoid pitfalls. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-23 01:02:46 +01:00
Maciej W. Rozycki	a1e5ee13ab	stdio-common: Adjust header inclusion in scanf input specifier tests Move the inclusion of the data class header from the individual tests to the data-type-specific skeleton, providing for the use of the data type under test in the data class header and reducing duplication. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-23 01:02:46 +01:00
Maciej W. Rozycki	ca0f999a93	stdio-common: Fix NaN input data for scanf input specifier tests [BZ #32857 ] Update NaN input data with 'n-char-sequence' in reference data matching data under test, removing test failures with the M68K host. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-23 01:02:46 +01:00
Maciej W. Rozycki	1c1f5e8f6d	stdio-common: Add 'f' conversion tests for . scanf input [BZ #12701 ] Verify that . input is rejected by 'f' conversion (and its uppercase counterpart). Replace 0 input with .0 rather than adding new one, because the integral part of 0 is already covered by 0.0 data, so there's no need to keep this duplication. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-11 17:42:12 +01:00
Maciej W. Rozycki	291f4d4fe5	stdio-common: Add 'e' conversion tests for . scanf input [BZ #12701 ] Verify that . input is rejected by 'e' conversion (and its uppercase counterpart). Replace 0e0 input with .0e0 rather than adding new one, because 0 significand is already covered by 0e+0 data, so there's no need to keep this duplication. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-11 17:42:12 +01:00
Maciej W. Rozycki	14957cb1c4	stdio-common: Add 'a', 'g' conversion tests for 0x. scanf input [BZ #12701 ] Verify that 0x. input is rejected by 'a' and 'g' conversions (and their uppercase counterparts). Replace 0x0p0 input with 0x.0p0 rather than adding new one, because 0x0 significand is already covered by 0x0p+0 data, so there's no need to keep this duplication. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-08-11 17:42:12 +01:00
Adhemerval Zanella	5c2b21c478	powerpc: Remove modff optimization The generic implementation is slight more optimized than the powerpc one, where it has a more optimized inf/nan check (by not using FP unit checks, along with branch prediction hints), and removed one branch by issuing trunc instead of a combination of floor/ceil (which also generated less code). On power10 with gcc 14.2.1: reciprocal-throughput master patch difference workload-0_1 1.5210 1.3942 8.34% workload-1_maxint 2.0926 1.3940 33.38% workload-maxint_maxfloat 1.7851 1.3940 21.91% workload-integral 1.5216 1.3941 8.37% latency master patch difference workload-0_1 1.5928 2.6337 -65.35% workload-1_maxint 3.2929 2.6337 20.02% workload-maxint_maxfloat 1.9697 2.6341 -33.73% workload-integral 2.0597 2.6337 -27.87% Checked on powerpc64le-linux-gnu. Reviewed-by: Sachin Monga <smonga@linux.ibm.com>	2025-06-25 15:05:30 -03:00
Adhemerval Zanella	f165e244e4	math: Simplify and optimize modf implementation Refactor the generic implementation to use math_config.h definitions, and add an alternative one if the ABI supports truncf instructions (gated through math-use-builtins-trunc.h). The generic implementation generates similar code on x86_64, while the optimization one for aarch64 (where truncf is supported as a builtin by through frintz), the improvements are: reciprocal-throughput master patch difference workload-0_1 3.0595 3.0698 -0.34% workload-1_maxint 5.1747 3.0542 40.98% workload-maxint_maxfloat 3.4391 3.0349 11.75% workload-integral 3.2732 3.0293 7.45% latency master patch difference workload-0_1 3.5267 4.7107 -33.57% workload-1_maxint 6.9074 4.7282 31.55% workload-maxint_maxfloat 3.7210 4.7506 -27.67% workload-integral 3.8634 4.8137 -24.60% Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-18 15:56:40 -03:00
Adhemerval Zanella	61cc9922f3	math: Simplify and optimize modff implementation Refactor the generic implementation to use math_config.h definitions, and add an alternative one if the ABI supports truncf instructions (gated through math-use-builtins-trunc.h). The generic implementation generates similar code for x86_64, while the optimization path aarch64 (where truncf is supported as a builtin) through frintz), the improvements are: reciprocal-throughput master patch difference workload-0_1 3.0740 3.0326 1.35% workload-1_maxint 5.2231 3.0436 41.73% workload-maxint_maxfloat 4.0962 3.0551 25.42% workload-integral 3.7093 3.0612 17.47% latency master patch difference workload-0_1 3.5521 4.7313 -33.20% workload-1_maxint 6.7148 4.7314 29.54% workload-maxint_maxfloat 4.0458 4.7518 -17.45% workload-integral 3.9719 4.7427 -19.40% Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-18 15:56:00 -03:00
Adhemerval Zanella	39775f00b1	math: Optimize float ilogb/llogb It removes the wrapper by moving the error/EDOM handling to an out-of-line implementation (__math_invalidf_i/__math_invalidf_li). Also, __glibc_unlikely is used on errors case since it helps code generation on recent gcc. The code now builds to with gcc-14 on aarch64: 0000000000000000 <__ilogbf>: 0: 1e260000 fmov w0, s0 4: d3577801 ubfx x1, x0, #23, #8 8: 340000e1 cbz w1, 24 <__ilogbf+0x24> c: 5101fc20 sub w0, w1, #0x7f 10: 7103fc3f cmp w1, #0xff 14: 54000040 b.eq 1c <__ilogbf+0x1c> // b.none 18: d65f03c0 ret 1c: 12b00000 mov w0, #0x7fffffff // #2147483647 20: 14000000 b 0 <__math_invalidf_i> 24: 53175800 lsl w0, w0, #9 28: 340000a0 cbz w0, 3c <__ilogbf+0x3c> 2c: 5ac01000 clz w0, w0 30: 12800fc1 mov w1, #0xffffff81 // #-127 34: 4b000020 sub w0, w1, w0 38: d65f03c0 ret 3c: 320107e0 mov w0, #0x80000001 // #-2147483647 40: 14000000 b 0 <__math_invalidf_i> Some ABI requires additional adjustments: * i386 and m68k requires to use the template version, since both provide __ieee754_ilogb implementatations. * loongarch uses a custom implementation as well. * powerpc64le also has a custom implementation for POWER9, which is also used for float and float128 version. The generic e_ilogb.c implementation is moved on powerpc to keep the current code as-is. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-02 13:32:19 -03:00
Adhemerval Zanella	afe09d44f3	math: Remove UB and optimize double ilogbf The subnormal exponent calculation invokes UB by left shifting the signed expoenent to find the first leading bit. The patch reimplements ilogb using the math_config.h macros and uses the new stdbit.h function to simplify the subnormal handling. On aarch64 it generates better code: * master: 0000000000000000 <__ieee754_ilogbf>: 0: 1e260000 fmov w0, s0 4: 12007801 and w1, w0, #0x7fffffff 8: 72091c1f tst w0, #0x7f800000 c: 54000141 b.ne 34 <__ieee754_ilogbf+0x34> // b.any 10: 34000201 cbz w1, 50 <__ieee754_ilogbf+0x50> 14: 53185c21 lsl w1, w1, #8 18: 12800fa0 mov w0, #0xffffff82 // #-126 1c: d503201f nop 20: 531f7821 lsl w1, w1, #1 24: 51000400 sub w0, w0, #0x1 28: 7100003f cmp w1, #0x0 2c: 54ffffac b.gt 20 <__ieee754_ilogbf+0x20> 30: d65f03c0 ret 34: 13177c20 asr w0, w1, #23 38: 12b01002 mov w2, #0x7f7fffff // #2139095039 3c: 5101fc00 sub w0, w0, #0x7f 40: 6b02003f cmp w1, w2 44: 12b00001 mov w1, #0x7fffffff // #2147483647 48: 1a819000 csel w0, w0, w1, ls // ls = plast 4c: d65f03c0 ret 50: 320107e0 mov w0, #0x80000001 // #-2147483647 54: d65f03c0 ret * patch: 0000000000000000 <__ieee754_ilogbf>: 0: 1e260001 fmov w1, s0 4: d3577820 ubfx x0, x1, #23, #8 8: 350000e0 cbnz w0, 24 <__ieee754_ilogbf+0x24> c: 53175821 lsl w1, w1, #9 10: 34000141 cbz w1, 38 <__ieee754_ilogbf+0x38> 14: 5ac01021 clz w1, w1 18: 12800fc0 mov w0, #0xffffff81 // #-127 1c: 4b010000 sub w0, w0, w1 20: d65f03c0 ret 24: 7103fc1f cmp w0, #0xff 28: 5101fc00 sub w0, w0, #0x7f 2c: 12b00001 mov w1, #0x7fffffff // #2147483647 30: 1a811000 csel w0, w0, w1, ne // ne = any 34: d65f03c0 ret 38: 320107e0 mov w0, #0x80000001 // #-2147483647 3c: d65f03c0 ret Other architecture with support for stdc_leading_zeros and/or __builtin_clzll should have similar improvements. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-02 13:32:19 -03:00
Adhemerval Zanella	c4be334400	math: Optimize double ilogb/llogb It removes the wrapper by moving the error/EDOM handling to an out-of-line implementation (__math_invalid_i/__math_invalid_li). Also, __glibc_unlikely is used on errors case since it helps code generation on recent gcc. The code now builds to with gcc-14 on aarch64: 0000000000000000 <__ilogb>: 0: 9e660000 fmov x0, d0 4: d374f801 ubfx x1, x0, #52, #11 8: 340000e1 cbz w1, 24 <__ilogb+0x24> c: 510ffc20 sub w0, w1, #0x3ff 10: 711ffc3f cmp w1, #0x7ff 14: 54000040 b.eq 1c <__ilogb+0x1c> // b.none 18: d65f03c0 ret 1c: 12b00000 mov w0, #0x7fffffff // #2147483647 20: 14000000 b 0 <__math_invalid_i> 24: d374cc00 lsl x0, x0, #12 28: b40000a0 cbz x0, 3c <__ilogb+0x3c> 2c: dac01000 clz x0, x0 30: 12807fc1 mov w1, #0xfffffc01 // #-1023 34: 4b000020 sub w0, w1, w0 38: d65f03c0 ret 3c: 320107e0 mov w0, #0x80000001 // #-2147483647 40: 14000000 b 0 <__math_invalid_i> Some ABI requires additional adjustments: * i386 and m68k requires to use the template version, since both provide __ieee754_ilogb implementatations. * loongarch uses a custom implementation as well. * powerpc64le also has a custom implementation for POWER9, which is also used for float and float128 version. The generic e_ilogb.c implementation is moved on powerpc to keep the current code as-is. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-02 13:32:19 -03:00
Adhemerval Zanella	eb1e9194fa	math: Remove UB and optimize double ilogb The subnormal exponent calculation invokes UB by left shifting the signed exponent to find the first leading bit. The implementation also uses 32 bits operations, which generates suboptimal code in 64 bits architectures. The patch reimplements ilogb using the math_config.h macros and uses the new stdbit function to simplify the subnormal handling. On aarch64 it generates better code: * master: 0000000000000000 <__ieee754_ilogb>: 0: 9e660000 fmov x0, d0 4: d360fc02 lsr x2, x0, #32 8: d360f801 ubfx x1, x0, #32, #31 c: f26c285f tst x2, #0x7ff00000 10: 540001a1 b.ne 44 <__ieee754_ilogb+0x44> // b.any 14: 2a000022 orr w2, w1, w0 18: 34000322 cbz w2, 7c <__ieee754_ilogb+0x7c> 1c: 35000221 cbnz w1, 60 <__ieee754_ilogb+0x60> 20: 2a0003e1 mov w1, w0 24: 7100001f cmp w0, #0x0 28: 12808240 mov w0, #0xfffffbed // #-1043 2c: 540000ad b.le 40 <__ieee754_ilogb+0x40> 30: 531f7821 lsl w1, w1, #1 34: 51000400 sub w0, w0, #0x1 38: 7100003f cmp w1, #0x0 3c: 54ffffac b.gt 30 <__ieee754_ilogb+0x30> 40: d65f03c0 ret 44: 13147c20 asr w0, w1, #20 48: 12b00202 mov w2, #0x7fefffff // #2146435071 4c: 510ffc00 sub w0, w0, #0x3ff 50: 6b02003f cmp w1, w2 54: 12b00001 mov w1, #0x7fffffff // #2147483647 58: 1a819000 csel w0, w0, w1, ls // ls = plast 5c: d65f03c0 ret 60: 53155021 lsl w1, w1, #11 64: 12807fa0 mov w0, #0xfffffc02 // #-1022 68: 531f7821 lsl w1, w1, #1 6c: 51000400 sub w0, w0, #0x1 70: 7100003f cmp w1, #0x0 74: 54ffffac b.gt 68 <__ieee754_ilogb+0x68> 78: d65f03c0 ret 7c: 320107e0 mov w0, #0x80000001 // #-2147483647 80: d65f03c0 ret * patch: 0000000000000000 <__ieee754_ilogb>: 0: 9e660001 fmov x1, d0 4: d374f820 ubfx x0, x1, #52, #11 8: 350000e0 cbnz w0, 24 <__ieee754_ilogb+0x24> c: d374cc21 lsl x1, x1, #12 10: b4000141 cbz x1, 38 <__ieee754_ilogb+0x38> 14: dac01021 clz x1, x1 18: 12807fc0 mov w0, #0xfffffc01 // #-1023 1c: 4b010000 sub w0, w0, w1 20: d65f03c0 ret 24: 711ffc1f cmp w0, #0x7ff 28: 510ffc00 sub w0, w0, #0x3ff 2c: 12b00001 mov w1, #0x7fffffff // #2147483647 30: 1a811000 csel w0, w0, w1, ne // ne = any 34: d65f03c0 ret 38: 320107e0 mov w0, #0x80000001 // #-2147483647 3c: d65f03c0 ret Other architecture with support for stdc_leading_zeros and/or __builtin_clzll should have similar improvements. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-02 13:32:19 -03:00
Andreas Schwab	d3e0f63fb9	ldbl-128: also disable lgammaf128_r builtin when building lgammal_r	2025-05-21 14:11:50 +02:00
Andreas Schwab	b1f33b2eeb	Fix typos in ldbl-opt makefile The -fno-builtin options need to disable the long double builtins.	2025-05-20 12:42:54 +02:00
Joseph Myers	06caf53adf	Implement C23 rootn. C23 adds various <math.h> function families originally defined in TS 18661-4. Add the rootn functions, which compute the Yth root of X for integer Y (with a domain error if Y is 0, even if X is a NaN). The integer exponent has type long long int in C23; it was intmax_t in TS 18661-4, and as with other interfaces changed after their initial appearance in the TS, I don't think we need to support the original version of the interface. As with pown and compoundn, I strongly encourage searching for worst cases for ulps error for these implementations (necessarily non-exhaustively, given the size of the input space). I also expect a custom implementation for a given format could be much faster as well as more accurate, although the implementation is simpler than those for pown and compoundn. This completes adding to glibc those TS 18661-4 functions (ignoring DFP) that are included in C23. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118592 regarding the C23 mathematical functions (not just the TS 18661-4 ones) missing built-in functions in GCC, where such functions might usefully be added. Tested for x86_64 and x86, and with build-many-glibcs.py.	2025-05-14 10:51:46 +00:00
Joseph Myers	ae31254432	Implement C23 compoundn C23 adds various <math.h> function families originally defined in TS 18661-4. Add the compoundn functions, which compute (1+X) to the power Y for integer Y (and X at least -1). The integer exponent has type long long int in C23; it was intmax_t in TS 18661-4, and as with other interfaces changed after their initial appearance in the TS, I don't think we need to support the original version of the interface. Note that these functions are "compoundn" with a trailing "n", not "compound" (CORE-MATH has the wrong name, for example). As with pown, I strongly encourage searching for worst cases for ulps error for these implementations (necessarily non-exhaustively, given the size of the input space). I also expect a custom implementation for a given format could be much faster as well as more accurate (I haven't tested or benchmarked the CORE-MATH implementation for binary32); this is one of the more complicated and less efficient functions to implement in a type-generic way. As with exp2m1 and exp10m1, this showed up places where the powerpc64le IFUNC setup is not as self-contained as one might hope (in this case, without the changes specific to powerpc64le, there were undefined references to __GI___expf128). Tested for x86_64 and x86, and with build-many-glibcs.py.	2025-05-09 15:17:27 +00:00
Adhemerval Zanella	84977600da	math: Fix UB on sinpif (BZ 32925) The left shift overflows for 'int', use uint32_t instead. It syncs with CORE-MATH commit bbfabd99. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:20:28 -03:00
Adhemerval Zanella	7a0d7fb25c	math: Fix UB on erfcf (BZ 32924) The left shift overflows for 'int', use uint64_t instead. It syncs with CORE-MATH commit d0a2be200cbc1344d800d9ef0ebee9ad67dd3ad8. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:20:25 -03:00
Adhemerval Zanella	8eeb7de8a2	math: Fix UB on cospif (BZ 32923) The left shift overflows for 'int', use uint32_t instead. It syncs with CORE-MATH commit bbfabd993a71b049c210b0febfd06d18369fadc1. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:20:16 -03:00
Adhemerval Zanella	7619c1b032	math: Fix UB on cbrtf (BZ 32922) The left shift overflows for 'int64_t', use unsigned instead. It syncs with CORE-MATH commit f7c7408d1749ec2859ea249495af699359ae559b. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:20:10 -03:00
Adhemerval Zanella	c8775c0423	math: Fix UB on sinhf (BZ 32921) The left shift overflows for 'int', use uint64_t instead. It syncs with CORE-MATH commit bbfabd99. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:20:04 -03:00
Adhemerval Zanella	de0c4adf94	math: Fix UB on logf (BZ 32920) The left shift overflows for 'int', use a literal instead. It syncs with OPTIMIZED-ROUTINES commit 0f87f607b976820ef41fe64d004fe67dc7af8236. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:19:59 -03:00
Adhemerval Zanella	4a1b96bf52	math: Fix UB on coshf (BZ 32919) The left shift overflows for 'int', use uint64_t instead. It syncs with CORE-MATH commit 4d6192d2. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:19:54 -03:00
Adhemerval Zanella	92f7b6061d	math: Fix UB on atanhf (BZ 32918) The left shift overflows for 'int', use unsigned instead. It syncs with CORE-MATH commit 4d6192d2. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-04-29 15:19:42 -03:00
Jakub Jelinek	63c99cd50b	math: Fix up THREEp96 constant in expf128 [BZ #32411 ] As mentioned by the reporter in a pull request against gcc-mirror, the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128 rather than 0x3.p+96f128. The algorithm uses that to compute the t2 integer (tval2), by whose delta it adjusts the x+xl pair and then in the result uses the precomputed exp value for that entry. Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes being one smaller, sometimes one larger than the desired value, thus can mean the x+xl pair after adjustment will be larger in absolute value than it should be. DesWursters created a test program for this https://github.com/DesWurstes/comparefloats and his results were total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648 I've modified this so with https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3 so that it actually tests pseudo-random _Float128 values with range (-16384.,16384) with strong bias on values larger than 0.0002 in absolute value (so that tval1/tval2 aren't zero most of the time) and that gave total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255 So, in both cases, in most cases the change doesn't result in any differences, and in those rare cases where does, about 85% have smaller ulp than without the patch. Additionally I've tried https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4 and in 2 billion iterations it didn't find any case where x+xl after the adjustments without this change would be smaller in absolute value compared to x+xl after the adjustments with this change. Reviewed-by: Joseph Myers <josmyers@redhat.com>	2025-04-09 18:24:11 +02:00

1 2 3 4 5 ...

1339 Commits