mirror of
https://sourceware.org/git/glibc.git
synced 2025-12-06 12:01:08 +03:00
The current implementation relies on setting the rounding mode for
different calculations (first to FE_TONEAREST and then to FE_TOWARDZERO)
to obtain correctly rounded results. For most CPUs, this adds a significant
performance overhead since it requires executing a typically slow
instruction (to get/set the floating-point status), it necessitates
flushing the pipeline, and breaks some compiler assumptions/optimizations.
This patch introduces a new implementation originally written by Szabolcs
for musl, which utilizes mostly integer arithmetic. Floating-point
arithmetic is used to raise the expected exceptions, without the need for
fenv.h operations.
I added some changes compared to the original code:
* Fixed some signaling NaN issues when the 3-argument is NaN.
* Use math_uint128.h for the 64-bit multiplication operation. It allows
the compiler to use 128-bit types where available, which enables some
optimizations on certain targets (for instance, MIPS64).
* Fixed an arm32 issue where the libgcc routine might not respect the
rounding mode [1]. This can also be used on other targets to optimize
the conversion from int64_t to double.
* Use -fexcess-precision=standard on i686.
I tested this implementation on various targets (x86_64, i686, arm, aarch64,
powerpc), including some by manually disabling the compiler instructions.
Performance-wise, it shows large improvements:
reciprocal-throughput master patched improvement
x86_64 [2] 289.4640 22.4396 12.90x
i686 [2] 636.8660 169.3640 3.76x
aarch64 [3] 46.0020 11.3281 4.06x
armhf [3] 63.989 26.5056 2.41x
powerpc [4] 23.9332 6.40205 3.74x
latency master patched improvement
x86_64 293.7360 38.1478 7.70x
i686 658.4160 187.9940 3.50x
aarch64 44.5166 14.7157 3.03x
armhf 63.7678 28.4116 2.24x
power10 23.8561 11.4250 2.09x
Checked on x86_64-linux-gnu and i686-linux-gnu with —disable-multi-arch,
and on arm-linux-gnueabihf.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970
[2] gcc 15.2.1, Zen3
[3] gcc 15.2.1, Neoverse N1
[4] gcc 15.2.1, POWER10
Signed-off-by: Szabolcs Nagy <nsz@gcc.gnu.org>
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>