mirror of
https://sourceware.org/git/glibc.git
synced 2025-10-27 12:15:39 +03:00
With same micro-optimization done for the double variant: * Combine the |y| zero check. * Rework the check to adjust result and call fmod. * Remove one check after fmod. * Remove float-int-float roundtrip on return. Also use math_config.h macros and indent the code. The resulting strategy is different in many places that I think requires a different Copyright. I see the following performance improvements using remainder benchtests (using reciprocal-throughput metric): Architecture | Input | master | patch | Improvemnt -----------------|-----------------|----------|----------------------- x86_64 | subnormals | 20.4176 | 19.6144 | 3.93% x86_64 | normal | 54.0939 | 52.2343 | 3.44% x86_64 | close-exponent | 23.9120 | 22.3768 | 6.42% aarch64 | subnormals | 9.2423 | 8.3825 | 9.30% aarch64 | normal | 30.5393 | 29.244 | 4.24% aarch64 | close-exponent | 15.5405 | 13.9256 | 10.39% The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was a AMD Ryzen 9 5900X, gcc 15.2.1. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>