1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-10-27 12:15:39 +03:00
Files
glibc/sysdeps/ieee754
Adhemerval Zanella 61ac7c6a75 math: Optimize flt-32 remainder implementation
With same micro-optimization done for the double variant:

  * Combine the |y| zero check.
  * Rework the check to adjust result and call fmod.
  * Remove one check after fmod.
  * Remove float-int-float roundtrip on return.

Also use math_config.h macros and indent the code.  The resulting
strategy is different in many places that I think requires a
different Copyright.

I see the following performance improvements using remainder benchtests
(using reciprocal-throughput metric):

Architecture     | Input           |   master |   patch  | Improvemnt
-----------------|-----------------|----------|-----------------------
x86_64           | subnormals      |  20.4176 |  19.6144 |      3.93%
x86_64           | normal          |  54.0939 |  52.2343 |      3.44%
x86_64           | close-exponent  |  23.9120 |  22.3768 |      6.42%
aarch64          | subnormals      |   9.2423 |   8.3825 |      9.30%
aarch64          | normal          |  30.5393 |   29.244 |      4.24%
aarch64          | close-exponent  |  15.5405 |  13.9256 |     10.39%

The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was
a AMD Ryzen 9 5900X, gcc 15.2.1.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:44 -03:00
..
2025-05-14 10:51:46 +00:00