Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation. In this case, the call
will be routed to generic implementation.
However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).
This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:
expm1f master patched improvement
latency 9.8574 7.0139 28.85%
reciprocal-throughput 4.3742 2.6592 39.21%
Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.
The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 82.3961 54.8052 33.49%
x86_64v2 82.3415 54.8052 33.44%
x86_64v3 69.3661 50.4864 27.22%
i686 219.271 45.5396 79.23%
aarch64 29.2127 19.1951 34.29%
power10 19.5060 16.2760 16.56%
reciprocal-throughput master patched improvement
x86_64 28.3976 19.7334 30.51%
x86_64v2 28.4568 19.7334 30.65%
x86_64v3 21.1815 16.1811 23.61%
i686 105.016 15.1426 85.58%
aarch64 18.1573 10.7681 40.70%
power10 8.7207 8.7097 0.13%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Speedup tanf range reduction by using the new sincosf range
reduction algorithm. Overall code quality is improved due to
inlining, so there is a speedup even if no range reduction is
required.
tanf throughput gains on Cortex-A72:
* |x| < M_PI_4 : 1.1x
* |x| < M_PI_2 : 1.2x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0 : 1.6x
* |x| < Inf : 12.1x
* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.
This patch makes flt-32 libm functions use libm_alias_float to define
public interfaces (in cases where _Float32 aliases of those interfaces
would be appropriate, so not for finitef / isinff / isnanf).
Tested for x86_64. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/ieee754/flt-32/s_asinhf.c: Include <libm-alias-float.h>.
(asinhf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_atanf.c: Include <libm-alias-float.h>.
(atanf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_cbrtf.c: Include <libm-alias-float.h>.
(cbrtf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_ceilf.c: Include <libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_copysignf.c: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_cosf.c: Include <libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_erff.c: Include <libm-alias-float.h>.
(erff): Define using libm_alias_float.
(erfcf): Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Include <libm-alias-float.h>.
(expm1f): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_fabsf.c: Include <libm-alias-float.h>.
(fabsf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_floorf.c: Include <libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_frexpf.c: Include <libm-alias-float.h>.
(frexpf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_fromfpf.c (fromfpf): Define using
libm_alias_float.
* sysdeps/ieee754/flt-32/s_fromfpf_main.c: Include
<libm-alias-float.h>.
* sysdeps/ieee754/flt-32/s_fromfpxf.c (fromfpxf): Define using
libm_alias_float.
* sysdeps/ieee754/flt-32/s_getpayloadf.c: Include
<libm-alias-float.h>.
(getpayloadf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_llrintf.c: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_llroundf.c: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_logbf.c: Include <libm-alias-float.h>.
(logbf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_lrintf.c: Include <libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_lroundf.c: Include <libm-alias-float.h>.
(lroundf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_modff.c: Include <libm-alias-float.h>.
(modff): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Include
<libm-alias-float.h>.
(nearbyintf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_nextafterf.c: Include
<libm-alias-float.h>.
(nextafterf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_nextupf.c: Include
<libm-alias-float.h>.
(nextupf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_remquof.c: Include
<libm-alias-float.h>.
(remquof): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_rintf.c: Include <libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_roundevenf.c: Include
<libm-alias-float.h>.
(roundevenf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_roundf.c: Include <libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_setpayloadf.c (setpayloadf): Define
using libm_alias_float.
* sysdeps/ieee754/flt-32/s_setpayloadf_main.c: Include
<libm-alias-float.h>.
* sysdeps/ieee754/flt-32/s_setpayloadsigf.c (setpayloadsigf):
Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_sincosf.c: Include
<libm-alias-float.h>.
(sincosf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_sinf.c: Include <libm-alias-float.h>.
(sinf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_tanf.c: Include <libm-alias-float.h>.
(tanf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_tanhf.c: Include <libm-alias-float.h>.
(tanhf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_totalorderf.c: Include
<libm-alias-float.h>.
(totalorderf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_totalordermagf.c: Include
<libm-alias-float.h>.
(totalordermagf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_truncf.c: Include <libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/ieee754/flt-32/s_ufromfpf.c (ufromfpf): Define using
libm_alias_float.
* sysdeps/ieee754/flt-32/s_ufromfpxf.c (ufromfpxf): Define using
libm_alias_float.