It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogbf>:
0: 1e260000 fmov w0, s0
4: d3577801 ubfx x1, x0, #23, #8
8: 340000e1 cbz w1, 24 <__ilogbf+0x24>
c: 5101fc20 sub w0, w1, #0x7f
10: 7103fc3f cmp w1, #0xff
14: 54000040 b.eq 1c <__ilogbf+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalidf_i>
24: 53175800 lsl w0, w0, #9
28: 340000a0 cbz w0, 3c <__ilogbf+0x3c>
2c: 5ac01000 clz w0, w0
30: 12800fc1 mov w1, #0xffffff81 // #-127
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalidf_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogb>:
0: 9e660000 fmov x0, d0
4: d374f801 ubfx x1, x0, #52, #11
8: 340000e1 cbz w1, 24 <__ilogb+0x24>
c: 510ffc20 sub w0, w1, #0x3ff
10: 711ffc3f cmp w1, #0x7ff
14: 54000040 b.eq 1c <__ilogb+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalid_i>
24: d374cc00 lsl x0, x0, #12
28: b40000a0 cbz x0, 3c <__ilogb+0x3c>
2c: dac01000 clz x0, x0
30: 12807fc1 mov w1, #0xfffffc01 // #-1023
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalid_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The code now looks like:
fclass.s $fa2, $fa0
movfr2gr.s $t0, $fa2
slli.w $t0, $t0, 0x0
fclass.s $fa2, $fa1
movfr2gr.s $t1, $fa2
or $t0, $t0, $t1
andi $t0, $t0, 0x3
bnez $t0, 1f
fmin.s $fa0, $fa0, $fa1
ret
1:
fmul.s $fa0, $fa0, $fa1
ret
This looks really bad, with expensive movfr2gr instructions, redundant
sign-extensions and masking (arguably it's a compiler
missed-optimzation), and a branch. Rewrite it with inline assembly:
fcmp.cor.s $fcc0, $fa0, $fa0
fcmp.cor.s $fcc1, $fa1, $fa1
fsel $fa2, $fa0, $fa1, $fcc0
fsel $fa0, $fa1, $fa0, $fcc1
fmax.s $fa0, $fa2, $fa0
ret
Note that we cannot make it more readable with
"double a = __builtin_isnanf (x) ? y : x" because this C statement only
happens to produce what we want with https://gcc.gnu.org/PR66462, if
this bug is fixed in the future the generated code may change.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
This patch implements the LoongArch specific math barriers in order to omit
the store and load from stack if possible.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add inline assembler for the scalbn functions. Passes GLIBC regression.
GCC 13, LoongArch support ___builtin_scalbn{,f} with -fno-math-errno,
but only "libm" can use -fno-math-errno in GLIBC, and scalbn is in libc
instead of libm because __printf_fp calls it.
Use __builtin_{fma, fmaf} to implement function {fma, fmaf} instead of
the generic implementation.
* sysdeps/loongarch/fpu/math-use-builtins-fma.h: New file.
GCC 13 compiles these built-ins to {fmax,fmin}.{s/d} instruction, use
them instead of the generic implementation.
Link: https://gcc.gnu.org/r13-2085
Signed-off-by: Xi Ruoyao <xry111@xry111.site>