The code now looks like:
fclass.s $fa2, $fa0
movfr2gr.s $t0, $fa2
slli.w $t0, $t0, 0x0
fclass.s $fa2, $fa1
movfr2gr.s $t1, $fa2
or $t0, $t0, $t1
andi $t0, $t0, 0x3
bnez $t0, 1f
fmin.s $fa0, $fa0, $fa1
ret
1:
fmul.s $fa0, $fa0, $fa1
ret
This looks really bad, with expensive movfr2gr instructions, redundant
sign-extensions and masking (arguably it's a compiler
missed-optimzation), and a branch. Rewrite it with inline assembly:
fcmp.cor.s $fcc0, $fa0, $fa0
fcmp.cor.s $fcc1, $fa1, $fa1
fsel $fa2, $fa0, $fa1, $fcc0
fsel $fa0, $fa1, $fa0, $fcc1
fmax.s $fa0, $fa2, $fa0
ret
Note that we cannot make it more readable with
"double a = __builtin_isnanf (x) ? y : x" because this C statement only
happens to produce what we want with https://gcc.gnu.org/PR66462, if
this bug is fixed in the future the generated code may change.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>