The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. It allows us to move the
implementation to a C one.
The performance on a Zen3 chip is slight better:
reciprocal-throughput input master no-SVID improvement
i686 subnormals 22.4741 20.1571 10.31%
i686 normal 74.1631 70.3606 5.13%
i686 close-exponent 22.5625 20.2435 10.28%
Tested on i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>