On Skylake, it improves tanh bench performance by: Before After Improvement max 110.89 95.826 14% min 20.966 20.157 4% mean 30.9601 29.8431 4% Reviewed-by: H.J. Lu <hjl.tools@gmail.com>