Fast GeLU approximation

I needed a faster version of GeLU for my application. The following approximation with p=0.544790 works quite well:

0.5*x*(1.0+x/sqrt(p+x*x))

We can rewrite it in terms of y = 0.5 * x as:

y+y*y/sqrt(0.25*p+y*y)

which can be implemented efficiently with AVX2 using _mm256_rsqrt_ps (optionally refined with one Newton–Raphson step for better accuracy).

Regards,
GW

1 Like