SSE replacement for FPREM1

News and research about CPU microarchitecture and software optimization
Post Reply
zero318
Posts: 1
Joined: 2021-09-04, 3:27:17

SSE replacement for FPREM1

Post by zero318 » 2021-09-19, 22:30:25

I'm working on patching an old piece of code used to reduce the angles of a 3D float vector to the range ±Pi. The original code used loops to implement a horribly inaccurate version of IEEE remainder.

Replacing the loops with FPREM1 2Pi has been working well so far, but I'd really like to use SSE instructions instead since FPREM1 is slow and the angles can be easily loaded into an XMM register to process as packed singles. The optimization guide recommends to "Multiply by the reciprocal divisor, get the fractional part by subtracting the truncated value, and then multiply by the divisor," but this frequently isn't producing correct results when the input angles are a multiple of Pi.

Is there a simple way to make the SSE version more accurately behave like true IEEE remainder?

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: SSE replacement for FPREM1

Post by agner » 2021-09-20, 5:14:17

It is complicated to calculate the remainder with reasonable accuracy for high x. The vector class library is doing this in the sin, cos, and tan functions. See the file vectormath_trig.h in https://github.com/vectorclass/version2
If you are using the reduced x for a trigonometric function anyway then I would recommend using the vector class library.

Post Reply