Vector Class Discussion

 
thread FMA with the vectorclass - chad - 2013-04-10
last reply FMA with the vectorclass - Agner - 2013-04-12
 
FMA with the vectorclass
Author:  Date: 2013-04-10 14:52
Your vectorclass is very useful to me and well documented. I have one question about FMA in the vectorclass.

You write in your text
"The FMA3 and FMA4 instruction sets are not handled directly by any code in the
vector class library, but by the compiler. The compiler will automatically combine
a floating point multiplication and a subsequent addition or subtraction into a
single instruction."

But according to the following the compiler won't use FMA unless you allow for a relaxed floating point model and even then it might not do it.
stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-see-avx/15933677?noredirect=1#comment22702114_15933677

There are specific FMA instructions, e.g. _mm_fmadd_ps(), which could be used. When I search the vectorclass I don't find any (which is what you say in your text). Can you explain to me why you don't support these instructions directly?

   
FMA with the vectorclass
Author: Agner Date: 2013-04-12 00:28
The programs are much more readable and easier to maintain if you rely on the compiler fusing * and + into FMA instructions than if you use FMA functions.

My experience is that the compiler automatically fuses multiply and add instructions whenever it is possible and optimal. This requires that the FMA3 or FMA4 instruction set is enabled and a relaxed floating point model.

I haven't tested all compilers and all situations. These instructions are pretty new and the compilers may not have been fine tuned yet. If it turns out that there are situations that the compilers can't handle then we may discuss whether it is relevant to add explicit FMA functions to the vector class library.