Vector Class Discussion

FMA and non temporal stores
Author: Agner Date: 2014-09-24 01:10
chad wrote:
Would you consider adding a function which call the non-temporal store instructions
That would be possible, but I don't know how useful it would be. Nontemporal stores are rarely optimal and I wouldn't expect the average programmer to know when they are. The programmer would have to check the cache size and use nontemporal stores when writing memory blocks bigger than half the size of the last level cache. Writing directly to video ram may be another application, but I don't think it is safe to use vector classes in a device driver. You are free to make your own extensions, of course, or use the intrinsic functions directly.

The compiler will automatically combine a floating point multiplication and a subsequent addition or subtraction into a single instruction ... But more importantly is that I have never observed this in GCC with the VCL.
You are right. GCC is not as good as I thought. The GCC developers are actually struggling with finding a solution to this problem, see gcc.gnu.org/bugzilla/show_bug.cgi?id=56253

It makes more sense to me to have these mul_add (and variants such as mul_sub and mul_add_x) functions as part of the core of the VLC instead of in a separate header
Good point. It depends on how long we have to wait for GCC and other compilers to implement this optimization. I will think about it.
 
thread FMA and non temporal stores new - chad - 2014-09-22
last reply FMA and non temporal stores - Agner - 2014-09-24