Vector Class Discussion

 
thread compiling code with usage of blend16f and permute - kp - 2018-07-10
last replythread compiling code with usage of blend16f and permute - kp - 2018-07-12
last replythread compiling code with usage of blend16f and permute - Agner - 2018-07-13
last replythread compiling code with usage of blend16f and permute - kp - 2018-07-13
last replythread compiling code with usage of blend16f and permute - Agner - 2018-07-13
last reply compiling code with usage of blend16f and permute - kp - 2018-07-16
 
compiling code with usage of blend16f and permute
Author: kp Date: 2018-07-10 20:18
I am seeing different behavior between Clang and gcc when I make use of permute16f and blend16f in my code. GCC compiles fine but Clang complains with index for __builtin_shufflevector must be a constant integer. It seems that the avxintrin.h in case of Clang expects immediate for mask. In my usage avx512 is emulated with avx2. So blend16f uses blend8f in vectorf256.h which produces 'const int maskb' using template parameters. But for some reason Clang is not able to reason that maskb is actually constexpr and throws 'index must be constant integer'.

Have you encountered this and if so what is the workaround with Clang?

Really appreciate your help.

   
compiling code with usage of blend16f and permute
Author: kp Date: 2018-07-12 17:22
Ok so I did some further digging in and found that Clang complains here.

In vectorf512e.h:2995
// no zeroing, need to blend
const int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8) |
((i4 << 1) & 0x10) | ((i5 << 2) & 0x20) | ((i6 << 3) & 0x40) | ((i7 << 4) & 0x80);
return _mm256_blend_ps(ta, tb, maskb); // blend

Working with another colleague we figured that it is complaining because of left shift on signed int values which is what all template params are. When we remove, just to see if it compiles this time, and make it as follows:
const int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8);
return _mm256_blend_ps(ta, tb, maskb); // blend
It compiles fine.

Further on we made the following change:
constexpr int maskb = ((i0 >> 3) & 1) | ((i1 >> 2) & 2) | ((i2 >> 1) & 4) | (i3 & 8) |
((i4 & 8) << 1) | ((i5 & 8) << 2) | ((i6 & 8) << 3) | ((i7 & 8) << 4);
return _mm256_blend_ps(ta, tb, maskb); // blend
which naively looking at the code keeps the same semantic. I did not go through all the complex logic within the code to ensure what I was doing was right, but glancing through it suggested it seems ok. All we did was to move bit-wise end inside to make it seem unsigned int on which we do shift afterwards, rather than doing shift first and then doing bitwise end.

This code compiles with Clang.

Do you think this would be a right change to make?

Thanks.

   
compiling code with usage of blend16f and permute
Author: Agner Date: 2018-07-13 02:14
Does it work if you just change
const int maskb
to
const unsigned int maskb
?

   
compiling code with usage of blend16f and permute
Author: kp Date: 2018-07-13 12:35
That does not work.
   
compiling code with usage of blend16f and permute
Author: Agner Date: 2018-07-13 23:45
What is the error message? Does it work if you make i0 - i7 unsigned?
   
compiling code with usage of blend16f and permute
Author: kp Date: 2018-07-16 09:12
Apologies for late response. The error I got was:

In file included from vectorclass/vectorclass.h:51:
vectorclass/vectorf256.h:2995:12: error: index for __builtin_shufflevector must be a constant integer
return _mm256_blend_ps(ta, tb, maskb); // blend
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And replacing template parameters from int to unsigned does not work either.