I did consider the lookup<n> function but from the documentation and source code it looks like that function is mainly intended for fixed, small numbers of <n>. For arbitrary large n, the implementation in vectorf256.h is based on store and re-load from temp. memory locations
uint32_t ii[8]; index1.store(ii);
float rr[8];
for (int j = 0; j < 8; j++) {
rr[j] = table[ii[j]];
}
return Vec8f().load(rr); which from my experience is generally less efficient than directly working with shuffles etc. in the registers themselves. So I thought it would be nice to have reasonably fast, general purpose gather() for arbitrary index vectors, analogous to the AVX2 VGATHER instruction .... |