Hello, great library. After implementing it in my program, i got a speedup by a factor of about 30. But i wonder if i overlooked something. My code looks not very good when i try to convert a Vec8s to a Vec8f. Just take a look at this code: ....
```
Vec8f weight_cap_f, weights, cap_weights;
```
short val[8];
short current_hit;
..... ` inline float stats::weight() {`
// get total float vector
Vec8f total_f = to_float( Vec8i( current_hit, val[1], val[2], val[3], val[4], val[5], val[6], val[7] ) );
// find overcap values
Vec8f over_cap = ( total_f - weight_cap_f ) * to_float( reinterpret_i( total_f > weight_cap_f ) );
// weight it
Vec8f sum = ( total_f + over_cap ) * weights - over_cap * cap_weights;
// sum it all up
return horizontal_add( sum );
}
what i try to do is basically i got a (short) number (eg 750) and a (short) cap (eg 500) and two (float) weights (eg 4.0 and 2.0). then i weight everything till the cap with 4.0 and everything above it with 2.0. ( = 500 * 4.0 + 250 * 2.0 = 2500.0 ).
and all that over 8 vectorelements. Can i optimize this code using your library? My program spends 70% of its time in this method. Maybe there ts a way to directly multipy Vec8f and Vec8s? Best regards |