Vector Class Discussion

 
thread Converting between the Vectors - Piece - 2012-12-30
last reply Converting between the Vectors - Agner - 2012-12-30
 
Converting between the Vectors
Author: Piece Date: 2012-12-30 06:13
Hello,

great library. After implementing it in my program, i got a speedup by a factor of about 30. But i wonder if i overlooked something. My code looks not very good when i try to convert a Vec8s to a Vec8f. Just take a look at this code:

....
Vec8f weight_cap_f, weights, cap_weights;
short val[8];
short current_hit;
.....

inline float stats::weight() {
// get total float vector
Vec8f total_f = to_float( Vec8i( current_hit, val[1], val[2], val[3], val[4], val[5], val[6], val[7] ) );
// find overcap values
Vec8f over_cap = ( total_f - weight_cap_f ) * to_float( reinterpret_i( total_f > weight_cap_f ) );
// weight it
Vec8f sum = ( total_f + over_cap ) * weights - over_cap * cap_weights;
// sum it all up
return horizontal_add( sum );
}

what i try to do is basically i got a (short) number (eg 750) and a (short) cap (eg 500) and two (float) weights (eg 4.0 and 2.0). then i weight everything till the cap with 4.0 and everything above it with 2.0. ( = 500 * 4.0 + 250 * 2.0 = 2500.0 ).
and all that over 8 vectorelements. Can i optimize this code using your library? My program spends 70% of its time in this method. Maybe there ts a way to directly multipy Vec8f and Vec8s?

Best regards

   
Converting between the Vectors
Author: Agner Date: 2012-12-30 12:57
Vec8f weight_cap_f, weights, cap_weights;
short val[8];
short current_hit;
.....

inline float stats::weight() {
   // load val into vector
   Vec8s val_s = Vec8s().load(val);
   // replace first element by current_hit
   Vec8s total_s = blend<0,9,10,11,12,13,14,15>(Vec8s(current_hit), val_s);
   // convert short to int
   Vec8i total_i = Vec8i(extend_low(total_s),extend_high(total_s));
   // convert to float
   Vec8f total_f = to_float(total_i);
   // find overcap values
   Vec8f over_cap = select(total_f > weight_cap_f, weight_cap_f - total_f, 0.0f);
   // weight it
   Vec8f sum = ( total_f + over_cap ) * weights - over_cap * cap_weights;
   // sum it all up
   return horizontal_add( sum );
}

You may consider using the same type throughout to avoid the many type conversions - they are expensive. I havent tested this code but I think you get the idea.

And don't expect me to solve all your programming problems...