Vector Class Discussion

 
thread unpack four bytes to four ints - chad - 2013-06-04
last reply unpack four bytes to four ints - chad - 2013-06-05
 
unpack four bytes to four ints
Author:  Date: 2013-06-04 08:00
I'm reading in pixel data where each pixel is an integer in RGBA format. I first unpack the the four bytes to flour ints and then convert to floats. One way I could do this is to use extend_low/high. But I would have to do this four times to get four integers. Instead I think it's more efficiency to use _mm_cvtepu8_epi32 intrinsic which unpacks four bytes directly to four ints. Is there a reason this intrinsics is not used by the vectoclass?

Here is the code I use now which unpacks four pixels into 12 floats.

void int4_to_float12(int *x, float*y, const int offset) {
//load 4 pixels, convert them from AoS to SoA, expand them to 12 floats
Vec16uc c16= Vec16uc().load(x);
//RGBARGBARGBARGBA -> 4xRRRRGGGGBBBB
Vec4ui i4 = (Vec4ui)permute16uc<
0, 4, 8, 12,
1, 5, 9, 13,
2, 6, 10, 14,
3, 7, 11, 15>(c16);

Vec4ui row0 = _mm_cvtepu8_epi32(permute4ui<0,-1,-1,-1>(i4)); //RRRR
Vec4ui row1 = _mm_cvtepu8_epi32(permute4ui<1,-1,-1,-1>(i4)); //GGGG
Vec4ui row2 = _mm_cvtepu8_epi32(permute4ui<2,-1,-1,-1>(i4)); //BBBB
//Vec4ui row3 = _mm_cvtepu8_epi32(permute4i<3,-1,-1,-1>(i4)); //AAAA

to_float(row0).store_a(&y[0*offset]);
to_float(row1).store_a(&y[1*offset]);
to_float(row2).store_a(&y[2*offset]);
//to_float(row3).store_a(&y[3*offset]);
}

   
unpack four bytes to four ints
Author:  Date: 2013-06-05 04:51
I thought a bit more carefully about this function. Since I'm unpacking four pixels at once then using extend_low/high gets me multiple values at once. The new version of the function only uses the vectorclass and it's even slightly faster than the previous version which used _mm_cvtepu8_epi32.

void int4_to_float12_v2(int *x, float*y, const int offset) {
Vec16uc c16= permute16uc<
0, 4, 8, 12,
1, 5, 9, 13,
2, 6, 10, 14,
3, 7, 11, 15>(Vec16uc().load_a(x));

Vec8us low = extend_low(c16);
Vec8us high = extend_high(c16);
Vec4ui row0 = extend_low(low); //RRRR
Vec4ui row1 = extend_high(low); //GGGG
Vec4ui row2 = extend_low(high); //BBBB
//Vec4ui row3 = extend_high(high); //AAAA

to_float(row0).store_a(&y[0*offset]);
to_float(row1).store_a(&y[1*offset]);
to_float(row2).store_a(&y[2*offset]);
//to_float(row3).store_a(&y[3*offset]);

}