Vector Class Discussion

unpack four bytes to four ints
Author:  Date: 2013-06-04 08:00
I'm reading in pixel data where each pixel is an integer in RGBA format. I first unpack the the four bytes to flour ints and then convert to floats. One way I could do this is to use extend_low/high. But I would have to do this four times to get four integers. Instead I think it's more efficiency to use _mm_cvtepu8_epi32 intrinsic which unpacks four bytes directly to four ints. Is there a reason this intrinsics is not used by the vectoclass?

Here is the code I use now which unpacks four pixels into 12 floats.

void int4_to_float12(int *x, float*y, const int offset) {
//load 4 pixels, convert them from AoS to SoA, expand them to 12 floats
Vec16uc c16= Vec16uc().load(x);
Vec4ui i4 = (Vec4ui)permute16uc<
0, 4, 8, 12,
1, 5, 9, 13,
2, 6, 10, 14,
3, 7, 11, 15>(c16);

Vec4ui row0 = _mm_cvtepu8_epi32(permute4ui<0,-1,-1,-1>(i4)); //RRRR
Vec4ui row1 = _mm_cvtepu8_epi32(permute4ui<1,-1,-1,-1>(i4)); //GGGG
Vec4ui row2 = _mm_cvtepu8_epi32(permute4ui<2,-1,-1,-1>(i4)); //BBBB
//Vec4ui row3 = _mm_cvtepu8_epi32(permute4i<3,-1,-1,-1>(i4)); //AAAA


thread unpack four bytes to four ints - chad - 2013-06-04
last reply unpack four bytes to four ints new - chad - 2013-06-05