Vector Class Discussion

Test suite for VCL? And how to submit patches

Author:

Date: 2016-06-02 16:23

I'm mostly finished tuning hsums for __m128i vectors. For horizontal_add_x(Vec16c), we can range-shift to unsigned and use psadbw, so that's a huge improvement.

Many of the _x functions do one step of extend/add and then just call the normal horizontal_add function for the next wider width.

I removed all the slow phadd code. In some cases, I changed things to avoid movdqa in the SSE2 / SSE4 versions without AVX. With AVX, it mostly just saves code-size, and maybe increases ILP.

For CPUs with slow shuffles (like Merom), there should be nice improvements from using pshuflw instead of pshufd when possible.

Anyway, I pushed stuff up to github. I have *not* turned my changes into a nice patch-series, so all the mess of development is there. I can re-factor the commits into a series of clean commits if that's useful, but you don't use public version-control for the library so IDK if it would benefit anything long-term.

I still haven't really looked at float or 256b vectors yet, but I'd like your comments on coding-style and how much detail to put in comments before I start on those.

Reply To This Message

Previous Message

Next Message