AVX or SSE code performance question

News and research about CPU microarchitecture and software optimization
Post Reply
enuinc
Posts: 1
Joined: 2022-04-30, 7:53:08

AVX or SSE code performance question

Post by enuinc » 2022-04-30, 8:18:00

Hi

My work deals with lots of numerical computations so I regularly use SSE or AVX to speed up numerical intensive work. So far, the results I got have been quite satisfactory, until today

Basically, I used SSE to rewrite a simple vector-vector multiplication. When I evaluate the speed up, I got approximately 3X reduction in run time as expected.

However, once I checked in the code, my colleague did an experiment, by running the compiled binary 56 times (with identical input) on a dell machine with dual Xeon Gold 6238R CPUs (so the machine contains 56 physical cores). I was surprised to see the code now runs about 5-6 slower than if I run such binary just once.

I suspect this is due to cache miss but I don't know how to confirm this. And probably more importantly, is there a way to mitigate such issue if it is indeed the root cause?

Thanks
en

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: AVX or SSE code performance question

Post by agner » 2022-04-30, 12:02:34

I will propose that you post your question on https://stackoverflow.com with details about your code. Your results may in fact reflect cache effects, compiler optimization, or thread effects.

Post Reply