Vector Class Discussion

check if all elements of a vector are zero - chad - 2013-03-21

check if all elements of a vector are zero - Agner - 2013-03-21

mathematical library functions - chad - 2013-03-27

mathematical library functions - chad - 2013-03-30

mathematical library functions - Agner - 2013-04-01

check if all elements of a vector are zero

Author:

Date: 2013-03-21 11:33

I have implement code to draw the Mandelbrot set using your vectorclass. I use Vec8f to operate on 8 pixels at once and I get significant speed ups doing this. In order to do this create a mask each iteration and I leave the loop when every element of the mask is zero. The way I did this first was using horizontal_add(mask)==0. However, I get better results using the intrinsic _mm_movemask_epi8(mask). Do you have a better recommendation than this?

Vec8i MandelbrotCalculate_vec8(const Vec8f x0, const Vec8f y0, const int maxiter, const float radius {

	// iterates z = z + c until |z| >= 2 or maxiter is reached,

	// returns the number of iterations.

	Vec8f x = x0;

	Vec8f y = y0;

	Vec8i vn = 0;

	for(int n = 0; n<maxiter; n++) {

		Vec8i mask = (x*x + y*y) <= radius;

		//if(horizontal_add(mask)==0) break;

		int test = _mm_movemask_epi8(mask.get_low()) + _mm_movemask_epi8(mask.get_high());

		if(test==0) break;

		vn += mask;

		Vec8f xtemp = x*x - y*y + x0;

		y = 2*x*y + y0;

		x = xtemp;

	}

	return -1*vn;

 }

Reply To This Message

check if all elements of a vector are zero

Author: Agner	Date: 2013-03-21 13:09
Good idea. Thank you. Your method for checking if a vector is zero will not work if there is a negative zero.

Reply To This Message

mathematical library functions

Author:

Date: 2013-03-27 09:51

I have generalized my Mandelbrot set code using templates to now work on float, double, Vec4f, Vec8f, Vec4d, and Vec2d. I create the colors using log, sin, and cos functions using "vectorclass_extra.h". For the standard math libraries everything works fine. I'm using Visual Studio 2012 and compiling in 64bit on a Intel Sandy-Bridge 2600K.
I test SSE and AVX. My code runs over 20x faster using Vec8f and four cores (OpenMP) than single float on one core.

I downloaded AMD libm developer.amd.com/tools/cpu-development/libm/. This is a bit faster on my Intel CPU but the images appear wrong for Vec4d and Vec2d. SSE or AVX does not matter. There appears to be a problem for double floating point.

I downloaded an evaluation version of Intel Parallel Studios and linked in svml_dispmt.lib and libircmt.lib. For SSE this draws stripes down the screen. For AVX it shows the image but it's note entirely correct. I have not tested using the GCC compiler on Linux yet. SVML has the fastest results but since the results are all wrong it's not very useful.

So currently I get the best results using LIBM for Vec8f and the standard math libraries for Vec4d.

Have you had similar experiences with these libraries? Do you have any other suggestions on how to use them?

Reply To This Message

mathematical library functions

Author:

Date: 2013-03-30 16:30

I found a bug in the LibM branch of vectormath.h (VECTORMATH = 1) for the Vec2d cos function.

static inline Vec2d cos (Vec2d const & x) //cosine
return amd_vrd2_sin(x);
}
It should be amd_vrd2_cos(x). When I change it the LibM library works for double floating point as well now.

In other news, I tried my Mandelbrot code on Linux. I installed the free non-commercial Intel C++ compiler for Linux. SVML works fine for me on Linux and is faster than LibM. I still have not got it to work on Windows.

In case your interest, here is class I found which implements exp, log, sin, cos, and sincos with AVX. It gives me the fastest results for my Mandelbrot code. Even faster than SVML. It only works for Vec8f though and does not work on Visual Studio. I may try and generalize for Vec8f, Vec4f, and Vec2d for GCC and Visual Studio.
software-lisc.fbk.eu/avx_mathfun/

Reply To This Message

mathematical library functions

Author: Agner	Date: 2013-04-01 05:56
chad wrote: I found a bug in the LibM branch of vectormath.h Thank you. I have fixed the bug now.

Reply To This Message