Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

 
thread Random number generator with vector output - Agner - 2014-10-17
replythread Random number generator with vector output - Just_Coder - 2014-10-17
last reply Random number generator with vector output - Agner - 2014-10-17
last replythread Random number generator with vector output - Yan Zhou - 2014-10-22
last replythread Random number generator with vector output - Agner - 2014-10-22
last replythread Random number generator with vector output - Yan Zhou - 2014-10-22
last reply Random number generator with vector output - Agner - 2014-10-23
 
Random number generator with vector output
Author: Agner Date: 2014-10-17 03:28
I have added a new random number generator to the vector class library. It is optimized for the vector processing capabilities and multiprocessing capabilities of the newest microprocessors, including the forthcoming AVX512 instruction set. It can generate pseudo-random integers and floating point numbers as scalars and vectors. The quality and precision is high. It is intended for large scale simulations and other Monte Carlo applications. The theory is explained in a separate article.
www.agner.org/optimize/#vectorclass
   
Random number generator with vector output
Author: Just_Coder Date: 2014-10-17 04:44
Are there any detailed benchmarks for the library ?
   
Random number generator with vector output
Author: Agner Date: 2014-10-17 06:35
Just_Coder wrote:
Are there any detailed benchmarks for the library ?
I have some timing results for the random number generator with different instruction sets and different compilers in this theoretical article. The Microsoft compiler gives poor performance, Gnu, Clang and Intel compilers give good performance.
   
Random number generator with vector output
Author:  Date: 2014-10-22 09:22
Have you heard a library called Random123. The paper by the author is at the following link: dl.acm.org/authorize?6509660 which I recall won the best paper at the SC11 conference. The library can be found at www.deshawresearch.com/resources_random123.html

It is a library for parallel random number generating. Unlike conventional PRNG, their generators are based on Cryptographic ideas of using the bijection y_i = f(x_i) where x_i's are only need to be different, such as 1, 2, 3,..., and y_i will appear to be random. All RNG discussed in there paper are Crush resistance (the BigCrush to be specific).

Is there any real advantage of your new library against theirs? The only one I can think of is that some PRNG has a much longer period. Theirs, on the other hand can easily provides 2^256 independent streams, each with 2^256 period, which is more than enough for most situations except the most demanding ones.

Performance-wise, generators in their library usually are ~10cpB on a Haswell CPU. The performance results included in their paper are faster, but were obtained in more idealized situation. I managed to re implement their algorithms and reduced the cpB to ~2 for most algorithms (I was only able to do it after reading your optimization manual, many thanks). In some cases their algorithm can be made to perform even better, < 1cpB (on average), but at the cost of larger space requirement (but the state size of the RNG are still considerably smaller than conventional PRNG such as MT19937).

   
Random number generator with vector output
Author: Agner Date: 2014-10-22 10:49
Yan Zhou wrote:
Have you heard a library called Random123.
Yes, I am referring to it in the theoretical article www.agner.org/random/theory/randomvector.pdf

The AES instruction doesn't come in vector sizes bigger than 128 bits. That's why I am not using it.

   
Random number generator with vector output
Author:  Date: 2014-10-22 19:11
Agner wrote:
The AES instruction doesn't come in vector sizes bigger than 128 bits. That's why I am not using it.
Is that a big issue? The AES instruction has a latency of 7 on Sandy bridge and later, but can be started every 1 cycle. Therefore by generating random numbers in 8 blocks, the theoretical peak performance is 0.64 cpB (though in practice I only managed 0.8cpB per core), which I believe is better than most RNG out there. I have only saw Intel MKL's SFMT19937 and MT2203 being better (0.5cpB, the last time tested them).

Anyway, it is indeed a shame that the newer VEX encoded AES instructions only work on the lower part XMM of YMM

   
Random number generator with vector output
Author: Agner Date: 2014-10-23 00:39
There is no good theoretical analysis of how random the output of AES is. I would combine it with another generator with extremely long cycle length anyway, for reasons described in my theory article. There is also a problem with portability to platforms that don't have AES.