AVX-512 stand-alone C code for neural nets

News and research about CPU microarchitecture and software optimization
Post Reply
37ef.ced3
Posts: 1
Joined: 2020-11-29, 21:47:37

AVX-512 stand-alone C code for neural nets

Post by 37ef.ced3 » 2020-11-29, 22:01:55

Open-source program generates fully AVX-512 vectorized, human-readable, stand-alone C implementations of convolutional neural nets. An example of AVX-512 programming using GCC's AVX-512 intrinsics: https://NN-512.com

robinkyle11
Posts: 1
Joined: 2021-08-24, 19:43:16

Re: AVX-512 stand-alone C code for neural nets

Post by robinkyle11 » 2021-08-24, 19:49:05

NN-512 is an open-source Go program that generates fully AVX-512 vectorized, human-readable, stand-alone C implementations of convolutional neural nets
The generated C code is an example of AVX-512 programming using GCC's AVX-512 intrinsics. AVX-512 is exciting because its use of masking simplifies edge cases (partial loads, partial stores, etc.), there are 32 wide vector registers, and really excellent shuffle/permutation instructions (in particular, the two-input permute by var). Recent versions of GCC produce very good object code from C intrinsics

The goal of NN-512 is efficient neural net inference on inexpensive, CPU-only cloud instances. For example, a Skylake-X cloud compute instance costs $10 per CPU-core per month at Vultr, and the NN-512 generated code does about 18 DenseNet121 inferences per CPU-core per second (in series, not batched)

As AVX-512 becomes better supported by Intel and AMD chips, it becomes more attractive as an alternative to expensive GPU instances for workloads with small amounts of inference mixed with other computation

Post Reply