Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog |

single message Gnu support for CPU dispatching - sort of... - Agner - 2011-07-08
Gnu support for CPU dispatching - sort of...
Author: Agner Date: 2011-07-08 06:04
The Gnu tools have added a new feature for automatic CPU dispatching. This means that you can have multiple versions of the same function, each optimized for a different CPU or a different instruction set. For example, you may want to have three different versions of an important library function: one that is compatible with any old CPU, a better one for CPUs with SSE2, and a still better one for CPUs with the AVX instruction set.

This feature, called "Gnu indirect function" was introduced two years ago (link). Since then, I have waited impatiently for an implementation that works. Now I have discovered that this feature is actually used for a few functions in the standard function library (glibc v. 2.13). The official documentation (link) says that you can use __attribute__ ((ifunc("name_of_dispatch_function"))), but this doesn't work.

After some experimentation, I found that the method shown below actually works:

// Example of Gnu indirect function
#include <stdio.h>
#include <time.h>

// Define different versions of my function
int myfunc1() {
   return 1;

int myfunc2() {
   return 2;

// Prototype for the common entry point
extern "C" int myfunc();
__asm__ (".type myfunc, @gnu_indirect_function");

// Make the dispatcher function. This returns a pointer to the desired function version
typeof(myfunc) * myfunc_dispatch (void) __asm__ ("myfunc");
typeof(myfunc) * myfunc_dispatch (void)  {

   if (time(0) & 1) {
      // If time is odd at first call, use version 1
      return &myfunc1;
   else {
      // else use version 2
      return &myfunc2;

int main() {
   // Test the call to myfunc
   printf("\nCalled function number %i\n", myfunc());
   return 0;
The function call is resolved via the normal procedure linkage table (PLT). The PLT entry is changed to point to the desired version of the function, either at load time or at the first call. The PLT initially points to myfunc_dispatch. This function is called only once, and the return value from myfunc_dispatch replaces its own entry in the PLT.

The "Gnu indirect function" feature requires support in the assembler, linker and loader, which is found in binutils version 2.20 and later.

The Gnu standard library glibc uses this feature to implement multiple versions of a few memory and string functions, including memmove, memset, memcmp, strcmp, strstr, but - strangely - not the most important one: memcpy.

This feature can be useful for anybody who wants to make a highly optimized function library for Linux. It is not possible in Windows, but it may be implemented in BSD and Mac systems. See my manual Optimizing software in C++ for a method that works on all platforms.