Optimization manuals updated
Author: Agner Date: 2012-03-02 06:52

The next update to the manuals is finally here. The most important additions are:

  • Test results for AMD Bulldozer (manual 2, 3, 4).
  • Shared objects in Unix systems are inefficient because of position-independent code, symbol interposition, global offset table (GOT) and procedure linkage table (PLT). All references to public symbols (and even some references to local symbols) require a table lookup, which is a complete waste of time if we don't need the symbol interposition feature, which we rarely do. The updated C++ manual gives advice on how avoid these time-consuming complications for shared objects (*.so) in Linux, BSD and Mac systems (manual 1).
  • Updated advice on memory copying, with description of false memory dependence (manual 2).
  • Methods for integer division by a constant revised, with more references and support in the asmlib library (manual 2).
  • Chapter on vectorization updated and revised, including AVX (manual 1).
  • SSE4.2 string instructions (manual 2).
  • Small modifications in chapter on multithreading (manual 1).
  • The instruction tables are now available both as a .pdf file (link) and as a spreadsheet (link) because several people have requested this (manual 4).
