Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

 
thread Optimization manuals updated - Agner Fog - 2009-09-27
replythread Optimization manuals updated - Marko Rekston - 2009-09-28
reply Optimization manuals updated - Agner Fog - 2009-09-29
last replythread Optimization manuals updated - Mohamed Sheikhaldeen - 2009-10-02
last reply Optimization manuals updated - Agner Fog - 2009-10-02
replythread Optimization manuals updated - Giovanni - 2009-12-06
last reply Optimization manuals updated - Agner Fog - 2009-12-06
reply Optimization manuals updated - Agner Fog - 2010-02-16
reply Optimization manuals updated - Agner Fog - 2010-09-25
reply Optimization manuals updated - Bobcat tested - Agner - 2011-06-08
last reply Optimization manuals updated - Agner - 2012-03-02
 
Optimization manuals updated
Author: Agner Fog Date: 2009-09-27 11:24
My optimization manuals have now been updated.

I have included tests of the Intel Core i7 processor and the lightweight processors Intel Atom and VIA Nano. Furthermore, there are many small additions to all five manuals

www.agner.org/optimize/#manuals

   
Optimization manuals updated
Author: Marko Rekston Date: 2009-09-28 15:07
Thank you Doktor Agner. I always follow you advice. Your manuals great for new to medium level programmr such as me. And some topics most important stuff like unions, enums, cpu dispatcing.


But i want to more cover on dynamic cache allocation( on msvc++ libs _alloca function ) and cache splits problem. Many software developers unable to cope with cache splitting.

Thanks. Good day.
Agner Fog wrote:

My optimization manuals have now been updated.

I have included tests of the Intel Core i7 processor and the lightweight processors Intel Atom and VIA Nano. Furthermore, there are many small additions to all five manuals

www.agner.org/optimize/#manuals

   
Optimization manuals updated
Author: Agner Fog Date: 2009-09-29 01:47
You are right that cache use is difficult to optimize.
The _alloca method is described in chapter 8.6 "Dynamic memory allocation" of my C++ manual. I can recommend this method where it is possible. I have made some examples of container classes that use contiguous memory in order to optimize cache use. You can find them in www.agner.org/optimize/cppexamples.zip. These container classes are useful as an alternative to the standard template library (STL).
   
Optimization manuals updated
Author:  Date: 2009-10-02 09:04
Hi,
Killer! thanks for the updates! those have been a refrenece for me for years.
can't think of an addition, dont have the enough expertise to suggest any frankly

p.s.: considering any "GPU" guide any time soon?

   
Optimization manuals updated
Author: Agner Fog Date: 2009-10-02 09:44
Mohamed Sheikhaldeen wrote:
considering any "GPU" guide any time soon?
A GPU guide would be very useful since people are putting more and more work into graphics processing units (GPUs) instead of CPUs - even non-graphic calculations. However, I am not going to make a GPU guide. Somebody else will have to do that.
   
Optimization manuals updated
Author:  Date: 2009-12-06 07:40
Hi Doctor Agner,

I've been reading your excellent optimization manuals for years, but I've just discovered your blog. I have a small request; every time you publish a new version of your manuals, I eagerly download them for reading, but it is hard to know what is new in every release: could you publish a small changelog in every manual so that one can directly go to new or changed sections?

Thanks,

-Giovanni Deretta

   
Optimization manuals updated
Author: Agner Fog Date: 2009-12-06 10:40
Giovanni wrote:
could you publish a small changelog in every manual so that one can directly go to new or changed sections?
Several people have asked for this. However, a revision log would be very long because there are lots of small changes, such as a typo fixed, a number added to a table, adding small details, etc.

It might be possible to compare the files and track changes using Compare Suite but I have not tried it.

The most important additions in the latest version are:

  • New processors tested: VIA Nano, Intel Atom, Intel Core i7
  • More details about the forthcoming AVX instruction set
  • Discussion of hyperthreading
  • Metaprogramming
   
Optimization manuals updated
Author: Agner Fog Date: 2010-02-16 08:05
A new generation of the VIA Nano processor is going to be released soon. I have tested a prerelease sample of a VIA Nano L3050. The results are in my microarchitecture manual and in the instruction tables.

The chapter on CPU dispatching in the C++ manual has been rewritten and expanded.

There are minor changes in the other manuals.

The updated manuals are, as always, at www.agner.org/optimize/#manuals

More comments on www.agner.org/optimize/blog/read.php?i=88

   
Optimization manuals updated
Author: Agner Fog Date: 2010-09-25 10:26
My optimization manuals have been updated. There are more advices on CPU dispatching in the C++ manual. There are minor changes an updates in the other manuals.

The updated manuals are at www.agner.org/optimize/#manuals

   
Optimization manuals updated - Bobcat tested
Author: Agner Date: 2011-06-08 11:01
Today, I have updated my manuals. The most important changes are:

  • AMD Bobcat tested and described

  • Short discussion of optimizing C/C++ code in embedded systems and small microprocessors

  • More recommendations on CPU dispatch strategies

  • Updated calling conventions for 256-bit vector parameters

  • Description of the instruction decoder in Sandy Bridge corrected

   
Optimization manuals updated
Author: Agner Date: 2012-03-02 06:52

The next update to the manuals is finally here. The most important additions are:

  • Test results for AMD Bulldozer (manual 2, 3, 4).
      
  • Shared objects in Unix systems are inefficient because of position-independent code, symbol interposition, global offset table (GOT) and procedure linkage table (PLT). All references to public symbols (and even some references to local symbols) require a table lookup, which is a complete waste of time if we don't need the symbol interposition feature, which we rarely do. The updated C++ manual gives advice on how avoid these time-consuming complications for shared objects (*.so) in Linux, BSD and Mac systems (manual 1).
      
  • Updated advice on memory copying, with description of false memory dependence (manual 2).
      
  • Methods for integer division by a constant revised, with more references and support in the asmlib library (manual 2).
      
  • Chapter on vectorization updated and revised, including AVX (manual 1).
      
  • SSE4.2 string instructions (manual 2).
      
  • Small modifications in chapter on multithreading (manual 1).
     
  • The instruction tables are now available both as a .pdf file (link) and as a spreadsheet (link) because several people have requested this (manual 4).