Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for AMD Bulldozer processor
Author:  Date: 2012-03-13 12:43
Hi Agner,
I'd like to know your opinion about few things I was thinking about BD's architecture:

* What do you think of BD's AGU not being able to issue LS-related instructions like mov r/m? i.e. K10 could issue memory instructions in AGU, whereas BD cannot - and PD the same (just mov r/r and such takes AGU path for renaming, i think). From BD manual, almost no instruction ends up alone in AGU (contrary to K10). it seems to me they moved toward having a fixed max 2 instr throughtput/core, a huge stepdown from previous (ideal) 6. Considering that MOVs are everywhere and decode fast, it seems to me a huge limit to overall IPC.
* The split L2 cache access - do you think they'd do better using a contention mechanism for the whole cache, instead of splitting its access in half?
* Do you think AMD will add a trace cache to fix the bad dual-core decoder throughput like intel did? I cant figure a fix for that (decoding 6 instructions would not work, making two x-1/x-1 decoders would double the first instr. decoder).
* What do you think about the L1D WT choice with higher latency (coupled with a WCC halfaway the L2)? Does it impact much the speed for you?

On a last note: I was thinking of BD's IPC - 2ALU+2ALU(+2 FPU but they share LS with ALU..). SB could sustain 4 instr /cycle in loops thanks to the TC, but the BD decoder would likely trounce the IPC to 1,x/core no? Is it the shared decoder the bigger stopper for BD, or the reworked AGU?
Do you think if AMD reworks the front-end for getting a near 2 instr/cycle/core, it will still lack without the ability to parallelize MOVs?

Thanks,
Massimo


Hi Agner, I've seen you updated the instruction table - and it seems different from AMD one! So MOV r/m is issued in AGU... but mov m/r is not???
 
thread Test results for AMD Bulldozer processor new - Agner - 2012-03-02
replythread Test results for AMD Bulldozer processor - Massimo - 2012-03-13
reply Test results for AMD Bulldozer processor new - Agner - 2012-03-14
last reply Test results for AMD Bulldozer processor new - Alex - 2012-03-14
replythread Test results for AMD Bulldozer processor new - fellix - 2012-03-15
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-16
last replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-16
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-17
reply Test results for AMD Bulldozer processor new - avk - 2012-03-17
last replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-17
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-17
last replythread Test results for AMD Bulldozer processor new - Massimo - 2012-03-20
last replythread Test results for AMD Bulldozer processor new - Agner - 2012-03-21
last reply Cache WT performance of the AMD Bulldozer CPU new - GordonBGood - 2012-06-05
reply Test results for AMD Bulldozer processor new - zan - 2012-04-03
replythread Multithreads load-store throughput for bulldozer new - A-11 - 2014-06-27
last replythread Multithreads load-store throughput for bulldozer new - Bigos - 2014-06-28
last reply Multithreads load-store throughput for bulldozer new - A-11 - 2014-07-04
last reply Store forwarding stalls of piledriver new - A-11 - 2014-09-07