Thats not totally true. K8-K10 cores can do 3 register additions, substractions, shifts, moves per cycle while bulldozer can do only two. Now in K8-k10 AGU was fused with ALU (this is the reason why out-of-order load/stores where imposible on those architectures) but stil K8-K10 were able to execute more then 2 (2.7) arithmetic instruction with memory operand per cycle.
Here is the throughput table:
http://gmplib.org/~tege/x86-timing.pdf
Bookmarks