Quote Originally Posted by AliG View Post
correct there are 3 full integer operations in k8 and on, that can do either ALU or AGU,
No, it is not "either" it is both ... what do you not understand in the quote of Hans' article ?
but as I understand it is more efficient due to improved prefetchers and smaller die sizes to use a 2+2 simplified design
That is correct, the current IPCs of usual code is around 1, I think Nehalem achievs 1.5-1.7 in best cases, thus: 2 pipes are enough

Quote Originally Posted by informal View Post
Yes ,but at the back end the Macro ops are retired and K8/10h can do 3 of those while each Bulldozer integer core can do 4. That is 33% difference.
Yes you are right, but I never said anything against that point ;-)
Maybe one note on that, because I red it earlier: The AGU results are not retired, they go immediately into the LD/STR units, so the waiting ľOp can get its mem-data ;-) Later, after the calculation of the ľOp is finished, that ľOp is retired.
So in short the retire / ExU ratio is 1:2 for both, not 1:3. For K10 it's (3:6) and for BD it's (4:8).