Agner Fog's microarchitecture.pdf is a good place to start.It has a part where it tries to identify the bottlenecks in every major x86 design today,so there is 10h(or wrongly called K10). Essentially 10h can in theory do a massive of 9(nine) "micro ops"* but retire only 3 "macro ops"** . There is a bottleneck in the retirement part of the design(but the utilization of 9 units can't be effectively measured in real world as the document says;it is clear that some of the time exec. units are underutilized ,especially 3rd AGU which is redundant due to 2 ports to L1D cache).
*macro op is split into these micro instructions and then sent to execution units
**macro op is an instruction the decoder deals with;1 x86 instruction typically = 1 or 2 macro ops
edit:
continued on to Bulldozer
Front end can take up 4 x86 instructions(can't tell what is the relation to the RISC like macro ops in 10h decoder stage) and dispatch it in 2 groups of 4(macro ops?). Each integer core can do 4 instructions (2 arithmetic and 2 address,but the Agen unit can maybe do some math work too ). Still a lot is unknown so we can't say what else is in there and how AMD organized it.At least not until launch .
Bookmarks