AMD's Bobcat and Bulldozer

**informal** · 08-30-2010, 05:03 PM

Originally Posted by Motiv

Being an absolute noob, could someone explain this to me.

How many pipelines are on the P2 (x4 for arguments sake), how do they feed the ALU & AGU normally.

To me it looks like bulldozer has cut down by 1 ALU&AGU per 'core'.

Agner Fog's microarchitecture.pdf is a good place to start.It has a part where it tries to identify the bottlenecks in every major x86 design today,so there is 10h(or wrongly called K10). Essentially 10h can in theory do a massive of 9(nine) "micro ops"* but retire only 3 "macro ops"** . There is a bottleneck in the retirement part of the design(but the utilization of 9 units can't be effectively measured in real world as the document says;it is clear that some of the time exec. units are underutilized ,especially 3rd AGU which is redundant due to 2 ports to L1D cache).

*macro op is split into these micro instructions and then sent to execution units
**macro op is an instruction the decoder deals with;1 x86 instruction typically = 1 or 2 macro ops

edit:
continued on to Bulldozer
Front end can take up 4 x86 instructions(can't tell what is the relation to the RISC like macro ops in 10h decoder stage) and dispatch it in 2 groups of 4(macro ops?). Each integer core can do 4 instructions (2 arithmetic and 2 address,but the Agen unit can maybe do some math work too ). Still a lot is unknown so we can't say what else is in there and how AMD organized it.At least not until launch .

Thread: AMD's Bobcat and Bulldozer

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions