No, the front is vertical multi threaded, i.e. every clock 4 decoders to one thread, in the next clock to the other. If the 2nd thread does not decode anything, obviously the other thread can have the front end longer than 1 clock cycle ;-)
bulldozer4b.jpg
How do you count to 5 now? Do you include the Macro Op Fusion, too? There are only 3fastpath plus 1 complex decoder in Intel's design. Officially they count 4:
32_m.png
Anyways, MacroOpFusion is used with Bulldozer, too now, so you have to count 5 for AMD, too (however in less cases, AMD's fusion is on the Conroe's level, Nehalem got more fusion capabilities, not sure about Sandy now.)
As said above, each thread has 4 or if you count Fusion then 5 decoders. How is intel running Hyperthreading on the 3+1 decoders? Each thread gets 4 decoders, so 8 total? That would be new to me and intel. If intel does it in another way than AMD, then they have to run both threads simultaneously. However, that would mean "only" 2 decoders for each thread, and that's exactly the baaaad case you wrote about above in your incorrect statement about AMD's decoder in the beginning.
Discussion is always fine, however in the above case, I assume you are rather wrong.
cheers
Opteron
Bookmarks