As Mechromancer said, everything is speculative. From what I've seen and read, I think that the Decoders would suit the needs of a clustered CMT back-end. They are 4-wide with the capability to decode microcoded (complex) and fast path instructions in parallel (for 2 threads). That means up to 8 Macro-Ops per clock (4 per thread, and I found indications, that they will stick to the Macro-Ops, i.e. groups of one ALU/FP op and one AGU op). Even with only fast path instructions this would result in 4 Macro-Ops per clock being distributed alternatingly or in a different pattern to the threads, while each cluster is only capable of executing 2 of these Macro-Ops at max.
Bookmarks