Quote Originally Posted by Mechromancer View Post
Bulldozer may very well not have CMT, but the gen after or even refreshes, (like K8 to K10) may. One thing that may be safe to assume is Bulldozer having a much higher IPC than anything today. Lots of cores with high IPC per core is just fine and doesn't lend itself to the weaknesses of certain multi-threading implementations. I still can't imagine what "commonly held beliefs" JF is talking about. It better be exciting as !
And how exactly do you get that "high IPC" ? K10 and Core 2 barely struggle to get 1 IPC in average, Nehalem does slightly better ( think 1.2 ) while it has the most advanced prefetchers in the world by far ( thanks to Netburst ). That is less than 1/3 of what is possible.

The reason ? x86. It simply lacks the ingredients for allowing high IPC to be extracted. This is way Netburst was designed in the 1st place. You can improve perfomance in 2 ways :
-increase IPC
-increase frequency

Since IPC is so hard to get, Intel decided let's try to reach a frequency as high as possible. Of course, discovering new territories back in the late '90s with 180/130/90nm proved quite a challenge. Without a thermal limit, they would have probably reached Power 6 like frequencies ( 5-6GHz ).

This was one of the paths they've taken. The other was EPIC, a completely new instruction set, designed with parallelism from the ground up. We're talking of an instruction set that tells the processor everything it needs to know because the optimizations happen at compile time ( the compiler inserts hints in the code flow : process this than that,etc ). In x86 the CPU needs to find the IPC at run time; well, most of the time you're SOL.
Itanium is able to get 4-5 IPC, but it lacks frequency by being so wide.

Basically it boils down to : if you chase IPC you lose frequency because of the complexity ( prefetchers,decoders,run-ahead,scouts); if you chase frequency you can't have a high IPC.
The solution is in the middle : K8/10,Core 2/Nehalem. To expect something revolutionary out, like very high IPC in x86, is simply wishing for pigs to fly. While some do try, most fail.

Why do you think Bulldozer was delayed some 2+ years ? Where are they now compared with the original expectations ? Could it be that Bulldozer is more Niagara like, lots of small, simpler cores with accelerators + GPUs for hard crunching ( FP ) ? That's the path everyone seems to be taking; not the uber Alpha EV8 like cores.