Originally Posted by
Solus Corvus
This is quite a fascinating architecture. If that RWT article is accurate then I am extremely interested in seeing some benchmarks.
I don't buy that overall per-core IPC must necessarily decrease (in relation to K10) because of reduced interger ALUs. Of course they will obviously miss out, compared to a 3 or 4 ALU core, on cases where int ILP is greater then 2. But in cases where the code is more mixed int and memory ops, IPC could go up in relation to K10 - based on available execution resources alone. Which case is more common obviously depends on the specific code being ran. Though I'd suggest that a program with consistently high integer ILP would be more efficient using packed integers (handled by the FPU) anyway.
If we add to that the fact that missed branches and cache misses (both significantly improved in BD) have a much greater effect on overall IPC than some missed ILP cases, it's clear that claiming lower IPC than K10 isn't really justified based on fewer ALUs alone. I doubt that BD will have lower IPC per-core than K10. In reality it's probably somewhere in the vast gulf between PII and SB.
As already noted though, IPC isn't the only factor in a processor's performance. This is obviously a high frequency design. The memory and cache subsystems are a big leap forward for AMD. They are designed to keep a large number of cores well fed - to minimize the amount of time that execution resources are waiting on data and thus increase efficiency. Intel will probably continue to lead in IPC by a significant margin. Whether AMD can increase frequency enough to make single threaded performance competitive remains to be seen. On the multi-threaded side BD sounds like a monster.
If AMD can't match Intel's single threaded performance it looks like we will have a split market come 2011. Office users and gamers might do best with SB while people doing encoding, folding, heavy multitasking, HPC, and servers might do best with BD.