Quote Originally Posted by Solus Corvus View Post
I didn't say it was a problem. Sometimes you really can do more with less. If they kept the extra ALU or even added more to satisfy certain high ILP code (ie, not average) then they would just be wasting power and transistors that could have been, and were, spent elsewhere (more cores).
Maybe in rare cases, but in average ILP is much more dependent by MLP than how much ALU units you have.

Quote Originally Posted by Apokalipse View Post
Right. I mean, it wouldn't be that hard to increase IPC by 25 or 30% from K10.5 But it would be very hard to do it without using a lot of transistors, using a lot of power and/or sacrificing a lot of frequency.
Instead of increasing IPC in any way possible, you'd want to increase IPC in inexpensive ways, and try not to spend transistor budget in areas that won't get much use.
Increasing ILP (or IPC) is more important for execution of non-paralel code (singlethread), but for high paralel code is more important how much cores you have and how much power they dissipate. That is the main paradigm of Bulldozer CMT - good single thread ILP with usage of all shared resources, and average multithread ILP with lot of low power cores. If you have a fat CPU core, you can add a little than you have two tiny cores. Because of that, BD module acts like one fat CPU core or two tiny with power budget of one fat core.