Quote Originally Posted by informal View Post
AMD invested heavily in both instruction and data prefetch which are now order of magnitude better than in 10h family.
Do you have a source for that? Not that I find it impossible, but I like to work with verified info.

On SMT, I suspect Bulldozer doesn't really get all the benefits from it. SMT is used to justify creating a very broad execution engine, which would be wanted to maximise single threaded performance. It might be so (and of course we can't be sure) that AMD had to do concessions in this regard (e.g. for efficiency reasons - execution units do take a lot of energy if kept running despite their small size, right? It's a bit doubtful that AMD chose to implement gating parts of a module.). If that's the case it may be 'smoother', and will rock for heavily threaded apps, but a full-on SMT approach might still have been better for single threaded apps.

I do believe AMD made the right choice, by the way - the module is probably better suited for server workloads. And regarding the client space, I suspect that especially in the future most times the CPU is a bottleneck will be in multithreaded situations.

And of course they needn't be twice as fast. Since it's probably about 40 % larger than a SB quad though, it would be nice if they could sell it for at least some 50% more (given lower yields et al), to have similar margins. The "more cores == higher single threaded frequency" thought is a nice one though.

Anyway, I was just commenting on the graph. I'd find it fairly absurd that in billions upon billions of dollars worth of research, Intel missed just that one thing that could double their performance per mm^2. :P Just to express my skepticism about the graph - not so much the Bulldozer architecture.