Everything can be found here:
http://www.hotchips.org/conference-a.../hot-chips-22/
It's a video from HC22.Mike Butler presented Bulldozer design and there was Q&A after the presentation.You can find all about aggressiveness(he couldn't emphasize it more) in the presentation and some of it in the Q&A.
SMT does zero for single threaded workloads so I don't know what you mean by this. In Bulldozer the shared front end is doing all the work for instruction dispatching and the integer schedulers do the actual work according to the dispatchers orders. FP unit in Bulldozer module is full on SMT approach due to latency tolerant nature of instructions it deals with.L2 is shared dynamically by 2 running threads(so SMT in essence as AMD describes it in one of the slides). Everything else is in the module is of vertical MT organization(switching back and forth between the threads).If that's the case it may be 'smoother', and will rock for heavily threaded apps, but a full-on SMT approach might still have been better for single threaded apps.
So while integer cores do have common L1 instruction cache and front end ,they are fully independent.They do can do opportunistic prefecth in the L2 and the data can be then used by either of the cores. FP unit is dedicated or shared. Dedicated it can assign a whole 256bit FMA to one core,or it is SMT organized and shared by 2 cores as 2 x 128bit FMA(being able to even do 2x ADD or 2x MUL,a feat not possible by any of today's x86 cores).
Intel is suffering from "not invented here" syndrome. They don't like using ideas that are developed by their competitors unless they really have to(think AMD64 ISA).They had some of the people behind "CMT" approach working in intel back in the day,ie Andy Glew,but they never really backed that idea up. Since Glew moved to AMD he presented them with the same concept.Coincidentally some of AMD folk were looking into the similar direction and they decided to pursue CMT approach ,roughly in 2005. 1 year before that Glew left AMD .Chuck Moore held a presentation back in 2005 describing CMT as the best choice for next gen. of multithreaded MPUs from perf./watt/mm^2 perspective.Also Fred Webber hinted back in 2005 where AMD was heading with their next gen of multithreaded CPUs.I do believe AMD made the right choice, by the way - the module is probably better suited for server workloads. And regarding the client space, I suspect that especially in the future most times the CPU is a bottleneck will be in multithreaded situations.
And of course they needn't be twice as fast.Since it's probably about 40 % larger than a SB quad though, it would be nice if they could sell it for at least some 50% more (given lower yields et al), to have similar margins. The "more cores == higher single threaded frequency" thought is a nice one though.
Anyway, I was just commenting on the graph. I'd find it fairly absurd that in billions upon billions of dollars worth of research, Intel missed just that one thing that could double their performance per mm^2. :P Just to express my skepticism about the graph - not so much the Bulldozer architecture.
Bookmarks