Agreed, I hope Bulldozer is the same. One thing we can all agree on: the more competition the better it is for the consumer.
That figure appears to come from an AMD chart focusing on one projection, "Floating Point performance".
It shows MC at "28", and Interlagos at "43" (at max, the line fades starting around 40). 43% to 53% improvement.
But given that Interlagos/BD has AVX (including FMA), this really isn't all that impressive, is it?
"Integer performance" on the same chart goes from "29" to "36-38.5" (from start fade to end fade)
That's a 24% to 33% improvement for 33% more cores.
Hmmm, doesn't really look like much single-threaded improvement there.
In sum:
From -7% to 0% performance loss per core on "Integer"
From 7.5% to 15% performance gain per core on "Floating point" -- and that's with AVX!
Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"
Other than AVX helping FP perf, somewhat less than expected, I don't see any single-threaded gains from MC --> BD, not based on these performance projections, anyhow.
About Bulldozer and single thread performance though, it seems AMD is going to take a page from Intel by employing an even more aggressive core boosting strategy (ala SB) in single thread scenarios. Which means the focus may not be so much on ipc tweaks, but rather ultra high core frequency boosting, a scenario that would be greatly helped by power-gating to ensure the chip stays within its tdp limits.
http://www.anandtech.com/show/2879/3
A 16 core (8 modules) Bulldozer based CPU is 60 to 80% faster than a 12 core Magny Cours in integer performance when using SPECInt_rate as a benchmark according to Anandtech.
I'm talking about design philosophy. You're comparing a product that reached 3.4GHz in 130nm with Nehalem which reaches 3.5 with 45nm.
It's not relevant the actual speed P4 reached ( not to mention it was held back in 65nm ), it's a product designed in the late '90s. Nehalem was done by the same time that did Netburst and it took 6 years; obviously, all they've learned with Netburst was put to good use.
And to sum it up : my point is that you can either aim for frequency or for IPC. The middle path are designs like Core/I7/K10 and possibly Bulldozer. I don't expect any wonders from either be it in IPC or frequency.
Even so, it has a lower frequency, thus it has other bottlenecks in the design.
Nothing new here; Pentium 4 did this back in 2000. The integer core was clocked 2x the core clock. Ultimately it ended badly, altough they tried every technique, low swing circuits, domino logic, running something at 6-8GHz means a lot of power used and performance per watt is poor.Quote:
You could also go for a more hybrid approach, like double clocking those parts in a core that make sense. Clock domains within a single core in other words. You could for example run the schedulers and execution units at double the clockspeed of the fetch and decode stage. I'm not saying they will, but it's another approach.
lol. intel copied amd by having LSU's!
integrated FPU, TLB, IMC, OoO, register renaming, superscalar pipeline, branch prediction, integrated L2 cache, L3 cache, among others are all things that intel did first (for x86) that amd has used.
on a more serious note intel doesnt even need to beat amd's uarch. they have the best process and circuit design teams in the world.
Well, I mean, what can I say? It is AMD's own chart from 6 months back. :rofl:
I'll stick with AMD's official figures over Johan's tidbits, especially when the latter date (and source) from the SAME time of the "5% extra die size for 80% performance gain" nonsense that was later clarified.
Unless JF-AMD wants to reiterate any performance claim that is different from what that AMD slide shows?
JF, is Interlagos really 60-80% better than MC (12-core) in specInt_rate, despite the AMD slide showing "integer performance" is only 24-33% better?
Or was your comment to Johan in error?
You know, I've been looking more at that chart from AMD.
I've got a very close fit for the Y-axis:
**** The base results for specInt_rate and specFP_rate, for 2-socket systems with the noted processors, divided by 10. ****
Look them up. They are dead on for the 2009 "2435 Istanbul 2-socket", and very close for 2008 and 2007 as well.
Now, you say, but 2010, MC is better than this: the chart would give:
290 for int_rate (base), 280 for fp_rate for a 2-socket 2.3 MC system.
When we look we find: 309 int, 290 fp.
But recall that JF likes to say that they over-delivered with MC vs what was promised... so I think this is ok.
If I got it right, this chart from AMD calls for
=================================
Interlagos top-bin 2-socket system:
SpecInt_rate(base): 360-390
SpecFP_rate(base): 400-430
(lower numbers are where the fade starts, upper is end of bar)
=================================
The upper end would amount to an FP improvement of 48% (thank you, AVX), and integer is 390/309 = 26%.
Note that per-core, this is 148/133 = 11% better SpecFP_rate, but about 5% worse on SpecInt_rate.
It would make sense that these charts would be some form of SpecInt/FP rates, and base is easier to project than peak, and it must be 2-socket (or 1, but that doesn't make much sense) systems from the 2xx initial parts chosen.
--------------------
Anyhow, given that I now think this chart is giving spectInt/FP_rate projections, the Johan specInt_rate tidbit from JF is completely at odds with this chart, and as they were both put out at the same time... gotta think the chart stands unless JF wants to (re-)claim otherwise.
Johan got his information directly from AMD. The chart has fading bars and AMD(JF) already stated that only they know how high the bars actually go(that's the purpose of the fading btw,to not actually disclose the true perf. projection). You are reading waaaay to much into that chart,especially knowing that AMD couldn't possibly predict the clock speeds they would milk from the BD silicon at the time they made the chart. 60-80% uplift from MC is a good bet,but seeing how AMD delivered and over-delivered with Shanghai,Istanbul and especially MC,you can bet they will do all they can to over-deliver with BD when it launches.
edit:
a question : why are you so obsessed with AMD,BD perfromance/tapeout and 2011? Any chance you're an intel shareholder ?
Johan got his info from JF, at the same time of the 5%-die-size thing, the same day (or so) that AMD released this chart. Hence the question to JF.
***EDIT: Could it be that it was just a misread? What *is* 80% better on SpecInt_rate (per that chart) is MC over Istanbul. (rather than Interlagos over MC, which the chart shows at 35%)
AMD can make a reasonable stab at Interlagos clocks... remember that power is what is really gating things here. (more so than with a Zambezi 1-die part) But I agree there's a bin or so of "not sure", which is why the bars fade out.
I'm sure AMD will try to over-deliver, my point is merely that the chart shows BD relative to a slightly-worse-than-reality version of MC.
The numbers are interesting:
With int_rate, both Nehalem-EX and Westmere are already at the low-end of BD's projected range, so I think Westmere-EX (25% core increase, higher clocks), and also SB (33% core increase, new arch, higher mem bandwidth) will have no trouble maintaining dominance here.
With fp_rate, Intel has a lot further to go to catch a (2-socket) 400-430 SpecFP_rate(base). But presumably this is where AVX comes in, as well as more cores/bandwidth.
For single-to-low-threaded stuff, I expect Intel will win across the board, probably substantially.
edit: obsessed? Isn't the whole point of these boards/threads speculation? Some people find it fun, you know. ;) It's a challenge trying to decode these AMD performance projection slides, but the results can be informative, no?
anecdote:
back then their CEO was craig barret and he used to be an engineer for intel who started working for them in the 70's. he was on the materials side of things so he pushed process over arch. netburst uarch was a really bad idea from the start. even researchers then new about future power issues. ever since presscot/tejas intel has focused on making a good uarch.
my point is that intel will almost always be ahead in process/physical design. amd can match or beat them in uarch but the only realistic way intel would lose is to fall behind in uarch (a la netburst). and fwiw hand optimized circuits can be up to 7x more power efficient over a synthesized counterpart.
At the end of the day who ever can encode my video faster wins. Currently this is Intel and has been for several years now.
And if one person responds to this and mentions something like badaboom I'm going to :banana::banana::banana::banana:! :down:
No, you are completely wrong here.
PLEASE DELETE THAT CHART.
I have explained several times:
1. The chart was drawn in powerpoint, the chart was not done in excel where you would have exact numbers
2. The chart uses a fade to purposely hide the actual performance estimates because we were not making actual estimates at the time.
Anyone that obsesses about that chart would also notice that the Magny Cours performance increase over Istanbul was also underestimated.
If you want to refer to any performance estimate for bulldozer, there is one official one: 50% greater total throughput than Magny Cours.
We won't be saying anything else for the forseeable future.
Any other guess, no matter how complicated the math or methodology, will be wrong.