AMD to Disclose Details About Bulldozer Micro-Architecture in August

Printable View

Show 100 post(s) from this thread on one page

06-23-2010, 12:35 PM
Klarko

Quote:

Originally Posted by Mad Pistol

I think AMD is on the right track. They've posted Quarterly profits I think for the first time in years this past quarter (correct me if I'm wrong on that. they have made a profit though) so that shows that they're doing something right. Obviously, it's chump change compared to Intel's earnings, but they are doing fine. AMD/ATI's R&D budget is much smaller than Intel's or Nvidia's, so that means they're being smart about what they're doing. The Radeon cards this generation have been fantastic. Who's to say that Bulldozer won't be the same way?

Agreed, I hope Bulldozer is the same. One thing we can all agree on: the more competition the better it is for the consumer.
06-23-2010, 12:44 PM
terrace215

Quote:

Originally Posted by -Sweeper_

looks like they wont talk about clock speeds :)

150+% of the performance with 133% the number of cores

thats 12,7+% more performance per core, something easily achievable with higher clocks (magny cours works at a modest 2.3ghz)

That figure appears to come from an AMD chart focusing on one projection, "Floating Point performance".

It shows MC at "28", and Interlagos at "43" (at max, the line fades starting around 40). 43% to 53% improvement.

But given that Interlagos/BD has AVX (including FMA), this really isn't all that impressive, is it?

"Integer performance" on the same chart goes from "29" to "36-38.5" (from start fade to end fade)

That's a 24% to 33% improvement for 33% more cores.

Hmmm, doesn't really look like much single-threaded improvement there.

In sum:

From -7% to 0% performance loss per core on "Integer"
From 7.5% to 15% performance gain per core on "Floating point" -- and that's with AVX!

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

Other than AVX helping FP perf, somewhat less than expected, I don't see any single-threaded gains from MC --> BD, not based on these performance projections, anyhow.
06-23-2010, 12:48 PM
-Sweeper_

Quote:

Originally Posted by terrace215

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

:rofl: thats true
06-23-2010, 01:14 PM
OhNoes!

Quote:

Originally Posted by terrace215

That figure appears to come from an AMD chart focusing on one projection, "Floating Point performance".

It shows MC at "28", and Interlagos at "43" (at max, the line fades starting around 40). 43% to 53% improvement.

But given that Interlagos/BD has AVX (including FMA), this really isn't all that impressive, is it?

"Integer performance" on the same chart goes from "29" to "36-38.5" (from start fade to end fade)

That's a 24% to 33% improvement for 33% more cores.

Hmmm, doesn't really look like much single-threaded improvement there.

In sum:

From -7% to 0% performance loss per core on "Integer"
From 7.5% to 15% performance gain per core on "Floating point" -- and that's with AVX!

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

Other than AVX helping FP perf, somewhat less than expected, I don't see any single-threaded gains from MC --> BD, not based on these performance projections, anyhow.

About Bulldozer and single thread performance though, it seems AMD is going to take a page from Intel by employing an even more aggressive core boosting strategy (ala SB) in single thread scenarios. Which means the focus may not be so much on ipc tweaks, but rather ultra high core frequency boosting, a scenario that would be greatly helped by power-gating to ensure the chip stays within its tdp limits.
06-23-2010, 01:19 PM
Helmore

Quote:

Originally Posted by terrace215

That figure appears to come from an AMD chart focusing on one projection, "Floating Point performance".

It shows MC at "28", and Interlagos at "43" (at max, the line fades starting around 40). 43% to 53% improvement.

But given that Interlagos/BD has AVX (including FMA), this really isn't all that impressive, is it?

"Integer performance" on the same chart goes from "29" to "36-38.5" (from start fade to end fade)

That's a 24% to 33% improvement for 33% more cores.

Hmmm, doesn't really look like much single-threaded improvement there.

In sum:

From -7% to 0% performance loss per core on "Integer"
From 7.5% to 15% performance gain per core on "Floating point" -- and that's with AVX!

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

Other than AVX helping FP perf, somewhat less than expected, I don't see any single-threaded gains from MC --> BD, not based on these performance projections, anyhow.

http://www.anandtech.com/show/2879/3

A 16 core (8 modules) Bulldozer based CPU is 60 to 80% faster than a 12 core Magny Cours in integer performance when using SPECInt_rate as a benchmark according to Anandtech.
06-23-2010, 01:27 PM
savantu

Quote:

Originally Posted by cegras

What are you talking about? The fastest P4 released was 3.4 ghz (from wiki?), while core i7's under turbo (e.g. single core to single core) can approach that speed. You seem to have handily ignored things like transistor shrinkage as well as better design ...

I'm talking about design philosophy. You're comparing a product that reached 3.4GHz in 130nm with Nehalem which reaches 3.5 with 45nm.
It's not relevant the actual speed P4 reached ( not to mention it was held back in 65nm ), it's a product designed in the late '90s. Nehalem was done by the same time that did Netburst and it took 6 years; obviously, all they've learned with Netburst was put to good use.

And to sum it up : my point is that you can either aim for frequency or for IPC. The middle path are designs like Core/I7/K10 and possibly Bulldozer. I don't expect any wonders from either be it in IPC or frequency.
06-23-2010, 01:31 PM
savantu

Quote:

Originally Posted by Helmore

That's not what I said. I said, lower IPC than Phenom II while clocking higher and Phenom II has a lower IPC than Core 2 AFAIK.

Even so, it has a lower frequency, thus it has other bottlenecks in the design.

Quote:

You could also go for a more hybrid approach, like double clocking those parts in a core that make sense. Clock domains within a single core in other words. You could for example run the schedulers and execution units at double the clockspeed of the fetch and decode stage. I'm not saying they will, but it's another approach.

Nothing new here; Pentium 4 did this back in 2000. The integer core was clocked 2x the core clock. Ultimately it ended badly, altough they tried every technique, low swing circuits, domino logic, running something at 6-8GHz means a lot of power used and performance per watt is poor.
06-23-2010, 02:04 PM
Hornet331

Quote:

Originally Posted by terrace215

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

Well my nickname for him is mister negative, and that for a good reason. :p:
06-23-2010, 02:08 PM
wuttz

1 Attachment(s)

Quote:

Originally Posted by OhNoes!

it seems AMD is going to take a page from Intel by employing an even more aggressive core boosting strategy (ala SB) in single thread scenarios.

amd copying intel? really.
06-23-2010, 02:32 PM
informal

Quote:

Originally Posted by Helmore

http://www.anandtech.com/show/2879/3

A 16 core (8 modules) Bulldozer based CPU is 60 to 80% faster than a 12 core Magny Cours in integer performance when using SPECInt_rate as a benchmark according to Anandtech.

Also keep in mind that terrace based all of his math on an old,vague chart that has fading bars.... :rofl:
06-23-2010, 03:10 PM
Motiv

Quote:

Originally Posted by JF-AMD

All I have said is that bulldozer will be faster than current products. I have not made any clock speed statements.

The statement that I made was that Interlagos would have 33% more cores and will be 50%+ faster than Magny Cours. If you are more than 50% faster with 33% more cores, then your "per core" performance is faster. That is the only statement that we will make on performance.

Sounds to me like the IPC is better.
06-23-2010, 03:51 PM
Chumbucket843

Quote:

Originally Posted by wuttz

amd copying intel? really.

lol. intel copied amd by having LSU's!

integrated FPU, TLB, IMC, OoO, register renaming, superscalar pipeline, branch prediction, integrated L2 cache, L3 cache, among others are all things that intel did first (for x86) that amd has used.

on a more serious note intel doesnt even need to beat amd's uarch. they have the best process and circuit design teams in the world.
06-23-2010, 05:03 PM
terrace215

Quote:

Originally Posted by informal

Also keep in mind that terrace based all of his math on an old,vague chart that has fading bars.... :rofl:

Well, I mean, what can I say? It is AMD's own chart from 6 months back. :rofl:

I'll stick with AMD's official figures over Johan's tidbits, especially when the latter date (and source) from the SAME time of the "5% extra die size for 80% performance gain" nonsense that was later clarified.

Unless JF-AMD wants to reiterate any performance claim that is different from what that AMD slide shows?

JF, is Interlagos really 60-80% better than MC (12-core) in specInt_rate, despite the AMD slide showing "integer performance" is only 24-33% better?

Or was your comment to Johan in error?
06-23-2010, 05:05 PM
Hornet331

Quote:

Originally Posted by Chumbucket843

on a more serious note intel doesnt even need to beat amd's uarch. they have the best process and circuit design teams in the world.

Well it only can fix that much... we have seen how it turned out for the P4, the process was good (as the pentium-m has proven) but if the cpu design itself isn't up to the task, the best process can't help you.
06-23-2010, 05:13 PM
informal

Quote:

Originally Posted by terrace215

Well, I mean, what can I say? It is AMD's own chart from 6 months back. :rofl:

I'll stick with AMD's official figures over Johan's tidbits, especially when they date (and source) from the time of the "5% extra die size for 80% performance gain" nonsense that was later "clarified".

You can stick with whatever you like,but if it ends up like Johan said(or even better ;)) ,it won't matter. JF already stated that the "jump" will be very similar if not better to what we have got with Istanbul->MC transition.
06-23-2010, 05:36 PM
haylui

Quote:

Originally Posted by informal

You can stick with whatever you like,but if it ends up like Johan said(or even better ;)) ,it won't matter. JF already stated that the "jump" will be very similar if not better to what we have got with Istanbul->MC transition.

core to core performance, how much does MC gains over Istanbul?
I thought both were K10.5 micro-architecture?
06-23-2010, 06:04 PM
terrace215

Quote:

Originally Posted by informal

You can stick with whatever you like,but if it ends up like Johan said(or even better ;)) ,it won't matter. JF already stated that the "jump" will be very similar if not better to what we have got with Istanbul->MC transition.

You know, I've been looking more at that chart from AMD.

I've got a very close fit for the Y-axis:

**** The base results for specInt_rate and specFP_rate, for 2-socket systems with the noted processors, divided by 10. ****

Look them up. They are dead on for the 2009 "2435 Istanbul 2-socket", and very close for 2008 and 2007 as well.

Now, you say, but 2010, MC is better than this: the chart would give:

290 for int_rate (base), 280 for fp_rate for a 2-socket 2.3 MC system.

When we look we find: 309 int, 290 fp.

But recall that JF likes to say that they over-delivered with MC vs what was promised... so I think this is ok.

If I got it right, this chart from AMD calls for

=================================
Interlagos top-bin 2-socket system:

SpecInt_rate(base): 360-390
SpecFP_rate(base): 400-430

(lower numbers are where the fade starts, upper is end of bar)

=================================

The upper end would amount to an FP improvement of 48% (thank you, AVX), and integer is 390/309 = 26%.

Note that per-core, this is 148/133 = 11% better SpecFP_rate, but about 5% worse on SpecInt_rate.

It would make sense that these charts would be some form of SpecInt/FP rates, and base is easier to project than peak, and it must be 2-socket (or 1, but that doesn't make much sense) systems from the 2xx initial parts chosen.

--------------------

Anyhow, given that I now think this chart is giving spectInt/FP_rate projections, the Johan specInt_rate tidbit from JF is completely at odds with this chart, and as they were both put out at the same time... gotta think the chart stands unless JF wants to (re-)claim otherwise.
06-23-2010, 06:11 PM
informal

Johan got his information directly from AMD. The chart has fading bars and AMD(JF) already stated that only they know how high the bars actually go(that's the purpose of the fading btw,to not actually disclose the true perf. projection). You are reading waaaay to much into that chart,especially knowing that AMD couldn't possibly predict the clock speeds they would milk from the BD silicon at the time they made the chart. 60-80% uplift from MC is a good bet,but seeing how AMD delivered and over-delivered with Shanghai,Istanbul and especially MC,you can bet they will do all they can to over-deliver with BD when it launches.

edit:
a question : why are you so obsessed with AMD,BD perfromance/tapeout and 2011? Any chance you're an intel shareholder ?
06-23-2010, 06:33 PM
terrace215

Quote:

Originally Posted by informal

Johan got his information directly from AMD. The chart has fading bars and AMD(JF) already stated that only they know how high the bars actually go(that's the purpose of the fading btw,to not actually disclose the true perf. projection). You are reading waaaay to much into that chart,especially knowing that AMD couldn't possibly predict the clock speeds they would milk from the BD silicon at the time they made the chart. 60-80% uplift from MC is a good bet,but seeing how AMD delivered and over-delivered with Shanghai,Istanbul and especially MC,you can bet they will do all they can to over-deliver with BD when it launches.

Johan got his info from JF, at the same time of the 5%-die-size thing, the same day (or so) that AMD released this chart. Hence the question to JF.

***EDIT: Could it be that it was just a misread? What *is* 80% better on SpecInt_rate (per that chart) is MC over Istanbul. (rather than Interlagos over MC, which the chart shows at 35%)

AMD can make a reasonable stab at Interlagos clocks... remember that power is what is really gating things here. (more so than with a Zambezi 1-die part) But I agree there's a bin or so of "not sure", which is why the bars fade out.

I'm sure AMD will try to over-deliver, my point is merely that the chart shows BD relative to a slightly-worse-than-reality version of MC.

The numbers are interesting:

With int_rate, both Nehalem-EX and Westmere are already at the low-end of BD's projected range, so I think Westmere-EX (25% core increase, higher clocks), and also SB (33% core increase, new arch, higher mem bandwidth) will have no trouble maintaining dominance here.

With fp_rate, Intel has a lot further to go to catch a (2-socket) 400-430 SpecFP_rate(base). But presumably this is where AVX comes in, as well as more cores/bandwidth.

For single-to-low-threaded stuff, I expect Intel will win across the board, probably substantially.

edit: obsessed? Isn't the whole point of these boards/threads speculation? Some people find it fun, you know. ;) It's a challenge trying to decode these AMD performance projection slides, but the results can be informative, no?
06-23-2010, 06:35 PM
Chumbucket843

Quote:

Originally Posted by Hornet331

Well it only can fix that much... we have seen how it turned out for the P4, the process was good (as the pentium-m has proven) but if the cpu design itself isn't up to the task, the best process can't help you.

anecdote:
back then their CEO was craig barret and he used to be an engineer for intel who started working for them in the 70's. he was on the materials side of things so he pushed process over arch. netburst uarch was a really bad idea from the start. even researchers then new about future power issues. ever since presscot/tejas intel has focused on making a good uarch.

my point is that intel will almost always be ahead in process/physical design. amd can match or beat them in uarch but the only realistic way intel would lose is to fall behind in uarch (a la netburst). and fwiw hand optimized circuits can be up to 7x more power efficient over a synthesized counterpart.
06-23-2010, 06:49 PM
saint-francis

At the end of the day who ever can encode my video faster wins. Currently this is Intel and has been for several years now.

And if one person responds to this and mentions something like badaboom I'm going to :banana::banana::banana::banana:! :down:
06-23-2010, 07:57 PM
haylui

Quote:

Originally Posted by saint-francis

At the end of the day who ever can encode my video faster wins. Currently this is Intel and has been for several years now.

And if one person responds to this and mentions something like badaboom I'm going to :banana::banana::banana::banana:! :down:

hm....Jaguar for u?
06-23-2010, 09:31 PM
JF-AMD

Quote:

Originally Posted by -Sweeper_

looks like they wont talk about clock speeds :)

Nobody talks about clock speeds before lanuch.
06-23-2010, 09:32 PM
JF-AMD

Quote:

Originally Posted by terrace215

That figure appears to come from an AMD chart focusing on one projection, "Floating Point performance".

It shows MC at "28", and Interlagos at "43" (at max, the line fades starting around 40). 43% to 53% improvement.

But given that Interlagos/BD has AVX (including FMA), this really isn't all that impressive, is it?

"Integer performance" on the same chart goes from "29" to "36-38.5" (from start fade to end fade)

That's a 24% to 33% improvement for 33% more cores.

Hmmm, doesn't really look like much single-threaded improvement there.

In sum:

From -7% to 0% performance loss per core on "Integer"
From 7.5% to 15% performance gain per core on "Floating point" -- and that's with AVX!

Can you imagine the outcry from saaya were SB to compare to Westmere in such a way? I can hear it now... "Epic Fail!!!!!111"

Other than AVX helping FP perf, somewhat less than expected, I don't see any single-threaded gains from MC --> BD, not based on these performance projections, anyhow.

No, you are completely wrong here.

PLEASE DELETE THAT CHART.

I have explained several times:

1. The chart was drawn in powerpoint, the chart was not done in excel where you would have exact numbers
2. The chart uses a fade to purposely hide the actual performance estimates because we were not making actual estimates at the time.

Anyone that obsesses about that chart would also notice that the Magny Cours performance increase over Istanbul was also underestimated.

If you want to refer to any performance estimate for bulldozer, there is one official one: 50% greater total throughput than Magny Cours.

We won't be saying anything else for the forseeable future.

Any other guess, no matter how complicated the math or methodology, will be wrong.
06-23-2010, 10:35 PM
Stukov

Quote:

Originally Posted by JF-AMD

If you want to refer to any performance estimate for bulldozer, there is one official one: 50% greater total throughput than Magny Cours.

Mmmm juicy.

Show 100 post(s) from this thread on one page