AMD Bulldozer Thread

Printable View

Show 100 post(s) from this thread on one page

11-18-2011, 12:21 PM
fellix_bg

Actually, Intel does a separate SRAM cell design for their L3 caches that's much denser. AMD simply re-uses the SRAM cells from its L2 design for the L3.
11-18-2011, 08:07 PM
mAJORD

Guys 2B never made sense in the first place when you did the rough sums, 1.2B sounds closer but too little IMO:

these figures may be slightly out, but close enough to get an idea how wrong 2B sounds.

4 Core deneb:
6M cache: 458M
2M L2: 152M
4 cores: 140M
cpu-NB misc: ~8M

Total : 758M

6 Core Thuban:
6M Cache: 458M
2MB l2: 228M
6 Cores: 210M
cpu-NB+misc: ~8M

Total 904M

4 Module Bulldozer:

Module transistor count based on AMD's pre release slide stating 268M Transistors for 1 module including 2MB cache

8MB L3 Cache: ~610M
8MB L2 Cache: ~610M
4 Modules: ~240M (at ~60M each)
cPUNB+Misc: ~8M

Total: ~1.46B
11-19-2011, 12:21 AM
tom1

Quote:

Originally Posted by Gambit_2K

How is that a review? It's an analysis of the architecture, they dont even have any form of performance test in the article...

don't worry about the second part is coming :)
11-19-2011, 01:03 AM
Leeghoofd

Never ever use performance slides from the manufacturer in a review... mostly that will backfire on you !!
11-22-2011, 03:52 AM
Olivon

AMD's Bulldozer server benchmarks are here, and they're a catastrophe - Ars Technica
11-22-2011, 04:20 AM
STaRGaZeR

I wouldn't call that a catastrophe, just horrible perfomance. AMD needs to abandon this architecture, and fast.
11-22-2011, 04:25 AM
informal

What a rubbish article... The guy is acknowledging that it's faster than 12C MC and Xeon BUT... He then says it's "not fast enough" since it has 33% more cores and scores a bit lower than that:"only" 27/32% faster in SPEC JBB2005/SAP. What happened to Ars Technica ? Don't bother with the 3rd page of the "article".
11-22-2011, 04:55 AM
Hornet331

Well everyone still clings to the 33% more cores 50% more performanec claim... that was taunted all over the internet for months like a gospel... and he has some point... How would have a h10 with 2 more cores on 32nm would have done? Presonally I think not much worse.
11-22-2011, 04:57 AM
-Boris-

Quote:

Originally Posted by informal

What a rubbish article... The guy is acknowledging that it's faster than 12C MC and Xeon BUT... He then says it's "not fast enough" since it has 33% more cores and scores a bit lower than that:"only" 27/32% faster in SPEC JBB2005/SAP. What happened to Ars Technica ? Don't bother with the 3rd page of the "article".

Well it has a much worse performance per dollar and performance per watt than both MC and Xeon, how is that not bad? It's only faster than Xeon when comparing to relatively cheap and slow Xeons. Per dollar performance is still worse.
11-22-2011, 08:08 AM
behrouz

this article (Ars Technica ) is not bad at all , but just said :

Quote:

AMD faces an uphill struggle just to compete with its own old chips—let alone with Intel.

did Anandtech ever say this ?

Quote:

So if performance/watt is your first priority, we think the current Xeons are your best option.

Quote:

If performance/dollar is your first priority, we think the Opteron 6276 is an attractive alternative.

from heise.de or English

in LINPACK GFlops : Opteron 6276 vs Xeon 5680 : 205 ~239 Gflops vs 144 Ggflops

With AMD-Compiler open64 vs Intel Composer2011 SP1 : an integer in comparison with 454 to 349 and 337 to 246 floating

also 502 MFLOPS / watt (6276) compared with 311 MFLOPS / Watt (5680)
11-23-2011, 10:55 AM
savantu

The comparison simply shows how FMA can double your FP throughput. FYI, Intel claims AVX enabled 8 core SB Xeons will get 2.1x improvement in Linpack over current high end Xeons. That would mean 300 GFLOPs, completely changing the situation.
11-24-2011, 12:46 AM
flyck

Quote:

Originally Posted by savantu

The comparison simply shows how FMA can double your FP throughput. FYI, Intel claims AVX enabled 8 core SB Xeons will get 2.1x improvement in Linpack over current high end Xeons. That would mean 300 GFLOPs, completely changing the situation.

And if you take the topmodel of AMD, intel will have a ~30GFLops advantage in linpack when both use optimized compilers. That indeed changes the situation from 100GFlops slower to 30GFlops faster.
11-24-2011, 04:09 AM
informal

Quote:

Originally Posted by flyck

And if you take the topmodel of AMD, intel will have a ~30GFLops advantage in linpack when both use optimized compilers. That indeed changes the situation from 100GFlops slower to 30GFlops faster.

Never mind the fact that 6282SE will not be the top model forever. Whenever intel launches the new 8C SB-E that scores 300Gflops in linpack,AMD will be refreshing their lineup by that time. We can expect 2.8Ghz stock model so it's roughly around 2.8/2.3=1.21 or 21% faster than what 6276 gets in linpack (or around 289Gflops). This is just a tad(~3%) behind projected intel's performance with AVX enabled on their highest(?) end model. Price difference will be huge between two chips though.
11-24-2011, 05:37 AM
savantu

Quote:

Originally Posted by informal

Never mind the fact that 6282SE will not be the top model forever. Whenever intel launches the new 8C SB-E that scores 300Gflops in linpack,AMD will be refreshing their lineup by that time.

You assume the process will improve significantly in 2-3 months. The 6282SE is a 140w chip, pumping the stock frequency another 200MHz could be an issue without a new stepping.

Quote:

We can expect 2.8Ghz stock model so it's roughly around 2.8/2.3=1.21 or 21% faster than what 6276 gets in linpack (or around 289Gflops). This is just a tad(~3%) behind projected intel's performance with AVX enabled on their highest(?) end model. Price difference will be huge between two chips though.

Intel was never top dog in Linpack. MC pushed a lot more GFLOPs at a significantly lower cost $/Gflops. Looking at the HPC wins, I'd say price is less of an factor than assumed, otherwise Xeon wouldn't dominate. It would be interesting to see how 16 ( assuming 2P nodes ) really fat SB cores will do compared with 32 skinnier BD cores in HPC codes ( except Linpack, which is best case for both ).
11-24-2011, 05:47 AM
informal

Quote:

Originally Posted by savantu

You assume the process will improve significantly in 2-3 months. The 6282SE is a 140w chip, pumping the stock frequency another 200MHz could be an issue without a new stepping.

Intel was never top dog in Linpack. MC pushed a lot more GFLOPs at a significantly lower cost $/Gflops. Looking at the HPC wins, I'd say price is less of an factor than assumed, otherwise Xeon wouldn't dominate. It would be interesting to see how 16 ( assuming 2P nodes ) really fat SB cores will do compared with 32 skinnier BD cores in HPC codes ( except Linpack, which is best case for both ).

Well the guy who knows about glofo stuff(rich_wargo @ SA forum) hints at improved process node in Q1. So maybe they will fix yield and clock/power issues that obviously plague both Llano and Bulldozer. They managed to launch 16C/8M 2.6Ghz chip within the max. TDP bracket on G34,on this crappy process. So I expect another speed bump in Q1. 100Mhz is too low for a speed bump so next step is 2.8Ghz. This chip would put AMD in good position in spec rate tests(both integer and fp throughput). It would be a good duel to watch in HPC workloads: 4P 8C SB-EP @ 3Ghz @ 150W vs 2.8Ghz 8M/16C Opteron @ 140W.
11-24-2011, 06:15 AM
savantu

Quote:

Originally Posted by informal

Well the guy who knows about glofo stuff(rich_wargo @ SA forum) hints at improved process node in Q1. So maybe they will fix yield and clock/power issues that obviously plague both Llano and Bulldozer. They managed to launch 16C/8M 2.6Ghz chip within the max. TDP bracket on G34,on this crappy process. So I expect another speed bump in Q1. 100Mhz is too low for a speed bump so next step is 2.8Ghz. This chip would put AMD in good position in spec rate tests(both integer and fp throughput). It would be a good duel to watch in HPC workloads: 4P 8C SB-EP @ 3Ghz @ 150W vs 2.8Ghz 8M/16C Opteron @ 140W.

C'mon, rich knows nada. And I doubt the process is solely to blame. BD is massive and it's high speed nature could mean it's just like Prescott reloaded : no matter how good the process is/was, it can't make BD/Prescott shine. Intel's 90nm was outstanding by any metric and Dothan fully showed that. However that couldn't save Prescott's bacon. I have the impression something similar is going on here : the process is reasonably ok, yields are poorer than planned due the intrisic things like gate first, BUT, BD and Llano aren't first class engineering jobs.

And with the relation getting really sour, GF probably doesn't give a damn about AMD's issues with 32nm and simply wait for the pay-only-good-die deal to end. GF is taking huge losses and part of the blame is the design which they have no influence upon.

And their other customers care more about 28nm bulk than 32nm SOI HKMG. Last yield figures put 28nm at 1-2 good dies per wafer. They must be dancing in the isles at GF.

Edit : just found something to reinforce my point that the process is acceptable :

Quote:

Meanwhile, Globalfoundries said it would not comment on its customer's foundry selection process or on their products unless they did so first. The spokesman also said problems with Llano had been specific to that product and that yields for AMD's 32/28nm Bulldozer products were on target and not affecting AMD's ability to meet customer commitments.

“We are still the only foundry producing HKMG products that can be purchased in stores now,” the Globalfoundries spokesman said, noting that the fab expected to ship “far more” HKMG volume in 2011 than all other foundries combined.

http://www.eetimes.com/electronics-n...benefits-TSMC-
11-24-2011, 07:46 AM
SEA

Quote:

Originally Posted by savantu

[.... and that yields for AMD's 32/28nm Bulldozer products were on target and not affecting AMD's ability to meet customer commitments]

It is about yield only, the % of good chips.
Consider this: different power drainage chips can be delivered with same yield.
11-24-2011, 10:30 AM
LightSpeed

Quote:

Originally Posted by savantu

I have the impression something similar is going on here : the process is reasonably ok, yields are poorer than planned due the intrisic things like gate first, BUT, BD and Llano aren't first class engineering jobs.

There are some serious issues with the process. Anand mentioned it along with a few other informed people. Yes, maybe the engineering has some issues as well since they have tried something very different so hopefully it will be fixed in later revisions. But the process definitely is not reasonably ok.
11-26-2011, 09:37 AM
Dresdenboy

Quote:

Originally Posted by mAJORD

Guys 2B never made sense in the first place when you did the rough sums, 1.2B sounds closer but too little IMO:

these figures may be slightly out, but close enough to get an idea how wrong 2B sounds.

4 Core deneb:
6M cache: 458M
2M L2: 152M
4 cores: 140M
cpu-NB misc: ~8M

Total : 758M

6 Core Thuban:
6M Cache: 458M
2MB l2: 228M
6 Cores: 210M
cpu-NB+misc: ~8M

Total 904M

4 Module Bulldozer:

Module transistor count based on AMD's pre release slide stating 268M Transistors for 1 module including 2MB cache

8MB L3 Cache: ~610M
8MB L2 Cache: ~610M
4 Modules: ~240M (at ~60M each)
cPUNB+Misc: ~8M

Total: ~1.46B

Each module with 2MB L2 has 213M transistors according to AMDs ISSCC papers.
11-26-2011, 09:40 AM
freeloader

Quote:

Originally Posted by savantu

C'mon, rich knows nada. And I doubt the process is solely to blame. BD is massive and it's high speed nature could mean it's just like Prescott reloaded : no matter how good the process is/was, it can't make BD/Prescott shine. Intel's 90nm was outstanding by any metric and Dothan fully showed that. However that couldn't save Prescott's bacon. I have the impression something similar is going on here : the process is reasonably ok, yields are poorer than planned due the intrisic things like gate first, BUT, BD and Llano aren't first class engineering jobs.

And with the relation getting really sour, GF probably doesn't give a damn about AMD's issues with 32nm and simply wait for the pay-only-good-die deal to end. GF is taking huge losses and part of the blame is the design which they have no influence upon.

And their other customers care more about 28nm bulk than 32nm SOI HKMG. Last yield figures put 28nm at 1-2 good dies per wafer. They must be dancing in the isles at GF.

Edit : just found something to reinforce my point that the process is acceptable :

http://www.eetimes.com/electronics-n...benefits-TSMC-

Jesus Christ, the world is coming to an end when I agree with Savantu. :)
11-26-2011, 11:49 PM
Dresdenboy

The Netburst based Prescott design used a lot of high speed dynamic logic, which is not only faster (as required for an aggressive 8 FO4 (IIRC) frequency goal) but uses much more power and more transistors. BD is a static CMOS design using faster logic styles for single speed paths.

A look into the BD/Llano ISSCC papers (incl. the L3 schmoo plot) should indicate, how they expected the designs to behave using the 32nm process.
11-27-2011, 02:45 AM
Dresdenboy

Quote:

Originally Posted by STaRGaZeR

I know you like to suppose a lot, but the official figures are ~2B transistors for the die and this is pure BS. That or AMD's PR department just hit another level of mediocrity. 1,2B on a process that is known to be more dense than the competition? ;);)

There are many areas on the die which seem to be empty and might just contain wires and repeaters.

And as already said there are different types of transisors w/ different specs and size.

IIRC Llano contains 1B T.

AMD also works with macro blocks containing specific logic circuits. These might cause a little bit less efficient placement while being size optimized in itself.

Sent from my GT-I9000 using Tapatalk
12-02-2011, 02:17 AM
informal

Quote:

Originally Posted by Dresdenboy

There are many areas on the die which seem to be empty and might just contain wires and repeaters.

And as already said there are different types of transisors w/ different specs and size.

IIRC Llano contains 1B T.

AMD also works with macro blocks containing specific logic circuits. These might cause a little bit less efficient placement while being size optimized in itself.

Sent from my GT-I9000 using Tapatalk

It's official now. AMD contacted AT:
http://www.anandtech.com/show/5176/a...unt-12b-not-2b

Quote:

This is a bit unusual. I got an email from AMD PR this week asking me to correct the Bulldozer transistor count in our Sandy Bridge E review. The incorrect number, provided to me (and other reviewers) by AMD PR around 3 months ago was 2 billion transistors. The actual transistor count for Bulldozer is apparently 1.2 billion transistors...
12-02-2011, 02:38 AM
STaRGaZeR

Quote:

Originally Posted by informal

It's official now. AMD contacted AT:
http://www.anandtech.com/show/5176/a...unt-12b-not-2b

Wasn't official 2 weeks ago? ;)

Quote:

Originally Posted by informal

Official number has been corrected now,it's 1.2B and die size is 315mm^2.
12-02-2011, 02:55 AM
informal

Quote:

Originally Posted by STaRGaZeR

Wasn't official 2 weeks ago? ;)

Well it kinda was but the website that claimed they were contacted by AMD never really posted what AMD said. Apparently AMD contacted several websites and AT was the only one to post anything substantial.

The funny thing is we still didn't get the explanation about the 2B figure...

Show 100 post(s) from this thread on one page