AMD's Bobcat and Bulldozer

**Opteron146** · 08-31-2010, 01:32 PM

Originally Posted by MTd2

JF, so each BD is faster clock per clock than the Phenom cores?

Ok, I assume JF will be too frustrated to answer this, so I will do him a favour and answer:
Yes, of course ...

If you want to hear it from JF directly, read "informal's" signature ;-) ;-)

**Olivon** · 08-31-2010, 01:46 PM

AMD to Test Upcoming Bobcat Processors in Servers

"We're definitely in the process of examining this as a design point," said Donald Newell, AMD's new server chief technology officer, in an interview. "It would be foolish not to."

"There's only a few papers ... and there's a lot more data to collect," Newell said. "It really depends on a number of factors ... to whether or not that's a good design point."

"It's hard for Arm to move up in the server world, like x86 would be to move down to dishwashers," Newell said.

AMD also is looking to mold graphics processors and separate accelerator units into its server offerings. Right now GPUs and accelerators are designed for specialist computing needs, but the company wants to build chips where all the architectural elements flawlessly work together, Newell said.

**blindbox** · 08-31-2010, 01:49 PM

Originally Posted by Olivon

AMD to Test Upcoming Bobcat Processors in Servers

I think someone took arsetechnica's advice and tested it anyway

**informal** · 08-31-2010, 01:53 PM

"We're definitely in the process of examining this as a design point," is not equal to actual testing of Bobcat in server environment. Today's journalists really like to twist the words and jump to (wrong) conclusions .To get to actual testing they have to see if it makes sense in the first place.

Originally Posted by Opteron146

Yes you are right, but I never said anything against that point ;-)
Maybe one note on that, because I red it earlier: The AGU results are not retired, they go immediately into the LD/STR units, so the waiting µOp can get its mem-data ;-) Later, after the calculation of the µOp is finished, that µOp is retired.
So in short the retire / ExU ratio is 1:2 for both, not 1:3. For K10 it's (3:6) and for BD it's (4:8).

That's correct

.BTW I'm sure AMD at least investigated the other ALU/AGU possibilities and they came out with the most efficient one.Wasted resources&power/diminishing returns is not what they would want from a design like Bulldozer,especially with the clock targets they have in mind

.

**JF-AMD** · 08-31-2010, 01:56 PM

I have debunked this in several places. We are NOT "testing" bobcat in servers.

We are looking at the market to determine whether there is a place for it. It would be irresponsible to not consider every piece of silicon and IP that we have access to. But, as Bobcat is defined today, it does not meet the needs of the server market. Just as Atom and ARM are coming up short as well. When you can get six cores @ 35W TDP in an Opteron 4000, why would you want to build more servers and have more physical hardware? The folks looking ar really low power environements are looking at embedded or they are looking to reduce management and power costs. 12 cores @ 35W/CPU in a single server makes a lot more sense than 6 low power (and low performance) dual core 1P servers. When you talk to the big cloud guys, core density is critical because that means fewer systems to manage.

**qcmadness** · 08-31-2010, 02:04 PM

Originally Posted by JF-AMD

I have debunked this in several places. We are NOT "testing" bobcat in servers.

We are looking at the market to determine whether there is a place for it. It would be irresponsible to not consider every piece of silicon and IP that we have access to. But, as Bobcat is defined today, it does not meet the needs of the server market. Just as Atom and ARM are coming up short as well. When you can get six cores @ 35W TDP in an Opteron 4000, why would you want to build more servers and have more physical hardware? The folks looking ar really low power environements are looking at embedded or they are looking to reduce management and power costs. 12 cores @ 35W/CPU in a single server makes a lot more sense than 6 low power (and low performance) dual core 1P servers. When you talk to the big cloud guys, core density is critical because that means fewer systems to manage.

I would expect Bobcat / Ontario will only prevail in HTPC / Set-top box markets.

**Opteron146** · 08-31-2010, 03:16 PM

Anybody who wants to speculate about clock rates ?

Just rememered IBMs 4.25 GHz p7 8core chip with 4xSMT. That is with 45nm

So far I thought 5 GHz for BD is fanboy dreaming, but compared to that monsterous 45nm chip it should be rather reasonable now that a smaller BD die produced in 32nm together with high-k interconnects should be able to achieve that.

What do you think ? Is it ok, to speculate on x86 clocks by comparing it to Power / RISC numbers ?

@informal:
I agree totally ;-)

Thanks

**AliG** · 08-31-2010, 03:32 PM

Originally Posted by Opteron146

Anybody who wants to speculate about clock rates ?

Just rememered IBMs 4.25 GHz p7 8core chip with 4xSMT. That is with 45nm

So far I thought 5 GHz for BD is fanboy dreaming, but compared to that monsterous 45nm chip it should be rather reasonable now that a smaller BD die produced in 32nm together with high-k interconnects should be able to achieve that.

What do you think ? Is it ok, to speculate on x86 clocks by comparing it to Power / RISC numbers ?

@informal:
I agree totally ;-)

Thanks

I have a hard time believing 5ghz stock as that's just never been done before that I can recall. However, intel's Sandy Bridge lineup covers 2.5-3.4ghz, and assuming that they will have an ipc advantage, AMD may end up covering 3-4ghz (numbers per-turbo on both sides).

Overclocking BD should be fun if it is truly a high frequency design. Even though Netburst cpus are just about worthless in terms of performance, they are still some of the most fun to mess with. AMD could perhaps combine the best of both worlds, and give it more IPC than k10.5 while still making it clock like p4s (that would be a major win amongst enthusiasts now that Intel is locking fsb).

**informal** · 08-31-2010, 07:34 PM

Originally Posted by Opteron146

Anybody who wants to speculate about clock rates ?

Just rememered IBMs 4.25 GHz p7 8core chip with 4xSMT. That is with 45nm

So far I thought 5 GHz for BD is fanboy dreaming, but compared to that monsterous 45nm chip it should be rather reasonable now that a smaller BD die produced in 32nm together with high-k interconnects should be able to achieve that.

What do you think ? Is it ok, to speculate on x86 clocks by comparing it to Power / RISC numbers ?

@informal:
I agree totally ;-)

Thanks

I will give it a try

@95W envelope we have 6 cores done on 45nm working @ 2.8Ghz. If BD was done on the same node I guess ,with the targeted 20% in clock speed due to pipeline changes, we could have 2.8x1.2=3.36 or round up to 3.4Ghz.BUT,it will go to 32nm highK/mg instead.I would still pick the same clock and power draw values just to be conservative(let's disregard the 45->32nm node improvement since we have 33% more cores).That's a 4 module part. Now,if count in 10-15% IPC improvement(pick average 12.5) and 33% more cores and at last divide by 1.1(10%) for the "performance hit" in fully loaded modules,in multithreaded workloads we get an equivalent performance of 4.65Ghz X6 Thuban .This is with no Turbo over stock.

Now,with the new Turbo(<=1/2 of the cores are idle,picking Thuban's Turbo conditions),I would expect ~20-30% clock increase,take a 25% as middle .We get => 3.4x1.25=4.25Ghz in poorly threaded or single threaded applications.Now add the speculated 10-15% IPC jump(pick 12.5 as arithm. mean value) to get the equivalent Thuban class core clock=> 4.25x1.125~=4.8Ghz Thuban in single threaded workloads(no 10% hit here).If the power gating happens in a way so that 2 modules are gated,we have the 10% hit due to core scaling in modules => 4.8/1.1=4.36Ghz Thuban class core speed in poorly threaded workloads(1<no. of threads active<=4).

So to sum it up,I expect a 95W 3.4Ghz "X8" Bulldozer model,with 4.25Ghz effective turbo and 10-15%(pick 12.5%) IPC jump. This would be equal to a:
-4.8Ghz Thuban in purely single threaded workloads and
-4.36Ghz Thuban class core in poorly threaded workloads.
-4.65Ghz X6 Thuban in multi thread workloads.

In the 125W range I would expect 3.6 and 3.8Ghz models,and if they really want to push the limit,a 4Ghz 125W model. Turbo would be smaller,percentage wise and similar or slightly lower frequency wise than in the earlier example. So effectively just add 0.2, 0.4 and 0.6Ghz on top of the 3 numbers for "equivalent Thuban" above and you will have projection how these 3 125 or 140W ones could perform(top model ,the hypothetical 125/140W 4Ghz one could easily be equivalent to 4.7-5.4Ghz Thuban class core,depending on the workload).
Enough of xtreme speculation from me

**MTd2** · 08-31-2010, 07:46 PM

Assuming that the 50% more performance with 33% more cores referes to IPC, we have that BD´s, we have a 12,5% increase in IPC in relation to K10h. Estimate that the area of a module in 32nm is the same of a core of the previous generation, and that the power envelope (just of the core now, not the whole chip) is the same for the same area of the previous generation. If we have a 20% higher frequency for the same power envelope, we've got a 35% increase for the same thermal envelope.

Each module have 30mm^2, so, the total will be 120mm^2, for a 4 module. Plus some 8mb of L3 cache, like, 60mm^2, we have 180mm^2. A previous generation had a heat of 125W at 3.4GHz, so this one will be 4,1GHz at 95W, turbo at 5GHz.

Let's see performance-wise. For 4modules/8 cores, we have that the performance of a bulldozer will be 70% higher, while consuming 30% less than a PhII 3.4GHz.

The IPC of a SB is 50% higher than PhII, but won't clock as high as a BD. At 95W, a 4core will be 3.3GHz. So, we have that at this power envelope, BD will be 20% faster than SB, with about the same die area of a SB, or slightly smaller.

Of course, Intel will release a 8 core SB, but its die area should be around 320mm^2, and no way that at 3.3GHz the power consumption will be lower than 150W. For servers, Intel must counter at least with a 10 core, absolute minimum.

So, you see, BD will be a competitor for Ivy Bridge, not Sandy Bridge.

**tifosi** · 08-31-2010, 08:37 PM

Originally Posted by AliG

... I honestly don't think it will be enough to compete with Sandy Bridge.

At least from my perspective, it seems to me that AMD is done challenging for the top enthusiast performance spot. They seem to have shifted onto a new direction, trying to offer the most performance per dollar, especially over the long run when you consider electricity bills. That's quite reasonable, as Intel has far more money spent on their fabrication process, and thus have denser, faster caches which seriously helps out on applications like Super Pi.

As you rightly said... Intel chippery has faster and denser caches which help in most desktop environment situations... AMD will be good, but beat Intel... Not unless some multi-threading is thrown into picture...

Then again, this is off topic but then you got to look beyond architecture to see, whether binaries involved in creating software are any of a bother and how much... As far as i'm aware, latest intel binaries do not allow AVX to work on any other chip than "Genuine Intel." This would shut up fanboys from both sides :P

Originally Posted by MTd2

... The IPC of a SB is 50% higher than PhII, but won't clock as high as a BD. At 95W, a 4core will be 3.3GHz. So, we have that at this power envelope, BD will be 20% faster than SB, with about the same die area of a SB, or slightly smaller.

Of course, Intel will release a 8 core SB, but its die area should be around 320mm^2, and no way that at 3.3GHz the power consumption will be lower than 150W. For servers, Intel must counter at least with a 10 core, absolute minimum.

So, you see, BD will be a competitor for Ivy Bridge, not Sandy Bridge.

Yes, In server arena, which is the most lucrative for both Intel and AMD, BD will help AMD gain a competitive edge... People aren't yet using MC as much, as big OEM partners are yet to come out with servers featuring MC to its best... However, will they be able to resist BD? Actually Eagleton could not come any sooner for Intel... but as much as i learned, it would be based on Sandy-bridge and not Ivy-league (please correct me if i'm wrong on this bit). What i'm saying is, BD is going to be a major win for AMD in server arena, which is where they'll make most of the money.

AMD is keeping it quiet as the product is 2 or more quarters away and it gives Intel enough time to out-maneuver them in the market based on factors like price and all, limiting options for AMD... Hence my theory of "Intel Employee" when people want to know more than AMD has already offered :P

**JumpingJack** · 08-31-2010, 09:07 PM

Originally Posted by MTd2

Assuming that the 50% more performance with 33% more cores referes to IPC, we have that BD´s, we have a 12,5% increase in IPC in relation to K10h. Estimate that the area of a module in 32nm is the same of a core of the previous generation, and that the power envelope (just of the core now, not the whole chip) is the same for the same area of the previous generation. If we have a 20% higher frequency for the same power envelope, we've got a 35% increase for the same thermal envelope.

Each module have 30mm^2, so, the total will be 120mm^2, for a 4 module. Plus some 8mb of L3 cache, like, 60mm^2, we have 180mm^2. A previous generation had a heat of 125W at 3.4GHz, so this one will be 4,1GHz at 95W, turbo at 5GHz.

Let's see performance-wise. For 4modules/8 cores, we have that the performance of a bulldozer will be 70% higher, while consuming 30% less than a PhII 3.4GHz.

The IPC of a SB is 50% higher than PhII, but won't clock as high as a BD. At 95W, a 4core will be 3.3GHz. So, we have that at this power envelope, BD will be 20% faster than SB, with about the same die area of a SB, or slightly smaller.

Of course, Intel will release a 8 core SB, but its die area should be around 320mm^2, and no way that at 3.3GHz the power consumption will be lower than 150W. For servers, Intel must counter at least with a 10 core, absolute minimum.

So, you see, BD will be a competitor for Ivy Bridge, not Sandy Bridge.

That was quite a bit of work

**savantu** · 08-31-2010, 09:44 PM

Originally Posted by MTd2

Assuming that the 50% more performance with 33% more cores referes to IPC, we have that BD´s, we have a 12,5% increase in IPC in relation to K10h. ...

Or not. Add in the calculation a 20% frequency increase and see what gain there is.
BD should be 20% higher frequency from the process change alone, irrespective of uarch changes to facilitate higher clocks.

**blindbox** · 08-31-2010, 10:21 PM

Originally Posted by savantu

Or not. Add in the calculation a 20% frequency increase and see what gain there is.
BD should be 20% higher frequency from the process change alone, irrespective of uarch changes to facilitate higher clocks.

20% higher clocks, 33% more cores is a little bit too much for the TDP, don't you think? Unless you insist IPC isn't better than K10 despite how many times JF-AMD says it.

Well anyway MTd2, they're server workloads.

**kl0012** · 08-31-2010, 10:34 PM

Originally Posted by MTd2

Assuming that the 50% more performance with 33% more cores referes to IPC, we have that BD´s, we have a 12,5% increase in IPC in relation to K10h. Estimate that the area of a module in 32nm is the same of a core of the previous generation, and that the power envelope (just of the core now, not the whole chip) is the same for the same area of the previous generation. If we have a 20% higher frequency for the same power envelope, we've got a 35% increase for the same thermal envelope.

Each module have 30mm^2, so, the total will be 120mm^2, for a 4 module. Plus some 8mb of L3 cache, like, 60mm^2, we have 180mm^2. A previous generation had a heat of 125W at 3.4GHz, so this one will be 4,1GHz at 95W, turbo at 5GHz.
Let's see performance-wise. For 4modules/8 cores, we have that the performance of a bulldozer will be 70% higher, while consuming 30% less than a PhII 3.4GHz.
The IPC of a SB is 50% higher than PhII, but won't clock as high as a BD. At 95W, a 4core will be 3.3GHz. So, we have that at this power envelope, BD will be 20% faster than SB, with about the same die area of a SB, or slightly smaller.

Of course, Intel will release a 8 core SB, but its die area should be around 320mm^2, and no way that at 3.3GHz the power consumption will be lower than 150W. For servers, Intel must counter at least with a 10 core, absolute minimum.

So, you see, BD will be a competitor for Ivy Bridge, not Sandy Bridge.

Right.

**-Boris-** · 08-31-2010, 11:17 PM

Seems like we all know what we need to know then.

**Florinmocanu** · 09-01-2010, 12:01 AM

Originally Posted by blindbox

20% higher clocks, 33% more cores is a little bit too much for the TDP, don't you think? Unless you insist IPC isn't better than K10 despite how many times JF-AMD says it.

Well anyway MTd2, they're server workloads.

AMD did a 50% core increase at the same frequency on the 45nm node without increasing tdp, actually lowering it. So don't hold you breath yet.

**savantu** · 09-01-2010, 12:48 AM

Originally Posted by blindbox

20% higher clocks, 33% more cores is a little bit too much for the TDP, don't you think? Unless you insist IPC isn't better than K10 despite how many times JF-AMD says it.

Well anyway MTd2, they're server workloads.

In this very thread people discuss frequencies of 3.3-4 GHz for BD which is significantly higher than MC ( max 2.3GHz ).

The 50% more performance, 33% more cores applies versus Magny Cours. You also need to factor in frequency since this was the unknown part in the AMD slide.

**Mechanical Man** · 09-01-2010, 01:13 AM

Originally Posted by savantu

In this very thread people discuss frequencies of 3.3-4 GHz for BD which is significantly higher than MC ( max 2.3GHz ).

The 50% more performance, 33% more cores applies versus Magny Cours. You also need to factor in frequency since this was the unknown part in the AMD slide.

MC is 12 core variant. Peple were discussing >3,4GHz for 4 module -> 8 core variant bulldozer not 16 core that would be 33% more cores compared to MC.

**savantu** · 09-01-2010, 01:29 AM

Originally Posted by Mechanical Man

MC is 12 core variant. Peple were discussing >3,4GHz for 4 module -> 8 core variant bulldozer not 16 core that would be 33% more cores compared to MC.

And what frequency do you expect the 16 core variant to reach at launch ?

**madcho** · 09-01-2010, 01:59 AM

Originally Posted by savantu

And what frequency do you expect the 16 core variant to reach at launch ?

I think top frequency in G34 will be higher than 2.3 and i would say higher of 20% than 2.2ghz 95W ACP, so around 2.6ghz.

But no desktop 8 modules in fabs i think.

**Sn0wm@n** · 09-01-2010, 02:17 AM

8 cores version will most likely be @ 3.0 & 3.2 maybe .... 6 cores 3.4 and higher ... and on and on

**Mechanical Man** · 09-01-2010, 02:28 AM

Originally Posted by savantu

And what frequency do you expect the 16 core variant to reach at launch ?

<3GHz, Maybe 2,6GHz. Also, i expect IPC gain to be atleast 10% in integer code, more in float when running "normal" application, one that does not have only floating point calcs. That kind of code would be better to be ran on gpu's anyway.

**MTd2** · 09-01-2010, 02:45 AM

Originally Posted by Sn0wm@n

8 cores version will most likely be @ 3.0 & 3.2 maybe .... 6 cores 3.4 and higher ... and on and on

I considered that the clock advantage would be related to the core area, which is really small in BD, such that 2 cores have about the same die space as a PhII core. So, the clock advantage is even higher from the point of view of a number of cores.

So, if the clocks are so small as you say, BD is more or less tied or a bit lower than SB in perf per watt.re

16 core, matching MC, would have 2,6 GHz, from the point of view of my reasoning.

**Sn0wm@n** · 09-01-2010, 02:58 AM

on the mcm part they are still bound by tdp ... now have 2 of those glued together and you need to lower your clocks considerably ... so you still stay in the desired tdp so i dont think you will see clocks higher then 2.3 for the 16 cores version ...

maybe 2.5 for the 12 cores mcm part on 32nm ... and higher the less cores they have

Thread: AMD's Bobcat and Bulldozer

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions