AMD to Test Upcoming Bobcat Processors in Servers
"We're definitely in the process of examining this as a design point," said Donald Newell, AMD's new server chief technology officer, in an interview. "It would be foolish not to.""There's only a few papers ... and there's a lot more data to collect," Newell said. "It really depends on a number of factors ... to whether or not that's a good design point.""It's hard for Arm to move up in the server world, like x86 would be to move down to dishwashers," Newell said.
AMD also is looking to mold graphics processors and separate accelerator units into its server offerings. Right now GPUs and accelerators are designed for specialist computing needs, but the company wants to build chips where all the architectural elements flawlessly work together, Newell said.
"We're definitely in the process of examining this as a design point," is not equal to actual testing of Bobcat in server environment. Today's journalists really like to twist the words and jump to (wrong) conclusions .To get to actual testing they have to see if it makes sense in the first place.
That's correct.BTW I'm sure AMD at least investigated the other ALU/AGU possibilities and they came out with the most efficient one.Wasted resources&power/diminishing returns is not what they would want from a design like Bulldozer,especially with the clock targets they have in mind
.
I have debunked this in several places. We are NOT "testing" bobcat in servers.
We are looking at the market to determine whether there is a place for it. It would be irresponsible to not consider every piece of silicon and IP that we have access to. But, as Bobcat is defined today, it does not meet the needs of the server market. Just as Atom and ARM are coming up short as well. When you can get six cores @ 35W TDP in an Opteron 4000, why would you want to build more servers and have more physical hardware? The folks looking ar really low power environements are looking at embedded or they are looking to reduce management and power costs. 12 cores @ 35W/CPU in a single server makes a lot more sense than 6 low power (and low performance) dual core 1P servers. When you talk to the big cloud guys, core density is critical because that means fewer systems to manage.
Anybody who wants to speculate about clock rates ?
Just rememered IBMs 4.25 GHz p7 8core chip with 4xSMT. That is with 45nm
So far I thought 5 GHz for BD is fanboy dreaming, but compared to that monsterous 45nm chip it should be rather reasonable now that a smaller BD die produced in 32nm together with high-k interconnects should be able to achieve that.
What do you think ? Is it ok, to speculate on x86 clocks by comparing it to Power / RISC numbers ?
@informal:
I agree totally ;-)
Thanks
I have a hard time believing 5ghz stock as that's just never been done before that I can recall. However, intel's Sandy Bridge lineup covers 2.5-3.4ghz, and assuming that they will have an ipc advantage, AMD may end up covering 3-4ghz (numbers per-turbo on both sides).
Overclocking BD should be fun if it is truly a high frequency design. Even though Netburst cpus are just about worthless in terms of performance, they are still some of the most fun to mess with. AMD could perhaps combine the best of both worlds, and give it more IPC than k10.5 while still making it clock like p4s (that would be a major win amongst enthusiasts now that Intel is locking fsb).
I will give it a try
@95W envelope we have 6 cores done on 45nm working @ 2.8Ghz. If BD was done on the same node I guess ,with the targeted 20% in clock speed due to pipeline changes, we could have 2.8x1.2=3.36 or round up to 3.4Ghz.BUT,it will go to 32nm highK/mg instead.I would still pick the same clock and power draw values just to be conservative(let's disregard the 45->32nm node improvement since we have 33% more cores).That's a 4 module part. Now,if count in 10-15% IPC improvement(pick average 12.5) and 33% more cores and at last divide by 1.1(10%) for the "performance hit" in fully loaded modules,in multithreaded workloads we get an equivalent performance of 4.65Ghz X6 Thuban .This is with no Turbo over stock.
Now,with the new Turbo(<=1/2 of the cores are idle,picking Thuban's Turbo conditions),I would expect ~20-30% clock increase,take a 25% as middle .We get => 3.4x1.25=4.25Ghz in poorly threaded or single threaded applications.Now add the speculated 10-15% IPC jump(pick 12.5 as arithm. mean value) to get the equivalent Thuban class core clock=> 4.25x1.125~=4.8Ghz Thuban in single threaded workloads(no 10% hit here).If the power gating happens in a way so that 2 modules are gated,we have the 10% hit due to core scaling in modules => 4.8/1.1=4.36Ghz Thuban class core speed in poorly threaded workloads(1<no. of threads active<=4).
So to sum it up,I expect a 95W 3.4Ghz "X8" Bulldozer model,with 4.25Ghz effective turbo and 10-15%(pick 12.5%) IPC jump. This would be equal to a:
-4.8Ghz Thuban in purely single threaded workloads and
-4.36Ghz Thuban class core in poorly threaded workloads.
-4.65Ghz X6 Thuban in multi thread workloads.
In the 125W range I would expect 3.6 and 3.8Ghz models,and if they really want to push the limit,a 4Ghz 125W model. Turbo would be smaller,percentage wise and similar or slightly lower frequency wise than in the earlier example. So effectively just add 0.2, 0.4 and 0.6Ghz on top of the 3 numbers for "equivalent Thuban" above and you will have projection how these 3 125 or 140W ones could perform(top model ,the hypothetical 125/140W 4Ghz one could easily be equivalent to 4.7-5.4Ghz Thuban class core,depending on the workload).
Enough of xtreme speculation from me![]()
Assuming that the 50% more performance with 33% more cores referes to IPC, we have that BD´s, we have a 12,5% increase in IPC in relation to K10h. Estimate that the area of a module in 32nm is the same of a core of the previous generation, and that the power envelope (just of the core now, not the whole chip) is the same for the same area of the previous generation. If we have a 20% higher frequency for the same power envelope, we've got a 35% increase for the same thermal envelope.
Each module have 30mm^2, so, the total will be 120mm^2, for a 4 module. Plus some 8mb of L3 cache, like, 60mm^2, we have 180mm^2. A previous generation had a heat of 125W at 3.4GHz, so this one will be 4,1GHz at 95W, turbo at 5GHz.
Let's see performance-wise. For 4modules/8 cores, we have that the performance of a bulldozer will be 70% higher, while consuming 30% less than a PhII 3.4GHz.
The IPC of a SB is 50% higher than PhII, but won't clock as high as a BD. At 95W, a 4core will be 3.3GHz. So, we have that at this power envelope, BD will be 20% faster than SB, with about the same die area of a SB, or slightly smaller.
Of course, Intel will release a 8 core SB, but its die area should be around 320mm^2, and no way that at 3.3GHz the power consumption will be lower than 150W. For servers, Intel must counter at least with a 10 core, absolute minimum.
So, you see, BD will be a competitor for Ivy Bridge, not Sandy Bridge.
As you rightly said... Intel chippery has faster and denser caches which help in most desktop environment situations... AMD will be good, but beat Intel... Not unless some multi-threading is thrown into picture...
Then again, this is off topic but then you got to look beyond architecture to see, whether binaries involved in creating software are any of a bother and how much... As far as i'm aware, latest intel binaries do not allow AVX to work on any other chip than "Genuine Intel." This would shut up fanboys from both sides :P
Yes, In server arena, which is the most lucrative for both Intel and AMD, BD will help AMD gain a competitive edge... People aren't yet using MC as much, as big OEM partners are yet to come out with servers featuring MC to its best... However, will they be able to resist BD? Actually Eagleton could not come any sooner for Intel... but as much as i learned, it would be based on Sandy-bridge and not Ivy-league (please correct me if i'm wrong on this bit). What i'm saying is, BD is going to be a major win for AMD in server arena, which is where they'll make most of the money.
AMD is keeping it quiet as the product is 2 or more quarters away and it gives Intel enough time to out-maneuver them in the market based on factors like price and all, limiting options for AMD... Hence my theory of "Intel Employee" when people want to know more than AMD has already offered :P
Last edited by tifosi; 08-31-2010 at 09:05 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Seems like we all know what we need to know then.![]()
In this very thread people discuss frequencies of 3.3-4 GHz for BD which is significantly higher than MC ( max 2.3GHz ).
The 50% more performance, 33% more cores applies versus Magny Cours. You also need to factor in frequency since this was the unknown part in the AMD slide.
I considered that the clock advantage would be related to the core area, which is really small in BD, such that 2 cores have about the same die space as a PhII core. So, the clock advantage is even higher from the point of view of a number of cores.
So, if the clocks are so small as you say, BD is more or less tied or a bit lower than SB in perf per watt.re
16 core, matching MC, would have 2,6 GHz, from the point of view of my reasoning.
on the mcm part they are still bound by tdp ... now have 2 of those glued together and you need to lower your clocks considerably ... so you still stay in the desired tdp so i dont think you will see clocks higher then 2.3 for the 16 cores version ...
maybe 2.5 for the 12 cores mcm part on 32nm ... and higher the less cores they have
Bookmarks