AMD's Bobcat and Bulldozer

**superrugal** · 08-31-2010, 06:31 AM

Is this clear enough?

**informal** · 08-31-2010, 06:47 AM

Each integer core takes 4 Macro ops from the dispatch group buffers while each 10h(Istanbul) core takes 3 Macro ops.

**Sn0wm@n** · 08-31-2010, 06:48 AM

where did you get those slides ? :O they are more then clear about how the arch works compared to westmere and instanbul ... Thanks

**z3et** · 08-31-2010, 06:51 AM

Originally Posted by Sn0wm@n

where did you get those slides ? :O they are more then clear about how the arch works compared to westmere and instanbul ... Thanks

From here. It's very good article

**Hornet331** · 08-31-2010, 06:54 AM

Originally Posted by z3et

From here. It's very good article

Its not like it hadn't been already posted here in this thread:
http://www.xtremesystems.org/forums/...&postcount=685

**z3et** · 08-31-2010, 07:13 AM

I didn't saw the post

PS: I read it from the first day it got published, http://www.google.com/realtime

**STaRGaZeR** · 08-31-2010, 07:15 AM

Originally Posted by -Boris-

BD has more resources since it can use 2 ALUs and 2 AGUs every clock, Phenom II averages at 1.5 ALUs and 1.5 AGUs since the share pipe. Again, if you can't use it, it isn't a resource. 2+2=4 (3+3)/2=3

The thing is that it uses it. If the CPU can't use all 6 at the same time that's another thing. All 6 will get used at some point. Either way, they are on the die, they're connected, and they are used. Alternatively, not at the same time, whatever. But they are there, they are used and thus they are a resource. K10 has more resources than BD (integer "clusters").

Originally Posted by -Boris-

The discussion is still around IPC. Even if you try to make it look different. And it's still about BDs integer execution capacity compared to k8 (10h), we are pointing out that BDs 4 pipes seems a bit stronger than K8s 3 pipes.
And by adding the different parts of K8s pipeline together some people here are trying to make them look twice as strong.
4 pipes equals more resources than 3.

Instructions per clock (compared to K10). Frequency doesn't matter, this is per clock:

IPC (CPU level) --> Will be higher, more "modules", double integer resources per "module", less resources per integer "cluster", better use of available resources per integer "cluster".
IPC ("module" level) --> Will be higher, double integer resources per "module", less resources per integer "cluster", better use of available resources per integer "cluster".
IPC (single integer "cluster") --> Less resources, better use of available resources. Higher or lower instructions per clock?

The bold part is likely lower, and that's exactly what savantu, terrace and others are discussing here. IPC per integer "cluster". We don't know for sure, since JF just says "IPC will be higher". At what of the previous levels? After all the BS, bans, etc. he still hasn't answered this question.

Now, if you throw frecuency in the mix, knowing that it will be higher than current K10 CPUs, of course you can say single integer "cluster" perfomance is higher. Just notice how he never uses IPC+higher+per integer "cluster" in the same sentence. The only info we know about single thread perfomance is that it will "be higher". Of course, because of the higher frequency, not because IPC is higher.

JF just has to answer the question and this debate is going to end fast: IPC per integer cluster has been increased or not? No BS, just yes or no.

**Mechanical Man** · 08-31-2010, 07:27 AM

Originally Posted by STaRGaZeR

Blaablaa..

JF just has to answer the question and this debate is going to end fast: IPC per integer cluster has been increased or not? No BS, just yes or no.

Originally Posted by JF-AMD

Blaablaa..

A BD integer core will do more IPC and perform single threads faster than an old core.

There, posted few pages back. So why did it not end?

**informal** · 08-31-2010, 07:27 AM

Stargazer,can you read or not?The man said IPC will be higher and single thread performance will be higher. Can't you just stop beating the dead horse already?It's dead,alright?
K10 has a clear bottleneck in the retirement unit.It has a massive 9 execution units available(3ALU,3AGU,3FPU) but can retire only 3 macro ops per cycle.

**Calmatory** · 08-31-2010, 07:31 AM

What would make the IPC lower on "integer cluster level"? Deeper pipeline + "less" resources, L1D cut down by 75%?

Not any of those. Less absolute resources, more practical resources per thread. This alone could possibly compensate for any IPC loss caused by deeper pipeline, let alone the improvements in other areas. If the cache is actually inclusive, then that alone would compensate for every possible CPU-level change which would reduce IPC even the fiercest Intel fan could think of.

Potential integer throughput of those 2ALU/2AGU says very little about the IPC performance, let alone single-thread performance, or whole product performance. All you'd need is slightly faster cache access and more aggressive prefetching and branch predicting to bring 10 % IPC increase with 10 % penalty on "integer clusters".

**Particle** · 08-31-2010, 07:33 AM

What BS? He has already stated both single thread and multithreaded performance are both higher.

**blindbox** · 08-31-2010, 07:34 AM

informal, some people can't believe their eyes.

Anyway, yeah, read the article at anand. It's quite clear about uarch changes there (and probably the only site other than realworldtech which bothered to make their own diagrams to help better understand). I haven't read the one on realworldtech yet but judging from that pic at post 726, it might be good.

**Mechromancer** · 08-31-2010, 07:48 AM

David Kanter over at Real World Tech has a writeup about Bulldozer's uArch.

http://www.realworldtech.com/page.cf...WT082610181333

I figured this didn't have to be posted as a completely new thread.

**madcho** · 08-31-2010, 08:01 AM

Originally Posted by Mechromancer

David Kanter over at Real World Tech has a writeup about Bulldozer's uArch.

http://www.realworldtech.com/page.cf...WT082610181333

I figured this didn't have to be posted as a completely new thread.

already posted in this thread but thx

**Chumbucket843** · 08-31-2010, 08:05 AM

Originally Posted by savantu

BD taped out a month or two ago. If they were lucky silicon is mostly functional. If not, they are working overtime to fix it and get working samples. Silicon is being characterized and in pre-validation stage.
In other words, benchmarks and performance are second place at this time, most important is getting a functional chip.

do you have any clue as to what happens in post-Si validation? based off of your post i'd bet against it.

**Mechromancer** · 08-31-2010, 08:07 AM

Originally Posted by madcho

already posted in this thread but thx

I didn't go back enough pages when I checked

. Oh wellz..

**Hornet331** · 08-31-2010, 08:20 AM

Originally Posted by Mechanical Man

There, posted few pages back. So why did it not end?

2 pages is a few?

And finally we got a statement, its the first time he explicit mentioned this and the question was answered, ironically after the one that asked the question first was banned...

If he would have done so much earlier we could have at least saved 15 pages of nonsense... anyway im satisfied with the answer and there is nothing more to ask.

**blindbox** · 08-31-2010, 08:23 AM

Or the guy who made 15 pages of non-sense could be willing to read a few articles on the front page before starting assumptions (and indirectly accusing people/slides of lying). Two ways of looking at it..

**tifosi** · 08-31-2010, 08:30 AM

Originally Posted by STaRGaZeR

1)The bold part is likely lower, and that's exactly what savantu, terrace and others are discussing here. IPC per integer "cluster". We don't know for sure, since JF just says "IPC will be higher". At what of the previous levels? After all the BS, bans, etc. he still hasn't answered this question.

2)...Of course, because of the higher frequency, not because IPC is higher.

3)JF just has to answer the question and this debate is going to end fast: IPC per integer cluster has been increased or not? No BS, just yes or no.

1) As per what percentage improvement could be seen... JF has already said that with 33% cores 50% performance gain at server workloads could be seen. This is the only information JF is willing to share and unless you hold Intel stock or work for them, i see no reason why'd you press so much for that information... which he already explained that he couldn't share owing to product being some time away from launch (i assume about a good 2 quarters or so...). Personally speaking AMD wouldn't want Intel to have information on an upcoming product, as it will give Intel an edge and possibly a chance to outmaneuver them. It works the same the when it comes to the opposite... The only time Intel leaked information (remember C2D) on an upcoming architecture was when AMD was kicking them around left right and center and in all segments of market... Now if Intel finds out stuff, they could possibly evolve a new pricing strategy (given their scale and market share its easier now) or something else, to counter a competitive product. Competitive BD is...

2) IPC is higher...

3) IPC compared to previous architectures of AMD is higher... he said as much... and many a times over...

**AliG** · 08-31-2010, 08:42 AM

Originally Posted by tifosi

1) As per what percentage improvement could be seen... JF has already said that with 33% cores 50% performance gain at server workloads could be seen. This is the only information JF is willing to share and unless you hold Intel stock or work for them, i see no reason why'd you press so much for that information... which he already explained that he couldn't share owing to product being some time away from launch (i assume about a good 2 quarters or so...). Personally speaking AMD wouldn't want Intel to have information on an upcoming product, as it will give Intel an edge and possibly a chance to outmaneuver them. It works the same the when it comes to the opposite... The only time Intel leaked information (remember C2D) on an upcoming architecture was when AMD was kicking them around left right and center and in all segments of market... Now if Intel finds out stuff, they could possibly evolve a new pricing strategy (given their scale and market share its easier now) or something else, to counter a competitive product. Competitive BD is...

2) IPC is higher...

3) IPC compared to previous architectures of AMD is higher... he said as much... and many a times over...

to be fair, historically companies that hide information until days before launch tend to have problems with their product, especially the ones who have many delays. Even if BD does have a sizeable increase over k10.5, which it should, I honestly don't think it will be enough to compete with Sandy Bridge.

That preview by Intel was a red cape for AMD to charge at, and I'm willing to bet if they had a better product they would have released their own preview, challenging for the top spot. My guess is that BD will be a fine product, just still not has a powerful as Intel's in terms of pure performance. To me it seems it's more about power efficiency, as JF keeps mentioning 50% off 33% more cores. Well why not 100% off 33% more cores? That's because the thermal envelopes would just be too high not to mention the power draw would be astronomical considering they don't have a working 32nm process.

At least from my perspective, it seems to me that AMD is done challenging for the top enthusiast performance spot. They seem to have shifted onto a new direction, trying to offer the most performance per dollar, especially over the long run when you consider electricity bills. That's quite reasonable, as Intel has far more money spent on their fabrication process, and thus have denser, faster caches which seriously helps out on applications like Super Pi.

**spursindonesia** · 08-31-2010, 08:50 AM

Originally Posted by JF-AMD

OK, so let me get the gist of all of this whole thread down to two statements:

1. People are claiming Bulldozer will be slower than existing products because they are sharing resources in the processor and sharing is inherently worse.

2. People are claiming that even though Bulldozer has dedicated resources relative to the old architecture that shares them, this is worse.

OK, I got it now.

Like what i've just said recently in this thread, this is sooooo predictable, the deed of Intel trolls, better take them lightly as a dry comedy & entertainment.

**AliG** · 08-31-2010, 09:00 AM

Originally Posted by spursindonesia

Like what i've just said recently in this thread, this is sooooo predictable, the deed of Intel trolls, better take them lightly as a dry comedy & entertainment.

this is how I think of them

**Andi64** · 08-31-2010, 09:00 AM

Is the OS aware of the cores sharing resources? If 2 cores of a module have 80% of the performance of two independent cores, when an application is using 2 threads (most of the games, for example) will the OS work on two different modules, or on a single module?

I'm reading the thread, sorry if it has already been answered...

**AliG** · 08-31-2010, 09:17 AM

Originally Posted by Andi64

Is the OS aware of the cores sharing resources? If 2 cores of a module have 80% of the performance of two independent cores, when an application is using 2 threads (most of the games, for example) will the OS work on two different modules, or on a single module?

I'm reading the thread, sorry if it has been already answered...

No one is sure, all JF has said is that AMD is working with MS to devise core utilization order etc.

I would imagine, that ideally for multithreaded tasks you would want the same module due to the shared L2, but for separate tasks you would want different modules due to the performance loss from sharing components

**Motiv** · 08-31-2010, 09:21 AM

Originally Posted by AliG

No one is sure, all JF has said is that AMD is working with MS to devise core utilization order etc.

I would imagine, that ideally for multithreaded tasks you would want the same module due to the shared L2, but for separate tasks you would want different modules due to the performance loss from sharing components

It was answered on the blog, that the shared L2 Cache wouldn't really help.

As for the Multitasking, I suspect it will work like Intels HT. As far as I'm aware, that doesn't cripple 1 core only, but spreads it out amongst the other cores first and foremost.

Thread: AMD's Bobcat and Bulldozer

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions