AMD cuts to the core with 'Bulldozer' Opterons

**informal** · 08-05-2010, 08:36 AM

Originally Posted by -Sweeper_

~11% penalty is not what we'd get even with ''independent'' cores?

That's the penalty due to shared front end.You have around 10% penalty for much much less die space investment(no need to have another front end stage,int cores can use full potential of 2 fmac units,shared L2 per module etc.)
Also this means that each core inside module does perform a bit better on its own(not counting new Turbo BD will have).

Originally Posted by -Boris-

But they perform the same when bottlenecked?

So L3 matters, but performance don't? A phenom II 910 and a Core i7 980X will perform the same in games 2014? I have a feeling that the benchmarks made today which aren't bottlenecked will give a good clue on the 980X scenario.

This is all wrong, there are no numbers of scaling on multiple modules. The only number we have is within a module, between one and two cores.
And 1.8x is 90% performance per core.

How can you get 80% scaling to 11% penalty? The numbers are 90% and 10%.

Phenom II 910 is QC ,the other is 6 core Westmere.Yes L3 matters and you can see this with Agena Vs Deneb.By 2014 games will use more than 4 cores so yes it (Westmere) will be better performer but not because the things you believe.

I can't see what you don't understand? You have a scaling within a module.Anything outside of module should behave the same as Lisbon does today(scaling to 6 cores which communicate over shared L3).Each module is a "super core" if you will and each of those will scale the same-hence 4x1.8(or 8 x 0.9 since you like 90% number more

)

Originally Posted by savantu

Back again to same old stuff : ICC 8.0 did a check for vendor ID, newer versions ( currently ICC 10 ) have the check removed and will check for feature flags ( basically whatever the CPU supports the compiler will throw at it ). However, Intel claim no responsibility for code quality and bugs.
They say the check in 8.0 was introduced simply because AMD did not give them the detailled errata list for their CPUs ( obviously that AMD refrains from sending samples to Intel for validation ).
It would be like AMD sampling now BD to Intel so future updates to Intel's compiler can support BD features.

You should read Agner Fog's latest blogs then

Originally Posted by Agner Fog

Intel have released a new version of their Math Kernel Library (v. 10.3) in beta test.

I have tested the new libraries and found that the CPU dispatching works basically the same way as before. The standard math library, vector math library, short vector math library and the 64-bit version of other math kernel library functions still use an inferior code path for non-Intel processors.

I have found the following differences from previous versions:

* Many functions now have a branch for the forthcoming AVX instruction set, but still only for Intel processors. This will increase the difference in performance between Intel and AMD processors on these functions. Both Intel and AMD are planning to support AVX in 2011.

* The CPU dispatcher for the vector math library has a new branch for non-Intel processors with SSE2. Unlike the generic branch, the new non-Intel SSE2 branch is used only on non-Intel processors, and it is inferior in many cases to the branch used by Intel processors with the same instruction set. The non-Intel SSE2 branch is implemented in the 32-bit Windows version and the 32-bit Linux version, but not in the 64-bit versions of the library.

* A new Summary Statistics library uses the same CPU dispatcher as the vector math library.

Obviously, I haven't tested all functions in the library. There may be more differences that I haven't discovered. But it is clear that many functions in the new version of the library still cripples performance on non-Intel processors. I don't understand how they can do this without violating the legal settlement with AMD.
Reply To This Message

**-Boris-** · 08-05-2010, 09:01 AM

Originally Posted by informal

Phenom II 910 is QC ,the other is 6 core Westmere.Yes L3 matters and you can see this with Agena Vs Deneb.By 2014 games will use more than 4 cores so yes it (Westmere) will be better performer but not because the things you believe.

Stop this nonsense. You claim total performance doesn't matter in real world, with this as proof?!
But after that you claim that L3 has an invisible impact, and core count?
Let me get this straight. If I have my Phenom II running a bunch of games today clocked at 2GHz and 4.8GHz in two runs. Limited by my GPU I get the exact same FPS in both cases. Will both perform equally in the games realesed in 2014? Same core count, same cache, same everything except frequency.

Originally Posted by informal

I can't see what you don't understand? You have a scaling within a module.Anything outside of module should behave the same as Lisbon does today(scaling to 6 cores which communicate over shared L3).Each module is a "super core" if you will and each of those will scale the same-hence 4x1.8(or 8 x 0.9 since you like 90% number more

)

I'm not to sure, we don't know how fast the L3 is. It could be enhanced much. I often see i7 with 3 times the bandwidth. Faster L3 could improve scaling.

**informal** · 08-05-2010, 09:24 AM

Originally Posted by -Boris-

Stop this nonsense. You claim total performance doesn't matter in real world, with this as proof?!
But after that you claim that L3 has an invisible impact, and core count?
Let me get this straight. If I have my Phenom II running a bunch of games today clocked at 2GHz and 4.8GHz in two runs. Limited by my GPU I get the exact same FPS in both cases. Will both perform equally in the games realesed in 2014? Same core count, same cache, same everything except frequency.

What kind of a game would run the same @ 2Ghz and 4.8Ghz Deneb chip?? There is no such a game and if there was it would be extremely GPU bound(I can't emphasize the word extremely).Both cache and core count are important(again look at Agena and Deneb and look at C2D Vs C2Q in modern games). There is a point where games stop scaling with CPU clocks since the GPU (yes even 5970) starts to bottleneck and can't process enough data.This happens with both Deneb and Nehalem.

Also first you need to find and show me "a bunch of games today" that can run the same @ 2Ghz and 4.8Ghz .There are no such games as I mentioned previously above.Second it wouldn't mean the games of 2014 would run the same on those 2 different CPU clocks since the games would a) be hardly playable with your current GPU if you would stick with it b) scaling would stop somewhere in between those two frequencies if you choose to buy a new GPU.The only way a game from 2014 could scale perfectly with clock speeds from 2 to 4.8Ghz is if it was coded with awesome multi core support and it uses all available CPU resources to the maximum(highly unlikely).

I'm not to sure, we don't know how fast the L3 is. It could be enhanced much. I often see i7 with 3 times the bandwidth. Faster L3 could improve scaling.

L3 sharing policies are very different in Deneb/Thuban and Nehalem so you can't just compare the bandwidth like that.AMD uses victim(spill over cache) while intel uses inclusive.

**-Boris-** · 08-05-2010, 09:54 AM

Originally Posted by informal

What kind of a game would run the same @ 2Ghz and 4.8Ghz Deneb chip?? There is no such a game and if there was it would be extremely GPU bound(I can't emphasize the word extremely).Both cache and core count are important(again look at Agena and Deneb and look at C2D Vs C2Q in modern games). There is a point where games stop scaling with CPU clocks since the GPU (yes even 5970) starts to bottleneck and can't process enough data.This happens with both Deneb and Nehalem.

Also first you need to find and show me "a bunch of games today" that can run the same @ 2Ghz and 4.8Ghz .There are no such games as I mentioned previously above.Second it wouldn't mean the games of 2014 would run the same on those 2 different CPU clocks since the games would a) be hardly playable with your current GPU if you would stick with it b) scaling would stop somewhere in between those two frequencies if you choose to buy a new GPU.The only way a game from 2014 could scale perfectly with clock speeds from 2 to 4.8Ghz is if it was coded with awesome multi core support and it uses all available CPU resources to the maximum(highly unlikely).

L3 sharing policies are very different in Deneb/Thuban and Nehalem so you can't just compare the bandwidth like that.AMD uses victim(spill over cache) while intel uses inclusive.

I admit, I exaggerated a bit. Say a 2.8GHz Phenom II and a 4.8GHz one. Would they handle the games released in 2014 equally?

And I know there is differences between i7 and Phenom II caches. My point is we don't know if DB has 48-way, 64-way or 96-way caches. We don't know if they operate around 2GHz or if the are at core speed. We know nothing about the speed of communication between modules.

**informal** · 08-05-2010, 10:30 AM

Originally Posted by -Boris-

I admit, I exaggerated a bit. Say a 2.8GHz Phenom II and a 4.8GHz one. Would they handle the games released in 2014 equally?

And I know there is differences between i7 and Phenom II caches. My point is we don't know if DB has 48-way, 64-way or 96-way caches. We don't know if they operate around 2GHz or if the are at core speed. We know nothing about the speed of communication between modules.

2.8Ghz to 4.8Ghz is 71% improvement and this brings very minor improvement in today's games.In future games I think the GPU based PhysX (not necessarily meaning the NV's approach,using just the term) will play much bigger role and then ,again,CPU will become even less of a factor than it is today.What will mater is a number of relatively fast cores(IMO 4) and cache(by this I mean,at least Penryn and Deneb class @ 2.8-3Ghz).With offloaded PhysX to GPU and massively shader heavy game engines,CPU will use additional cores to offload AI for example,among other things.Also we have to keep in mind that maybe games will use new AVX instruction set and this may play a role in how certain chip performs.

**Movieman** · 08-05-2010, 10:34 AM

Slightly OT but from what I see AMD with the MC's has got a excellent product, just need to get the clocks up.
A 12 core MC at 3000mhz would be a force to reckon with.

**informal** · 08-05-2010, 10:44 AM

Originally Posted by Movieman

Slightly OT but from what I see AMD with the MC's has got a excellent product, just need to get the clocks up.
A 12 core MC at 3000mhz would be a force to reckon with.

Yeah that's true.The only problem is they can't crank the clocks that high AND stay in the 105W ACP bracket

. What I find interesting is that MC and Lisbon(D1) appear to still not using the low k dielectric tweak AMD implemented in Thuban silicon (E0) which practically made possible for AMD to make a hex core desktop chip with the same clock as QC equivalent(955BE,3.6Ghz @ 3 cores with Turbo) while staying within 125W TDP and actually drawing less than 955BE under full load.If they were to use this major process node tweak on server parts,I think they could bring up the clocks on MC by at least 15% while staying at the same power bands as today.

Originally Posted by -Boris-

So you believe that there will be no practical difference between the two? You are very vague.

No,I believe the difference won't be big or impact gameplay that much.Not to mention there won't be 4.8Ghz Deneb chip to actually compare this(unless you do the dry ice session and test it yourself

).But I figure many will be looking to replace their SB or Bulldozer desktop chips with something new in 2014

so testing a Deneb on Dry Ice with the latest GPUs wouldn't be on anyone's priority list

.

**Drwho?** · 08-08-2010, 06:54 PM

Originally Posted by Movieman

Slightly OT but from what I see AMD with the MC's has got a excellent product, just need to get the clocks up.
A 12 core MC at 3000mhz would be a force to reckon with.

More info here: http://www.youtube.com/watch?v=V0UcQDUR-fU

Thread: AMD cuts to the core with 'Bulldozer' Opterons

Thread Tools

Search Thread

Rate This Thread

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions