AMD cuts to the core with 'Bulldozer' Opterons

Printable View

Show 100 post(s) from this thread on one page

08-05-2010, 03:19 PM
Chumbucket843

i wont turn this into a semantics debate but i wouldnt call a module a dual core because it can run two threads. the cores share a front end that cannot be separated from them and still function.
08-05-2010, 03:28 PM
ajaidev

Quote:

Originally Posted by SEA

That is a mistake.
Consider scenario:
1 thread is running on single core of BD delivering a 100% of performance.
2 threads are running on 2 cores within same DB module and delivers 180% of performance.
Now, 2 same threads are running on 2 cores of different DB modules. They deliver 200% of performance. So we saw a performance impact - 180% instead of 200%. And no performance increase...
And this is exactly as it would be with Intel's core with HT.

Quote:

Originally Posted by SEA

You can call it hiperthreading, or whatever, but it is a fact: a DB module is not equal to 2 cores and then it should not be considered this way. Just a module, 2-threaded module

No they will not, hell 200% i am sure SB cant do that :D. The thing behind the arc is to make the cores smaller so that there are more of them and this helps them in better speeds. Intel does things a bit differently and well HT is not super effective everywhere "X3 435 vs i5 530"
08-05-2010, 05:30 PM
mAJORD

Sorry if this has been brought up, i'm struggling to follow this thread now!, but If you look at the 50% higher performance claims as a fully utilised, 4 Module - 8 module CPU, then shouldn't single threaded performance actally be a strong point? since a single core processing a thread isn't sharing the front end, nor the cache?

The statement that performance "scales 80%" from 1 core, to 2 cores in a module kind of proves this point.

This also makes any "12.5%" IPC per core calculation wrong.

I asked this quesiton earlier, (I think) but I'll ask again. Does anyone know how 4 threads are handled by the OS on a 4 module - 8 core BD?
08-05-2010, 07:49 PM
JF-AMD

Quote:

Originally Posted by -Boris-

The differences between Istanbul and Magny Cours is none except that Magny Cours has a new socket.

This is not true, they are different steppings with different features.
08-05-2010, 10:30 PM
savantu

Quote:

Originally Posted by mAJORD

Sorry if this has been brought up, i'm struggling to follow this thread now!, but If you look at the 50% higher performance claims as a fully utilised, 4 Module - 8 module CPU, then shouldn't single threaded performance actally be a strong point? since a single core processing a thread isn't sharing the front end, nor the cache?

In theory it should given that it uses resources exclusively. But it depends on the uarch, is it similar to K7 lineage or something new ?

Quote:

The statement that performance "scales 80%" from 1 core, to 2 cores in a module kind of proves this point.

Scaling is a meaningless number in this context.

A module vs. 2 cores :
- one multithreaded front end / 2 front ends
- 2 integer clusters / 2 integer clusters
- one FPU / 2 FPUs
- shared L2 / shared or exclusive L2s

You can get to 80% from 1 to 2 cores in a single scenario : integer calculations.

Quote:

This also makes any "12.5%" IPC per core calculation wrong.

1.5x more performance / 1.33x more cores = 1.1278x more performance per core.
12.78%.

Quote:

I asked this quesiton earlier, (I think) but I'll ask again. Does anyone know how 4 threads are handled by the OS on a 4 module - 8 core BD?

No different than a current 4 core CPU.
08-05-2010, 10:38 PM
Movieman

Quote:

Originally Posted by savantu

In theory it should given that it uses resources exclusively. But it depends on the uarch, is it similar to K7 lineage or something new ?

Scaling is a meaningless number in this context.

A module vs. 2 cores :
- one multithreaded front end / 2 front ends
- 2 integer clusters / 2 integer clusters
- one FPU / 2 FPUs
- shared L2 / shared or exclusive L2s

You can get to 80% from 1 to 2 cores in a single scenario : integer calculations.

1.5x more performance / 1.33x more cores = 1.1278x more performance per core.
12.78%.

No different than a current 4 core CPU.

Now I'm not a very intiitive person BUT I swear to you that I just KNEW you'd be in this thread tonight!:rofl:
08-05-2010, 11:03 PM
-Boris-

Quote:

Originally Posted by informal

"Greatly outperform" in any game(for real ?) at what settings?In real world gaming?Nice try though.

Try to read what I am saying. NOT in real world gaming, but in situations that aren't GPU-limited. Because these processors won't be GPU-limited 2014 in the same way. Just as a A64 4400+ bought 2006 isn't GPU limited with a 5770.
So, instead of reading your bench that says an Athlon II X2 can compete with a 980X, you do an ordinary game bench, where you measure the differences between the CPUs, not measure the GPU. And that way you know which CPU that is most likely to still have adequate performance 2014.

I say that if an i7 can outperform an Phenom II with 50% in the games we have today, it's much more likely to achieve playable FPS 2014. But if we do it your way, and limit them down to the same FPS in the games of today with underpowered GPU, we will be fooled into believing they have the same gaming performance.

Example:
In the year 2010, CPU A limits a game at 50FPS, CPU B limits the same game at 100FPS.
In the year 2014, CPU A limits a new game at 25FPS, CPU B limits the same game at 50FPS. CPU B is still good.
BUT, if we do the 2010 test with a GPU that limits the game at 30FPS, we won't see the difference, and will be disappointed when our CPU A proved to have a poor longevity.

And of course I know that you can't predict this exactly and that their relationship performance wise may change. But that's beside the point.
2004 my A64 ran just as fast as my friends P4, since we had the same GPU. But 2007, my old A64 was the only one that still performed good in games.

Quote:

Originally Posted by Manicdan

@Boris
take a look at steam hardware surveys and see how many gaming rigs have quads
your predictions for gaming are not very valid, by 2014, we might see >50% of games be quad optimized, but not much more than that.

why would i spend 1000$ on a cpu, when 300$ does just as good, and might be 10% worse in 4 years, when 300$ then could get me something that stomps the 1000$ cpu of today

you really should look into the limiting factors for cpus in games and try and assume how that will change in the future. do you really expect the requirements for games to increase 4x, when in the last 4 years have maybe just doubled.

I don't talk about quad, I talk about any CPU against any other CPU. And no, I don't expect them to increase 4x. But I expect them to increase. And IF requirements double every four years, then will the CPU with 100% more performance today last several years longer.

I don't say that you should buy more expensive CPUs because of games 2014. All I'm trying to say is that if two processors are comparable today, around the same performance and price, except that one of the has much better performance in games. The one with better performance in games have an advantage in a couple of years.

I actually don't understand how people can say that great differences in performance in the games today won't matter later. If you buy a new CPU every year maybe, but most people don't.

Quote:

Originally Posted by SEA

That is a mistake.
Consider scenario:
1 thread is running on single core of BD delivering a 100% of performance.
2 threads are running on 2 cores within same DB module and delivers 180% of performance.
Now, 2 same threads are running on 2 cores of different DB modules. They deliver 200% of performance. So we saw a performance impact - 180% instead of 200%. And no performance increase...
And this is exactly as it would be with Intel's core with HT.

You misunderstand me. ;) I say that hyperthreading can leave a performance impact in single threaded applications. But a module design might actually increase in performance in single threaded applications compared to core design, since it now can focus the prefetch and FPU on only one thread. :)
Two FPUs might increase performance in single threaded programs. If the FPU wasn't shared, this wouldn't be possible.

Quote:

Originally Posted by mAJORD

Sorry if this has been brought up, i'm struggling to follow this thread now!, but If you look at the 50% higher performance claims as a fully utilised, 4 Module - 8 module CPU, then shouldn't single threaded performance actally be a strong point? since a single core processing a thread isn't sharing the front end, nor the cache?

The statement that performance "scales 80%" from 1 core, to 2 cores in a module kind of proves this point.

This also makes any "12.5%" IPC per core calculation wrong.

Yes, you are right.
I've stated in a previous post that the performance must be higher than 112.5% in single threaded applications if bulldozer scales bad over 4 threads (8 for MCM). But I've also said that it must be lower than 112.5% if it scales better than MC.

You can't have more than 12.5% performance increase and 33% more cores at once and still only deliver 50% more performance with the same scaling. That's why the scaling must be worse in Bulldozer to achieve more than 12.5% performance increase in single threaded application.

Quote:

Originally Posted by mAJORD

I asked this quesiton earlier, (I think) but I'll ask again. Does anyone know how 4 threads are handled by the OS on a 4 module - 8 core BD?

I think (hope) that it will try to have one thread per module at once. Four heavy threads would be spread out on 4 modules. It will need 5 heavy threads or more before it's giving a module two threads.
But unfortunately I don't really have a clue. :(

Quote:

Originally Posted by JF-AMD

This is not true, they are different steppings with different features.

Yes, and a bit different process on Thuban. But were talking performance wise. Is the ALU or FPU different? Any internal latencies? Or will there be no difference in performance if you negate the MCM design?
I think there is no difference in performance between two istanbul and one Magny Cours that can be derived from the MCM design. (better interconnect).

Or if it is, can you enlighten me what that difference would be.
08-06-2010, 12:03 AM
tifosi

Quote:

Originally Posted by JF-AMD

This is not true, they are different steppings with different features.

However, most of the architectural changes made to the core weren't publicized by AMD... Now that it is available in open market and next gen tech is almost here... could we possibly have more information on core tech in Magny Cours?

Oh, before someone flames me, i know JF has a job to do and i respect that. If he shares info, well and good. If he can't, it is alright just as well. However, i had to ask. Curiosity begot the cat and now its me. :D
08-06-2010, 12:18 AM
FlanK3r

I know only at Thuban core iis included low-K
08-06-2010, 12:22 AM
EvilOne

What do you guys guess the single thread performance will be when only 1 core per module gets used instead of 2? Do you think the extra resorces will give much of a boost compared to using both the cores. I know there's shared L2 cache, but not sure what else can be used in this way.
08-06-2010, 12:27 AM
-Boris-

Quote:

Originally Posted by EvilOne

What do you guys guess the single thread performance will be when only 1 core per module gets used instead of 2? Do you think the extra resorces will give much of a boost compared to using both the cores. I know there's shared L2 cache, but not sure what else can be used in this way.

We already have numbers on this. Running two threads will lower performance of each "core" 10%. That's why it's important to balance the threads evenly between modules. On the other hand the design seems to have lots of power to begin with.
08-06-2010, 01:26 AM
Sn0wm@n

i like these threads ... cheap entertainment
08-06-2010, 01:55 AM
Dresdenboy

Quote:

Originally Posted by -Boris-

We already have numbers on this. Running two threads will lower performance of each "core" 10%. That's why it's important to balance the threads evenly between modules. On the other hand the design seems to have lots of power to begin with.

This is only a statement about the effect of adding a second core. Power etc. seems not to be included here. So if only one cluster is being used, a thread could run ~11% faster solely based on this statement (it's not a fact, nothing is measured here). Further the saved power of the second cluster, L1D$ etc. and less utilization of the 256 bit FPU could leave some headroom to boost the single running cluster to maybe +30% overall.
08-06-2010, 02:12 AM
-Boris-

Quote:

Originally Posted by Dresdenboy

This is only a statement about the effect of adding a second core. Power etc. seems not to be included here. So if only one cluster is being used, a thread could run ~11% faster solely based on this statement (it's not a fact, nothing is measured here). Further the saved power of the second cluster, L1D$ etc. and less utilization of the 256 bit FPU could leave some headroom to boost the single running cluster to maybe +30% overall.

You have a point. And of course, anything regarding actual performance is based on statements and nothing more at this point.
I mentioned the possibility of good single thread performance before, based on the shared resources like FPU focusing on only one core.

What do you think about turbo in bulldozer? Will it be there? And will it work on core level or module level? Personally I guess that it will be a turbo on core level further increasing performance in 1-4 threads.
08-06-2010, 02:19 AM
mAJORD

Quote:

Originally Posted by savantu

In theory it should given that it uses resources exclusively. But it depends on the uarch, is it similar to K7 lineage or something new ?

Scaling is a meaningless number in this context.

A module vs. 2 cores :
- one multithreaded front end / 2 front ends
- 2 integer clusters / 2 integer clusters
- one FPU / 2 FPUs
- shared L2 / shared or exclusive L2s

You can get to 80% from 1 to 2 cores in a single scenario : integer calculations.

1.5x more performance / 1.33x more cores = 1.1278x more performance per core.
12.78%.

No different than a current 4 core CPU.

That's 12.78% with the same number of threads/cores.. (12 in this example)

If you only get 80% (integer) performance going from 1 to 2 cores, then for single threaded performance, or in Boris example, with only 1 thread per module, then under that same calculation it would be more like 22-25% more performance / core.
08-06-2010, 03:00 AM
Dresdenboy

Quote:

Originally Posted by savantu

Scaling is a meaningless number in this context.

A module vs. 2 cores :
- one multithreaded front end / 2 front ends
- 2 integer clusters / 2 integer clusters
- one FPU / 2 FPUs
- shared L2 / shared or exclusive L2s

The multithreaded front end can be wider because it's higher throughput will be useful.
The one FPU is twice as powerful in throughput.

Quote:

Originally Posted by -Boris-

What do you think about turbo in bulldozer? Will it be there? And will it work on core level or module level? Personally I guess that it will be a turbo on core level further increasing performance in 1-4 threads.

John already wrote, that BD will have some improved form of turbo. And I posted one of my interpretations here. There is a lot that can be done. And further a 16 core chip like Interlagos will have it's mem channels saturated, since they're not only used for normal data fetches but also prefetches, which sometimes are useless. So running less threads not only creates power headroom, but also mem bandwidth headroom available for more aggressive prefetches or modern techniques like runahead execution/scout threads. If there is not much mem b/w left, not much could be done with such methods, since they have lower priority than actual requests.
08-06-2010, 03:11 AM
madcho

don't try to calc the +% performance, you'll never find really how much.

Just read blogs and wait 24.
08-06-2010, 03:14 AM
-Boris-

Quote:

Originally Posted by madcho

don't try to calc the +% performance, you'll never find really how much.

Just read blogs and wait 24.

Of course we won't get any performance numbers by calculating. But we can make some ill founded calculations on the scaling differences. ;)
08-06-2010, 06:12 AM
JF-AMD

Quote:

Originally Posted by tifosi

However, most of the architectural changes made to the core weren't publicized by AMD... Now that it is available in open market and next gen tech is almost here... could we possibly have more information on core tech in Magny Cours?

Oh, before someone flames me, i know JF has a job to do and i respect that. If he shares info, well and good. If he can't, it is alright just as well. However, i had to ask. Curiosity begot the cat and now its me. :D

Sorry, we don't get into core level things here. There are things that I just can't talk about.
08-06-2010, 06:15 AM
JF-AMD

Quote:

Originally Posted by madcho

don't try to calc the +% performance, you'll never find really how much.

Just read blogs and wait 24.

Hot Chips is about the architecture, you will not see performance estimates or benchmarks.

You'll see a lot more about what is inside the core, but we won't make statements about performance other than what we have already said because a.) it is not that type of event and b.) we don't have any benchmarks released unless we have final silicon or are very confident that final silicon will behave the same as what we have in our hand.
08-06-2010, 08:09 AM
tifosi

Quote:

Originally Posted by JF-AMD

Sorry, we don't get into core level things here. There are things that I just can't talk about.

Thanks for replying mate! :up:

p.s: dunno if you checked this other thread, but here's a tip for your legal team. :D

http://www.xtremesystems.org/forums/...ld#post4500153
08-06-2010, 09:58 AM
JF-AMD

Quote:

Originally Posted by tifosi

Thanks for replying mate! :up:

p.s: dunno if you checked this other thread, but here's a tip for your legal team. :D

http://www.xtremesystems.org/forums/...ld#post4500153

That is a suggestion, not something that is real. Akin to saying if someone had a gun they might rob a bank.
08-06-2010, 11:02 AM
tifosi

Quote:

Originally Posted by JF-AMD

That is a suggestion, not something that is real. Akin to saying if someone had a gun they might rob a bank.

Understood. However, we are talking about someone who's been caught more than once with their hand in the cookie jar, sorry that was Dell (pun is intended). This i type on an Intel system. O' the horror, the shame :ROTF:
08-06-2010, 02:45 PM
god_43

who resurrected this thread...why? this is from February man!
08-08-2010, 06:54 PM
Drwho?

Quote:

Originally Posted by Movieman

Slightly OT but from what I see AMD with the MC's has got a excellent product, just need to get the clocks up.
A 12 core MC at 3000mhz would be a force to reckon with.

More info here: http://www.youtube.com/watch?v=V0UcQDUR-fU :rofl:

Show 100 post(s) from this thread on one page