i wont turn this into a semantics debate but i wouldnt call a module a dual core because it can run two threads. the cores share a front end that cannot be separated from them and still function.
Printable View
i wont turn this into a semantics debate but i wouldnt call a module a dual core because it can run two threads. the cores share a front end that cannot be separated from them and still function.
No they will not, hell 200% i am sure SB cant do that :D. The thing behind the arc is to make the cores smaller so that there are more of them and this helps them in better speeds. Intel does things a bit differently and well HT is not super effective everywhere "X3 435 vs i5 530"
Sorry if this has been brought up, i'm struggling to follow this thread now!, but If you look at the 50% higher performance claims as a fully utilised, 4 Module - 8 module CPU, then shouldn't single threaded performance actally be a strong point? since a single core processing a thread isn't sharing the front end, nor the cache?
The statement that performance "scales 80%" from 1 core, to 2 cores in a module kind of proves this point.
This also makes any "12.5%" IPC per core calculation wrong.
I asked this quesiton earlier, (I think) but I'll ask again. Does anyone know how 4 threads are handled by the OS on a 4 module - 8 core BD?
In theory it should given that it uses resources exclusively. But it depends on the uarch, is it similar to K7 lineage or something new ?
Scaling is a meaningless number in this context.Quote:
The statement that performance "scales 80%" from 1 core, to 2 cores in a module kind of proves this point.
A module vs. 2 cores :
- one multithreaded front end / 2 front ends
- 2 integer clusters / 2 integer clusters
- one FPU / 2 FPUs
- shared L2 / shared or exclusive L2s
You can get to 80% from 1 to 2 cores in a single scenario : integer calculations.
1.5x more performance / 1.33x more cores = 1.1278x more performance per core.Quote:
This also makes any "12.5%" IPC per core calculation wrong.
12.78%.
No different than a current 4 core CPU.Quote:
I asked this quesiton earlier, (I think) but I'll ask again. Does anyone know how 4 threads are handled by the OS on a 4 module - 8 core BD?
Try to read what I am saying. NOT in real world gaming, but in situations that aren't GPU-limited. Because these processors won't be GPU-limited 2014 in the same way. Just as a A64 4400+ bought 2006 isn't GPU limited with a 5770.
So, instead of reading your bench that says an Athlon II X2 can compete with a 980X, you do an ordinary game bench, where you measure the differences between the CPUs, not measure the GPU. And that way you know which CPU that is most likely to still have adequate performance 2014.
I say that if an i7 can outperform an Phenom II with 50% in the games we have today, it's much more likely to achieve playable FPS 2014. But if we do it your way, and limit them down to the same FPS in the games of today with underpowered GPU, we will be fooled into believing they have the same gaming performance.
Example:
In the year 2010, CPU A limits a game at 50FPS, CPU B limits the same game at 100FPS.
In the year 2014, CPU A limits a new game at 25FPS, CPU B limits the same game at 50FPS. CPU B is still good.
BUT, if we do the 2010 test with a GPU that limits the game at 30FPS, we won't see the difference, and will be disappointed when our CPU A proved to have a poor longevity.
And of course I know that you can't predict this exactly and that their relationship performance wise may change. But that's beside the point.
2004 my A64 ran just as fast as my friends P4, since we had the same GPU. But 2007, my old A64 was the only one that still performed good in games.
I don't talk about quad, I talk about any CPU against any other CPU. And no, I don't expect them to increase 4x. But I expect them to increase. And IF requirements double every four years, then will the CPU with 100% more performance today last several years longer.
I don't say that you should buy more expensive CPUs because of games 2014. All I'm trying to say is that if two processors are comparable today, around the same performance and price, except that one of the has much better performance in games. The one with better performance in games have an advantage in a couple of years.
I actually don't understand how people can say that great differences in performance in the games today won't matter later. If you buy a new CPU every year maybe, but most people don't.
You misunderstand me. ;) I say that hyperthreading can leave a performance impact in single threaded applications. But a module design might actually increase in performance in single threaded applications compared to core design, since it now can focus the prefetch and FPU on only one thread. :)
Two FPUs might increase performance in single threaded programs. If the FPU wasn't shared, this wouldn't be possible.
Yes, you are right.
I've stated in a previous post that the performance must be higher than 112.5% in single threaded applications if bulldozer scales bad over 4 threads (8 for MCM). But I've also said that it must be lower than 112.5% if it scales better than MC.
You can't have more than 12.5% performance increase and 33% more cores at once and still only deliver 50% more performance with the same scaling. That's why the scaling must be worse in Bulldozer to achieve more than 12.5% performance increase in single threaded application.
I think (hope) that it will try to have one thread per module at once. Four heavy threads would be spread out on 4 modules. It will need 5 heavy threads or more before it's giving a module two threads.
But unfortunately I don't really have a clue. :(
Yes, and a bit different process on Thuban. But were talking performance wise. Is the ALU or FPU different? Any internal latencies? Or will there be no difference in performance if you negate the MCM design?
I think there is no difference in performance between two istanbul and one Magny Cours that can be derived from the MCM design. (better interconnect).
Or if it is, can you enlighten me what that difference would be.
However, most of the architectural changes made to the core weren't publicized by AMD... Now that it is available in open market and next gen tech is almost here... could we possibly have more information on core tech in Magny Cours?
Oh, before someone flames me, i know JF has a job to do and i respect that. If he shares info, well and good. If he can't, it is alright just as well. However, i had to ask. Curiosity begot the cat and now its me. :D
I know only at Thuban core iis included low-K
What do you guys guess the single thread performance will be when only 1 core per module gets used instead of 2? Do you think the extra resorces will give much of a boost compared to using both the cores. I know there's shared L2 cache, but not sure what else can be used in this way.
i like these threads ... cheap entertainment
This is only a statement about the effect of adding a second core. Power etc. seems not to be included here. So if only one cluster is being used, a thread could run ~11% faster solely based on this statement (it's not a fact, nothing is measured here). Further the saved power of the second cluster, L1D$ etc. and less utilization of the 256 bit FPU could leave some headroom to boost the single running cluster to maybe +30% overall.
You have a point. And of course, anything regarding actual performance is based on statements and nothing more at this point.
I mentioned the possibility of good single thread performance before, based on the shared resources like FPU focusing on only one core.
What do you think about turbo in bulldozer? Will it be there? And will it work on core level or module level? Personally I guess that it will be a turbo on core level further increasing performance in 1-4 threads.
That's 12.78% with the same number of threads/cores.. (12 in this example)
If you only get 80% (integer) performance going from 1 to 2 cores, then for single threaded performance, or in Boris example, with only 1 thread per module, then under that same calculation it would be more like 22-25% more performance / core.
The multithreaded front end can be wider because it's higher throughput will be useful.
The one FPU is twice as powerful in throughput.
John already wrote, that BD will have some improved form of turbo. And I posted one of my interpretations here. There is a lot that can be done. And further a 16 core chip like Interlagos will have it's mem channels saturated, since they're not only used for normal data fetches but also prefetches, which sometimes are useless. So running less threads not only creates power headroom, but also mem bandwidth headroom available for more aggressive prefetches or modern techniques like runahead execution/scout threads. If there is not much mem b/w left, not much could be done with such methods, since they have lower priority than actual requests.
don't try to calc the +% performance, you'll never find really how much.
Just read blogs and wait 24.
Hot Chips is about the architecture, you will not see performance estimates or benchmarks.
You'll see a lot more about what is inside the core, but we won't make statements about performance other than what we have already said because a.) it is not that type of event and b.) we don't have any benchmarks released unless we have final silicon or are very confident that final silicon will behave the same as what we have in our hand.
Thanks for replying mate! :up:
p.s: dunno if you checked this other thread, but here's a tip for your legal team. :D
http://www.xtremesystems.org/forums/...ld#post4500153
who resurrected this thread...why? this is from February man!
More info here: http://www.youtube.com/watch?v=V0UcQDUR-fU :rofl: