I'm not disagreeing with you but some people you just can't please. It has shared resources, big deal. I read somewhere that the module is 80-90% of the efficiency of a true dual core. Close enough for me to call a module two cores.![]()
I'm not disagreeing with you but some people you just can't please. It has shared resources, big deal. I read somewhere that the module is 80-90% of the efficiency of a true dual core. Close enough for me to call a module two cores.![]()
As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"
It will be a few weeks before Newegg put it on sale, lets just stop arguing.......unless peep really work at AMD's BD design team. Lol
It's in official slides from FAD 2010. This is an average number,there are probably some cases where sharing doesn't cost any performance and somewhere where it does cost more. BTW Bulldozer has all (integer) cores turbo so this alone will negate some of the penalties that *may* occur.
Also note that there is no multicore chip design out there that scales perfectly with more threads and at the same time has none of the MT technologies(fine grained,coarse grained,SMT). Usually conventional CMP designs (a la Opteron,Conroe) scale ~90% with second thread,on well multithreaded workloads.
Last edited by informal; 08-26-2011 at 04:49 PM.
AMD's Bulldozer Blog (centralized around Server, though). The exact entry I had linked to about 1/3 or 1/2 way through the thread, but for a different reason. I'd have to read it again to make sure this is accurate (which I've not the time to do atm), but I believe it said something on the order that if each module were working on one thread, it would yield quite a bit more performance than a single Magny-Cours core (heh); however, a single module would only be about 90% of the performance of two of Magny's cores.
Since I'm 98% sure what I read was by JF (John) anyways, this is just as valid as whatever blog post I'm referring to (though I think it might be one of the 20 Questions posts found here):
http://www.xtremesystems.org/forums/...1&postcount=67
What I wrote might not be verbatim, hence my disclaimer that it might not be accurate and that I am going off memory lol Regardless of if I was close, or wayyyy off, I read pretty much all of the Bulldozer related blog entries at AMD. All three of the 20-Questions entries are a good read and where I think John may have said what I was going off of, but I was on Vacation for all of July (without internet) so my brain has been making room for new info while playing catch-up hahaIf I get time after editing reviews I'll try to dig for the specific blog link.
No.
If both cores in each module are being used, AMD says it would get 80% higher performance (i.e 180% of single core performance).
That percentage had nothing to do with K10.5 - it's looking solely at one BD module, with single thread performance vs two thread performance.
If you ran each thread in separate modules, you'd get better scaling than that though.
JF doesn't said that. He said that the one module has 1.8x of performance scaling with two threads on it. Single module with two threads would be 90% of the performance of two BD modules with two threads. In other words CMT - clustered multithreaded module has 90% of chip multiprocessed two cores with same type of microarchitecture. There isn't comparison with Magny Cours or 10h.
He talks about mythical 6-core bulldozer:Since I'm 98% sure what I read was by JF (John) anyways, this is just as valid as whatever blog post I'm referring to (though I think it might be one of the 20 Questions posts found here):
http://www.xtremesystems.org/forums/...1&postcount=67
And someone concluded that the IPC decreases. :dMythical 6-core bulldozer:
100% + 95% + 95% + 95% + 95% + 95% = 575%
Orochi die with 4 modules:
180% + 180% + 180% + 180% = 720%
What if we had just done a 4 core and added HT (keeping in the same die space):
100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%
What about a 6 core with HT (has to assume more die space):
100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%
And, from that moment, there is so much people who think that BD has decreased IPC from 10h.
IPC = Instructions Per Cycle. This is a number who's tells us how many instructions CPU can retire per one cycle. Today modern Out Of Order processors have many execution units, they can execute more than one instruction per cycle, parallel, and out of order. Because program code is in order, there is quite difficult to make such machine to execute out of order, because of data dependencies, too many branches in code, etc.
IPC is software performance measuring unit. If code is properly optimised, it can run on CPU, or even GPU with more instructions per cycle. However, also if CPU is faster, than it can run same software with higher IPC number.
If BD has 50% more throughput than 6-core 10h, at probably same clock, that means the server workload uses all cores. That means the core per core BD vs 10h, BD has 12.5% more throughput than 10h, but single module can do 25% more serialized, single thread jobs than one single core.Maybe here is not talking about core to core, but about cores to cores (6 vs 8 etc)
Last edited by drfedja; 08-27-2011 at 05:09 PM.
"That which does not kill you only makes you stronger." ---Friedrich Nietzsche
PCAXE
"That which does not kill you only makes you stronger." ---Friedrich Nietzsche
PCAXE
Bookmarks