considering the max turbo kicks in at half module use, it could be optimal performance, and it could be more efficient too.
Printable View
You asked "why not be happy they showed us what kind of air/water OC we can expect." and i answered.
Why should i be happy about how it overclocks if i don't even know how it performs.
As for your concept car stuff... sorry, i couldn't make any sense out of it :shrug:
Windows has been multithreaded aware (SMT at least) since XP, however, aware and 'optimized' are two unrelated terms. Vista did not improve on that, but Windows 7 implemented SMT parking (google it, you will find several references). While not nearly perfect, the improvement in performance in lightly threaded applications can be quite high. Though completely different architecture, BD may or may not benifit from scheduling threads across modules as opposed to within module, but it would make sense that Windows may initially view a module as a dual threaded 'core' and enumerate the contexts as such and take advantage of better scheduling. Total guess but it would make sense. I, personally, would take this with a healthy dose of skepticism until both final silicon and final Windows builds are actually released.
Yes, it's an anagram of DNA. What did I win ? :)
For me, it's the same thing as movieman (or whatever his name) saying that AMD has a winner etc...
Why is there still an on going discussion of core's modules.........
I don't give a rats ass what Marketing calls the chip.
AMD's patent draws a clear picture. They say a picture says a 1000 words right? AMD's own picture for there own patent.
Note core 100 not module 100 aka core 0, and then inside core 0 is 2 clusters A and B.
Case closed.
Attachment 120124
I guess it doesn't matter how marketing calls it and you are right chew*. What matters is performance. If it fails to beat X6 thuban in multithreaded workloads then it doesn't warrant an 8C marketing name IMO. They would be better off calling it 4C/8T part since in this case they would get praised for being much faster than X4 in many applications. But this way,equal or barely faster than Thuban with sky high Turbo clocks ... It is not going to bring them much praise in reviews.
Only reason I have tried to point this out many times so far is due to peoples expectations. Those expecting 100% native 8 core multithreaded performance have unrealistic expectations. Hopefully this gives them a better idea so they can have more realistic expectations.
a good marketing team can take anything and make it sound good
8 cores for the price of intels 4 cores
vs
our 4 core chips are faster than intels 4 core chips (pending perf results)
see.
Well to be honest,they already said so @ Analyst Day presentations. They stated "80% of CMP" approach while having less die area and less power draw. So this 80% means 0.8x the performance of "native" X8 Bulldozer,done the old way of stacking cores next to each other. In other words,without knowing if they actually improved the "cores"/clusters versus K10,the speedup is a far cry from 33% more we have on paper and in marketing slides.
tell that to apple, but your absolutely right that only engineers create the product, while marketing create the demand.
but you need both, and some companies depend on one more than the other. we all hope that a great product will sell itself, but that just isnt true these days with how much marketing affects our lives. and i dont mind some of it where they try to push the good features of a great product, but i cant stand commercials which make their products appear as a necessity rather than a luxury, or convince someone they are inferior until they buy it.
the benchmark im waiting for is OCed gaming perf for a price range vs the competition. Cores and Hz dont mean crap until then.
So...example 4C/8SMT> 6C/6T ??? Efectivity 6C is about 5.5x example, then 4C/8SMT could be about 6.3x?
Well 80% of CMP is 1.6x to be exact ( or 25% lower than perfect scaling). 4 "real cores" as chew said,both with 2 hardware threads running on them,should therefore get you around 6.4x or 6.4/5.5=1.16x speedup over 6C Thuban,provided same clock and same IPC. We have hints that IPC may be lower sometimes and higher sometimes. Also we have 9% higher base clock than 1100T. All in all,20% faster than 1100T should be the expected result,but as we can see from cinebench result for example,it is not faster. Maybe there are corner cases where modules don't perform so weel and some cases where they perform exactly like 2 cores. So it averages to say 1.5x or so.
80% is the second thread, so it would be 1.8x not 1.6x. i asked this to JF on an AMD blog he did a while back.
Ok ,think about it like this : 100pts is a base number for 2 full core performance running some workload,so performance without any "compromises". Call it hypothetical dual core bulldozer done the old way. 80% of this is how much? 100x0.8=80pts. How slower is this than the CMP BD used for comparison? 100/80=1.25 or 25% slower. Or if you want to use 100pts as a base for single core and even count non-perfrect scaling(95%) due to software scheduling limitations : 1 hypothetical BD core 100pts,2 of those in CMP design 195pts. 80% of this is how much exactly? 195x0.8=156pts. How much slower than the hypothetical BD is this? Yes, 25% : 195/156=1.25 .
How about official AMD slide? Is that way off too?
Attachment 120134
Read carefully what it says at the bottom. Where does it say exactly ,in official AMD presentation,that second core adds 80%? The whole module has 80% of CMP performance while having less die area and less power. CMP approach is 2 cores done the old way,or if you want another AMD slide ,here it is:
Attachment 120135
http://blogs.amd.com/work/2010/08/30...ge-1/#comments
Quote:
Manicdan August 30, 2010
the 80% thing is still confusing many,
If I have 2 cores, I get 100% ontop of the 100% of the first core = 100% each
If I have 2 BD cores in 1 module, do i get 80% ontop of the 100%, for 90% each
Or do we get 60% ontop of the 100%, for 80% each?
Considering the 50% performance increase over MC, there really is no wrong answer here, but it does play a very fun role in the conspiracy theory math we like for trying to determine single threaded performance
John Fruehe August 30, 2010
It is all about throughput. To your question it is like 90% each.
One thread on one core = 100 units of throughput
Two threads on two cores in the same module = ~180 units of throughput
Two threads running on 2 cores in 2 different modules = ~200 units of throughput (I know Amdahl’s law says it won’t be straight scaling so it is actually less than that, just relax on that one for a moment.)
The point is that there is a small penalty for a shared environment. But, what is the payoff for that? How about more cores. If we did not share resources, that same die space that holds 16 cores might only hold, perhaps 12 cores, or so. (NOTE TO CONSPIRACY THEORISTS: this is just for example, don’t start making die space assumptions….) Would you give up 20% performance (or less) in order to get 33% more cores? If your application was highly threaded you would do that in a heartbeat.
People are fixating on what you give up by sharing and not what you gain. Think of SMT. You share integer pipelines. But in the example above, you would only get ~120 units of throughput vs. the 200 units of two full cores. So that penalty is 80% for sharing. Funny that nobody ever brings that up.
Well he is marketing guy,the presentation I linked was done by the chief architect,Mike Butler. Who do you think knows better?
Note also that it was said that it was average figure. This means performance can be equal or better to CMP (so yes 1.8x applies) or 1.5x or even lower in some corner cases. It all depends on micro benchmark used. What matters is an average and it is 80% (of dual core CMP approach).
or it means that on average the second core gives 80% when looking across many benchmarks. which makes sense to me because its talking about die size and power in the same sentence. why say 80% perf when your comparing one module to 2 cores, it makes more sense that the extra core costs less area than a traditional core and uses less space.
EDIT
btw if the second core is really that weak then it means single threaded stuff should be very strong.
Because the whole presentation was about a module and not about a single core running on it.... You can even see it,it's so painfully obvious. It talks about 2 hardware execution threads,the good predicatbility of their performance(which they estimate to be 80% of the "old" CMP dual core approach). Read the first bullet point: "What it(:module) is? A monolithic dual core building block that supports two threads of execution" Then at the end: "Customer benefits: Estimated average of 80% of CMP performance(this is 2 cores by the way) with much less die area and power" .It clearly speaks about the module since the module now has much less die are than 2 cores done CMP way.
edit: sorry for bold parts,I just had to point out the obvious in the slide since it somehow escapes you guys.
edit no2:
Single thread should be definitely stronger when running alone on the module,similar goes for SB and Nehalem. The difference is that this single thread should be a lot stronger than a single thread of SB (not directly compared but compared when both run on isolation on their respective cores/modules).Quote:
Originally Posted by Manicdan
SB sees 0-30% speedup with SMT on. Bulldozer module sees around 1.6x speedup on average.This tells us that there is no erratic behavior with module approach and that is at least predictable. You can expect 10-15% better single thread result than what you would get from multicore scaled result.
The question is : is that single core running on a module alone still noticeably faster than K10 core? If it isn't than multicore result will be a lot less better and you won't see 33% better scores over Thuban at similar clock. You may see 10-20% better ,depending on application. Maybe even less than that (as cinebench indicates-no better than Thuban ...).
This again?
I can hardly wait to hear the kind of arguing that will ensue when they subtract the FP unit out of the module and send all the FP/SIMD work to the GPU. How many cores will it have then?
i tried to get clarity on what the 80% was because its clearly confusing people,
i got an answer, and i trust that answer until we are told it was a mistake.