Quote Originally Posted by JumpingJack View Post
So I just glanced over your data and have not fully digested it yet, but all the benches you ran are multithreaded. You have not shown anything that explains single threaded performances, rather what you have done is forced a situation where you have 4 available contexts, in one situation 4 threads are scheduled across 2 modules (a sharing situation) and in other case 4 threads spread over 4 modules (a non-shared situation). What you are showing is the performance hit taken when resources in the front end are shared (i.e. cache, TLBs, BTBs, etc etc). The results would be exactly what I would have expected.

What is more interesting about your data is you can now ascertain AMD's claims of 180% or that 1.8 scaling factor of a module vs two distinct cores.

Here is an experiment to do.... repeat this, but with just one application (Fritz Chess) because this app allows you to specify the number of threads spawn.

Do it for 1, 2, 3, and 4 threads. Then turn everything on (all modules, clusters, cores what ever you want to call them), and do the same runs for 1, 2, 3, 4, 5, 6, 7 and 8 threads and plot the scaling vs thread for the three different configurations.

You will find that the fritz run for 1 thread will not be different regardless of how you configure the modules, clusters, cores (again, call them whatever you want to call them).

jack
Jack, hardware.fr already tried something along those lines with 4m/4t and 2m/4t ,both with Turbo on. In 1st case maximum turbo for all 4 "threads" is 3.9Ghz since all modules are running. In second case it's 4.2Ghz across 2 modules(4 threads). The % difference in Turbo clock(~7%) is not nearly enough to make up for sharing losses as can be seen here:
http://www.hardware.fr/articles/842-...windows-8.html
IMG0033836.gif

4m/4t is 26% faster(!) than 4m/2t at fixed 3.6Ghz and 15% faster when both are running their maximum Turbo modes allowed. Now comes the power draw story.
If you look at the power draw you will see the faster config is 20% more power hungry and I suspect this is the reason why AMD didn't configure the core priorities in that way. I think when PD arrives,power draw will go down sufficiently in order to schedule the threads the faster way and still get good power numbers. Still,with present BD core, for 20% more power you gain 26% more performance this way,not a bad tradeoff. If GloFo would get their act together and make possible for AMD to produce 3.6Ghz 5 module PD core with this thread affinity capability,this thing could very well be significantly more powerful than Thuban ,even in ST at fixed clock and noticeably more powerful than BD in both ST and MT with both Turbo on and off.

By the way,great thread DGLee