Maybe need to patch for running core priority.
Now : Core 0 --> Core 1 --> Core 2 --> Core 3 --> Core 4 --> Core 5 --> Core 6 --> Core 7
Right priority : Core 0 --> Core 2 --> Core 4 --> Core 6 --> Core 1 --> Core 3 --> Core 5 --> Core 7
CMIIW![]()
@Thread starter: Good comparision. Could you also show Thuban numbers 4C and same clock. Also is turbo disabled on all (would be more accurate that way)
PS: where is the thank you button ?![]()
Va fail, dh'oine.
"I am going to hunt down people who have strong opinions on subjects they dont understand " - Dogbert
Always rooting for the underdog ...
AMD FX-8350 (1237 PGN) | Asus Crosshair V Formula (bios 1703) | G.Skill 2133 CL9 @ 2230 9-11-10 | Sapphire HD 6870 | Samsung 830 128Gb SSD / 2 WD 1Tb Black SATA3 storage | Corsair TX750 PSU
Watercooled ST 120.3 & TC 120.1 / MCP35X XSPC Top / Apogee HD Block | WIN7 64 Bit HP | Corsair 800D Obsidian Case
First Computer: Commodore Vic 20 (circa 1981).
Windows 7 is already handling things like this for Intel processors with HT, using real cores first and logical cores later.
However, according to AMD there are situations where you don't even want this behavior.
Take a look at the first two pictures at THG:
http://www.tomshardware.co.uk/fx-815...-32295-23.html
Because of the shared L1-Cache it makes indeed sense that in some cases it can be faster to use the whole module instead of splitting things up and utilize two modules partially. This means that the scheduler has to be more intelligent though, as it's not enough to just assign each new task to a new core like now, instead it must be able to guess which tasks should be grouped to one module and which should be split over two (more) modules.
I'm no coder but I can imagine easier projects than making the scheduler aware of such a complex problem.
Power Rig: Core i7-5930K, ASRock X99 Extreme6/3.1, 16GB G.Skill DDR4-2400, Asus Strix GTX980 OC
Time Sink: Core i7-5775C, ASRock Z97E-ITX/ac, 16GB AMD DDR3-2133, Silverstone PT-09 w/ 120W Power Brick
HTPC: Athlon 5350, ASRock AM1H-ITX, 4GB DDR3, Supermicro SC-101i
There is no real or logical core in BD.
There are clusters, simple as that.
When you disable a cluster in BIOS, you do the same thing as AMD's diagram.
AMD's diagram
Core 0 - shared
Core 1 - one cluster
Core 2 - shared
Core 3 - one cluster (uses all resources for 1 thread)
What were doing
Core 0 - one cluster
Core 1 - disabled
Core 2 - one cluster
Core 3 - disabled
Smile
I'm aware of that and I never said that it works like that on BD.
At the moment Windows sees the processor as having 8 real cores and assigns the tasks accordingly but doesn't care (know) about the whole module thing.
Power Rig: Core i7-5930K, ASRock X99 Extreme6/3.1, 16GB G.Skill DDR4-2400, Asus Strix GTX980 OC
Time Sink: Core i7-5775C, ASRock Z97E-ITX/ac, 16GB AMD DDR3-2133, Silverstone PT-09 w/ 120W Power Brick
HTPC: Athlon 5350, ASRock AM1H-ITX, 4GB DDR3, Supermicro SC-101i
That's fine (except your wording is inaccurate). Somehow we should trick it to use this method for BD, as well...
It depends on if the penalty of forcing those closely related threads to communicate through L3 (instead of L2) is more or less than the gain on the lack of sharing resources. It seems most applications only benefits from it:However, according to AMD there are situations where you don't even want this behavior.
img0033832.gif
So, there could be a little patch that simply enables scheduling a' la SMT in Win7, that it already supports (if true)...
Quoted from the article:
And so the default behaviour will be separation (contrary to what JF said all along)? Would be just stupid if not... Of course, power consumption is higher because more modules are active, but here we can see also that with turbo enabled the the energy efficiency is really the same...According to AMD, Windows 8 will more intelligently align threads so that, when they can benefit from sharing a module, they will. The implication is that when two threads can be consolidated onto one module (despite the fact that they’re forced to share resources), putting an entire module to sleep and potentially enabling a higher p-state (a faster Turbo Core setting) outweighs any performance penalty tied to sharing.
Well, unless there is a fix coming (HW or SW or both) that largely improves on the penalty of sharing resoruces. Just because the current numbers are much worse (anywhere between 95% to 160%, with one case of 180%) than what they've propagated (180% across the board), and so one can think there is some flaw somewhere here, as well. (And there is indeed the case of L1D trashing, that they claim to be responsible for only 3% decrease.)
What diagram? Do you mean this? Which part of it?
Do you mean, if we disable every other "core" in the BIOS? Then no, you will get this:What were doing
Core 0 - one cluster
Core 1 - disabled
Core 2 - one cluster
Core 3 - disabled
Core (Module) 0 - one cluster
Core (Module) 1 - one cluster
Core (Module) 2 - one cluster
Core (Module) 3 - one cluster
ps. perhaps the title of the thread should be changed to "Thread separation vs. turbo", or something like that, to be more meaningful.
Last edited by dess; 10-14-2011 at 05:40 PM.
Sorry, I should have typed out:
Core 0 - one cluster
Core 1 - disabled
Core 2 - one cluster
Core 3 - disabled
Core 4 - one cluster
Core 5 - disabled
Core 6 - one cluster
Core 7 - disabled
...but I thought everyone would be smart enough to get the picture.
The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
Smile
I've thought you've called a CU a core (just like others before you), not just because you were listing four of it two times, but that you used terms like "shared" and "one cluster" next to it. Why calling something two names, anyway?
Now, care to elaborate for the stupid like me what it all means and what diagram:
AMD's diagram
Core 0 - shared
Core 1 - one cluster
Core 2 - shared
Core 3 - one cluster (uses all resources for 1 thread)
Effectively everybody uses the term Module here, but fine, let it be "Compute Unit"... (No, not Computational Unit, while it means the same, this term is not used in this form.)The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
Last edited by dess; 10-15-2011 at 02:28 AM.
That image shows Thread 1 (a/b) running sharing a CU while Thread 2 has it's own CU, does it not?
By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
That is what I was describing. There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.
I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
Last edited by BeepBeep2; 10-15-2011 at 11:02 AM.
Smile
Yes.
No. That way all of Thread 1a, Thread 1b and Thread 2 would run on a separate CU.By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
(You can't "disable" a sw thread this way, anyway. If there are less cores enabled than threads to run, then those will simply share the available cores more.)
Why should there be Thread 2b? SW threads are given. The diagram represents a situation when there are three threads to run. Two of them closely reladed (Thread 1a and 1b), i.e. working on the same dataset. The 3rd being a separate one, Thread 2. And then there are two cases, regarding core/CU utilization, one being sub-optimal and the other being optimal (shown above). The latter being so because the related threads share a CU (and so they can share data in L2 cache), while the separate one can have a whole CU to itself, and the unneeded CU's can go to sleep, enabling higher turbo mode for the first two CU.There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.
Now, according to the findings, running all these three threads simply on separate CU's (so every other cores [clusters] disabled) would in most cases still be better than allowing them to share the CU's(*) in order to limit the number of used CU's, and so have higher turbo. Because with unsharing you usually win more than with turbo... (This shouldn't be the case, if everything worked as planned, or at least as marketed, but still, it is.)
* At least if done in the wrong way (Thread 2 and Thread 1a/1b in one CU). But it's possible it's true even if it's done in the "optimal" way. It needs further tests to tell.
Well, if I was to you I would have asked for forgive being confusing, instead of what you wrote there.I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
Last edited by dess; 10-15-2011 at 05:57 PM.
Bookmarks