MMM
Results 1 to 25 of 267

Thread: AMD FX "Bulldozer" Review - (4) !exclusive! Excuse for 1-Threaded Perf.

Hybrid View

  1. #1
    Registered User
    Join Date
    Mar 2008
    Location
    Indonesia
    Posts
    46
    Maybe need to patch for running core priority.

    Now : Core 0 --> Core 1 --> Core 2 --> Core 3 --> Core 4 --> Core 5 --> Core 6 --> Core 7
    Right priority : Core 0 --> Core 2 --> Core 4 --> Core 6 --> Core 1 --> Core 3 --> Core 5 --> Core 7

    CMIIW





  2. #2
    Xtreme Member
    Join Date
    Mar 2009
    Location
    Unknown
    Posts
    266
    @Thread starter: Good comparision. Could you also show Thuban numbers 4C and same clock. Also is turbo disabled on all (would be more accurate that way)

    PS: where is the thank you button ?
    Va fail, dh'oine.

    "I am going to hunt down people who have strong opinions on subjects they dont understand " - Dogbert

    Always rooting for the underdog ...

  3. #3
    Xtreme Mentor
    Join Date
    Dec 2007
    Location
    State of Confusion, USA
    Posts
    2,513
    Quote Originally Posted by bondhahnmrt85 View Post
    Maybe need to patch for running core priority.

    Now : Core 0 --> Core 1 --> Core 2 --> Core 3 --> Core 4 --> Core 5 --> Core 6 --> Core 7
    Right priority : Core 0 --> Core 2 --> Core 4 --> Core 6 --> Core 1 --> Core 3 --> Core 5 --> Core 7

    CMIIW
    I don't have alot of knowledge about the inner workings of a CPU, but to a laymen, this sounds brilliant and should be easy enough to implement.
    Thoughts?
    AMD FX-8350 (1237 PGN) | Asus Crosshair V Formula (bios 1703) | G.Skill 2133 CL9 @ 2230 9-11-10 | Sapphire HD 6870 | Samsung 830 128Gb SSD / 2 WD 1Tb Black SATA3 storage | Corsair TX750 PSU
    Watercooled ST 120.3 & TC 120.1 / MCP35X XSPC Top / Apogee HD Block | WIN7 64 Bit HP | Corsair 800D Obsidian Case








    First Computer: Commodore Vic 20 (circa 1981).

  4. #4
    Registered User
    Join Date
    Oct 2005
    Location
    Austria
    Posts
    68
    Quote Originally Posted by bondhahnmrt85 View Post
    Maybe need to patch for running core priority.

    Now : Core 0 --> Core 1 --> Core 2 --> Core 3 --> Core 4 --> Core 5 --> Core 6 --> Core 7
    Right priority : Core 0 --> Core 2 --> Core 4 --> Core 6 --> Core 1 --> Core 3 --> Core 5 --> Core 7

    CMIIW
    Windows 7 is already handling things like this for Intel processors with HT, using real cores first and logical cores later.

    However, according to AMD there are situations where you don't even want this behavior.
    Take a look at the first two pictures at THG:
    http://www.tomshardware.co.uk/fx-815...-32295-23.html

    Because of the shared L1-Cache it makes indeed sense that in some cases it can be faster to use the whole module instead of splitting things up and utilize two modules partially. This means that the scheduler has to be more intelligent though, as it's not enough to just assign each new task to a new core like now, instead it must be able to guess which tasks should be grouped to one module and which should be split over two (more) modules.

    I'm no coder but I can imagine easier projects than making the scheduler aware of such a complex problem.
    Power Rig: Core i7-5930K, ASRock X99 Extreme6/3.1, 16GB G.Skill DDR4-2400, Asus Strix GTX980 OC
    Time Sink: Core i7-5775C, ASRock Z97E-ITX/ac, 16GB AMD DDR3-2133, Silverstone PT-09 w/ 120W Power Brick
    HTPC: Athlon 5350, ASRock AM1H-ITX, 4GB DDR3, Supermicro SC-101i

  5. #5
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by pumero View Post
    Windows 7 is already handling things like this for Intel processors with HT, using real cores first and logical cores later.

    However, according to AMD there are situations where you don't even want this behavior.
    Take a look at the first two pictures at THG:
    http://www.tomshardware.co.uk/fx-815...-32295-23.html

    Because of the shared L1-Cache it makes indeed sense that in some cases it can be faster to use the whole module instead of splitting things up and utilize two modules partially. This means that the scheduler has to be more intelligent though, as it's not enough to just assign each new task to a new core like now, instead it must be able to guess which tasks should be grouped to one module and which should be split over two (more) modules.

    I'm no coder but I can imagine easier projects than making the scheduler aware of such a complex problem.
    There is no real or logical core in BD.
    There are clusters, simple as that.
    When you disable a cluster in BIOS, you do the same thing as AMD's diagram.

    AMD's diagram
    Core 0 - shared
    Core 1 - one cluster
    Core 2 - shared
    Core 3 - one cluster (uses all resources for 1 thread)

    What were doing
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Smile

  6. #6
    Registered User
    Join Date
    Oct 2005
    Location
    Austria
    Posts
    68
    I'm aware of that and I never said that it works like that on BD.
    At the moment Windows sees the processor as having 8 real cores and assigns the tasks accordingly but doesn't care (know) about the whole module thing.
    Power Rig: Core i7-5930K, ASRock X99 Extreme6/3.1, 16GB G.Skill DDR4-2400, Asus Strix GTX980 OC
    Time Sink: Core i7-5775C, ASRock Z97E-ITX/ac, 16GB AMD DDR3-2133, Silverstone PT-09 w/ 120W Power Brick
    HTPC: Athlon 5350, ASRock AM1H-ITX, 4GB DDR3, Supermicro SC-101i

  7. #7
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by pumero View Post
    Windows 7 is already handling things like this for Intel processors with HT, using real cores first and logical cores later.
    That's fine (except your wording is inaccurate). Somehow we should trick it to use this method for BD, as well...

    However, according to AMD there are situations where you don't even want this behavior.
    It depends on if the penalty of forcing those closely related threads to communicate through L3 (instead of L2) is more or less than the gain on the lack of sharing resources. It seems most applications only benefits from it:

    img0033832.gif

    So, there could be a little patch that simply enables scheduling a' la SMT in Win7, that it already supports (if true)...

    Quoted from the article:
    According to AMD, Windows 8 will more intelligently align threads so that, when they can benefit from sharing a module, they will. The implication is that when two threads can be consolidated onto one module (despite the fact that they’re forced to share resources), putting an entire module to sleep and potentially enabling a higher p-state (a faster Turbo Core setting) outweighs any performance penalty tied to sharing.
    And so the default behaviour will be separation (contrary to what JF said all along)? Would be just stupid if not... Of course, power consumption is higher because more modules are active, but here we can see also that with turbo enabled the the energy efficiency is really the same...

    Well, unless there is a fix coming (HW or SW or both) that largely improves on the penalty of sharing resoruces. Just because the current numbers are much worse (anywhere between 95% to 160%, with one case of 180%) than what they've propagated (180% across the board), and so one can think there is some flaw somewhere here, as well. (And there is indeed the case of L1D trashing, that they claim to be responsible for only 3% decrease.)

    Quote Originally Posted by BeepBeep2 View Post
    When you disable a cluster in BIOS, you do the same thing as AMD's diagram.
    What diagram? Do you mean this? Which part of it?

    What were doing
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Do you mean, if we disable every other "core" in the BIOS? Then no, you will get this:
    Core (Module) 0 - one cluster
    Core (Module) 1 - one cluster
    Core (Module) 2 - one cluster
    Core (Module) 3 - one cluster

    ps. perhaps the title of the thread should be changed to "Thread separation vs. turbo", or something like that, to be more meaningful.
    Last edited by dess; 10-14-2011 at 05:40 PM.

  8. #8
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by dess View Post
    That's fine (except your wording is inaccurate). Somehow we should trick it to use this method for BD, as well...



    It depends on if the penalty of forcing those closely related threads to communicate through L3 (instead of L2) is more or less than the gain on the lack of sharing resources. It seems most applications only benefits from it:

    img0033832.gif

    So, there could be a little patch that simply enables scheduling a' la SMT in Win7, that it already supports (if true)...

    Quoted from the article:


    And so the default behaviour will be separation (contrary to what JF said all along)? Would be just stupid if not... Of course, power consumption is higher because more modules are active, but here we can see also that with turbo enabled the the energy efficiency is really the same...

    Well, unless there is a fix coming (HW or SW or both) that largely improves on the penalty of sharing resoruces. Just because the current numbers are much worse (anywhere between 95% to 160%, with one case of 180%) than what they've propagated (180% across the board), and so one can think there is some flaw somewhere here, as well. (And there is indeed the case of L1D trashing, that they claim to be responsible for only 3% decrease.)


    What diagram? Do you mean this? Which part of it?


    Do you mean, if we disable every other "core" in the BIOS? Then no, you will get this:
    Core (Module) 0 - one cluster
    Core (Module) 1 - one cluster
    Core (Module) 2 - one cluster
    Core (Module) 3 - one cluster

    ps. perhaps the title of the thread should be changed to "Thread separation vs. turbo", or something like that, to be more meaningful.
    Sorry, I should have typed out:
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Core 4 - one cluster
    Core 5 - disabled
    Core 6 - one cluster
    Core 7 - disabled

    ...but I thought everyone would be smart enough to get the picture.

    The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
    Smile

  9. #9
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by BeepBeep2 View Post
    Sorry, I should have typed out:
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Core 4 - one cluster
    Core 5 - disabled
    Core 6 - one cluster
    Core 7 - disabled

    ...but I thought everyone would be smart enough to get the picture.
    I've thought you've called a CU a core (just like others before you), not just because you were listing four of it two times, but that you used terms like "shared" and "one cluster" next to it. Why calling something two names, anyway?

    Now, care to elaborate for the stupid like me what it all means and what diagram:
    AMD's diagram
    Core 0 - shared
    Core 1 - one cluster
    Core 2 - shared
    Core 3 - one cluster (uses all resources for 1 thread)

    The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
    Effectively everybody uses the term Module here, but fine, let it be "Compute Unit"... (No, not Computational Unit, while it means the same, this term is not used in this form.)
    Last edited by dess; 10-15-2011 at 02:28 AM.

  10. #10
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by dess View Post
    I've thought you've called a CU a core (just like others before you), not just because you were listing four of it two times, but that you used terms like "shared" and "one cluster" next to it. Why calling something two names, anyway?

    Now, care to elaborate for the stupid like me what it all means and what diagram:




    Effectively everybody uses the term Module here, but fine, let it be "Compute Unit"... (No, not Computational Unit, while it means the same, this term is not used in this form.)

    That image shows Thread 1 (a/b) running sharing a CU while Thread 2 has it's own CU, does it not?
    By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
    That is what I was describing. There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.


    I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
    Last edited by BeepBeep2; 10-15-2011 at 11:02 AM.
    Smile

  11. #11
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by BeepBeep2 View Post
    That image shows Thread 1 (a/b) running sharing a CU while Thread 2 has it's own CU, does it not?
    Yes.

    By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
    No. That way all of Thread 1a, Thread 1b and Thread 2 would run on a separate CU.
    (You can't "disable" a sw thread this way, anyway. If there are less cores enabled than threads to run, then those will simply share the available cores more.)

    There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.
    Why should there be Thread 2b? SW threads are given. The diagram represents a situation when there are three threads to run. Two of them closely reladed (Thread 1a and 1b), i.e. working on the same dataset. The 3rd being a separate one, Thread 2. And then there are two cases, regarding core/CU utilization, one being sub-optimal and the other being optimal (shown above). The latter being so because the related threads share a CU (and so they can share data in L2 cache), while the separate one can have a whole CU to itself, and the unneeded CU's can go to sleep, enabling higher turbo mode for the first two CU.

    Now, according to the findings, running all these three threads simply on separate CU's (so every other cores [clusters] disabled) would in most cases still be better than allowing them to share the CU's(*) in order to limit the number of used CU's, and so have higher turbo. Because with unsharing you usually win more than with turbo... (This shouldn't be the case, if everything worked as planned, or at least as marketed, but still, it is.)

    * At least if done in the wrong way (Thread 2 and Thread 1a/1b in one CU). But it's possible it's true even if it's done in the "optimal" way. It needs further tests to tell.

    I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
    Well, if I was to you I would have asked for forgive being confusing, instead of what you wrote there.
    Last edited by dess; 10-15-2011 at 05:57 PM.

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •