Page 5 of 11 FirstFirst ... 2345678 ... LastLast
Results 101 to 125 of 267

Thread: AMD FX "Bulldozer" Review - (4) !exclusive! Excuse for 1-Threaded Perf.

  1. #101
    Xtreme Member
    Join Date
    Mar 2009
    Location
    Miltown, Wisconsin
    Posts
    353
    I think AMD should have given out more samples to us here at XS and this lack of support could have been fixed way before launch. We are almost doing all the R&D for them right now anyways. This was just a way to sloppy and a rushed release. I would of rather had a proper release with support over a half a$$ one with all this negative publicity hurting the product.
    Quote Originally Posted by ***Deimos*** View Post
    WARNING GTX480 - may cause dizziness, blurred vision, dry mouth, dehydration, shortness of breath, headaches, naussea, explosive diahrea


    Foxconn Bloodrage P11 ( 2.1 SLIC MOD )
    Corei7 980 (3118B583) 4.2ghz 24-7 with stock vcore
    2x8GB PNY 1600c9 @ 1600mhz 9-9-9-24-1T
    nVidia GTX 770
    256gb OCZ Vertex4 FW 1.5
    2TB Green Barracuda
    Antec HCG-620w PSU
    Corsair H50 ( Sucks Hairry Balls IMHO )
    Coolermaster Storm Sniper Black Custom Sleeved
    3 x Dell U2410 H-IPS 1920x1200 Surround
    Windows 7 x64 Ultimate




  2. #102
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by The Stilt View Post
    Indeed there should be a update coming for Windows which optimizes Turbo Core functionality on Zambezi.
    Currently Windows (7 atleast) is throwing the load from core to core which sometimes neutralizes the effect of the Turbo Core feature.
    This is because the load is not being run on the currently boosted core(s).
    So you mean one that huddles threads as much as possible to enable max. turbo, right? But we want the opposite of it as the findings proves it's more beneficial to populate only one integer cluster per CU... So, we need only a rather little patch really that enables the SMT-aware scheduling Win7 already know, as some say.

    But guys... please...

    When talking about Zambezi please use the correct terms to avoid any further confusion.

    A Zambezi node consists of: Four compute units and eight cores.
    Each compute unit contains two cores.

    In some of the slides a compute unit was called as a module, however thats not the official term.
    Well, in the patent papers they call the former a core and the latter an integer cluster. Not so surprisingly, I may add.
    Also, some people just refuse to call the latter a core, anyway, because it's rather marketing than technics.
    I would use compute unit or CU for the former and integer cluster for the latter.
    (Although, I find the "compute unit" a little laboured and awkward.)
    Last edited by dess; 10-14-2011 at 08:22 PM.

  3. #103
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    Quote Originally Posted by dess View Post
    So you mean one that huddles threads as much as possible to enable max. turbo, right? But we want the opposite of it as the findings proves it's more beneficial to populate only one integer cluster per CU... So, we need only a rather little patch really that enables the SMT-aware scheduling Win7 already know, as some say.
    IŽll give you an example what currently happens:

    A FX-8150 (Turbo Core enabled) running at stock settings, SuperPI (a single threaded software) is being executed:

    Because there is load only on one thread (in theory) the Turbo Core feature boosts couple cores up to 4.2GHz while the rest are operating at 3.6 - 3.9GHz frequency. What currently happens is that Windows is unable to put the load (SuperPI) on the boosted core (4.2GHz) but throwing the load between the cores.
    And executing such program (or any program actually) is naturally faster when it is executed on a core operating at 4.2GHz rather on one which is operating at 3.6-3.9GHz.

  4. #104
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    I know that, and this idiotic behaviour needs to be addressed, as well, of course. Just I'm not sure it's going to happen with Win7, as well. In the short term the lesser patch would also be fine as it would boost performance of lightly threaded apps, like games, without even max. turbo.

    Quote Originally Posted by The Stilt View Post
    And executing such program (or any program actually) is naturally faster when it is executed on a core operating at 4.2GHz rather on one which is operating at 3.6-3.9GHz.
    Not really any program. It's considerable faster to execute a program with 4 threads on 4 CU's, one cluster/CU, even at stock frequency (but usually all-cores turbo can work), than on 2 CU's, two cluster/CU, at max. turbo. Just see the findigs across the topic!
    Last edited by dess; 10-14-2011 at 09:03 PM.

  5. #105
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by dess View Post
    That's fine (except your wording is inaccurate). Somehow we should trick it to use this method for BD, as well...



    It depends on if the penalty of forcing those closely related threads to communicate through L3 (instead of L2) is more or less than the gain on the lack of sharing resources. It seems most applications only benefits from it:

    Attachment 121261

    So, there could be a little patch that simply enables scheduling a' la SMT in Win7, that it already supports (if true)...

    Quoted from the article:


    And so the default behaviour will be separation (contrary to what JF said all along)? Would be just stupid if not... Of course, power consumption is higher because more modules are active, but here we can see also that with turbo enabled the the energy efficiency is really the same...

    Well, unless there is a fix coming (HW or SW or both) that largely improves on the penalty of sharing resoruces. Just because the current numbers are much worse (anywhere between 95% to 160%, with one case of 180%) than what they've propagated (180% across the board), and so one can think there is some flaw somewhere here, as well. (And there is indeed the case of L1D trashing, that they claim to be responsible for only 3% decrease.)


    What diagram? Do you mean this? Which part of it?


    Do you mean, if we disable every other "core" in the BIOS? Then no, you will get this:
    Core (Module) 0 - one cluster
    Core (Module) 1 - one cluster
    Core (Module) 2 - one cluster
    Core (Module) 3 - one cluster

    ps. perhaps the title of the thread should be changed to "Thread separation vs. turbo", or something like that, to be more meaningful.
    Sorry, I should have typed out:
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Core 4 - one cluster
    Core 5 - disabled
    Core 6 - one cluster
    Core 7 - disabled

    ...but I thought everyone would be smart enough to get the picture.

    The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
    Smile

  6. #106
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by BeepBeep2 View Post
    Sorry, I should have typed out:
    Core 0 - one cluster
    Core 1 - disabled
    Core 2 - one cluster
    Core 3 - disabled
    Core 4 - one cluster
    Core 5 - disabled
    Core 6 - one cluster
    Core 7 - disabled

    ...but I thought everyone would be smart enough to get the picture.
    I've thought you've called a CU a core (just like others before you), not just because you were listing four of it two times, but that you used terms like "shared" and "one cluster" next to it. Why calling something two names, anyway?

    Now, care to elaborate for the stupid like me what it all means and what diagram:
    AMD's diagram
    Core 0 - shared
    Core 1 - one cluster
    Core 2 - shared
    Core 3 - one cluster (uses all resources for 1 thread)

    The Stilt is also correct about module vs CU, compute/computational unit is the correct term.
    Effectively everybody uses the term Module here, but fine, let it be "Compute Unit"... (No, not Computational Unit, while it means the same, this term is not used in this form.)
    Last edited by dess; 10-15-2011 at 02:28 AM.

  7. #107
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by dess View Post
    I've thought you've called a CU a core (just like others before you), not just because you were listing four of it two times, but that you used terms like "shared" and "one cluster" next to it. Why calling something two names, anyway?

    Now, care to elaborate for the stupid like me what it all means and what diagram:




    Effectively everybody uses the term Module here, but fine, let it be "Compute Unit"... (No, not Computational Unit, while it means the same, this term is not used in this form.)

    That image shows Thread 1 (a/b) running sharing a CU while Thread 2 has it's own CU, does it not?
    By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
    That is what I was describing. There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.


    I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
    Last edited by BeepBeep2; 10-15-2011 at 11:02 AM.
    Smile

  8. #108
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by BeepBeep2 View Post
    That image shows Thread 1 (a/b) running sharing a CU while Thread 2 has it's own CU, does it not?
    Yes.

    By disabling every other core in BIOS, Thread 1 runs on it's own CU, Thread 2 is disabled.
    No. That way all of Thread 1a, Thread 1b and Thread 2 would run on a separate CU.
    (You can't "disable" a sw thread this way, anyway. If there are less cores enabled than threads to run, then those will simply share the available cores more.)

    There is no Thread 2 a/b in the diagram despite the whole article talking about how AMD wants threads to share modules and benefit from higher p-states.
    Why should there be Thread 2b? SW threads are given. The diagram represents a situation when there are three threads to run. Two of them closely reladed (Thread 1a and 1b), i.e. working on the same dataset. The 3rd being a separate one, Thread 2. And then there are two cases, regarding core/CU utilization, one being sub-optimal and the other being optimal (shown above). The latter being so because the related threads share a CU (and so they can share data in L2 cache), while the separate one can have a whole CU to itself, and the unneeded CU's can go to sleep, enabling higher turbo mode for the first two CU.

    Now, according to the findings, running all these three threads simply on separate CU's (so every other cores [clusters] disabled) would in most cases still be better than allowing them to share the CU's(*) in order to limit the number of used CU's, and so have higher turbo. Because with unsharing you usually win more than with turbo... (This shouldn't be the case, if everything worked as planned, or at least as marketed, but still, it is.)

    * At least if done in the wrong way (Thread 2 and Thread 1a/1b in one CU). But it's possible it's true even if it's done in the "optimal" way. It needs further tests to tell.

    I would also like to add that though I never called you stupid, I find your post quite rude and very demeaning.
    Well, if I was to you I would have asked for forgive being confusing, instead of what you wrote there.
    Last edited by dess; 10-15-2011 at 05:57 PM.

  9. #109
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    i think this is going to get a very mixed opinion if your thinking about client vs server aplications

    2 threads sharing a CU is going to be better for servers since that second thread is getting a pretty generous bonus for less additional power consumption.

    for client side we have to keep in mind what happens when the cpu isnt running at 100% load. those same few threads sharing CUs would only be worth it if turbo was able to compensate. when the speedup was 1.8x a 10% bonus to clocks would work out fine. but since it seems we only get about 1.5x, we need turbo to increase speed by 25% for it work out. and that is not going to happen nomatter how much the process matures. so for client apps that use about 4 threads or less the optimal answer seems to be to split everything up (unless your trying to somehow save power, which is quite rare for desktops to think about ~10% efficiency.

    i am curious about the whole 1a and 1b thread that want to share resources. i assume this means they use the same L2 to their advantage, but im curious if software can determine if they want to do that, and then can windows give them the right thread assignments to accomplish it? and even if they did, how much of a bonus would there be, if the end result is just a little less L2 being used then it could never be better than independent CUs when looking at it from just performance. but if the data sharing increases code efficiency then there could be alot more perf. but i dont really know much about this kind of stuff
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  10. #110
    Brilliant Idiot
    Join Date
    Jan 2005
    Location
    Hell on Earth
    Posts
    11,015
    Quote Originally Posted by The Stilt View Post
    Indeed there should be a update coming for Windows which optimizes Turbo Core functionality on Zambezi.
    Currently Windows (7 atleast) is throwing the load from core to core which sometimes neutralizes the effect of the Turbo Core feature.
    This is because the load is not being run on the currently boosted core(s).

    But guys... please...

    When talking about Zambezi please use the correct terms to avoid any further confusion.

    A Zambezi node consists of: Four compute units and eight cores.
    Each compute unit contains two cores.

    In some of the slides a compute unit was called as a module, however thats not the official term.
    This is AMD's patent for BD.

    It points to the case that in a node there are 4 cores and 8 compute units or clusters. It also details how clusters share resources.

    Disabling in bios "cores" ( actually clusters 150B ) 1 3 5 7 effectively disables resource sharing.

    Click image for larger version. 

Name:	BD.jpg 
Views:	1559 
Size:	49.0 KB 
ID:	121294
    Last edited by chew*; 10-15-2011 at 07:08 PM.
    heatware chew*
    I've got no strings to hold me down.
    To make me fret, or make me frown.
    I had strings but now I'm free.
    There are no strings on me

  11. #111
    Registered User
    Join Date
    Apr 2008
    Location
    Brasil-RS
    Posts
    88
    Disabling in bios "cores" ( actually clusters 150B ) 1 3 5 7 effectively disables resource sharing.
    Can I supose that the 256 bit FPU remain fully active ? so the fp performance will not be directly afected.
    FX 8320 4GHz@1.3V
    Antec 920
    Sabertooth 990FX
    32GB Kingston 1333
    Asus 5770
    Dell 2405FPW
    2x Seagate 2TB
    CWT 750w

  12. #112
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by Manicdan View Post
    i am curious about the whole 1a and 1b thread that want to share resources. i assume this means they use the same L2 to their advantage, but im curious if software can determine if they want to do that, and then can windows give them the right thread assignments to accomplish it? and even if they did, how much of a bonus would there be, if the end result is just a little less L2 being used then it could never be better than independent CUs when looking at it from just performance. but if the data sharing increases code efficiency then there could be alot more perf. but i dont really know much about this kind of stuff
    I wonder how Win8 judges which approach to choose "real-time". Perhaps it simply couples child-threads with their mother thread? That's not always good.

    Anyway, if they're not going to implement it in Win7, as well, I hope at least they will enable the SMT-aware approach for BD (if Win7 really supports it already), that's still much better than the default one.

    Quote Originally Posted by zhadoom View Post
    Can I supose that the 256 bit FPU remain fully active ?
    Most probably, but worth a test.

    so the fp performance will not be directly afected.
    Actually, it raises FP performance, as well, for less-threaded apps.
    Last edited by dess; 10-16-2011 at 04:08 AM.

  13. #113
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by dess View Post
    I wonder how Win8 judges which approach to choose "real-time". Perhaps it simply couples child-threads with their mother thread? That's not always good.

    Anyway, if they're not going to implement it in Win7, as well, I hope at least they will enable the SMT-aware approach for BD (if Win7 really supports it already), that's still much better than the default one.


    Most probably, but worth a test.


    Actually, it raises FP performance, as well, for less-threaded apps.
    "real time" is mean to cpu's you can't alt Tab out of a game if you it on AOD i just tried on my fable 3 and GTA:IV i was stuck in game until i quit
    it can stop key board and mouse movement
    Last edited by demonkevy666; 10-16-2011 at 09:32 AM.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  14. #114
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by demonkevy666 View Post
    "real time" is mean to cpu's you can't alt Tab out of a game if you it on AOD i just tried on my fable 3 and GTA:IV i was stuck in game until i quit
    it can stop key board and mouse movement
    I didn't mean the task priority option in Windows, I've used the term in a sense something happens just in time. Sorry if I wasn't clear enough.
    Last edited by dess; 10-16-2011 at 01:16 PM.

  15. #115
    Brilliant Idiot
    Join Date
    Jan 2005
    Location
    Hell on Earth
    Posts
    11,015
    Well the plot thickens, 23 chips later I came to a conclusion, while this sounds good on paper it's exactly why AMD won't release a 4core disabled this way.

    Feel free to test this yourself but after testing i found that you will take a significant clock hit, so much that it defeats the gained effeciency.......
    heatware chew*
    I've got no strings to hold me down.
    To make me fret, or make me frown.
    I had strings but now I'm free.
    There are no strings on me

  16. #116
    Xtreme Addict
    Join Date
    Feb 2008
    Posts
    1,209
    what do you mean exactly with clock hit? i dont think that you will get lower max clocks? how can that be?

    or maybe u mean more clock needed that it eats the efficiency?
    1. ASUS Sabertooth 990fx | FX 8320 || 2. DFI DK 790FXB-M3H5 | X4 810
    8GB Samsung 30nm DDR3-2000 9-10-10-28 || 4GB PSC DDR3-1333 6-7-6-21
    Corsair TX750W | Sapphire 6970 2GB || BeQuiet PurePower 450w | HD 4850
    EK Supreme | AC aquagratix | Laing Pro | MoRa 2 || Aircooled

  17. #117
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    Quote Originally Posted by chew* View Post
    Well the plot thickens, 23 chips later I came to a conclusion, while this sounds good on paper it's exactly why AMD won't release a 4core disabled this way.

    Feel free to test this yourself but after testing i found that you will take a significant clock hit, so much that it defeats the gained effeciency.......
    So are you basically saying that one int "core" in a module does not matter on power usage almost at all?

  18. #118
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by Oese View Post
    what do you mean exactly with clock hit? i dont think that you will get lower max clocks? how can that be?

    or maybe u mean more clock needed that it eats the efficiency?
    No, he means that you get less clocks with 4 than 8.
    Smile

  19. #119
    Registered User
    Join Date
    Apr 2008
    Location
    Brasil-RS
    Posts
    88
    Quote Originally Posted by chew* View Post
    Well the plot thickens, 23 chips later I came to a conclusion, while this sounds good on paper it's exactly why AMD won't release a 4core disabled this way.

    Feel free to test this yourself but after testing i found that you will take a significant clock hit, so much that it defeats the gained effeciency.......
    If I understand what you try to said this will reduce the events where the max turbo ( 4.2GHz ) happens. This major effect will be in 2 thread apps because will use two cores of different compute units ( modules or something like ... ).
    FX 8320 4GHz@1.3V
    Antec 920
    Sabertooth 990FX
    32GB Kingston 1333
    Asus 5770
    Dell 2405FPW
    2x Seagate 2TB
    CWT 750w

  20. #120
    Registered User
    Join Date
    Apr 2008
    Location
    Brasil-RS
    Posts
    88
    Quote Originally Posted by chew* View Post
    Well the plot thickens, 23 chips later I came to a conclusion, while this sounds good on paper it's exactly why AMD won't release a 4core disabled this way.

    Feel free to test this yourself but after testing i found that you will take a significant clock hit, so much that it defeats the gained effeciency.......
    If I understand what you try to said this will reduce the events where the max turbo ( 4.2GHz ) happens. This major effect will be in 2 thread apps because will use two cores of different compute units ( modules or something like ... ).
    FX 8320 4GHz@1.3V
    Antec 920
    Sabertooth 990FX
    32GB Kingston 1333
    Asus 5770
    Dell 2405FPW
    2x Seagate 2TB
    CWT 750w

  21. #121
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    Quote Originally Posted by BeepBeep2 View Post
    No, he means that you get less clocks with 4 than 8.
    I tought he was comparing to 2CU FX-4100, basically with 2CU you get so much more clocks that it greatly outweights gain on 4CU with second int core disabled.

  22. #122
    Xtreme Addict
    Join Date
    Feb 2008
    Posts
    1,209
    If only turbo would be reduced, i would not mind if i could disable it whatsoever.

    from what i hear, turbo is very unpredictable and bugged, sometimes cores clock below stock even when fully loaded and c&q off (which doesnt seeem to work at least on some boards..)

    max oc anyhow should be same or higher with cores disabled? if not, that would be very strange..
    Last edited by Oese; 10-16-2011 at 02:08 PM.
    1. ASUS Sabertooth 990fx | FX 8320 || 2. DFI DK 790FXB-M3H5 | X4 810
    8GB Samsung 30nm DDR3-2000 9-10-10-28 || 4GB PSC DDR3-1333 6-7-6-21
    Corsair TX750W | Sapphire 6970 2GB || BeQuiet PurePower 450w | HD 4850
    EK Supreme | AC aquagratix | Laing Pro | MoRa 2 || Aircooled

  23. #123
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by chew* View Post
    Well the plot thickens, 23 chips later I came to a conclusion, while this sounds good on paper it's exactly why AMD won't release a 4core disabled this way.
    We were mostly talking about the case of lightly threaded apps on a 81xx, and it definitely works there.

    Feel free to test this yourself but after testing i found that you will take a significant clock hit, so much that it defeats the gained effeciency.......
    According to findings here and some other places, there is roughly a 10% performance advantage with 4M/4C at 3.6 GHz, compared to 2M/4C at 4.2 GHz.
    In other words, even if the 4M/4C part would work only at 3.6 GHz, it would usually still be faster than a current 4170.
    Of course, it more or less depends on the given application. What were your results?

    ---

    [snip]
    Last edited by dess; 10-16-2011 at 07:49 PM.

  24. #124
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by dess View Post
    You've got it wrong, dude. 100/90=1.111, but it's not 11%, but 90%! So, you should get the reciprocals, and that will show you the percentages.

    Chess 11800/8813=1.3389 -> 74.7%
    Wprime 13.814/9.531=1.4494 -> 69%
    Winrar 4467/3027=1.4757 -> 67.7%
    3d06 5803/4134=1.4037 -> 71.4%
    3dvantage 19215/12102=1.5878 -> 63%
    3d11 6340/4289=1.4782 -> 67.6%
    CB R10 20552/15033=1.3671 -> 73.1%
    CB R11.5 6/3.8=1.5789 -> 63.3%
    Blender 9.76/7.16=1.3631 -> 73.3%
    X264 37.23/25.18=1.4786 -> 67.6%

    (These are the 4CU/8C vs. 4CU/4C numbers!)
    i really dont follow your math,
    if 4CU/8T gets 100, and 4CU/4T gets 90, thats a speed up of 11% through CMT. saying that things are 90% slower by turning off alternating cores just makes a confusing statement, even if true.

    i think his numbers were right, CMT is giving us 34-59% speedup

    as for chews point, i think from a stock chip perspective of 4CU/4T vs 2CU/4T, i think is a little wrong.
    there was a perf/power consumption test and with turbo they all came out near identical, ill try and find the chart.

    EDIT: this chart
    Last edited by Manicdan; 10-16-2011 at 07:15 PM.
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  25. #125
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Manicdan: You're right, my mistake. I was playing with numbers that was about relative performance before, and when I seen this list it escaped my attention that these are speed-up numbers, so rightfully the opposites. I quess I need some sleep. (Edited out that part of my message.)

Page 5 of 11 FirstFirst ... 2345678 ... LastLast

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •