Results 1 to 23 of 23

Thread: 460/ 560, etc efficiency

  1. #1
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139

    460/ 560, etc efficiency

    This thread assumes you have figured out how to run BOINC/ GUPUGrid on your system, this is not intended to be a BOINC setup guide. If you are would like some help with the base setup please check out the great "everything you needed to know" thread by Otis11.
    http://www.xtremesystems.org/forums/...d.php?t=246583

    Issue: Currently, the Compute Capable (CC) 2.1 cards (GTX460, GTX560 Ti), run at a much lower GPU utilization than CC2.0 cards (GTX465, GTX470, GTX480, GTX560, GTX580, GTX590). Not that I make pretend to understand CUDA programming but the core performance bottleneck is based on the 1x48 ratio of cores to shaders compared to the 1x32 ratio. If your card is not listed above then this thread probably isn't going to help you.

    Goal: Maximize output for a GTX460 for running GPUGrid applications on Windows by running 2 instances of ACEMD on a single card at the same time via a custom app_info.xml configuration file.

    Factors: Other factors are how the OS handles drivers (Vista and Win7 slow, XP fast), CPU speed and availability, and perhaps the most challenging factor to try and control is that GPUGrid has multiple different types of Work Units (WUs) that run internally using different input paramters (the experiments themselves) which in turn directly effect the efficiency of the overall GPUGrid application, ACEMD. I have also seen conflicting information that the PCIE slot bandwidth can be a performance factor but it seems like most of the information on this points to it being a non-issue. If I have time I will try to address that topic but seriously, I have a life and there's only so much I can do

    <aside>: The previously accepted methodology was to simply turn networking off, make a backup of your BOINC data folder and when the first run was compelte, turn BOINC off, restore the backup, reconfigure and run again, rinse and repeat until you had all configurations accounted for. This doesn't seem to work anymore in Win7, likely because I just haven't figured which file(s)in the BOINC program folder needs to be copied also. The upshot is that within the same WU type the total runtimes are within a few percent which doesn't really matter as we are looking for substantial increases. Anything less than 10% is probably not worth the effort and anyone who thinks it is has probably already done all this testing anyway :-)


    Baseline: All that being out of the way, I want to see if we can increase the effeciency of perhaps most common OS configuration, Win7x64. I am going to start with what I have come to think of as the best balance approach for a i7-920 system, which is running with HT on and leaving one thread left free for GPUGrid, using the SWAN_SYNC environmental variable to dedicate that free thread to GPUGrid.

    Off to set this up which is going to take a while ... move PC to basement, break it out of (gasp) it's case, breakdown the cable management, swap cards and PSU (crap I have to break down the other PC to do this ... WTF was I thinking putting them in cases and making them all nice and neat???

    Three cheers go out to our GPUGrid team mate sponsoring the hardware for these tests

    ----------------------------------------------------------------
    Baseline Win7x64: CPU (20*200) w/ HT ON, 1 threads free, SWANSYNC=0, running 1 instance of ACEMD

    WU Type: KASHIF_HIVPR_wo and KASHIF_HIVPR_maca_wo:
    Across 10 runs, there was only a .0143% runtime difference between shortest and longest so I'm not gonna bother with a stddev here

    avg 40251.28 seconds per WU: 87-90% GPU Utilization


    sidebar: there are two different sub types of HIV WUs, wo and so.
    The above stat was based on only the "wo" subtype: KASHIF_HIVPR_wo, KASHIF_HIVPR_maca_wo. Back of the napkin calcs tells me the "so" subtype would take about an hour and 7 minutes longer per WU on a 460.
    ----------------------------------------------------------------
    Test 1: Win7x64, CPU (20*200) w/ HT ON, 2 threads free, SWANSYNC=0, running 2 instances of ACEMD

    After only half way through the first set of 2 instance I can call this a "no win" as there is no substantive improvement. The time doubled almost exactly. While the GPU usage did go up into the mid to high 90s I believe that internal resource contention inside the GPU caused by running 2 instances negated the almost 10% utilization increase for a net 0% overall efficiency.

    ----------------------------------------------------------------
    Last edited by Snow Crash; 04-03-2011 at 04:45 AM.

  2. #2
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Quote Originally Posted by Snow Crash View Post
    Off to set this up which is going to take a while ... move PC to basement, break it out of (gasp) it's case, breakdown the cable management, swap cards and PSU (crap I have to break down the other PC to do this ... WTF was I thinking putting them in cases and making them all nice and neat???
    I know... took me a while to learn that lesson too.

    Looking forward to the results.


    24 hour prime stable? Please, I'm 24/7/365 WCG stable!

    So you can do Furmark, Can you Grid???

  3. #3
    Xtreme Cruncher
    Join Date
    Jan 2009
    Location
    Nashville
    Posts
    4,162
    Hmmm SC getting serious and experimenting, should we call his mom??



  4. #4
    Xtreme Cruncher
    Join Date
    Mar 2010
    Posts
    451
    Subscribed!

    I've got a couple of 460s, so I'm very interested in seeing if this pans out.

    Quote Originally Posted by Snow Crash View Post
    ... WTF was I thinking putting them in cases and making them all nice and neat???
    I'm going to go out on a limb and guess that you're married? The fairer sex often has a thing about tidiness (good thing for me).

    Good Luck!

  5. #5
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    Quote Originally Posted by SMTB1963 View Post
    Subscribed!
    I'm going to go out on a limb and guess that you're married? The fairer sex often has a thing about tidiness (good thing for me).
    I can't blame this one on my wife, more like my A type personality feeling inadequate while looking through the air cooling subforum and seeing all the insane cable work people have done.

    quicky update ... I decided that after having things to do that I was not expecting yesterday, I can get to 90% of my target config just by swapping hard drives. I know the XP system is a not an issue but I wonder how many times I have done this to Win7 and if it's gonna make me call M$ to get registered properly.

  6. #6
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    back up and running, one instance of a short KASHIF WUs running on Win7x64 utilization is 90%, memory @19% ... looks good to me!
    Man this card runs cool ... fan @67% and temps don't even touch 50c

    I'll add this to the first post as I get more results but I would seriously be remiss if I didn't make sure everyone knows this really is a team effort as it is only because of a super generous XS team member who is loaning me this card for the tests, you know who you are and you ROCK!!!

  7. #7
    Devil kept pokin'
    Join Date
    Jan 2010
    Location
    South Kakalaky
    Posts
    1,299
    Sub'd.
    These results should apply to 450/550's as well being 1x32 shader ratio too.

  8. #8
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    updated OP with average for baseline for KASHIF_HIV wu type

  9. #9
    I am Addicted!
    Join Date
    Feb 2006
    Posts
    1,772
    would like to see some solid results. Keep us posted as I'm sure you will.

    off topic; What is the ideal card to get now-a-days? I sold my gtx 275's a while back, but thinking that may have been a bad idea. current cards are a bit above my spending limit. I am also not sure if they are truly worthy of their cost compared to the older cards, but I could be extremely wrong.

    I currently only fold on the grid with 1 gtx 260-216 and I have a cheap backup card (GT240). GT 240 only avg's approx 15k ppd and the 260 I think is around 35k or so. Thanks
    XTREMESupercomputer: Phase 2
    Live up to your name - May 1 - 8
    Crunch with us, the XS WCG team

  10. #10
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Quote Originally Posted by INFRNL View Post
    would like to see some solid results. Keep us posted as I'm sure you will.

    off topic; What is the ideal card to get now-a-days? I sold my gtx 275's a while back, but thinking that may have been a bad idea. current cards are a bit above my spending limit. I am also not sure if they are truly worthy of their cost compared to the older cards, but I could be extremely wrong.

    I currently only fold on the grid with 1 gtx 260-216 and I have a cheap backup card (GT240). GT 240 only avg's approx 15k ppd and the 260 I think is around 35k or so. Thanks
    According to SKgiven, the best card for ppd/$ is the 470, but I think it might be better to get the 570 if you pay your own electric bills...

    In either case:

    The GTX580 and GTX570 are the best. Tomorrow you will be able to add the GTX590 to the top of the list. The GTX465, GTX470 and GTX480 cards are all better for crunching here than the GTX560 Ti, GTX550 Ti, GTX450 and GTS450.

    The difference is in the architectures:

    GTX465, GTX470 and GTX480 are Compute Capable (CC) 2.0 cards (GF100)
    GTX460 is CC 2.1 (GF104)
    GTS450 is CC 2.1 (GF106)
    GT440, GT430 and GT420 are CC 2.1 (GF108)

    GTX580 and GTX570 are CC2.0 (GF110)
    GTX560 is CC 2.1 (GF114)
    GTX550 is CC 2.1 (GF116)

    GTX 590 is presumably CC2.0 (GF110)

    All the CC2.0 cards have 32 cuda cores (shaders) per SM (core) and all the CC2.1 cards have 48shaders per SM. Unfortunately the CC2.1 cards under perform relative to the shader per SM as if they only have 32shaders per SM, making them about 33% slower per SM.

    Bottom line - Get a CC2.0 card and not a CC2.1 card.


    24 hour prime stable? Please, I'm 24/7/365 WCG stable!

    So you can do Furmark, Can you Grid???

  11. #11
    Xtreme Cruncher
    Join Date
    Mar 2010
    Posts
    451
    Quote Originally Posted by Snow Crash View Post
    updated OP with average for baseline for KASHIF_HIV wu type
    Cool...so how many KASHIF WU's make up the 40250 sec avg?

  12. #12
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    The original avg. was based on only a couple of runs but they are very consistent so I doubt there will be much change with a larger number of WUs used as a baseline. I will be posting up averages tonight/ tomorrow morning for all WUs that were run over the past week and I'll add "how many" were used in determining avererages. Unless someone thinks it would be valluable I had not planned on including standard deviation ... anyone interested in that stat?

  13. #13
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Quote Originally Posted by Snow Crash View Post
    The original avg. was based on only a couple of runs but they are very consistent so I doubt there will be much change with a larger number of WUs used as a baseline. I will be posting up averages tonight/ tomorrow morning for all WUs that were run over the past week and I'll add "how many" were used in determining avererages. Unless someone thinks it would be valluable I had not planned on including standard deviation ... anyone interested in that stat?
    I'd be interested in standard deviation if only to support that the lengths are fairly consistent...

    Look forward to the update.


    24 hour prime stable? Please, I'm 24/7/365 WCG stable!

    So you can do Furmark, Can you Grid???

  14. #14
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    OP updated ... baseline complete and there was only an increase of 1 sec from the original estimate
    I will be setting up the 2 instance runs later today (maybe tomorrow). Wish me luck that I don't crash out and get stuck in the GPUGrid "no WUs for you" limbo.
    If I keep getting the same WU type I will post up results after the first couple of runs to see if we're making progress or if it's time to start a new config test.

  15. #15
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    OP updated with multi instance results

    If you want the most from a 460 then run LONG WUs, just make sure to keep your BOINC cache really low so you always return inside the 24 hour bonus deadline. You will get better PPD even if you don't make the deadline as compared to running the short ones.

    In case you're wondering ... no, you'll never be able to run 2 instances of LONG WUs and return them within the bonus deadline.

    To validate my original baseline numbers I poked around through individual tasks for various team members who have the "show my computers on the website" set to YES at GPUGrid and found that our averages (for the same WU subtype) vary quite a bit but mostly explainable ... those of you not running with the environmental variable SWAN_SYNC set to 0 are getting hammered and I fully understand and respect your choice to not take a core away from WCG.

    I *think* I'm seeing that 260.99 drivers run a bit faster than 266.58 but it only amounts to 20 minutes over the course of an entire WU.

    I checked that our standard "faster shader freq. = faster runtime" - still true.
    I checked that our standard "WinXP is faster that Vista / Win7" - still true.

    Interesting side note: The 480 did see an overall efficiency improvement but I'll be testing out to see if that holds true for the LONG WUs also as the typically run at a higher utilization rate to begin with.

    The only other experiments at this point that I think can help identify ways to increase PPD for the 460 series of cards (includes all of the Compute Capable 2.1 cards), beyond everyhitng we are already aware of are driver version and PCIe. I would only do one of those at a time so how about we informally vote on it

    Drivers or PCIe first?

  16. #16
    Xtreme Cruncher
    Join Date
    Mar 2010
    Posts
    451
    Quote Originally Posted by Snow Crash View Post
    After only half way through the first set of 2 instance I can call this a "no win" as there is no substantive improvement. The time doubled almost exactly. While the GPU usage did go up into the mid to high 90s I believe that internal resource contention inside the GPU caused by running 2 instances negated the almost 10% utilization increase for a net 0% overall efficiency.
    Bummer. But thanks for taking the time to look into it.


    +1 for Drivers!

  17. #17
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    I'll be doing this with long WUs.
    Forward to 270.xx beta or backward to 260.99?

  18. #18
    Xtreme Cruncher
    Join Date
    Mar 2010
    Posts
    451
    Quote Originally Posted by Snow Crash View Post
    I'll be doing this with long WUs.
    Forward to 270.xx beta or backward to 260.99?
    Otis11 started a thread inquiring about performance under the 270 beta driver, looks like a good opportunity to get some, erm...beta data.

  19. #19
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Quote Originally Posted by SMTB1963 View Post
    Otis11 started a thread inquiring about performance under the 270 beta driver, looks like a good opportunity to get some, erm...beta data.
    Yeah, I'd do newer to older and see what's best. I'd imagine newer is better but that has no basis. (that I know of)


    24 hour prime stable? Please, I'm 24/7/365 WCG stable!

    So you can do Furmark, Can you Grid???

  20. #20
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    OK ... after a week of 265.90 (yes, I forgot I as running hacked quadro driver) and a week of 270.51 beta it looks like the 270.51 looks to be slighly more efficient but I will say the difference is well within normal variation between WUs of the same type. It was difficult to get similar WU types as the new TONI_AGG started to ramp up AFTER I switched drivers and I got non of the IBUCH_***_EFGR after I switched. That being said, on average the driver gained 5 minutes per WU

    Unless there is something else we can think of to test, and I can get permission from the card's true owner, I think we may be done with this round of testing on a 460 with no real optimizations identified, just a confirmation of things we already knew.

    Run *long* WUs and keep a small cache to make sure you always get the <24 hour return point multiplier bonus.
    The latest drivers (270.51) *may* provide a performance benefit but it is very small so there is no real reason to upgrade but also no reason not to.

    Kirk out

  21. #21
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago, IL
    Posts
    840
    Something I've noticed recently that is related to your thread...


    Because of the efficiency issues with Windows Vista and 7 I have been using Windows XP x64 to handle the dedicated crunchers for GPUGRID. My typical configuration has been to set aside one core for GPUGRID on the XP machines. For instance, on my GPUGRID cruncher I'd typically set up 87.5% CPU utilization for WCG. On my SR-2 rig that would leave GPUGRID a free core for crunching and another free core for actual application use. For whatever reason I found this helped maximize the response of my computer when using it while also providing for maximum crunching. When observing the "time remaining + time elapsed" for the long WUs I'd regularly see 14+ hours for a single WU. Of course, this was usually disappointing because the long WUs claim to be 8-12 hours. I'm running a GTX 570 for the above data and I've been disappointed because I was lucky if it finished in anything less than 14 hours.

    Then I did something...

    On my SR-2 rig I lowered the CPU utilization from 87.5% to 75% about 5 days ago because it was starting to get warm in the room. I also lowered the overclock by about 10% because of the heat that was causing WU errors. Since GPUGRID WUs typically cause 99% CPU utilization for one core I was sure that my WUs would finish more slowly for GPUGRID. Instead, I've been watching my GPU crunching time and since I've lowered the CPU utilization I've noticed that the GPUGRID times have always been less than 12 hours now.

    Perhaps the issue with GPUGRID performing so poorly in Windows 7 and Vista is related to us maxing out our cores for WCG and GPUGRID? I just find it very strange that I lowered my OC and lowered the CPU utilization and my GPU crunching times have improved.

    Anyone know if GPUGRID changed something recently that could account for this change and it was just a coincidence that I changed my settings around the same time? I know a few days ago they had an issue where GPUGRID was handing out tons of WUs to everyone and filling our queues.

  22. #22
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    Couple of things come to mind ... there are different types of WUs which will run different lengths and will give different points. Another is that if you are cranking your GPU too high you can actually be causing *recoverable* errors ... you'll never know this is happening until you start to see poor overall runtimes. Here's a dirty little secret for you ... the amount of CPU used by a GPU app through BOINC is not governed by the CPU utilization preference so in general the more free CPU you have the better GPUGrid will run. None of this has anything to do with the new DMA in Vista/ Win7 and afaik there just isn't anything that can be done about that.
    Last edited by Snow Crash; 05-07-2011 at 12:55 PM.

  23. #23
    Xtreme Cruncher
    Join Date
    Dec 2008
    Location
    Texas
    Posts
    5,152
    Quote Originally Posted by Snow Crash View Post
    Couple of things come to mind ... there are different types of WUs which will run different lengths and will give different points. Another is that if you are cranking your GPU too high you can actually be causing *recoverable* errors ... you'll never know this is happening until you start to see poor overall runtimes. Here a dirty little secret for you ... the amount of CPU used by a GPU app through BOINC is not governed by the CPU utilization preference so in general the ower the more free CPU you have the better GPUGrid will run. None of this has anything to do with the new DMA in Vista/ Win7 and afaik there just isn't anything that can be done about that.
    The other way you might notice those errors is to find the standard Dev of your run times. (have to keep projects seperate) They should be VERY close if you aren't having any issues. (My SDEV is ~1.6% for reference.)


    24 hour prime stable? Please, I'm 24/7/365 WCG stable!

    So you can do Furmark, Can you Grid???

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •