This thread assumes you have figured out how to run BOINC/ GUPUGrid on your system, this is not intended to be a BOINC setup guide. If you are would like some help with the base setup please check out the great "everything you needed to know" thread by Otis11.
http://www.xtremesystems.org/forums/...d.php?t=246583
Issue: Currently, the Compute Capable (CC) 2.1 cards (GTX460, GTX560 Ti), run at a much lower GPU utilization than CC2.0 cards (GTX465, GTX470, GTX480, GTX560, GTX580, GTX590). Not that I make pretend to understand CUDA programming but the core performance bottleneck is based on the 1x48 ratio of cores to shaders compared to the 1x32 ratio. If your card is not listed above then this thread probably isn't going to help you.
Goal: Maximize output for a GTX460 for running GPUGrid applications on Windows by running 2 instances of ACEMD on a single card at the same time via a custom app_info.xml configuration file.
Factors: Other factors are how the OS handles drivers (Vista and Win7 slow, XP fast), CPU speed and availability, and perhaps the most challenging factor to try and control is that GPUGrid has multiple different types of Work Units (WUs) that run internally using different input paramters (the experiments themselves) which in turn directly effect the efficiency of the overall GPUGrid application, ACEMD. I have also seen conflicting information that the PCIE slot bandwidth can be a performance factor but it seems like most of the information on this points to it being a non-issue. If I have time I will try to address that topic but seriously, I have a life and there's only so much I can do
<aside>: The previously accepted methodology was to simply turn networking off, make a backup of your BOINC data folder and when the first run was compelte, turn BOINC off, restore the backup, reconfigure and run again, rinse and repeat until you had all configurations accounted for. This doesn't seem to work anymore in Win7, likely because I just haven't figured which file(s)in the BOINC program folder needs to be copied also. The upshot is that within the same WU type the total runtimes are within a few percent which doesn't really matter as we are looking for substantial increases. Anything less than 10% is probably not worth the effort and anyone who thinks it is has probably already done all this testing anyway :-)
Baseline: All that being out of the way, I want to see if we can increase the effeciency of perhaps most common OS configuration, Win7x64. I am going to start with what I have come to think of as the best balance approach for a i7-920 system, which is running with HT on and leaving one thread left free for GPUGrid, using the SWAN_SYNC environmental variable to dedicate that free thread to GPUGrid.
Off to set this up which is going to take a while ... move PC to basement, break it out of (gasp) it's case, breakdown the cable management, swap cards and PSU (crap I have to break down the other PC to do this ... WTF was I thinking putting them in cases and making them all nice and neat???
Three cheers go out to our GPUGrid team mate sponsoring the hardware for these tests
----------------------------------------------------------------
Baseline Win7x64: CPU (20*200) w/ HT ON, 1 threads free, SWANSYNC=0, running 1 instance of ACEMD
WU Type: KASHIF_HIVPR_wo and KASHIF_HIVPR_maca_wo:
Across 10 runs, there was only a .0143% runtime difference between shortest and longest so I'm not gonna bother with a stddev here
avg 40251.28 seconds per WU: 87-90% GPU Utilization
sidebar: there are two different sub types of HIV WUs, wo and so.
The above stat was based on only the "wo" subtype: KASHIF_HIVPR_wo, KASHIF_HIVPR_maca_wo. Back of the napkin calcs tells me the "so" subtype would take about an hour and 7 minutes longer per WU on a 460.
----------------------------------------------------------------
Test 1: Win7x64, CPU (20*200) w/ HT ON, 2 threads free, SWANSYNC=0, running 2 instances of ACEMD
After only half way through the first set of 2 instance I can call this a "no win" as there is no substantive improvement. The time doubled almost exactly. While the GPU usage did go up into the mid to high 90s I believe that internal resource contention inside the GPU caused by running 2 instances negated the almost 10% utilization increase for a net 0% overall efficiency.
----------------------------------------------------------------
Bookmarks