Results 1 to 21 of 21

Thread: GPUPI - SuperPI on the GPU - Post results!

  1. #1
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106

    Lightbulb GPUPI - SuperPI on the GPU - Post results!



    In the last weeks I've implemented GPUPI, a benchmark that computes pi in parallel via OpenCL. It was something that I wanted to do for years, but never got that far. Somehow I successfully implemented it and it's currently in beta version. Have a look at it, it's pretty fun to crunch pi on your gpu.


    GPUPI 1B: AMD Radeon R9 290, NVIDIA GeForce GTX 980 and Intel Core i7-4960X@4 GHz


    I am aware that FUGGER and his team wanted to do something like that for some time. I don't want to piss anybody off, on the contrary! I just think that SuperPI on the GPU has something magical and the idea never got out off my head.

    I'd also like to dedicate this benchmark to our beloved Turrican. He will always be missed.

    Download, technical details & FAQ: GPUPI Beta 1.2

    ______________________________________________

    Please post any results you got in this thread! I would love to see some heavy Quadros and FirePro cards, altough double precision performance should not be too much of an advantage. If there's a bug, other issues and feedback, just let me know. I will monitor this thread closely.

    Important: Please use the latest graphics/OpenCL drivers, there's available. Old drivers seem to be pretty buggy with double precision.

  2. #2
    Xtreme Enthusiast
    Join Date
    Jan 2008
    Location
    Athens -> Hellas
    Posts
    944
    Τhis is my result with stock GPU clocks :


  3. #3
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Here's mine. ASUS R9 290x ROG something.


  4. #4
    Xtreme Owner Charles Wirth's Avatar
    Join Date
    Jun 2002
    Location
    Las Vegas
    Posts
    11,653
    Not a problem at all, I will poke at Nvidia again to see if we can complete the CUDA version.

    With time we will see a good sample of AMD GPU's performance in a new way.

    This will run on APU too?
    Intel 9990XE @ 5.1Ghz
    ASUS Rampage VI Extreme Omega
    GTX 2080 ti Galax Hall of Fame
    64GB Galax Hall of Fame
    Intel Optane
    Platimax 1245W

    Intel 3175X
    Asus Dominus Extreme
    GRX 1080ti Galax Hall of Fame
    96GB Patriot Steel
    Intel Optane 900P RAID

  5. #5
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    I already have a CUDA version, I implemented it before porting to OpenCL. CUDA has the nicer toolkit and much easier to develop an idea. But the goal was to let NVIDIA and AMD compete in the same bench.

    Yes, AMD has a very nice Integer performance and that helps a lot in this bench. So whatever is good a mining bitcoins should also have a nice performance in GPUPI. But it's not that easy: The benchmark uses multiple precisions for various parts of the calculation. I guess it's about 75% 64 bit integer performance and 25% double precision. Just a rough guess, I could measure it sometime if it's important for understanding and optimizing the benchmark.

    Yes, it runs on APUs and even on CPUs.

  6. #6
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    Hmmm., maybe I can try APU quickly this week .
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  7. #7
    Xtreme Addict
    Join Date
    Jun 2005
    Location
    NYC
    Posts
    1,943
    can i run this in crossfire lol
    Amd Nvidia/Ati -3dmark06 scorebord revisted

    asus L1N64-ws or /b depending on bios chip
    4x1gig 8500 gkill bpk
    2x opteron 8224 @ 3.8ghz
    http://www.xtremesystems.org/forums/...&postcount=236
    vga= 8800gt
    winxp pro

    custom chiller -31 water
    2x dtek fuzions
    bix3-with x3panaflo hi output
    antec 850 quattro

    heat under msimax abitmax and dfimax

  8. #8
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Currently SLI and Crossfire are not possible, please have a look at the FAQ. But it's definitely possible in the future, I just need multiple cards to develop and implement it. Let's see if the vendors can help us out.

  9. #9
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    So, my GPU at stock, R9-270X ToXIC and FX-9590 4715 MHz (hmmm, its not bad in SUperpi with all threads...)



    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  10. #10
    Xtreme Addict
    Join Date
    May 2009
    Location
    Switzerland
    Posts
    1,972
    CPU: - I7 4930K (EK Supremacy )
    GPU: - 2x AMD HD7970 flashed GHZ bios ( EK Acetal Nickel Waterblock H2o)
    Motherboard: Asus x79 Deluxe
    RAM: G-skill Ares C9 2133mhz 16GB
    Main Storage: Samsung 840EVO 500GB / 2x Crucial RealSSD C300 Raid0

  11. #11
    Xtreme Member
    Join Date
    May 2006
    Posts
    101
    Thanks for this _mat_, but, why is this faster on CPU's than SuperPI? It's easy to get below 2s for 1M.
    Was the old code obsolete after all?
    Is it quicker because of OpenCL and the use of modern CPU instructions?

  12. #12
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    First of all it's a completely different algorithm to calculate pi. It uses the BPP formula, the only pi formula that can be computed in parallel. That said it only crunches 9 digits from 1M, exactly the 1.000.001th to the 1.000.010th. SuperPi calculates every digit to 1M, as most of the serial formulas do. The reason is that the calculation always depends on the result in the step before. That is why they can't make use of more than one core, let alone hundreds of cores on a graphics card.

    My benchmark is not very much optimized for CPUs, call it a side effect that it even runs properly on them. It uses FMA every know and then and a double2 data type (for doubledouble arithmetic), but that's pretty much it. Altough the OpenCL driver that compiles the kernels on the fly, can do some autovectorization to improve the performance. Even though I have not intentionally optimized the kernel code, as I have done it for AMD and NVIDIA cards.

  13. #13
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by TimPrice View Post
    Thanks for this _mat_, but, why is this faster on CPU's than SuperPI? It's easy to get below 2s for 1M.
    Was the old code obsolete after all?
    Is it quicker because of OpenCL and the use of modern CPU instructions?
    SuperPi compute all (decimal) digits from start to N. The algorithm used in SuperPi is (for simplicity) not parallelizable. That's why the initial SuperPi on GPU project didn't get anywhere.

    As mentioned by __mat__, his program does something completely different. It computes a few (hexadecimal?) digits starting at the N'th place. This task, uses the BBP formula which is very parallelizable.

    Unfortunately, all known efficient algorithms to compute all digits from start to N are difficult to parallelize. Not to hijack this thread, but my y-cruncher program attempts to do this. But it doesn't do it well enough for GPU (not to mention memory bottlenecks).


    Nice work __mat__! This is the first BBP-based Pi benchmark for GPUs that I've ever seen.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  14. #14
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    its clear this is quicker than superpi. Old superpi is without instrucitons and only for one core (x87 only and not much effective at one core)...
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  15. #15
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Quote Originally Posted by poke349 View Post
    Nice work __mat__! This is the first BBP-based Pi benchmark for GPUs that I've ever seen.
    Hey poke, I've been watching your y-cruncher project for some time now. I even used it to externally validate my hexadecimal results for 32B. Your kind words are very much appreciated!

  16. #16
    Xtreme Addict
    Join Date
    Jun 2005
    Location
    NYC
    Posts
    1,943
    first run

    Amd Nvidia/Ati -3dmark06 scorebord revisted

    asus L1N64-ws or /b depending on bios chip
    4x1gig 8500 gkill bpk
    2x opteron 8224 @ 3.8ghz
    http://www.xtremesystems.org/forums/...&postcount=236
    vga= 8800gt
    winxp pro

    custom chiller -31 water
    2x dtek fuzions
    bix3-with x3panaflo hi output
    antec 850 quattro

    heat under msimax abitmax and dfimax

  17. #17
    Xtreme Legend
    Join Date
    Nov 2003
    Location
    Helsinki, Finland
    Posts
    1,692
    Is this software using double precision?
    Changing the DPFP from 1/8 to 1/2 on Hawaii makes barely any difference at all?

    Also compiling the kernel appears to give several errors in CodeXL for example.

  18. #18
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Quote Originally Posted by The Stilt View Post
    Is this software using double precision?
    Changing the DPFP from 1/8 to 1/2 on Hawaii makes barely any difference at all?

    Also compiling the kernel appears to give several errors in CodeXL for example.
    It uses doubledouble arithmetic by splitting the precision into two doubles. But that's only for a final division with high precision. Double precision performance is more important for calculations smaller than 1B. 1B and above has to make use of special uint128 arithmetic to calculate modpow by using Montgomery exponantiation. So 1B is pretty much bound to the 64 bit integer performance of the device.

    I will have a look at CodeXL, but I don't care too much about the AMD APP SDK, it's a mess. The kernels are compiled in realtime on the target device anyways.

  19. #19
    Xtreme Addict
    Join Date
    Jun 2005
    Location
    NYC
    Posts
    1,943
    Amd Nvidia/Ati -3dmark06 scorebord revisted

    asus L1N64-ws or /b depending on bios chip
    4x1gig 8500 gkill bpk
    2x opteron 8224 @ 3.8ghz
    http://www.xtremesystems.org/forums/...&postcount=236
    vga= 8800gt
    winxp pro

    custom chiller -31 water
    2x dtek fuzions
    bix3-with x3panaflo hi output
    antec 850 quattro

    heat under msimax abitmax and dfimax

  20. #20
    Xtreme Enthusiast
    Join Date
    Feb 2007
    Location
    So near, yet so far.
    Posts
    737
    Here's mine from the living-room pc.

    [[Daily R!G]]
    Core i7 920 D0 @ 4.0GHz w/ 1.325 vcore.
    Rampage II Gene||CM HAF 932||HX850||MSI GTX 660ti PE OC||Corsair H50||G.Skill Phoenix 3 240GB||G.Skill NQ 6x2GB||Samsung 2333SW

    flickr

  21. #21
    Registered User
    Join Date
    Nov 2014
    Posts
    2
    EVGA GTX 980 ACX SC @ 1580 / 8ghz


Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •