Page 9 of 10 FirstFirst ... 678910 LastLast
Results 201 to 225 of 226

Thread: SuperPi on GPU, were going CUDA

  1. #201
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by K404 View Post
    But only benchers care about it. For CUDA to develop it has to have real-world uses that developers can score funding to code for and turn heads to CUDA that way.
    It does have its uses and it has a lot of potential. But a lot of people don't like it because it's too "closed" right now (Nvidia-specific).

    Programming for CUDA, I would guess, is kind of a nightmare because it's difficult to parallel across so many cores.

    Think about it. There's already so much trouble about several cores... but hundreds?
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

  2. #202
    Wanna look under my kilt?
    Join Date
    Jun 2005
    Location
    Glasgow-ish U.K.
    Posts
    4,396
    Im not a programmer but I agree completely. Allocation and re-assembly across that many "nodes" could be a royal nightmare but I do believe it will need something as big as Photoshop compatability or the ability to crunch Pixar movies before things will REALLY get moving.

    Something that will either be used by thousands of people, including a % at professional level, or clients willing to drop millions on something that works
    Quote Originally Posted by T_M View Post
    Not sure i totally follow anything you said, but regardless of that you helped me come up with a very good idea....
    Quote Originally Posted by soundood View Post
    you sigged that?

    why?
    ______

    Sometimes, it's not your time. Sometimes, you have to make it your time. Sometimes, it can ONLY be your time.

  3. #203
    Registered User
    Join Date
    Jan 2013
    Location
    FL,USA
    Posts
    39
    Frankenstein Necro-bump for a good idea that never came to fruition.There's gotta be some way to do this;for ATI cards, too.err.."AMD" now

  4. #204
    Xtreme Member
    Join Date
    May 2006
    Posts
    101
    Of course, it's called OpenCL.
    Go back to the grave.

  5. #205
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Come back to life, sweet thread ...



    In the last weeks I've implemented GPUPI, a benchmark that computes pi in parallel via OpenCL. It was something that I wanted to do for years, but never got that far. Somehow I successfully implemented it and it's currently in beta version. Have a look at it, it's pretty fun to crunch pi on your gpu.


    GPUPI 1B: AMD Radeon R9 290, NVIDIA GeForce GTX 980 and Intel Core i7-4960X@4 GHz


    I am aware that FUGGER and his team wanted to do something like that for some time. I don't want to piss anybody off, on the contrary! I just think that SuperPI on the GPU has something magical and the idea never got out off my head.

    I'd also like to dedicate this benchmark to our beloved Turrican. He will always be missed.

    Download, technical details & FAQ: GPUPI Beta 1.2

  6. #206
    Xtreme Enthusiast
    Join Date
    Jan 2008
    Location
    Athens -> Hellas
    Posts
    944
    ^ I tried this, this is my result with stock GPU clocks :


  7. #207
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    So, this is essentially a double precision benchmark? It is unfortunate that many consumer GPUs have artificially limited DP performance.

    Anyway, here's an old Fermi:


  8. #208
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Quote Originally Posted by fellix_bg View Post
    So, this is essentially a double precision benchmark? It is unfortunate that many consumer GPUs have artificially limited DP performance.
    Some parts use double precision, but most of it is integer.
    The calculation itself is split into two parts, that are treated in smaller packages called batches. A batch consists of a number of partial calculations, configured by the Batch Size, and the memory reduction to accumulate all results. The calculation itself uses at least 64 bit integers and doubledouble arithmetic for 500M and smaller. Starting with 1 billion digits each kernel has to make use of additional 128 bit integer routines at a certain point of the calculation. The higher precision is achieved with two unsigned 64 bit integers, but needs more complex algorithms for all basic math functions.
    felix_w, nice to see some FirePro scores.

  9. #209
    Administrator
    Join Date
    Nov 2007
    Location
    Stockton, CA
    Posts
    3,568
    WOW I am going to have to give this a try. Thanks !

  10. #210
    Xtreme Enthusiast
    Join Date
    Jan 2008
    Location
    Athens -> Hellas
    Posts
    944
    _mat_ should we start a new thread, to keep a list with the scores also ?

  11. #211
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Sure, I will start a new one!

    Edit: Here it is: http://www.xtremesystems.org/forums/...=1#post5242169

  12. #212
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Hmmm...

    Your R9 290 results.


    My R9 290x results.


    I feel utterly defeated.

  13. #213
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    Yes, the 290 seems very strong even tough it has less compute units. I don't own any R9 cards, so I can't really tell you what is going on there. But there must be good reason for this, maybe driver related, maybe a difference in the architecture of GPU.

  14. #214
    I am Xtreme FlanK3r's Avatar
    Join Date
    May 2008
    Location
    Czech republic
    Posts
    6,823
    many thanks man, wow, Radeon is so strong here...
    ROG Power PCs - Intel and AMD
    CPUs:i9-7900X, i9-9900K, i7-6950X, i7-5960X, i7-8086K, i7-8700K, 4x i7-7700K, i3-7350K, 2x i7-6700K, i5-6600K, R7-2700X, 4x R5 2600X, R5 2400G, R3 1200, R7-1800X, R7-1700X, 3x AMD FX-9590, 1x AMD FX-9370, 4x AMD FX-8350,1x AMD FX-8320,1x AMD FX-8300, 2x AMD FX-6300,2x AMD FX-4300, 3x AMD FX-8150, 2x AMD FX-8120 125 and 95W, AMD X2 555 BE, AMD x4 965 BE C2 and C3, AMD X4 970 BE, AMD x4 975 BE, AMD x4 980 BE, AMD X6 1090T BE, AMD X6 1100T BE, A10-7870K, Athlon 845, Athlon 860K,AMD A10-7850K, AMD A10-6800K, A8-6600K, 2x AMD A10-5800K, AMD A10-5600K, AMD A8-3850, AMD A8-3870K, 2x AMD A64 3000+, AMD 64+ X2 4600+ EE, Intel i7-980X, Intel i7-2600K, Intel i7-3770K,2x i7-4770K, Intel i7-3930KAMD Cinebench R10 challenge AMD Cinebench R15 thread Intel Cinebench R15 thread

  15. #215
    Xtreme Addict
    Join Date
    May 2009
    Location
    Switzerland
    Posts
    1,972
    Single 7970 stock (well 1050mhz )

    CPU: - I7 4930K (EK Supremacy )
    GPU: - 2x AMD HD7970 flashed GHZ bios ( EK Acetal Nickel Waterblock H2o)
    Motherboard: Asus x79 Deluxe
    RAM: G-skill Ares C9 2133mhz 16GB
    Main Storage: Samsung 840EVO 500GB / 2x Crucial RealSSD C300 Raid0

  16. #216
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    My 290X 1050/1325 does 22.102 in Windows 7 Pro x64 With beta Cat. 14.7.

    I too was surprised plain R9 290 was faster at a lover clock and tried clock and memory scaling with no surprises, eg. lower clock = slower time.
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  17. #217
    Xtreme Member
    Join Date
    Apr 2010
    Location
    Austria
    Posts
    106
    This 290X on Haswell-E is currently the fasted card in the bench: http://hwbot.org/submission/2673063_...0x_19sec_690ms

    I wonder if the CPU somehow helps achieving the good performance. The whole execution to call the kernels is measured too (well there is'nt much of an alternative), maybe it's pushing the kernels quicker to the gpu.

  18. #218
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    What 290x's are you guys using? Some cards with Elpida memory will score poorly due to broken memory clock tables in the bioses and pushing clocks too far will result in the same.

    for example my sapphire with stock bios and elpida memory gets about 24 seconds.
    Last edited by STEvil; 11-13-2014 at 06:59 PM.

    All along the watchtower the watchmen watch the eternal return.

  19. #219
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    If it's in the BIOS, maybe the memory timings? Lightman mentioned that scaling down the memory clock drops down the results as well. I haven't tested downclocking the memory myself. I'll try it later.

    @Mat, hard to say. Those clocks aren't exactly comparable. Let's see, going graphics clockwise:-

    core clock 1215/1050 = 1.157
    mem clock 1500/1300 = 1.111

    assuming 1000 ops,

    at 1215 MHz = 50.787 ops/sec
    at 1050 MHz = 43.033 ---> that means, 1215 MHz is 18% faster, with a 15% increase in core clock and an 11% increase in mem clock.

    We can conclude that the processor does play a small role. I'm on Haswell 3.7 GHz ~ 3.9 GHz (probably running at 3.9Ghz if it's loading the processor single-threadedly)

    But doesn't that mean we're still losing out to R9 290? 10% shaders is a lot IMO. Need more tests.
    Last edited by blindbox; 11-13-2014 at 08:31 PM.

  20. #220
    Xtreme X.I.P.
    Join Date
    Oct 2006
    Posts
    1,260
    Nice benchmark Have to give it a shot.
    --->TeamPURE<---

  21. #221
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by STEvil View Post
    What 290x's are you guys using? Some cards with Elpida memory will score poorly due to broken memory clock tables in the bioses and pushing clocks too far will result in the same.

    for example my sapphire with stock bios and elpida memory gets about 24 seconds.
    Yeah, mine is Elpida so this is the differentiator.
    When I was mining my R9 290 Hynix card was faster at it with lower clocks than my full R9 290X. I had to run R9 290X @1500Mem to get same hashing speed as Hynix card with stock 1250MHz memory (which didn't OC at all :P).
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  22. #222
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Quote Originally Posted by Lightman View Post
    Yeah, mine is Elpida so this is the differentiator.
    When I was mining my R9 290 Hynix card was faster at it with lower clocks than my full R9 290X. I had to run R9 290X @1500Mem to get same hashing speed as Hynix card with stock 1250MHz memory (which didn't OC at all :P).
    I hope it doesn't mean anything in gaming.

  23. #223
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    It does and is why some review cards (and most early retail cards with Elpida memory) gave crap numbers even with a "better" fan bios.

    All along the watchtower the watchmen watch the eternal return.

  24. #224
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    . That really sucks. Quite disappointed with ASUS too, since this is a ROG MATRIX card.

  25. #225
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    Quote Originally Posted by STEvil View Post
    It does and is why some review cards (and most early retail cards with Elpida memory) gave crap numbers even with a "better" fan bios.
    So, is there any way to fix it? Is it bios where those memory tables reside in?

    My Asus 290x gave "00h 00m 24.034s PI value output -> 5895585A0"

Page 9 of 10 FirstFirst ... 678910 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •