Page 3 of 7 FirstFirst 123456 ... LastLast
Results 51 to 75 of 175

Thread: AMD does reverse GPGPU, announces OpenCL SDK for x86

  1. #51
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816

    300 done ...

    done ...


    the time I saved programming the thing for sure make up for the time the PhD you need to do for GPGPU trivial coding of Factorial ....

    I am now calculation the 1000000! ... what about you?
    DrWho, The last of the time lords, setting up the Clock.

  2. #52
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    Dude, I can download a factorial calculator too. You're still avoiding my questions.

  3. #53
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816

    Speakers vs Doers

    Quote Originally Posted by trinibwoy View Post
    The entire premise of your argument is false. Factorial isn't a strictly iterative calculation as all mutliplications of every two numbers in the sequence can happen in parallel. For example, 10!, using 4 threads.

    Clock #1
    Thread 1: 10*9
    Thread 2: 8*7
    Thread 3: 6*5
    Thread 4: 4*3
    Clock #2
    Thread 1: 90*56
    Thread 2: 30*12
    Clock #3
    Thread 1: 5040 * 360
    Clock #4
    Thread 1: 1814400 * 2

    Only 4 clocks on a theoretical 4 core processor. How is your naive iterative algorithm supposed to be faster when it takes 8 clocks? You really need to read up on parallel scans, there isn't even any branching in this algorithm so I'm really curious where you're getting all of this bad info from.
    did you ever try to move result from one Stream unit to an other one in a GPU?
    the thing that I know if that it toke me 1 hour to prove that it is easy to do it on a CPU ... and you are still trying to do the theory ...

    What you do not realize is that the arguement that I am using is what make x86 the most formidable architecture, I can put software together at the speed of light, while you keep thinking about how to do it ...
    I gave you the link about how to parallelize n!, I am fully aware of it ... just try to implement it, and you ll figure out that your "trivial" claim is 100% bogus. In the mean time, I am demonstrating how easy it was for me to calculate the trivial part.

    It is always easy to say it is trivial, it is an other story to actually do the work ... and it is my entiere point ...

    I am a Doer ... I do stuff, I don't do claims without knowing exactly what I am talking about.

    AMD is right, on OpenCL, it is more important to start with the CPU side.

    Back to Topic, thanks for not using "trivial" anymore in your posting.
    my 1000000! number is coming. Just for the fun of it, I ll thread it
    DrWho, The last of the time lords, setting up the Clock.

  4. #54
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    i wonder what it is like to be brainwashed by intel?

  5. #55
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    Quote Originally Posted by Drwho? View Post
    It is always easy to say it is trivial, it is an other story to actually do the work ... and it is my entiere point ...
    Ummmm it is trivial, any high school student can understand how easily parallelizable it is. Don't know why that seems to escape you.

    I am a Doer ... I do stuff, I don't do claims without knowing exactly what I am talking about.
    You haven't done anything except paste the result from a program you downloaded on the internet What you have done though is demonstrate that you believe running something slowly on a CPU is good because it's easier to code. That's pretty backward thinking my friend.

  6. #56
    Xtreme Addict
    Join Date
    Oct 2007
    Location
    Chicago,Illinois
    Posts
    1,182
    'for' loop initial declaration used outside C99 mode .



  7. #57
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    Quote Originally Posted by trinibwoy View Post
    Ummmm it is trivial, any high school student can understand how easily parallelizable it is. Don't know why that seems to escape you.



    You haven't done anything except paste the result from a program you downloaded on the internet What you have done though is demonstrate that you believe running something slowly on a CPU is good because it's easier to code. That's pretty backward thinking my friend.
    Dude ... I assemble it, made the MFC frame ... and prove that you did not do it :-P
    again ... I showed you the link before you come up with you own // "idea".
    here :http://www.luschny.de/math/factorial/Benchmark.html
    I understand it is perfectly threadable ... TRY TO DO IT ON THE GPU AND YOU'LL FIGURE OUT THAT YOU ARE SLOWER THAN THE CPU!

    CLEAR!??


    I don't know who is the one with 'understanding' problems here...
    DrWho, The last of the time lords, setting up the Clock.

  8. #58
    Xtreme Member
    Join Date
    Mar 2008
    Location
    germany-münster
    Posts
    375
    why are you arguing about lame factorials
    system:

    Phenom II 920 3.5Ghz @ 1.4v, benchstable @ over 3,6Ghz (didnt test higher)
    xigmatek achilles
    sapphire hd4870 1gb @ 820 1020
    Gigabyte GA-MA790GP-DS4H
    8gb a-data 4-4-4-12 800
    x-fi xtrememusic
    rip 2x 160gb maxtor(now that adds up to 4...)
    320gb/250gb/500gb samsung

  9. #59
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by trinibwoy View Post
    Ummmm it is trivial, any high school student can understand how easily parallelizable it is. Don't know why that seems to escape you.
    hey im in high school and i take that as an offense but i do see how easy that would be to parallelize and then paste into a forum. it looks like you can do this with a lot of sequences.

  10. #60
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    Quote Originally Posted by clonez View Post
    why are you arguing about lame factorials
    That is what trinibwoy picked on my posting to say that I was totally wrong ... His Big opinion, nothing to back it up ... That is the conclusion.
    Last edited by Drwho?; 08-15-2009 at 02:01 PM.
    DrWho, The last of the time lords, setting up the Clock.

  11. #61
    Xtreme Mentor
    Join Date
    Sep 2007
    Location
    Ohio
    Posts
    2,977
    You should probably tell ATI, Nvidia, and intel that their future plans of doing processing on the GPU was all wrong, and the the CPU really was faster all along.

    Probably a tough sell, but you might make millions if you can convince them?

    How long until all games are converted over to run just on the CPU for the added performance you have displayed here with your...
    31765153303077535897513456514742416740151747072083 91018699899932793649108926879247397058141528555439 65954222603919059265825637344676406359525838966981 51198395988660368375304201799032818594556941255051 90663028548695333776829846000318080938221300381022 14387057461181304251961916405970456035183121708151 65864735655654053292841174862895708285679230005352 58463770612805914520355463899321278759063496278379 75871352588618213252263577038396202737385324908353 68049799008570152248330343952519734465334299465256 52360967428345505237397339023742618088717992837222 85366293439240895762913154442106573609205481842139 36589386771554284247727510016673435774309363894844 45647643771840738743794710078671510704495546576262 81566137550730763768080600031844296233977808233311 35978757713698301281757162567168328728151193733668 57894371090977485812228681268241223172726811849752 07863453107495331708260153159440253645365524453587 95203474521342924891664450480435535228197772198197 18690548841768963987827047820661269214725486182478 59626434279190274503452994769367997217285165465591 79947178906788568727857447008428972377823476308074 09195129662383464278396538650173246658501921440916 94630371265581197700774682562035198318782913591013 99781730363517376470671438399281029122446084832051 89832483488551310255397215831849316536707322731729 95431750775475634748127320956655431851879586978172 49172170086576809890832783083824043773797445534252 56887128988555131809670124978594542906096273705906 59970784172738420721605576789060565167694565490120 38816577586193923092436298338954985727987452339809 04998584674848503995091093988342104246931136178759 78611803096108774362764990414655167545507613665725 91499337611434024376291029038413588853131259113254 48492258960071848511693901939854346494154837823383 02531368775990005443722332901462568184095998830522 52158532859983399033659541893269668016326589935823 46632470803240204297913574257554985493728961920916 'ssss

    I still think GPU computing is the future.
    Last edited by Talonman; 08-15-2009 at 02:15 PM.
    Asus Maximus SE X38 / Lapped Q6600 G0 @ 3.8GHz (L726B397 stock VID=1.224) / 7 Ultimate x64 /EVGA GTX 295 C=650 S=1512 M=1188 (Graphics)/ EVGA GTX 280 C=756 S=1512 M=1296 (PhysX)/ G.SKILL 8GB (4 x 2GB) SDRAM DDR2 1000 (PC2 8000) / Gateway FPD2485W (1920 x 1200 res) / Toughpower 1,000-Watt modular PSU / SilverStone TJ-09 BW / (2) 150 GB Raptor's RAID-0 / (1) Western Digital Caviar 750 GB / LG GGC-H20L (CD, DVD, HD-DVD, and BlueRay Drive) / WaterKegIII Xtreme / D-TEK FuZion CPU, EVGA Hydro Copper 16 GPU, and EK NB S-MAX Acetal Waterblocks / Enzotech Forged Copper CNB-S1L (South Bridge heat sink)

  12. #62
    I am Xtreme
    Join Date
    Sep 2006
    Posts
    10,374
    at least my GPU is bloody fast at folding@home... I like that together with my 8 threads daily

    Go out into the sun Sir Francois... !!
    Question : Why do some overclockers switch into d*ckmode when money is involved

    Remark : They call me Pro Asus Saaya yupp, I agree

  13. #63
    Xtremely Kool
    Join Date
    Jul 2006
    Location
    UK
    Posts
    1,875
    Quote Originally Posted by Talonman View Post
    You should probably tell ATI, Nvidia, and intel that their future plans of doing processing on the GPU was all wrong, and the the CPU really was faster all along.

    Probably a tough sell, but you might make millions if you can convince them?
    I still think GPU computing is the future.
    Your really do see things a as absolutes as black & white to the point that you put words into peoples mouths.

    All i see is DRWho saying that the CPU is still faster at some things & the GPU is not suitable for everything.

    And then you turn around to make out like he's been saying that nothing should be done on the GPU.

  14. #64
    Xtreme Mentor
    Join Date
    Sep 2007
    Location
    Ohio
    Posts
    2,977
    I am actually sitting here hoping he post some more numbers....

    They make for such a good read.

    Here is one...

    10000000000000000000000000000000000000000000000000 000000000000000000000000000000000000

    That's a big one too.

    "the CPU is still faster at some things & the GPU is not suitable for everything."

    That i agree with.
    Last edited by Talonman; 08-15-2009 at 02:51 PM.
    Asus Maximus SE X38 / Lapped Q6600 G0 @ 3.8GHz (L726B397 stock VID=1.224) / 7 Ultimate x64 /EVGA GTX 295 C=650 S=1512 M=1188 (Graphics)/ EVGA GTX 280 C=756 S=1512 M=1296 (PhysX)/ G.SKILL 8GB (4 x 2GB) SDRAM DDR2 1000 (PC2 8000) / Gateway FPD2485W (1920 x 1200 res) / Toughpower 1,000-Watt modular PSU / SilverStone TJ-09 BW / (2) 150 GB Raptor's RAID-0 / (1) Western Digital Caviar 750 GB / LG GGC-H20L (CD, DVD, HD-DVD, and BlueRay Drive) / WaterKegIII Xtreme / D-TEK FuZion CPU, EVGA Hydro Copper 16 GPU, and EK NB S-MAX Acetal Waterblocks / Enzotech Forged Copper CNB-S1L (South Bridge heat sink)

  15. #65
    Xtreme Cruncher
    Join Date
    Jul 2006
    Posts
    1,374
    Let me say first of all I'm guessing some of this based on what I've read from the CUDA manuals (I'm no programming expert obviously). If I get this all wrong, then correct me politely; I'm always happy to be corrected as I don't want to be wrong as it were. That being said, the factorials calculation can be computed in parallel, but you'd have to break up the problem and be able to keep track of what has/hasn't been calculated, which would be more effectively done on the CPU. Issuing the commands to multiply x*x to each unit is really easy after that point (and would be faster on the GPU). Data flows one way in the GPU, so you can't really go back and effectively reference what has and hasn't been done. So, yes, if you try to calculate the factorials ONLY on the GPU, it could be slow due to the initial pre-processing. The actual calculation would indeed be fast on the GPU. By effectively balancing the use of the CPU and GPU in tandem, you can likely break down the problem in to chunks, maximizing the strengths of each processor types. The balancing of course being the key, and requiring the PhD that Francois speaks of...

  16. #66
    I am Xtreme zanzabar's Avatar
    Join Date
    Jul 2007
    Location
    SF bay area, CA
    Posts
    15,871
    Quote Originally Posted by Talonman View Post
    You should probably tell ATI, Nvidia, and intel that their future plans of doing processing on the GPU was all wrong, and the the CPU really was faster all along.

    Probably a tough sell, but you might make millions if you can convince them?

    ......

    I still think GPU computing is the future.
    why tell intel that they run x86 on the shader, and i think that amd is working to that also
    5930k, R5E, samsung 8GBx4 d-die, vega 56, wd gold 8TB, wd 4TB red, 2TB raid1 wd blue 5400
    samsung 840 evo 500GB, HP EX 1TB NVME , CM690II, swiftech h220, corsair 750hxi

  17. #67
    Xtreme Mentor
    Join Date
    Sep 2007
    Location
    Ohio
    Posts
    2,977
    I bet both of those GPU's will be faster than a CPU at Factorial calculating too.

    I just don't think you can make a blanket statement that CPU's are faster than GPU's at Factorial Calculating...

    What CPU, what GPU?

    Can we say that Nvidia GPU's are faster than ATI GPU's and not take some flack? A review on performance should be more specific.

    Yes, some CPU's might be faster than some GPU's. I can acknowledge that.
    My Q6600 probably would beat my 8600 GT...
    Asus Maximus SE X38 / Lapped Q6600 G0 @ 3.8GHz (L726B397 stock VID=1.224) / 7 Ultimate x64 /EVGA GTX 295 C=650 S=1512 M=1188 (Graphics)/ EVGA GTX 280 C=756 S=1512 M=1296 (PhysX)/ G.SKILL 8GB (4 x 2GB) SDRAM DDR2 1000 (PC2 8000) / Gateway FPD2485W (1920 x 1200 res) / Toughpower 1,000-Watt modular PSU / SilverStone TJ-09 BW / (2) 150 GB Raptor's RAID-0 / (1) Western Digital Caviar 750 GB / LG GGC-H20L (CD, DVD, HD-DVD, and BlueRay Drive) / WaterKegIII Xtreme / D-TEK FuZion CPU, EVGA Hydro Copper 16 GPU, and EK NB S-MAX Acetal Waterblocks / Enzotech Forged Copper CNB-S1L (South Bridge heat sink)

  18. #68
    Xtreme Mentor
    Join Date
    Sep 2007
    Location
    Ohio
    Posts
    2,977
    Quote Originally Posted by xVeinx View Post
    Let me say first of all I'm guessing some of this based on what I've read from the CUDA manuals (I'm no programming expert obviously). If I get this all wrong, then correct me politely; I'm always happy to be corrected as I don't want to be wrong as it were. That being said, the factorials calculation can be computed in parallel, but you'd have to break up the problem and be able to keep track of what has/hasn't been calculated, which would be more effectively done on the CPU. Issuing the commands to multiply x*x to each unit is really easy after that point (and would be faster on the GPU). Data flows one way in the GPU, so you can't really go back and effectively reference what has and hasn't been done. So, yes, if you try to calculate the factorials ONLY on the GPU, it could be slow due to the initial pre-processing. The actual calculation would indeed be fast on the GPU. By effectively balancing the use of the CPU and GPU in tandem, you can likely break down the problem in to chunks, maximizing the strengths of each processor types. The balancing of course being the key, and requiring the PhD that Francois speaks of...
    That sounds correct to me, for what that's worth.

    "By effectively balancing the use of the CPU and GPU in tandem, you can likely break down the problem in to chunks, maximizing the strengths of each processor types."

    That is some big truth right there...
    Asus Maximus SE X38 / Lapped Q6600 G0 @ 3.8GHz (L726B397 stock VID=1.224) / 7 Ultimate x64 /EVGA GTX 295 C=650 S=1512 M=1188 (Graphics)/ EVGA GTX 280 C=756 S=1512 M=1296 (PhysX)/ G.SKILL 8GB (4 x 2GB) SDRAM DDR2 1000 (PC2 8000) / Gateway FPD2485W (1920 x 1200 res) / Toughpower 1,000-Watt modular PSU / SilverStone TJ-09 BW / (2) 150 GB Raptor's RAID-0 / (1) Western Digital Caviar 750 GB / LG GGC-H20L (CD, DVD, HD-DVD, and BlueRay Drive) / WaterKegIII Xtreme / D-TEK FuZion CPU, EVGA Hydro Copper 16 GPU, and EK NB S-MAX Acetal Waterblocks / Enzotech Forged Copper CNB-S1L (South Bridge heat sink)

  19. #69
    Banned
    Join Date
    Aug 2008
    Posts
    1,052
    Quote Originally Posted by Talonman View Post
    I still think GPU computing is the future.
    The GPU is going to become part of the CPU.

    I think much of this push for GPU computing is just marketing on the GPU makers part as they desperately try to stay alive and be relevant.

    A standalone GPU company like Nvidia won't even exist in 10 years time.

  20. #70
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by Chad Boga View Post
    The GPU is going to become part of the CPU.
    i could see this becoming the norm in low power and mobile devices but with large caches and multicore cpu's they really dont have the die space for a gpu, especially a high performance nvidia gpu.

  21. #71
    Banned
    Join Date
    Aug 2008
    Posts
    1,052
    Quote Originally Posted by Chumbucket843 View Post
    i could see this becoming the norm in low power and mobile devices but with large caches and multicore cpu's they really dont have the die space for a gpu, especially a high performance nvidia gpu.
    That is the case today, but in three or four shrinks time?

  22. #72
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    Quote Originally Posted by trinibwoy View Post
    Clock #1
    Thread 1: 10*9
    Thread 2: 8*7
    Thread 3: 6*5
    Thread 4: 4*3
    Clock #2
    Thread 1: 90*56
    Thread 2: 30*12
    Clock #3
    Thread 1: 5040 * 360
    Clock #4
    Thread 1: 1814400 * 2
    Actually, after looking at the threadization of the factorial, it is far from being trivial. KNowing that you have to deal with numbers that are way behond what the CPU or GPU registers can deal with ... Close to a million digits ... It is not as simple as explain on the quote ...
    you got to write many loops doing multiplication between partial thread results, and multiplying numbers with thousands of digits with each other ... managing "carry" at each step ...
    I am telling you my friend, this is not going to be even 5% of what the CPU is doing ...

    Think about it, your (100000-2)! is 8.2639316883315556986809062972533 E+5565701 using Stirling's approximation ... that is a lot of Digits ... and you have to multiply by 999,999 then by 1,000,000 ...


    you don t have enough memory on the GPU ... say to Memory Swapping though PCI express ...
    The fastest way to do it is use the Int registers .. and threads reasonably.

    So, today, I demonstrated that it is easy to claim that the GPU can do well on everything, the reality is that there is a majority of case where the GPU will NOT be even close to a regular CPU... with some memory :-P
    AMD did the right choice.

    Last edited by Drwho?; 08-15-2009 at 06:27 PM.
    DrWho, The last of the time lords, setting up the Clock.

  23. #73
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    And Intel just shot itself on the foot.

  24. #74
    Xtreme Enthusiast
    Join Date
    Dec 2007
    Posts
    816
    Quote Originally Posted by Calmatory View Post
    And Intel just shot itself on the foot.
    well, honnesty is great no?
    Actually, I believe that x86 will outperform OpenCL, using the same software architecture, so AMD and Intel are almost doing the same thing. AMD for the moment is not going x86, I would not be surprise that they do it one day for the GPU.

    Francois
    DrWho, The last of the time lords, setting up the Clock.

  25. #75
    Xtreme Legend
    Join Date
    Jan 2003
    Location
    Stuttgart, Germany
    Posts
    929
    Quote Originally Posted by Drwho? View Post
    well, honnesty is great no?
    Actually, I believe that x86 will outperform OpenCL, using the same software architecture, so AMD and Intel are almost doing the same thing. AMD for the moment is not going x86, I would not be surprise that they do it one day for the GPU.

    Francois
    i dont think it is possible to make a general statement like "x86 will outperform opencl".

    from what i understand, the unique feature of opencl is that it can run threads on the cpu and the gpu. so run iterative/branching/recursive stuff on the cpu and use the gpu for stuff like matrix multiplications.

    of course in a commercial environment it is also important how quickly you can get things working. your factorial is a great example for that. 1 hour to write cpu code, weeks to write gpu code. coders cost money, time you often don't have, hardware is cheap, just get a few dozen boxes running the cpu implementation

Page 3 of 7 FirstFirst 123456 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •