MMM
Results 1 to 25 of 815

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

Hybrid View

  1. #1
    Xtreme Cruncher
    Join Date
    Jun 2005
    Location
    Northern VA
    Posts
    1,285
    damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
    did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this for testing purposes.. hehe

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    Its not overkill if it works.


  2. #2
    Xtreme Enthusiast
    Join Date
    Mar 2009
    Location
    Bay Area, California
    Posts
    705
    Quote Originally Posted by skycrane View Post
    damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
    did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this for testing purposes.. hehe

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    Yeah it kinda does. At least enough to match the internal bandwidth or the socket <-> socket connection. That's the problem when you try to use distributed memory like shared memory. Latencies can be hidden pretty well with HT and good cache locality, but not bandwidth.

    FWIW, we had 2 GB/s of just disk bandwidth when we did the 10 trillion digit computation of Pi. Not only was it severely limiting, but the program is specifically optimized for using disk.
    There is somewhat of a fundamental problem though: The FFT algorithm requires very high Bisection Bandwidth to run efficiently.
    Of course this doesn't exist - even on the best connected super-computers. So the efficiency is extremely poor on them. (even with specialized distributed implementations)

    That's not to say I can't find a way to do any better. But I have a full-time job now and I don't have as much time as I used to.

    but its yours untill i can sell it.
    I would feel pretty bad taking another machine from you. I also kind of broke the promise of putting the quad Opteron on WCG. I had it running for a few months, then I realized that I had no way to monitor the heath the machine. (with Summer approaching) So I took it off and used it only for things that needed the NUMA. (to preserve the operational life) So that's how it is right now. It's off most of the time, but every once in a while, I'll boot it up to run some scalability testing.

    what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
    would be better for what i have inmind?
    That looks really cheap for a Sandy Bridge blade. (if I'm reading it right) I would imagine that simple high-end desktops (OCed) would be the cheapest and most power-efficient approach for truly distributed tasks that require little communication. The main reason why you would go with multi-socket boards is to get fast bandwidth between the two chips. But I guess that's not the case here.
    Main Machine:
    AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate

    Miscellaneous Workstations for Code-Testing:
    Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •