New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

**skycrane** · 12-16-2013, 02:41 PM

damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this

for testing purposes.. hehe

what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
would be better for what i have inmind?

**poke349** · 12-17-2013, 10:24 AM

Originally Posted by skycrane

damn it needs to be that fast??? so i would need what a dual 10gbt card in each node. then connect all 8 wires to the switch?
did you get my pm? i havent gotten any offers on anything, maybe they are all to expensive?? but its yours untill i can sell it. then ill let you do some remote login work for your classes, and to see what magic you can work with the NUMA programing when you have access to this

for testing purposes.. hehe

what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
would be better for what i have inmind?

Yeah it kinda does. At least enough to match the internal bandwidth or the socket <-> socket connection. That's the problem when you try to use distributed memory like shared memory. Latencies can be hidden pretty well with HT and good cache locality, but not bandwidth.

FWIW, we had 2 GB/s of just disk bandwidth when we did the 10 trillion digit computation of Pi. Not only was it severely limiting, but the program is specifically optimized for using disk.
There is somewhat of a fundamental problem though: The FFT algorithm requires very high Bisection Bandwidth to run efficiently.
Of course this doesn't exist - even on the best connected super-computers. So the efficiency is extremely poor on them. (even with specialized distributed implementations)

That's not to say I can't find a way to do any better. But I have a full-time job now and I don't have as much time as I used to.

but its yours untill i can sell it.

I would feel pretty bad taking another machine from you.

I also kind of broke the promise of putting the quad Opteron on WCG. I had it running for a few months, then I realized that I had no way to monitor the heath the machine. (with Summer approaching) So I took it off and used it only for things that needed the NUMA. (to preserve the operational life) So that's how it is right now. It's off most of the time, but every once in a while, I'll boot it up to run some scalability testing.

what i really would love doing with it is some HPC work using boinc to really rack up the stats.... do you think this would work better, or a blade center? ive got all the software i need for it, i just need to know what sort of hardware ive got to get. would you be able to do a lil reasearch and tell me what i need for either the c6100 or this http://www.ebay.com/itm/HP-C7000-BLA...91034888783%26
would be better for what i have inmind?

That looks really cheap for a Sandy Bridge blade. (if I'm reading it right) I would imagine that simple high-end desktops (OCed) would be the cheapest and most power-efficient approach for truly distributed tasks that require little communication. The main reason why you would go with multi-socket boards is to get fast bandwidth between the two chips. But I guess that's not the case here.

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

Thread Tools

Search Thread

Rate This Thread

Display

Hybrid View

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions