New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

**NapalmV5** · 08-23-2011, 03:25 PM

thanks.. cant wait till you release the version that takes advantage of storage setup

then gpu.. you got to.. the whole system: cpu/ram/gpu/storage

if that dont bring down my all liquid cooled system then nothing will

**poke349** · 08-23-2011, 05:07 PM

Originally Posted by NapalmV5

thanks.. cant wait till you release the version that takes advantage of storage setup

then gpu.. you got to.. the whole system: cpu/ram/gpu/storage

if that dont bring down my all liquid cooled system then nothing will

I'm not sure exactly what you mean by "storage setup". If you were refering to the raid that I mentioned a few posts back, the current version already has raid-0 support. (You just have to find it

.)
If you go into Custom Compute (option 3), change the number of digits (option 3) to something large (> 100,000,000 digits).
Then select Computation Mode (option 8), and select one of the swap modes.
Once you're in a swap mode, a new option will appear: "Swap Disks" (option 9)
You can now specify how many drives you wish to use as well as the paths. The program will use all the paths you enter as raid-0. So the combined disk performance will the number of drives you enter times the speed of the slowest drive.

The hybrid raid0+3 that I mentioned in the earlier post will be in v0.6.1. It's basically done and it works - even through "simulated" hard drive failures.
The only thing missing right now is the ability to rebuild a dead drive. Since I haven't started writing this feature yet, I'm probably going to put it off to v0.6.2.

As for GPU, it's unlikely that it will help much (if at all). Computing Pi does not fit the GPU programming model very well. There isn't much parallelism that is exploitable with a GPU.
The other major problem is the GPU <-> CPU memory bottleneck. It doesn't matter how powerful your GPU is if you can't get data to and from memory. This is already a problem right now - on a CPU.

Anyways, more v0.6.1 updates:
Again I still can't provide an ETA since the program is in the middle of a partial re-write and there is still a lot of code that needs to be updated.

Hybrid Raid-0+3: Nested raid 0+3 to provide fault-tolerance.
New Stress-Tester: I'm replacing the current stress-tester with four "component testers". Each of the 4 testers correspond to one of the 4 major algorithms used by y-cruncher for large multiplication.
Detailed Status Output: The progress indicator will show more than just a %. It will show more of the sub-steps of the computation. (though I don't expect everyone to easily figure out what it means)
This feature has always been in the program, but it was always disabled in public releases. I'll be enabling it starting from v0.6.1.
FMA4 + XOP: This is also done and tested via emulation*. However, I don't expect the speedup to be significant except for large computations above 50 billion digits when there is enough disk bandwidth to become CPU-bound.
Native 64-bit Arithmetic: The small arithmetic library that I use has been completely rewritten with native support for 64-bit arithmetic. The speedup isn't very noticeable, but I had to do this to get rid of the old library which was incompatible with most of the new code.
Each of these algorithms vary in what they stress the most. On Sandy Bridge (with AVX enabled), one of these tests runs 5-10C hotter than the current stress-tester.
New Base Conversion Algorithm: (50% faster) - I mentioned this a few posts back. I still need to implement it for disk.
New Multiplication Algorithm: This will be used for computations larger than 50 billion digits. It is heavily vectorized and uses AVX, FMA4, and XOP.
This new algorithm is the one that will run 5-10C hotter than the current version of y-cruncher.
Various other speedups. Computing e will be slightly faster. I may or may not get around to optimizing some of the other constants.

*I don't plan on getting a bulldozer machine unless it's a lot better than Sandy Bridge. (and the leaked ES benchmarks are showing the opposite...

)
So I'm gonna need help from someone to do some (real) tests before I can add it to the v0.6.1 release.

Things I'll be removing:

The current stress-tester. I may keep it as a sub-option to the new stress-tester.
Basic Swap Mode. This mode was useful before I added Advanced Swap Mode in v0.5.2. Now it's useless code that's bloating the program. So I'm getting rid of it completely.

Possible Features: (probably not for v0.6.1)

Swap computation in benchmark mode (0). This was suggested by Massman for HWBOT.
More detailed output with timestamps. Also suggested by Massman. If I implement this, all timestamps will also be printed into the validation file for easier verification.
Denser swap-mode checkpoints. (more checkpoints to reduce the time between them)

Thread: New Multi-Threaded Pi Program - Faster than SuperPi and PiFast

Thread Tools

Search Thread

Rate This Thread

Display

Hybrid View

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions