CUDA Factorial Benchmark from TOC.ru

**poke349** · 08-30-2009, 03:38 PM

I wonder what algorithm it uses...

750000! can be done well under a second with Mathematica 6.0 single-threaded on any i7.

Yet it takes 1+ minutes using CUDA?

*Sorry for being off topic.

**[XC] riptide** · 08-30-2009, 03:49 PM

Originally Posted by poke349

I wonder what algorithm it uses...

750000! can be done well under a second with Mathematica 6.0 single-threaded on any i7.

Yet it takes 1+ minutes using CUDA?

*Sorry for being off topic.

Have you got mathematica? Can you show us the result? We already know there are insanely fast algorithms and approximations that can be used http://www.luschny.de/math/factorial/Benchmark.html .... but we also know there are many algorithms for calculation Pi to 1million places.. all with diff times.

**poke349** · 08-30-2009, 03:59 PM

Originally Posted by [XC] riptide

Have you got mathematica? Can you show us the result? We already know there are insanely fast algorithms and approximations that can be used http://www.luschny.de/math/factorial/Benchmark.html .... but we also know there are many algorithms for calculation Pi to 1million places.. all with diff times.

Code:

Timing[750000!;]
{0.405, Null}

i7 920 @ 3.5 GHz

Mathematica can't multi-thread high-precision arithmetic. So there's no way see how much better it can go.

On the other hand... I can mod the program in my siggy to do multi-threaded factorials (using a sub-optimal algorithm)... And I'm certain I can beat 0.405 seconds.

But again... off topic.

**[XC] riptide** · 08-30-2009, 04:05 PM

Originally Posted by poke349

Code:

Timing[750000!;]
{0.405, Null}

i7 920 @ 3.5 GHz

Can you export all the digits? I mean does it deliver all the digits? There should be 4080578 digits.

**poke349** · 08-30-2009, 04:10 PM

Originally Posted by [XC] riptide

Can you export all the digits? I mean does it deliver all the digits? There should be 4080578 digits.

That run of 0.405 seconds did only binary digits. Printing it out in decimal requires an expensive conversion.

Code:

750000!
\!\(\*
TagBox[
RowBox[{"2646896442810456334473283390526976189442958803731348335812907\
9334567747113504796887022327350144664381155203676817108918748679291696\
6443372148573575453227479621798163102781469763477812875007762400556456\
3838296982600913849826449820515029294880777450379489322119687361868491\
51503071358153700424169800424565", 
RowBox[{"<<", "4079973", ">>"}], 
    "00000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000"}],
Short[#, 5]& ]\)

Took roughly 6 seconds including conversion.

**Talonman** · 09-06-2009, 12:58 PM

(Thanks for the Benchmark link.)

I have a question. When I ran the CUDA Factorial Benchmark program, I set it up for 900,000 and 4 threads @ 3.81GHz on my Q6600.
My GPU is listed as GTX 295, and is running 2.5 times faster than my CPU. Just for the record, that GPU test is only calculating on 1/2 of my 295, correct?
(Basically, running on the equivalent of a single GTX 275.)

CPU takes 3 minutes, 20.828s
GPU takes 1 minute, 21.580s
2.5 x faster...

Note that if I set the benchmark for 999,000 my GPU moves up to 3.1 X as fast as my CPU. The bigger the order, the more the GPU appears to gain.
CPU takes 4 minutes, 23.231s
GPU takes 1 minute, 27.306s
Checksum: 578712543720173939

I would love to find the app, that checks to see if you have more than 1 GPU in your system, then use them all.
3 instances of folding, can load up 3 GPU's... (But that is 3 separate programs running...)
A game with PhysX can run graphics in SLI, and PhysX on another... (Still, that's Graphics on 2 or more, and PhysX on the other...)
But still not 1 benchmark program to use all GPU's in your system.

To be fair, I think CUDA apps couldn't grab cards in SLI until a recent release... I believe?
I would love to see this CUDA Factorial benchmark, grab all available GPU's in your system, in a later update.

If 1/2 of my 295 can be 3 X as fast as my Q6600, I have to wonder how many times faster it would be with both my 295 and 280 in on the deal...

I imagine DX11 will also use just 1 GPU for video transcoding, not all GPU's in your system?

Thread: CUDA Factorial Benchmark from TOC.ru

Thread Tools

Search Thread

Rate This Thread

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions