SuperPi on GPU, were going CUDA

**audiofreak** · 05-27-2008, 08:39 AM

Originally Posted by BenchZowner

Guys, we're just grabbing a opportunity to run a pure number crunching benchmark on the GPU.

Fine, just remember that such a benchmark won't tell you how good your video card will be for running Crysis.

In my opinion, benching just the number crunching part of a GPU is equally insane as benching just the FPU in the CPU.

Simply put CPU has other units which may improve its performance for other uses (for example SSE as used in DivX encoder), and GPU has other units which may limit its performance for other uses (number of ROPs, TMUs, etc, which determine actual game performance).

What I am trying to say is that I am not sure we really need yet another benchmark with relative instead of absolute performance numbers.

What we also don't need is the CUDA mp3 encoder (and Linux only mind you) but NVIDIA still organized a contest.

mp3 encoding is ridiculously fast on a CPU already, disk I/O is the bottleneck -- it would be much better if they organized a contest for x264 encoder on a GPU or even better wrote one themselves and open-sourced it.

**Luka_Aveiro** · 05-27-2008, 08:49 AM

Originally Posted by audiofreak

it would be much better if they organized a contest for x264 encoder on a GPU or even better wrote one themselves and open-sourced it.

Don't know about the open source part, but x264 encoder by cuda seems to be on.

http://www.youtube.com/watch?v=8C_Pj1Ep4nw

**BenchZowner** · 05-27-2008, 08:51 AM

Originally Posted by audiofreak

Fine, just remember that such a benchmark won't tell you how good your video card will be for running Crysis.

That's not my purpose.
If I wanted to test a graphics card's gaming performance, I know how to run and what to run.

Notice the word BENCHING.

Originally Posted by audiofreak

In my opinion, benching just the number crunching part of a GPU is equally insane as benching just the FPU in the CPU.

Once again, BENCHING.
We're talking about programs that we ( overclockers ) use to measure the performance of our overclocked systems in specific apps/things.

Originally Posted by audiofreak

What I am trying to say is that I am not sure we really need yet another benchmark with relative instead of absolute performance numbers.

I repeat

Benching.
We ( overclockers ) want more and we like it.

-- We need applications to take advantage of our GPUs for normal usage, but this is NOT the thread to talk about CUDA & "real-life" usage.

**MrHydes** · 05-27-2008, 01:35 PM

how much can CUDA impact on general performance?

what is the biggest advantage? how will this influence market and tech

evolution in a close future?

cheers and thanks

**conzymaher** · 05-27-2008, 02:08 PM

Well you can see in the tech demo posted above ^

GPUs are animals at video encoding etc

**W1zzard** · 05-27-2008, 02:54 PM

gpus are only good at workloads that can be parallelized (hundreds of parallel threads). if there is a sequential execution flow, gpus can't show their performance and will be slower than cpus

quick example .. imagine a large excel sheet with a number of rows (the money you spent on drinking, partying and getting laid) that you want to sum up.

one way is to go through the rows one by one and add each row to the previous result -> sequential, like it would run on any CPU today, this will take you N steps for N rows (actually N-1 but lets keep it simple).

on a GPU you could parallelize this and launch a large number of threads that add up groups of two rows each first, like (1+2=a, 3+4=b, 5+6=c..), all of those additions are done at the same time in parallel on the gpu in a single step. once that is done you sum up a+b=a1, c+d=a2 etc... repeat until you have only two numbers left to add together and you get the final result. in total this will take you log2(N) steps. (e.g. for 256 rows -> log2(256) = 8 steps only)

for a small number of rows there won't be much difference and the higher clock speed of the CPU will still outweigh small gains. but once you increase the number of rows you can clearly see what a huge difference this make. (yes this is simplified, you do not have an infinite amount of execution units)

however, note how much more complex the second example is. most programmers today have coded for their whole life like example 1. now they are supposed to switch to example 2...

**Luka_Aveiro** · 05-27-2008, 03:38 PM

Originally Posted by W1zzard

however, note how much more complex the second example is. most programmers today have coded for their whole life like example 1. now they are seduced to switch to example 2...

Fixed

Loved the post, W1zzard really enlighten.

**Seraphiel** · 05-27-2008, 03:54 PM

The benchmark side of this is not something that really impresses me (just another benchmark). However, the potential of the calculation speed difference, is something that does impress me. Can CUDA be used to factor numbers, with the promise of faster performance than a CPU?

**Charles Wirth** · 05-27-2008, 04:04 PM

Michael, though I am not up to speed on tweaks needed to compile correctly there are two people working on getting me the assistance to get it done.

As to the name, SuperPi 1.6 CUDA GPU

Is there way to make a universal binary for both manufactures? Larrabe should be a GPGPU as well.

**W1zzard** · 05-28-2008, 01:01 AM

Originally Posted by FUGGER

there are two people working on getting me the assistance to get it done.

if those are somehow affiliated with nvidia maybe call it "techdemo"?

**[XC] riptide** · 05-28-2008, 01:32 AM

If anyone is more interested on the general use of GPU's you could follow up on the feeds here. http://www.gpgpu.org/

**saaya** · 05-28-2008, 08:18 AM

Originally Posted by W1zzard

yep, even opengl gpgpu is quite easy to do. but ctm/cuda, especially ctm give you much more options to improve performance and flexibility

hmmm how much of a boost at what expense though? coding for ctm and cuda is a lot more complex then coding for gpgpu directx or ogl, right?

Originally Posted by W1zzard

magine a large excel sheet with a number of rows (the money you spent on drinking, partying and getting laid) that you want to sum up.

you keep a record of all that?

heheheh

thanks for the example, very interesting

so basically anything that is coded to run on a server cluster should work well on a gpu, right? so every application that has to do with audio/video processing, filtering, compressing etc, should work well on gpus then, right? im curious when we will see a gpu divx codec

**EnJoY** · 05-28-2008, 08:36 AM

Originally Posted by saaya

you keep a record of all that?

heheheh

Don't we all?

**Planet** · 05-28-2008, 08:45 AM

Originally Posted by EnJoY

Don't we all?

Its call previous orders from newegg lol. To bad thats only a quarter of all the hardware ive spent money on.

**Charles Wirth** · 05-28-2008, 09:00 AM

My guys are not with Nvidia (doing the work), but I do have developer assistance from Nvidia.

Sascha one could assume but the gains are random but they are usually exponential of 8x beyond 100x for the examples given.

**JMKS** · 05-28-2008, 10:16 AM

Originally Posted by saaya

so basically anything that is coded to run on a server cluster should work well on a gpu, right?

First requirement is - of course - multithreading; and yes - MULTI, not 2 or 4 as we are happy when playing with CPUs.
Second requirement - recode program to be doable on GPU and don't lose that ~30x "possible" performance boost while recoding

.
I'm far from being accurate or anything

, that's just a simple explanation as I see it.

**Yakyb** · 05-28-2008, 11:28 AM

Originally Posted by audiofreak

[b]

What we also don't need is the CUDA mp3 encoder (and Linux only mind you) but NVIDIA still organized a contest.

mp3 encoding is ridiculously fast on a CPU already, disk I/O is the bottleneck -- it would be much better if they organized a contest for x264 encoder on a GPU or even better wrote one themselves and open-sourced it.

i cant understand that statement how can you say we dont need this?

i would love a program i could throw all my mp3s at and have it recode, normalise esspecially if i didn't have to dedicate a pc for a day to do it.

as much as they can do to possibly improve PC usage is great (there is already a cuda based h.264 encoder)

this is the most exciting thing to come on to xtreme news in a long time have fun Fugger (and yes open source would be great!!)

**saaya** · 05-29-2008, 08:28 AM

well hes right, we dont need this, but its still interesting

if it turns out to be a pointless benchmark that doesnt really scale realisitcally, then it will most likely be forgotten pretty soon...
but not necessarily... it might be fun

things dont need to make sense to be fun

**MuffinFlavored** · 05-29-2008, 09:28 AM

The problem I have found is that the algorithm SuperPi uses (Gauss-Legendre) can not be very well mutlithreaded.
The way it works is, every time an "iteration" is performed, more and more numbers are returned. So, each result is dependent on the previous result.

There is probably an algorithm out there that is very good for mutlithreading.

I think something like wPrime being ported to GPGPU code would be good, because the workload can be distributed to many threads.
If you want to calculate 100 prime numbers, have each thread calculate 1 prime number. (assuming there are 100 threads)
If you want to do 1000, have each thread calculate 10 prime numbers. (assuming there are 100 threads)
The work load is able to be distributed.

This might just be a tech demo for nVIDIA.

**initialised** · 05-29-2008, 09:41 AM

Nice one, shame I haven't got my G92 any more, any chance that this time round it can have a continuous mode for stress testing.

**RaZz!** · 05-29-2008, 10:53 AM

Originally Posted by GoThr3k

[...]

And ATI has something like CUDA, called CTM (close to metal), to bad you have to program in assembler with CTM, in CUDA you can program in C & C++

but then, ati did something wrong as i never heard anything of CTM. i know that there's a folding@home client for ati gpus, but ati never caused sensation with this.
and now nvidia teases customers with marketing regarding their CUDA environment.

seems like ati somehow missed the train to advertise their feature properly?

**rozzyroz** · 05-31-2008, 12:12 AM

will there be a way to test an individual core on a gpu?

on a side note, it would be nice if there was a common compiler for all gpu's (ati's, nvidia's and intels up and comming one), but that would take these guys working together...

**initialised** · 05-31-2008, 03:59 PM

If there is a point to this it is then it is to let the GPU do the maths that the GPU does best (massively parallel, DSP, Video, Audio, CAD etc). Offload whatever can be off loaded to the GPU while letting the CPU do what it does best.

**MuffinFlavored** · 05-31-2008, 04:45 PM

Originally Posted by RaZz!

but then, ati did something wrong as i never heard anything of CTM. i know that there's a folding@home client for ati gpus, but ati never caused sensation with this.
and now nvidia teases customers with marketing regarding their CUDA environment.

seems like ati somehow missed the train to advertise their feature properly?

This is true.
I just got an e-mail already from eVGA saying that you should join there Folding@Home team.

I base this on absolutely nothing, but didn't ATI "overly" advertise R600?

**Lestat** · 05-31-2008, 05:14 PM

I just got an e-mail already from eVGA, Folding@Home on nVIDIA.

and what did that email say?

Thread: SuperPi on GPU, were going CUDA

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions