A Fermi benchmark of sorts...

**Talonman** · 10-18-2009, 01:31 AM

A Fermi benchmark of sorts...

If you do consider folding a benchmark, we have had a Fermi report if you haven't heard.

http://www.evga.com/forums/tm.asp?m=...1&key=�

I make these bold assumptions about Fermi.

1) It will use about 320 watts when loaded up.

2) (1) Fermi GPU can produce 40K PPD, or more.

3) Fermi will fold near or exceed 4X the speed of the 200 Series GPU's.
What now takes 60 minutes to produce, will soon be done in 15 minutes.

Fermi will also have 1.5GB of memory, and is made up of 3.0 billion transistors and features 512 CUDA processing cores organized into 16 SMs (Streaming Multiprocessors) of 32 cores each.

**trinibwoy** · 10-18-2009, 04:25 AM

http://foldingforum.org/viewtopic.ph...=11717#p114890

btw, Fermi only has 16 cores but with 32 shaders each.

**DAK1640** · 10-18-2009, 05:50 AM

Omg...:d

**OldChap** · 10-18-2009, 06:47 AM

I've not seen pics yet but what connections will this have? pcie =75w, 6pin = 75w, 8pin = 150w, one of each = 300watts. Going to need to upgrade psu's for this?

**Talonman** · 10-18-2009, 06:55 AM

Originally Posted by trinibwoy

http://foldingforum.org/viewtopic.ph...=11717#p114890

btw, Fermi only has 16 cores but with 32 shaders each.

Thanks!

Fixed:

Fermi will also have 1.5GB of memory, and is made up of 3.0 billion transistors and features 512 CUDA processing cores organized into 16 SMs (Streaming Multiprocessors) of 32 cores each.

**W1zzard** · 10-18-2009, 07:00 AM

there's no proof that this is indeed fermi folding .. a lot of information that doesnt add up.. just wishful thinking by the nv fanboys

**jfromeo** · 10-18-2009, 07:14 AM

Originally Posted by OldChap

I've not seen pics yet but what connections will this have? pcie =75w, 6pin = 75w, 8pin = 150w, one of each = 300watts. Going to need to upgrade psu's for this?

PCI-e 2.0 is 150W, 75W is 1.x.

**Olivon** · 10-18-2009, 07:32 AM

Originally Posted by W1zzard

there's no proof that this is indeed fermi folding .. a lot of information that doesnt add up.. just wishful thinking by the nv fanboys

^Right

1) It will use about 320 watts when loaded up.

It seems weird, no

**Talonman** · 10-18-2009, 07:36 AM

Originally Posted by W1zzard

there's no proof that this is indeed fermi folding .. a lot of information that doesnt add up.. just wishful thinking by the nv fanboys

Wishful thinking that will most likely turn out to be fact.

Do you not think (1) Fermi GPU can produce 40K PPD, or more?

BTW - His 24hr PPD is up to 344,845.

http://folding.extremeoverclocking.c...hp?s=&u=477950

344,845 - 10K PPD for the (4) i7 cores= 334,845.

334,845 / 7 Fermi = 49,263. Wow!!

It might actually be closer to 50K PPD per Fermi?

**Jamesrt2004** · 10-18-2009, 08:34 AM

Originally Posted by W1zzard

there's no proof that this is indeed fermi folding .. a lot of information that doesnt add up.. just wishful thinking by the nv fanboys

+1..

**Talonman** · 10-18-2009, 08:55 AM

Originally Posted by jfromeo

PCI-e 2.0 is 150W, 75W is 1.x.

A 295's TDP is 289 Watts, and can produce 18K PPD.

If a Fermi's TDP is about 320 watts, not to bad if it can produce 50K PPD...

It would take 2.777 295's using 802 watts to do the same.

Actually here: http://www.brightsideofnews.com/news...not-300w!.aspx

"In case of upcoming high-memory configurations nVidia Tesla, Quadro and GeForce cards, the company had to install a 6-pin and an 8-pin connector, getting 300W of power to play with. However, this was a precautionary measure. According to information we have at hand, the GT300 board [yeah, featuring "Fermi" CUDA architecture] barely missed 225W cut-off for the 6+6 pin if the board comes with 6GB of GDDR5 memory."

The power it takes to run (1) Fermi might actually be less than what I am guessing? Near 225W sounds outstanding...

**NeedMoMegaHurtZ** · 10-18-2009, 09:05 AM

2) (1) Fermi GPU can produce 40K PPD, or more.

3) Fermi will fold near or exceed 4X the speed of the 200 Series GPU's.
What now takes 60 minutes to produce, will soon be done in 15 minutes.

While this is interesing, you can't compare apples to oranges.

He is using GPU3 beta client, not the standard GPU2 client.

How much do today's cards generate with that GPU3 client? He's running 200 instances of GPU3. I think GPU2 only allows for 8 ?

How big are GPU3 workunits?

I don't see how a direct comparison is possible? Especially when you don't even know what he is using ?

**Talonman** · 10-18-2009, 09:08 AM

But we do know is PPD!

And we do know his rig is (1) i7 CPU using 4 cores, and (7) Fermi.

I am thinking the BETA Client lets you run more instances of folding, but probably doesn't give a performance increase all by itself.
It would probably need Fermi to take advantage of that new feature.

**Chumbucket843** · 10-18-2009, 10:13 AM

Originally Posted by Talonman

A Fermi benchmark of sorts...

If you do consider folding a benchmark, we have had a Fermi report if you haven't heard.

http://www.evga.com/forums/tm.asp?m=...1&key=�

I make these bold assumptions about Fermi.

1) It will use about 320 watts when loaded up.

PCI SIG only allows 300 watts.

2) (1) Fermi GPU can produce 40K PPD, or more.

even with 48Kb of cache it still wont get that.

3) Fermi will fold near or exceed 4X the speed of the 200 Series GPU's.
What now takes 60 minutes to produce, will soon be done in 15 minutes.

Fermi will also have 1.5GB of memory, and have of 16 cores which contain 32 shaders each.

he is folding on over 200 processors. there is no way that is fermi. its just gpu3 beta. 31 gpu's and an i7. g80 and up are MIMD arrays of SIMDs. it sounds like it is a current gpu because he is getting 700 points per SM. that sounds like g92. i find it ironic that if there are 248 active cpus that means exactly 31 gpu's.

**Talonman** · 10-18-2009, 10:22 AM

ORIGINAL: FahMan

Hi Folks!
Thanks for your interesting in my humble contribution in team EVGA.
Some of you wanted to know about my hardware so please check:

www.fahmanfolding.webs.com

As you can see on photos this single PC is able to produce more then 200kppd (around 250 with extreme overclocking).

Some additional explanations:
I was forced to use usb monitor as GPUs haven't any video output (this engineering samples of Fermi are Tesla like, but they have 1.5GB of memory each like GT300 will).
Because of the new MIMD architecture (they have 32 clusters of 16 shaders) i was not able to load them at 100% in any other way but to launch 1 F@H client per cluster and per card. Every client is GPU3 core Beta (Open MM library). I supose it is much more efficient then previous GPU2. In addition they need very little memory to run. Having 16GB of DDR3 and using Windows 7 Enterprise I've managed to run 200 instances of F@H GPU and 4 CPU (i7 processor HT off). The 7th card is not fully loaded. This could also be an issue with EVGA X58 mobo.
I use together two Silverstone Strider PSU's 1500W each that is probably too much but
now I experiment with overclocking (cards are factory unlocked). Max power consumption
I've noticed was 2400W.
The whole system is cooled by my own construction of liquid CO2 which is heavy an inconvenient and I have to supply a new cylinder every 5 days.
That's It. I wish everybody to improve your folding speed using GT300 soon!
Just keep folding...

"(they have 32 clusters of 16 shaders) i was not able to load them at 100% in any other way but to launch 1 F@H client per cluster and per card."

1 folding instance per cluster, is 32 per card.

32 x 7 = 224 (GPU folding instances.)

224 - 248 = 24

The 24 is probably just from experimenting? Not sure.

You may be right on the power, I am starting to think it does take less than 320 watts when loaded up.

**Pixie_marj** · 10-18-2009, 10:32 AM

is it 32 clusters of 16 shaders or 16 clusters or 32 shaders?

Which is it?

And when did benchmarks become wildly theoretical?

**owcraftsman** · 10-18-2009, 10:41 AM

I found fahman's real setup!

a little levity.
http://www.evga.com/forums/tm.asp?m=100973448

**Talonman** · 10-18-2009, 10:47 AM

Originally Posted by Pixie_marj

is it 32 clusters of 16 shaders or 16 clusters or 32 shaders?

Which is it?

Not sure now...

According to this, Fermi is 16 SMs which contain 32 "CUDA cores":

"http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&p=4

"The control hierarchy is similar to the GT200, with a global scheduler that issues work to each SM. Previously the global scheduler (and hence the GPU) could only have a single kernel in flight. Nvidia’s newer scheduler can maintain state for up to 16 different kernels, one per SM. Each SM runs a single kernel, but the ability to keep multiple kernels in flight increases utilization especially when one kernel begins to finish and has fewer blocks left. More importantly, assigning a kernel per core means that smaller kernels can be efficiently dispatched to the GPU.

The latency for context switch between kernels has also been reduced by 10X to around 25 microseconds, this delay is largely due to cleaning up the state that each kernel must track – such as TLBs, dirty data in caches, registers, shared memory and the other kernel context."

Now I think he is running 16 folding instances, 1 per each SM calculating on 32 CUDA Cores.

But 7 * 16 is only 112.

I don't have a grip on this part yet...
"Fermi SM Overview

The cores (or SMs) in Fermi have been tremendously beefed up and resources have been shifted around substantially. At a high level, the execution resources have quadrupled, but are shared between two scalar execution pipelines; each pipeline has twice the execution resources (or vector lanes) of the GT200 cores. It’s essential to note that while the two pipelines can execute two warps from the same thread block, they are not superscalar in the sense of a CPU. The memory pipeline has also been brought into the core, whereas previously each memory pipeline was shared between three cores. More importantly, the shared memory has been folded into a (semi-coherent) L1 data cache, giving each core a real memory hierarchy.

In many respects, these changes are conceptually reminiscent of the improvements between Niagara I and II. Niagara II doubled the thread count to 8, but each set of 4 threads had a dedicated scheduler and integer (ALU) pipeline, compared to dedicated ALUs and floating point (FPU) pipelines for Fermi. All 8 threads in a Niagara II core shared memory pipelines, just like Fermi, and FPUs, which are analogous to the special function units (SFU).

To utilize those execution resources, the number of threads in flight for each Fermi core has increased by 50% to 1536, spread across 8 concurrent thread blocks. This means that to fully utilize one of the new cores, 192 threads per block are required up from 128 in GT200. As with the current generation, execution within an SM occurs at the granularity of a warp, which is a set of 32 threads. With the increase in threads, each core can have up to 48 warps in-flight at once.

As with all Nvidia DX10 hardware, Fermi has several different clock domains in each core – principally the regular clock for front-end and scheduling, and then the fast clock for actual execution units that runs at twice the regular clock."

I still don't know the exact Max number of folding instances you could load up on 1 Fermi GPU.
Any help would be appreciated on the subject.

Originally Posted by Pixie_marj

And when did benchmarks become wildly theoretical?

Since we are trying to get benchmarks on a GPU still under NDA, but is currently being used, and we get to see the production numbers.

**trinibwoy** · 10-18-2009, 11:26 AM

Originally Posted by Talonman

Thanks!

Did you even read the link?

**Talonman** · 10-18-2009, 01:03 PM

Yes...

But I don't think they believe the guy.

I still don't know the exact Max number of folding instances you could load up on 1 Fermi GPU.
Any help would be appreciated on the subject. What is your opinion trinibwoy?

He is now up to 382,179 in 24 hours!
- 10K for the CPU

372,179 / (7) Fermi = 53,168 PPD per each Fermi.

I simply have a hard time believing a guy would show up with so much processing power, then give us all that info about his system, with it flat out being a lie.
It just goes against human nature. He would naturally want to tell his Folding Team about the rig he is running.

Or, are we to believe it is a bunch of guys, that all started folding together, and just picked the name FahMan? No way!

I believe he typed the truth as best he knew it. That simple.

**trinibwoy** · 10-18-2009, 02:50 PM

Originally Posted by Talonman

What is your opinion trinibwoy?

Don't have one. We have one guy claiming he's busting up FaH on some Fermis and another guy saying he's got confirmation that it's a lie.

My only question is why somebody with that kind of PPD throughput would be motivated to lie about running Fermis. My first thought was that it was the Nvidia hype machine at work but that would be really out there.

**Talonman** · 10-18-2009, 03:05 PM

I agree. Too far out there for it to be the nVidia hype machine...

Reading here...

http://www.pcper.com/article.php?aid=789

(1) Fermi has 16 SMs, with 32 CUDA Cores each.

The SMs (streaming multiprocessors) execute threads in groups of 32 called “warps” that help to improve efficiency of the GPU.

The GPU is made up of 3.0 billion transistors and features 512 CUDA processing cores organized into 16 streaming multiprocessors of 32 cores each.

If he had the BETA folding program, and could load up more than one instance on a Fermi, does this mean:

The groups of 32 Threads called “warps” x the 16 SMs = 512 total threads per (1) Fermi, all processing on 512 CUDA cores?
What does that say to how many folding instances he could load up on 1 GPU, if he wanted to go for Max load?

**Chumbucket843** · 10-18-2009, 03:26 PM

warps handle WAY more threads than that. fermi has about 24,000 and gt200 handles about 30,000. the gpu cant run a program on each thread like a cpu though. the threads are designed to hide latency. yes, a single folding instance has multiple threads. thats why they are so much faster than cpu's.there are many many calculations to do that can be done independently so they run very well on gpu's.

**Talonman** · 10-18-2009, 03:31 PM

Sorry, I did edit my post a bit.

Thanks for the info.

Do you have an opinion Chumbucket843 on how many GPU folding instances (1) Fermi could load up in theory?

This seems to be a big point of yours:

Originally Posted by Chumbucket843

He is folding on over 200 processors. there is no way that is fermi. its just gpu3 beta. 31 gpu's and an i7. g80 and up are MIMD arrays of SIMDs. it sounds like it is a current gpu because he is getting 700 points per SM. that sounds like g92. i find it ironic that if there are 248 active cpus that means exactly 31 gpu's.

But I don't understand why the 248 is such a big deal.

That is only about 35 folding instances per GPU. I thought Fermi could do that, no problem?

**trinibwoy** · 10-18-2009, 04:46 PM

Originally Posted by Talonman

Do you have an opinion Chumbucket843 on how many GPU folding instances (1) Fermi could load up in theory?

Fermi can run 16 compute kernels in parallel, one per core.

Thread: A Fermi benchmark of sorts...

Thread Tools

Search Thread

Rate This Thread

Display

A Fermi benchmark of sorts...

Bookmarks

Bookmarks

Posting Permissions