PDA

View Full Version : Folding@4850


initialised
06-27-2008, 06:11 PM
I borrowed a pair of HIS 4850 512MB GDDR3 :D
Swapped them out for my 2900s, no driver reinstall was required, but a total of three reboots (off, install, boot, reboot, reboot) were needed.

Folding on Crossfire 4850s

http://farm4.static.flickr.com/3113/2616370895_1641168b80_o.jpg

~936 iterations per second with 96% load on GPU1 and 24% on GPU2. All GPU usage can be attributed to F@H since at the desktop I get 0 GPU usage on both cards.

http://farm4.static.flickr.com/3263/2616371117_f10a5b912b_o.jpg

I may be wrong but I believe that the 24% on GPU1 is folding activity since it remains at 24% when the display is closed.

http://farm4.static.flickr.com/3159/2617246396_a8cda93124_o.jpg

However with Crossfire disabled I get ~910-970 iterations per second (image shows 961/97%, 940/96% is typical) 0 iterations per second with 96% GPU .

http://farm4.static.flickr.com/3070/2617298618_826a6ec6bb_o.jpg

So it would appear that the 24% usage on GPU2 is pretty much waste since a single card gives similar performance. However, in Crossfire GPU utilisation and folding performance were more consistent hopefully this will improve with ATi's drivers and future releases of F@H-GPU.

The core doesn't clock to 625MHz as it should under load on either card, when this is fixed in the client expect to see a further 25% when that is fixed.

On my HD2900XT cards in Crossfire I get ~640 iterations per second so that is a 46% increase in folding performance. Also am I right in thinking that one iteration is approximately 1GFLOP?

System used:
E4600 @ 3.2GHz (400x8)
2x1GB DDR2 FleXLC 4-4-4-12 800MHz
Ragimus Formula BIOS: RF-401

lowfat
06-27-2008, 06:17 PM
I don't think this is exactly news...But while we are here what kind of PPD are you getting per card?

bowman
06-27-2008, 06:22 PM
I get over 4200 iter/s on my 8800GTX. Such a shame about the ATI client, the 4870 really ought to be almost twice as fast if not twice as fast. I hear they've got an update 'focused on the 4 series' on the way, let's hope it can really unleash the cards.

Calmatory
06-27-2008, 06:27 PM
I get over 4200 iter/s on my 8800GTX. Such a shame about the ATI client, the 4870 really ought to be almost twice as fast if not twice as fast. I hear they've got an update 'focused on the 4 series' on the way, let's hope it can really unleash the cards.

If OP's guess/whatsoever is anywhere near the truth, 8800GTX should be 4,2 TFLOP card. :D

Guess this thrashes OP's guess, or then the iterations can't be compared for some unknown reason.

Someone enlighten us!

xVeinx
06-27-2008, 06:37 PM
It seems that the client this time around was written using the CUDA API from nvidia, so it seems more natural for the ATI cards to (at least initially) not perform as well. On the other hand, past performance of the 3xxx series wasn't as good with the brook+ version coded with it in mind. As people have noted here and on the F@H forums, the size of the WUs processed and other factors have to be taken into consideration, but it seems that CUDA has made some significant leaps in it's usability and general performance from the state it was in prior to this last update or so. If CUDA or at least Brook+/OpenCL can be made to take advantage of the full architecture of the 3xxx/4xxx series, that would be great...

initialised
06-27-2008, 06:50 PM
I don't think this is exactly news...But while we are here what kind of PPD are you getting per card?No, maybe not news but a few people out there wanted to see folding performance and having it here seems relevant since nVidia has been shouting about Folding@Home (amongst other things) recently. BTW 'Game physics capability' is listed on the back of the box, under 'Features & Benefits' I expect to see more on this soon.

Admins: feel free to move this thread to a more relevant are if appropriate.

The cards have been in my system for less than a day so I don't know PPD yet AND CAN'T ACCESS MY STATS (http://fah-web.stanford.edu/cgi-bin/main.py?qtype=userpage&username=Initialised&teamnum=35947) right now. But looking at the second active folding screen shot you can see that it's doing a GPU WU in <3 hours on a single card so around 8.2 GPU WU per day if my maths is right that is around 182 PPD.

zerazax
06-27-2008, 06:51 PM
Is the latest client optimized / compatible for 4800's yet? Last I heard, they haven't brought 4800's to it yet but I haven't paid attention since

xVeinx
06-27-2008, 06:53 PM
No, maybe not news but a few people out there wanted to see folding performance and having it here seems relevant since nVidia has been shouting about Folding@Home (amongst other things) recently. BTW 'Game physics capability' is listed on the back of the box, under 'Features & Benefits' I expect to see more on this soon.

Admins: feel free to move this thread to a more relevant are if appropriate.

The cards have been in my system for less than a day so I don't know PPD yet AND CAN'T ACCESS MY STATS (http://fah-web.stanford.edu/cgi-bin/main.py?qtype=userpage&username=Initialised&teamnum=35947) right now. But looking at the second active folding screen shot you can see that it's doing a GPU WU in <3 hours on a single card so around 8.2 GPU WU per day if my maths is right that is around 182 PPD.

Are you able to increase this by having two separate clients set to use each core while not in crossfire?

bowman
06-27-2008, 06:54 PM
If OP's guess/whatsoever is anywhere near the truth, 8800GTX should be 4,2 TFLOP card. :D

Guess this thrashes OP's guess, or then the iterations can't be compared for some unknown reason.

Someone enlighten us!

The ATI client doesn't even fully utilize the R670 let alone the RV770. There's an upgrade coming which according to an AMD dev on the f@h forums is 'focused on the 4 series' which one can hope will fix all of this.

The iterations can be compared, but they only reflect the state of the client running on each card, not the architecture/general flunky hotness of the card running it. ;)

initialised
06-27-2008, 07:06 PM
The ATI client doesn't even fully utilize the R670 let alone the RV770. There's an upgrade coming which according to an AMD dev on the f@h forums is 'focused on the 4 series' which one can hope will fix all of this.

The iterations can be compared, but they only reflect the state of the client running on each card, not the architecture/general flunky hotness of the card running it. ;)
Surely there is a test WU that can be used to benchmark a GPU or an equation for translating iterations per second to GFLOPS.

Anyway, what struck me is that the second GPU is apparently smoothing the performance of the first GPU in an application that is not supposed to benefit from Crossfire and surely this can be built to do more than just smooth. Has this been observed on other Crossfire enabled systems?

SKYMTL
06-27-2008, 07:31 PM
I've been told by ATI that this current client is not optimized at all for the new R770 cards. Basically, it means the 4800-series will fold around the level of a HD3870.

Pontos
06-27-2008, 07:33 PM
I'm getting 70% core utilization on a single 3850. Is that normal?. Using the last client from the f@h page.
They surely changed something (in reference to the comment about it being wrote in CUDA), because the last time I tried with the old core, it would just discard all the units with the error "CoreStatus = FFFFFFFF (-1)"

zerazax
06-27-2008, 07:38 PM
Okay just read this:

From Mike Houston (http://foldingforum.org/viewtopic.php?p=31446#p31446)
The code needs to be retuned to scale from the 320 SPs in the 38XX boards to the 800 SPs in the 4850. At the moment, at least on the smaller proteins FAH is currently running, the 4850 is a little faster clock for clock than a 38XX, but the clocks are lower so it's currently a small drop in performance. We are tuning away, but it's going to be a little while until things are tweaked to utilized the 2.5X increase in SPs in the 4850 over the 38XX boards. We went for stability first, and are now just a little way into a heaving tuning pass. 2XXX/3XXX will get faster as well, but the 4XXX has the most headroom.

and here (http://forum.beyond3d.com/showpost.php?p=1183136&postcount=25)

There are a few things going on. We need to get the CPU load down to show the scaling and we need tweaks to the code to get things to launch "wider". But, Amdahl's law comes into play. Until we get the serial code tuned and under control, parallelism doesn't help. We have tuning and optimization in the pipeline, including usage of new compute features. We *are* also still working on 6XX tuning, but 7XX has lots more headroom.

And here (http://forum.beyond3d.com/showpost.php?p=1183199&postcount=30)

As for the CPU intensive side, it's nothing persay about our GPU architecture, but the way the GROMACS code, Brook, CAL, and the kernel drivers all work together. In an upcoming core, we think we have largely dealt with the CPU overheads on the ATI side of the software stack and are working with Stanford, as is Nvidia, on reducing the overheads in the GROMACS core when offloading work to the GPU. On the GPU optimization side, we are working with code that has evolved all the way back to R5XX days. We have done a fair amount of optimization work on the 6XX hardware, the impact you will hopefully see soon in an updated core, and we've just started into 7XX. 7XX offers lots of new features to explore, which means it may take time until we have things fully tuned, but we hope to show some improvement on the small proteins soon and we already show scaling on the larger proteins.

The important thing to remember is that it takes time to tune and optimize in the face of numerical sensitivity. On the ATI side, we've been at this awhile and know that overoptimization and instability can lead to issues that can't be seen for awhile. This is one of the reasons for GPU2 we worked with the Folding@Home guys to insert more robust checks into the core (for both vendors). This does increase CPU overheads, but it also helps to protect the science and better detect when something funky is going on.

The beta has only been out for a short time, so I wouldn't be quick to draw conclusions just yet. But yes, Nvidia has the fastest client, especially on GT200, but we will give them a run for their money. We are ramping up performance and Nvidia has been making nice gains in stability. The good news is that unlike tuning for games and other benchmarks, all this hard work from the vendors (ATI/AMD, Nvidia, and Sony) all goes to help a good cause.


Looks like optimizations aren't in yet and are in beta right now

initialised
06-27-2008, 07:43 PM
I'm getting 70% core utilization on a single 3850. Is that normal?. Using the last client from the f@h page.
They surely changed something (in reference to the comment about it being wrote in CUDA), because the last time I tried with the old core, it would just discard all the units with the error "CoreStatus = FFFFFFFF (-1)"I was initially getting 75% but switched the priority (Configure\advanced0) and it rose to 96%.

zerazax
06-27-2008, 07:53 PM
I don't think the GPU utilization is readiing properly for the shadres anyways since as the quotes from Mike Houston say, they havent even coded it for more than 320 at this moment.

IvanAndreevich
06-27-2008, 11:26 PM
I don't think the GPU utilization is readiing properly for the shadres anyways since as the quotes from Mike Houston say, they havent even coded it for more than 320 at this moment.

They haven't OPTIMIZED for more than 320. This probably means that all 800 are used, but inefficiently.

Aivas47a
06-28-2008, 08:37 PM
I've been told by ATI that this current client is not optimized at all for the new R770 cards. Basically, it means the 4800-series will fold around the level of a HD3870.

That's my experience. I actually get slightly less PPD on a 4850 than on 3870.:down: Hope the clients are better optimized for ATI soon.

SocketMan
06-28-2008, 11:45 PM
Whoever you "borrowed" the card from got ripped off:
mine came with 800 sp and not 480 :eek:


Now seriously I've tried getting my point across many times,one more can't hurt


Take a look at the circled numbers (benchmarks) that is an "older" ATI
gpu2 project.

Now instead of the credit of 30 let's put in the credit of 98 (that's what NV is getting from Stanford for a similar project) our ppd will grow quit a bit (do the math) ATI will
still get less ppd but the difference (compare to NV) would be minimal;)

So it looks like for some reason Stanford is giving more credit to NV, this could be
to attract more people (they have succeeded) or some other reasons ,go figure..

Just like they say:ppd's will change more then a few times..


Daamit is not getting enough credit (not talking ppd's here) for the work they've done over the years, all NV had to do was - show up for dinner, the table was already set.
They (NV) have done some amazing performance tuning over the last couple of months, no question there, but it was ATI/AMD who made it possible for us and NV enable our "pwning" cards do much more than that.
Just wanted to remind people about that, as we all seem to forget about it while chasing higher ppd's :)

Kingcarcas
06-29-2008, 04:26 PM
Daamit is not getting enough credit (not talking ppd's here) for the work they've done over the years, all NV had to do was - show up for dinner, the table was already set.
They (NV) have done some amazing performance tuning over the last couple of months, no question there, but it was ATI/AMD who made it possible for us and NV enable our "pwning" cards do much more than that.
Just wanted to remind people about that, as we all seem to forget about it while chasing higher ppd's :)
It's the idea that we're doing this all for a good cause that stops me from getting psst off :cool: Sitting on my 2000PPD 3870 waiting for real numbers on 4850/70.