5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)

**jaredpace** · 09-28-2009, 08:17 AM

Originally Posted by JimmyH

750core 530mem
http://farm3.static.flickr.com/2549/...bd963579_b.jpg

750core 900mem
http://farm3.static.flickr.com/2662/...2c401421_b.jpg

Thanks, nice work, JimmyH. Don't suppose you could run that again on your HD4870 at 750/574, 750/618 & 750/662. Those speeds represent the compute/bandwidth ratios of 850/1300, 850/1400, and 850/1500 on the HD5870's memory. Thank you

**Mr. K6** · 09-28-2009, 08:59 AM

Originally Posted by jaredpace

When they increased the memory by 100mhz they got the same boost as when they increased the core by 65mhz.

You have to consider the values as percents, not the raw MHz. The 65MHz increase on the core is ~7.6% while the 100MHz increase on the RAM is ~8.3% - both very close, and the increase in FPS reflects this. Because increasing either the core or the RAM frequency nets noticeable performance gains, it appears that neither is bottlenecking the other (at the frequencies tested in the game tested). I'd imagine that in older games, the RAM will become more of a bottleneck to the FPS rate than would the core frequency.

**jaredpace** · 09-28-2009, 09:15 AM

Okay Mr. K6 - good point. They've basically raised the core & memory 8%. They get the same increase in framerates from either +8% core or +8% memory. To me this means that the 5870 could use all the more core or memory speed you could give it.

This card is going to give very good results to anyone who overclocks the

out it.

**Mr. K6** · 09-28-2009, 09:36 AM

Originally Posted by jaredpace

Okay Mr. K6 - good point. They've basically raised the core & memory 8%. They get the same increase in framerates from either +8% core or +8% memory. To me this means that the 5870 could use all the more core or memory speed you could give it.

This card is going to give very good results to anyone who overclocks the

out it.

Oh most definitely

. I'm very much anticipating a stable release of Rivatuner so I can get to work on the voltages

. As it stands, AMD GPU Clock Tool isn't working with my card (drivers?), so I currently can't go above the limits of CCC (unless I'm using the program incorrectly). Anyway, with a little voltage, it seems these cores will easily do 1GHz+ on the GPU. I'd be interested to see not only the performance gains from this speed, but also if a memory bottleneck finally shows itself

**Lightman** · 09-28-2009, 09:38 AM

Originally Posted by iandh

I'm benching my unlocked 720BE against a guy with that has a 5870 and i7 over at OCN... results are less than exciting (for me at least)

I just ran Crysis Warhead bench 0.33 (avalanche flythrough)

1920 0xAA

i7 @ 4.2Ghz Min: 29.55 Max: 46.81 Avg: 37.04

Phenom II x4 @ 3.6Ghz Min: 25.12 Max: 36.73 Avg: 30.99

1920 8xAA

i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

Phenom II x4 @ 3.6Ghz Min: 18.24 Max: 29.33 Avg: 23.83

note: Unlock from X3 to X4 gave me a slight avg fps increase, and almost 2fps increase in min fps.

It appears since the min FPS numbers are all very close, they are revealing spots where the bench is GPU limited, and the Max FPS numbers are quite different, showing spots where the bench is CPU limited

Anyone who owns a 5870 and wants to add benches into the mix is welcome... I'll run the same bench if possible on my system and we can see what can be worked out about this. OBVIOUSLY a 3.6Ghz Phenom II is no match for a 4.2Ghz i7, but I would have though that the results would have been much more GPU limited in this bench.

OK here is mine on:
PhII @3750MHz/2500MHz/1000MHz (core/nb/mem)
HD5870 default
DX10, Enth., 64bit on Vista x64

DirectX 10 ENTHUSIAST 3X @ Map: avalanche @ 0 1920 x 1200 AA 0x
==> Framerate [ Min: 15.05 Max: 46.67 Avg: 32.30 ]

**iandh** · 09-28-2009, 03:15 PM

Originally Posted by zalbard

Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/...d.php?t=235181, thanks!

This thread that we are now in is about the 5870's internal memory bottlenecking the card's performance.

The thread I purposely made seperate was to investigate CPU bottlenecking of the 5870 as a whole.

Originally Posted by Ashraf

Done.

At your discretion of course as a moderator, but I think maybe these threads should be un-merged. They were for two totally different topics and merging them will most likely prevent the original goal of this thread's OP from being reached.

edit: Ashraf edited thread title to "5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)" to reflect two different topics being discussed within thread.

Thanks!

**Chumbucket843** · 09-28-2009, 03:27 PM

i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.

_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s

L1 to L2 bandwidth 384GB/s 435GB/s

**~~purecain~~** · 09-28-2009, 03:50 PM

please unmerge the thread... we cant have a coherant discussion if everythread about the 5870 gets merged... meh....

**iandh** · 09-28-2009, 04:56 PM

Originally Posted by purecain

please unmerge the thread... we cant have a coherant discussion if everythread about the 5870 gets merged... meh....

Yeah, it's pretty screwed now as far as coherency goes. I wish the person who reported the threads had bothered to read them before clicking submit... just because two threads have a similar title doesn't mean they are discussing the same exact subject. I can understand Ashraf's mistake because he probably just was going down the job queue and saw the thread titles, then hit "merge threads".

I asked Ashraf and he said that there isn't really an "unmerge" option, so I basically I'd have to start a new empty thread, and then delete all the posts from my old thread that are now in this thread.

**hurleybird** · 09-28-2009, 08:58 PM

Got my 5870 today, but I can't seem to get the GPUclock utility to work. I'm using the RC7 drivers leaked from MSI, maybe that's the problem...

**saaya** · 09-29-2009, 12:08 AM

Originally Posted by iandh

i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46

those numbers make no sense, idential max fps, but lower min fps... yet he still gets the same av fps? 0_o

this only makes sense if the min fps is REALLY short, like it drops to 15fps once and thats it...

**Astennu** · 09-29-2009, 01:53 AM

Originally Posted by demonkevy666

I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

Its depents on the game and the driver you can see it like this:

So if instructions can be grouped together you can have quite a performance boost. Down side is if you cant group instructions you only have 1/4e or 1/5e of the performance.

I dont know if its true but i have heard the AMD compiler does a quite good job. Grouping up to 3-4 of them most of the time.

But if you have a heavy nVidia optimised game you can have lower value's and bad shader performance. (you might it 1-3 then)

Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.

The RV870 core is still new. And i think AMD could optimize the scheduling of the threads a bit better so you can keep those 1600 alu's fed with data. It would be nice if there was a way to see the load on those shader units. And compare it to the load on RV770 and RV790 cores. Dont forget those are well optimized in the last years driver releases.

**JimmyH** · 09-29-2009, 05:09 AM

Originally Posted by jaredpace

Thanks, nice work, JimmyH. Don't suppose you could run that again on your HD4870 at 750/574, 750/618 & 750/662. Those speeds represent the compute/bandwidth ratios of 850/1300, 850/1400, and 850/1500 on the HD5870's memory. Thank you

OK heres more:

750/575

750/618

750/663

800/530

800/575

835/575

The framerate does fluctuate a bit when standing still so I am taking screens at the minimum.

**Mr. K6** · 09-29-2009, 05:13 AM

Interesting read: http://firingsquad.com/hardware/ati_...ng/default.asp

**CrimInalA** · 09-29-2009, 05:31 AM

Originally Posted by Mr. K6

Interesting read: http://firingsquad.com/hardware/ati_...ng/default.asp

Very interesting indeed . It seems that coreclocks give the most gains in the games they tested .

I actually can't wait to see what these cards can do when more voltage is given . And I also want to see what the little brother 5850 can pull out of it's hat in terms of overclocking

**JimmyH** · 09-29-2009, 05:35 AM

Originally Posted by Mr. K6

Interesting read: http://firingsquad.com/hardware/ati_...ng/default.asp

Thanks for the link. Doesn't look like the 5870 is that bandwidth limited here. But it seems bottlenecked by something else. Most games return less gains than the increase on the core and memory. Poor drivers? Can't help thinking that the impressive power consumption of this card is actually could be the shaders sitting around doing nothing.

Anandtech power test with occt shows the 5870 actually uses alot more power than the other reviews suggest when loaded with a highly optimized application.

http://www.anandtech.com/video/showdoc.aspx?i=3643&p=26

**jaredpace** · 09-29-2009, 07:13 AM

based on jimmyh's cod4 benchmark:
(compute power to bandwidth ratio)
memory:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ----- 100 --- 0%
750 / 575 --- 850 / 1300 ----- 104 --- 4%
750 / 618 --- 850 / 1400 ----- 107 --- 2.9%
750 / 663 --- 850 / 1500 ------ 112 --- 4.6%
750 / 900 --- 850 / 2040 ------ 121 --- n/a
core:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ------ 100 --- 0%
800 / 530 --- 906 / 1200 ------ 103 --- 3%
750 / 575 --- 850 / 1300 ----- 104 --- 0%
800 / 575 --- 906 / 1300 ----- 107 --- 2.9%
835 / 575 --- 946 / 1300 ----- 110 --- 2.8%
http://www.xtremesystems.org/forums/...7&postcount=88

Lightman's 3d06 benchmark:
core / mem / FPS / + fps / + %
850 / 3600 / 091.7 / 0 / 0%
850 / 4000 / 095.2 / +3.5 +3.8%
850 / 4400 / 097.9 / +2.7 +2.9%
850 / 4800 / 100.3 / +2.5 +2.5%
850 / 5200 / 102.4 / +2.1 +2.1%
http://www.xtremesystems.org/forums/...1&postcount=23

Extrahardware.CZ Crysis 1920 × 1200, 4× AA
core / mem / fps / +fps / +%
memory:
850 / 4400 - 40,9 - 0 / 0%
850 / 4800 - 42,0 - +1.1 +2.7%
850 / 5200 - 43,1 - +1.1 +2.6%
core:
785 / 4800 - 40,1 - 0 / 0%
850 / 4800 - 42,0 - +1.9 +4.7%
915 / 4800 - 43,2 - +1.2 +2.9%
core and memory:
785 / 4400 - 39,3 - 0 / 0%
850 / 4800 - 42,0 - +2.7 +6.9%
900 / 5200 - 44,7 - +2.7 +6.4%
http://www.extrahardware.cz/pretakto...adeonu-hd-5870

Firingsquad Crysis 1920 × 1200, 2× AA
core / mem / fps / +fps / +% (from stock)
850 / 4800 - 31.6 - 0 / 0%
850 / 5272 - 32.3 - +0.6 +2.2%
850 / 4800 - 31.6 - 0 / 0%
930 / 4800 - 33.1 - +1.5 +4.7%
850 / 4800 - 31.6 - 0 / 0%
930 / 5400 - 34.3 - +2.7 +8.5%
http://firingsquad.com/hardware/ati_...king/page5.asp

Meh. You can get gains from overclocking core, or memory, or both together. You get higher gains (usually) from overclocking the core. Only in the game Batman did firingsquad get higher performance from overclocking memory vs. core. Conclusion: overclock core & memory as much as possible. Memory bandwidth isn't the bottleneck, otherwise we would have relatively no gain from core overclocking...

?

Originally Posted by JimmyH

Can't help thinking that the impressive power consumption of this card is actually could be the shaders sitting around doing nothing.

Yeah seems like it... jeez.

**JimmyH** · 09-29-2009, 07:45 AM

Originally Posted by jaredpace

Meh. You can get gains from overclocking core, or memory, or both together. You get higher gains (usually) from overclocking the core. Only in the game Batman did firingsquad get higher performance from overclocking memory vs. core. Conclusion: overclock core & memory as much as possible. Memory bandwidth isn't the bottleneck, otherwise we would have relatively no gain from core overclocking...

?

Yeah seems like it... jeez.

I wouldn't say it isn't memory bandwidth bottlenecked just because core overclock works. Memory bandwidth bottleneck isn't a hard capped bottleneck like the gpu core. From my experience overclocking 4870's memory from stock by over 10% returns less than 1% fps gain. So in comparison yes the 5870 is relatively bandwidth starved compared to 4870/ 4890. In batman it highlights memory's importance. core 9% + memory 12% increase fps by 10.5%

**jaredpace** · 09-29-2009, 08:06 AM

Jimmy - what situation maxes out memory bandwidth? Highest texture quality + quality AA filtering? I'm not sure if this maximizes the need for bandwidth. I know it loads up a greater amount of video memory potentially maximizing capacity, but how do you go about maxing out bandwidth? Is there a special test?

So you DO think it has a memory bottleneck? IMO it could definitely use faster memory. IDC @ anandtech says, "f you can overclock the GPU cores and see a performance improvement that exceeds that which comes from increasing the memory clocks then that is about as close to proof you are going to get that your compute system is not memory bandwidth constrained. "

**iandh** · 09-29-2009, 08:33 AM

Originally Posted by saaya

those numbers make no sense, idential max fps, but lower min fps... yet he still gets the same av fps? 0_o

this only makes sense if the min fps is REALLY short, like it drops to 15fps once and thats it...

Yeah that's what I thought too, he got lower min fps than my 720be at stock, must be a fluke or something

**JimmyH** · 09-29-2009, 08:47 AM

Originally Posted by jaredpace

Jimmy - what situation maxes out memory bandwidth? Highest texture quality + quality AA filtering? I'm not sure if this maximizes the need for bandwidth. I know it loads up a greater amount of video memory potentially maximizing capacity, but how do you go about maxing out bandwidth? Is there a special test?

So you DO think it has a memory bottleneck? IMO it could definitely use faster memory. IDC @ anandtech says, "f you can overclock the GPU cores and see a performance improvement that exceeds that which comes from increasing the memory clocks then that is about as close to proof you are going to get that your compute system is not memory bandwidth constrained. "

I don't know. Maybe look for games benchmarks where 4870 outperformed 4850 by significantly more than 20%. 16xAF could likely be one scenario. Colorfill test could show up limitations here too: http://www.techreport.com/articles.x/17618/6

Well the gpu core is doing the actual rendering work so increasing that usually gains more unless you are really badly bottlenecked by slow memory.

**Bojamijams** · 09-29-2009, 09:36 AM

Has it been confirmed that all cards can have their voltages adjusted if flashed with the ASUS bios?

**saaya** · 09-29-2009, 09:43 AM

Originally Posted by jaredpace

http://www.extrahardware.cz/pretakto...adeonu-hd-5870

what cpu speed did they test with?
might stop scaling cause cpu is limiting?

thx JimmyH for the 4870 results!

so reducing 4870s bandwidth by 88% to the same bw/compute ratio as a 5870 results in a mere 20% performance drop. sounds like yet another hint that 5870 is NOT held back a lot by memory bandwidth...

Originally Posted by demonkevy666

so your saying the bottleneck is the thread dispatch and it's only being used at about 31.25% if it where redesigned to use all 1024 dispatches threads at once and not have those 5 alu's grouped. 5 alu is one SIMD. changing this to be all seprate alu shouldn't be too hard, the alu's them self are quite small already.

it's seem to me it's more like what ever is easy the programs will go for shorter times to code things.
easy isn't the best possible way to do things.

i think hes saying the thread count is NOT a limitation cause even if all parts of the gpu are fully loaded its only using 30% of the max possible threads the dispatch processor can coordinate. and it can handle that many threads cause in xfire one dispatch processor apparently runs as master and oversees the threads running on all gpus in the system, hence the hint that in quad gpu configs the thread dispatch MIGHT limit.

Originally Posted by Chumbucket843

i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.

_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s

L1 to L2 bandwidth 384GB/s 435GB/s

3870 to 4870 was a 150% shader unit boost that resulted in a 100% performance boost. this time we have a 100% boost of not only shader units but tmus and rops too! yet the perf boost is only 40% or even less in some cases... that would be as if 4870 would only have been 60% faster with a 150% logic boost instead of 100% faster. theres def something limiting...

l1 to l2 cache bw... interesting!
was looking for 770 figures but couldnt find any...
l1 to l2 barely increased at all... but then again, doesnt each 5way processor or alu or whatever you wanna call it its own L1? and each group of those shares the l2 right? the grouping hasnt changed, so then l1 to l2 bandwidth actually shouldnt matter and could have remained the same...

maybe it actually is if you normalize those numbers clockspeed wise for 770 and 870?

**Chumbucket843** · 09-29-2009, 01:53 PM

Originally Posted by Astennu

So if instructions can be grouped together you can have quite a performance boost. Down side is if you cant group instructions you only have 1/4e or 1/5e of the performance.

I dont know if its true but i have heard the AMD compiler does a quite good job. Grouping up to 3-4 of them most of the time.

But if you have a heavy nVidia optimised game you can have lower value's and bad shader performance. (you might it 1-3 then)

it depends on the what it is doing .you can see here Ati is the intel of synthetic benchmarks.http://www.bit-tech.net/hardware/gra...ture-review/10. the wider you make a vector, the harder it is to keep under full load.

Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.

on average one flop takes 1 byte per second of memory performance. this translates to 2 terabytes per second of required bandwidth for rv870 so every gpu made is bottlenecked from this. the only way is to further reduce the memory operation to calculation ratio. its already 100:1 but it must go higher.

**Robin BP** · 09-29-2009, 09:17 PM

As the 5850 is out i think its clear that something is holding the 5870 back, but what ?

Conclusion

When you take the Cypress based Radeon HD 5870 and cut out 2 SIMDs and 15% of the clock speed to make a Radeon HD 5850, on paper you have a card 23% slower. In practice, that difference is only between 10% and 15% depending on the resolution. What’s not a theory is AMD’s pricing: they may have cut off 15% of the performance to make the 5850, but they have also cut the price by well more than 15%; 31% to be precise.

http://www.anandtech.com/video/showdoc.aspx?i=3650&p=14

Thread: 5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions