Thanks, nice work, JimmyH. Don't suppose you could run that again on your HD4870 at 750/574, 750/618 & 750/662. Those speeds represent the compute/bandwidth ratios of 850/1300, 850/1400, and 850/1500 on the HD5870's memory. Thank you :up:
Printable View
You have to consider the values as percents, not the raw MHz. The 65MHz increase on the core is ~7.6% while the 100MHz increase on the RAM is ~8.3% - both very close, and the increase in FPS reflects this. Because increasing either the core or the RAM frequency nets noticeable performance gains, it appears that neither is bottlenecking the other (at the frequencies tested in the game tested). I'd imagine that in older games, the RAM will become more of a bottleneck to the FPS rate than would the core frequency.
Okay Mr. K6 - good point. They've basically raised the core & memory 8%. They get the same increase in framerates from either +8% core or +8% memory. To me this means that the 5870 could use all the more core or memory speed you could give it.
This card is going to give very good results to anyone who overclocks the :banana::banana::banana::banana: out it.
Oh most definitely :D. I'm very much anticipating a stable release of Rivatuner so I can get to work on the voltages :). As it stands, AMD GPU Clock Tool isn't working with my card (drivers?), so I currently can't go above the limits of CCC (unless I'm using the program incorrectly). Anyway, with a little voltage, it seems these cores will easily do 1GHz+ on the GPU. I'd be interested to see not only the performance gains from this speed, but also if a memory bottleneck finally shows itself :eek:
This thread that we are now in is about the 5870's internal memory bottlenecking the card's performance.
The thread I purposely made seperate was to investigate CPU bottlenecking of the 5870 as a whole.
At your discretion of course as a moderator, but I think maybe these threads should be un-merged. They were for two totally different topics and merging them will most likely prevent the original goal of this thread's OP from being reached.
edit: Ashraf edited thread title to "5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)" to reflect two different topics being discussed within thread.
Thanks!
i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.
_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s
L1 to L2 bandwidth 384GB/s 435GB/s
please unmerge the thread... we cant have a coherant discussion if everythread about the 5870 gets merged... meh....
Yeah, it's pretty screwed now as far as coherency goes. I wish the person who reported the threads had bothered to read them before clicking submit... just because two threads have a similar title doesn't mean they are discussing the same exact subject. I can understand Ashraf's mistake because he probably just was going down the job queue and saw the thread titles, then hit "merge threads".
I asked Ashraf and he said that there isn't really an "unmerge" option, so I basically I'd have to start a new empty thread, and then delete all the posts from my old thread that are now in this thread.:shakes:
Got my 5870 today, but I can't seem to get the GPUclock utility to work. I'm using the RC7 drivers leaked from MSI, maybe that's the problem...
Its depents on the game and the driver you can see it like this:
http://images.anandtech.com/reviews/...I/4800/ilp.png
So if instructions can be grouped together you can have quite a performance boost. Down side is if you cant group instructions you only have 1/4e or 1/5e of the performance.
I dont know if its true but i have heard the AMD compiler does a quite good job. Grouping up to 3-4 of them most of the time.
But if you have a heavy nVidia optimised game you can have lower value's and bad shader performance. (you might it 1-3 then)
Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.
The RV870 core is still new. And i think AMD could optimize the scheduling of the threads a bit better so you can keep those 1600 alu's fed with data. It would be nice if there was a way to see the load on those shader units. And compare it to the load on RV770 and RV790 cores. Dont forget those are well optimized in the last years driver releases.
OK heres more:
750/575
http://farm3.static.flickr.com/2502/...a2681904_b.jpg
750/618
http://farm4.static.flickr.com/3434/...97c11f34_b.jpg
750/663
http://farm3.static.flickr.com/2578/...10d40b55_b.jpg
800/530
http://farm4.static.flickr.com/3432/...597305c5_b.jpg
800/575
http://farm3.static.flickr.com/2629/...481b9ef8_b.jpg
835/575
http://farm3.static.flickr.com/2487/...2cf224a3_b.jpg
The framerate does fluctuate a bit when standing still so I am taking screens at the minimum.
Interesting read: http://firingsquad.com/hardware/ati_...ng/default.asp
Very interesting indeed . It seems that coreclocks give the most gains in the games they tested .
I actually can't wait to see what these cards can do when more voltage is given . And I also want to see what the little brother 5850 can pull out of it's hat in terms of overclocking :)
Thanks for the link. Doesn't look like the 5870 is that bandwidth limited here. But it seems bottlenecked by something else. Most games return less gains than the increase on the core and memory. Poor drivers? Can't help thinking that the impressive power consumption of this card is actually could be the shaders sitting around doing nothing.
Anandtech power test with occt shows the 5870 actually uses alot more power than the other reviews suggest when loaded with a highly optimized application.
http://www.anandtech.com/video/showdoc.aspx?i=3643&p=26
based on jimmyh's cod4 benchmark:
(compute power to bandwidth ratio)
memory:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ----- 100 --- 0%
750 / 575 --- 850 / 1300 ----- 104 --- 4%
750 / 618 --- 850 / 1400 ----- 107 --- 2.9%
750 / 663 --- 850 / 1500 ------ 112 --- 4.6%
750 / 900 --- 850 / 2040 ------ 121 --- n/a
core:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ------ 100 --- 0%
800 / 530 --- 906 / 1200 ------ 103 --- 3%
750 / 575 --- 850 / 1300 ----- 104 --- 0%
800 / 575 --- 906 / 1300 ----- 107 --- 2.9%
835 / 575 --- 946 / 1300 ----- 110 --- 2.8%
http://www.xtremesystems.org/forums/...7&postcount=88
Lightman's 3d06 benchmark:
core / mem / FPS / + fps / + %
850 / 3600 / 091.7 / 0 / 0%
850 / 4000 / 095.2 / +3.5 +3.8%
850 / 4400 / 097.9 / +2.7 +2.9%
850 / 4800 / 100.3 / +2.5 +2.5%
850 / 5200 / 102.4 / +2.1 +2.1%
http://www.xtremesystems.org/forums/...1&postcount=23
Extrahardware.CZ Crysis 1920 × 1200, 4× AA
core / mem / fps / +fps / +%
memory:
850 / 4400 - 40,9 - 0 / 0%
850 / 4800 - 42,0 - +1.1 +2.7%
850 / 5200 - 43,1 - +1.1 +2.6%
core:
785 / 4800 - 40,1 - 0 / 0%
850 / 4800 - 42,0 - +1.9 +4.7%
915 / 4800 - 43,2 - +1.2 +2.9%
core and memory:
785 / 4400 - 39,3 - 0 / 0%
850 / 4800 - 42,0 - +2.7 +6.9%
900 / 5200 - 44,7 - +2.7 +6.4%
http://www.extrahardware.cz/pretakto...adeonu-hd-5870
Firingsquad Crysis 1920 × 1200, 2× AA
core / mem / fps / +fps / +% (from stock)
850 / 4800 - 31.6 - 0 / 0%
850 / 5272 - 32.3 - +0.6 +2.2%
850 / 4800 - 31.6 - 0 / 0%
930 / 4800 - 33.1 - +1.5 +4.7%
850 / 4800 - 31.6 - 0 / 0%
930 / 5400 - 34.3 - +2.7 +8.5%
http://firingsquad.com/hardware/ati_...king/page5.asp
Meh. You can get gains from overclocking core, or memory, or both together. You get higher gains (usually) from overclocking the core. Only in the game Batman did firingsquad get higher performance from overclocking memory vs. core. Conclusion: overclock core & memory as much as possible. Memory bandwidth isn't the bottleneck, otherwise we would have relatively no gain from core overclocking...:confused:?
Yeah seems like it... jeez.
I wouldn't say it isn't memory bandwidth bottlenecked just because core overclock works. Memory bandwidth bottleneck isn't a hard capped bottleneck like the gpu core. From my experience overclocking 4870's memory from stock by over 10% returns less than 1% fps gain. So in comparison yes the 5870 is relatively bandwidth starved compared to 4870/ 4890. In batman it highlights memory's importance. core 9% + memory 12% increase fps by 10.5%
Jimmy - what situation maxes out memory bandwidth? Highest texture quality + quality AA filtering? I'm not sure if this maximizes the need for bandwidth. I know it loads up a greater amount of video memory potentially maximizing capacity, but how do you go about maxing out bandwidth? Is there a special test?
So you DO think it has a memory bottleneck? IMO it could definitely use faster memory. IDC @ anandtech says, "f you can overclock the GPU cores and see a performance improvement that exceeds that which comes from increasing the memory clocks then that is about as close to proof you are going to get that your compute system is not memory bandwidth constrained. "
I don't know. Maybe look for games benchmarks where 4870 outperformed 4850 by significantly more than 20%. 16xAF could likely be one scenario. Colorfill test could show up limitations here too: http://www.techreport.com/articles.x/17618/6
Well the gpu core is doing the actual rendering work so increasing that usually gains more unless you are really badly bottlenecked by slow memory.
Has it been confirmed that all cards can have their voltages adjusted if flashed with the ASUS bios?
what cpu speed did they test with?
might stop scaling cause cpu is limiting?
thx JimmyH for the 4870 results! :toast:
so reducing 4870s bandwidth by 88% to the same bw/compute ratio as a 5870 results in a mere 20% performance drop. sounds like yet another hint that 5870 is NOT held back a lot by memory bandwidth...
i think hes saying the thread count is NOT a limitation cause even if all parts of the gpu are fully loaded its only using 30% of the max possible threads the dispatch processor can coordinate. and it can handle that many threads cause in xfire one dispatch processor apparently runs as master and oversees the threads running on all gpus in the system, hence the hint that in quad gpu configs the thread dispatch MIGHT limit.
3870 to 4870 was a 150% shader unit boost that resulted in a 100% performance boost. this time we have a 100% boost of not only shader units but tmus and rops too! yet the perf boost is only 40% or even less in some cases... that would be as if 4870 would only have been 60% faster with a 150% logic boost instead of 100% faster. theres def something limiting...
l1 to l2 cache bw... interesting!
was looking for 770 figures but couldnt find any...
l1 to l2 barely increased at all... but then again, doesnt each 5way processor or alu or whatever you wanna call it its own L1? and each group of those shares the l2 right? the grouping hasnt changed, so then l1 to l2 bandwidth actually shouldnt matter and could have remained the same...
maybe it actually is if you normalize those numbers clockspeed wise for 770 and 870?
it depends on the what it is doing .you can see here Ati is the intel of synthetic benchmarks.http://www.bit-tech.net/hardware/gra...ture-review/10. the wider you make a vector, the harder it is to keep under full load.
on average one flop takes 1 byte per second of memory performance. this translates to 2 terabytes per second of required bandwidth for rv870 so every gpu made is bottlenecked from this. the only way is to further reduce the memory operation to calculation ratio. its already 100:1 but it must go higher.Quote:
Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.
As the 5850 is out i think its clear that something is holding the 5870 back, but what ?
http://www.anandtech.com/video/showdoc.aspx?i=3650&p=14Quote:
Conclusion
When you take the Cypress based Radeon HD 5870 and cut out 2 SIMDs and 15% of the clock speed to make a Radeon HD 5850, on paper you have a card 23% slower. In practice, that difference is only between 10% and 15% depending on the resolution. What’s not a theory is AMD’s pricing: they may have cut off 15% of the performance to make the 5850, but they have also cut the price by well more than 15%; 31% to be precise.