what cpu speed did they test with?
might stop scaling cause cpu is limiting?

thx JimmyH for the 4870 results!
so reducing 4870s bandwidth by 88% to the same bw/compute ratio as a 5870 results in a mere 20% performance drop. sounds like yet another hint that 5870 is NOT held back a lot by memory bandwidth...

Quote Originally Posted by demonkevy666 View Post
so your saying the bottleneck is the thread dispatch and it's only being used at about 31.25% if it where redesigned to use all 1024 dispatches threads at once and not have those 5 alu's grouped. 5 alu is one SIMD. changing this to be all seprate alu shouldn't be too hard, the alu's them self are quite small already.

it's seem to me it's more like what ever is easy the programs will go for shorter times to code things.
easy isn't the best possible way to do things.
i think hes saying the thread count is NOT a limitation cause even if all parts of the gpu are fully loaded its only using 30% of the max possible threads the dispatch processor can coordinate. and it can handle that many threads cause in xfire one dispatch processor apparently runs as master and oversees the threads running on all gpus in the system, hence the hint that in quad gpu configs the thread dispatch MIGHT limit.

Quote Originally Posted by Chumbucket843 View Post
i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.


_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s

L1 to L2 bandwidth 384GB/s 435GB/s
3870 to 4870 was a 150% shader unit boost that resulted in a 100% performance boost. this time we have a 100% boost of not only shader units but tmus and rops too! yet the perf boost is only 40% or even less in some cases... that would be as if 4870 would only have been 60% faster with a 150% logic boost instead of 100% faster. theres def something limiting...

l1 to l2 cache bw... interesting!
was looking for 770 figures but couldnt find any...
l1 to l2 barely increased at all... but then again, doesnt each 5way processor or alu or whatever you wanna call it its own L1? and each group of those shares the l2 right? the grouping hasnt changed, so then l1 to l2 bandwidth actually shouldnt matter and could have remained the same...

maybe it actually is if you normalize those numbers clockspeed wise for 770 and 870?