PDA

View Full Version : 5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)



hurleybird
09-23-2009, 03:47 PM
The 5870 is fast, that's for sure, but looking like and enhanced and doubled 4890 it should be much faster, and definitely shouldn't be beat by 4890CF across the board. The card has power, but obviously isn't able to use very well. The two main culprits are immature drivers and a possible bandwidth bottleneck. While we have to wait and see for drivers, anyone with a 5870 in hand can easily test if the memory system is at fault.

This is a call for anyone who has a 5870 right now, and if no one steps up to the plate I will when mine comes in. A good methodology would be dropping the core and memory frequencies in half while observing how the performance changes across various programs (especially those that the 5870 doesn't scale well in, like hawx) as core and memory are individually adjusted. Come on guys, lets get to it!

ownage
09-23-2009, 04:03 PM
Bs!

Razrback16
09-23-2009, 05:15 PM
I'd like to see a comparison between 4890 CF and a single 5870 in direct benchmarks on the same system. Didn't see any today in the reviews.

TheBlueChanell
09-23-2009, 09:00 PM
I'd imagine drivers would also have some to do with it. The 5870 should pretty much always be faster than a 4870x2.

Mango31
09-23-2009, 10:06 PM
So it will take month again until the drivers catch up ? *yawn*

Nobody should touch those latest and greatest until they run like they should - it would speed up this process by a fair margin... Reds and greens alike :rolleyes:

cstkl1
09-23-2009, 11:22 PM
dont think there will be a bottleneck by the pcie

the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
then it would in my opinion might bottleneck a pcie 2.0

but in high res games it practically kills the 4890..
this i guess is where the bandwidth doesnt play much role as to computing power

Jamesrt2004
09-24-2009, 01:44 AM
dont think there will be a bottleneck by the pcie

the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
then it would in my opinion might bottleneck a pcie 2.0

but in high res games it practically kills the 4890..
this i guess is where the bandwidth doesnt play much role as to computing power

an min fps... it kills EVERYTHING at min fps :D

mcmeat51
09-24-2009, 02:20 AM
the 5870 compare to 4890 is botllenecked by its mem bandwidth in comparison... if only it was 512bit or 7-8ghz..
then it would in my opinion might bottleneck a pcie 2.0

but in high res games it practically kills the 4890..
this i guess is where the bandwidth doesnt play much role as to computing power

I think thats partly the case, however from reading ~5 reviews so far, I think that is partly that and partly VRAM bound. looking at the benches, the only cards that are able to outperform it are (single or multi GPU) cards that have 1GB++. If I had the money to buy a 5870, I would wait and get one (or two ;) ) that had 2GB mem as I was assuming that would be the case.

Plus Im not sure how many places have used the 9.10 beta drivers that AMD released, in the few test bits I scanned, they used 9.8 I think.

hurleybird
09-24-2009, 08:24 AM
looking at the benches, the only cards that are able to outperform it are (single or multi GPU) cards that have 1GB++. If I had the money to buy a 5870, I would wait and get one (or two ;) ) that had 2GB mem as I was assuming that would be the case.

In multigpu cards, that 2GB is mostly just the same texture memory mirrored for each one, so it behaves more like a 1GB card, although it's possible there are some optimizations to reduce memory usage.

I've been told my card is shipping this morning, when I get it, if no one else has stepped up to the plate, I'll settle once and for all if the memory bandwidth is what's holding the card back.

mcmeat51
09-24-2009, 09:24 AM
In multigpu cards, that 2GB is mostly just the same texture memory mirrored for each one, so it behaves more like a 1GB card, although it's possible there are some optimizations to reduce memory usage.

True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?

Farinorco
09-24-2009, 09:45 AM
True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?

Theoretical mem bus throughput in a dual GPU card (with AFR) it's the same case than memory occupation. It has twice the bandwidth, because it has one bus for each GPU, but chances are that most of the bandwidth of the second GPU is being used to send to the second GPU the same info that the first bus is sending to the first GPU, because both need the same data.

With Alternate Frame Rendering, you have to think as if you have two completely different cards, each one rendering half of the frames (indeed, that's exactly what you have), so you can use the 2nd card to render the next frame while the 1st one is rendering the current one, if the CPU has finished to process that next frame.

You can't compare theoretical specs of dual and single GPU solutions so easily.

russian boy
09-24-2009, 11:40 AM
True however that seemes like a trend to me. what is the theoretical mem bus throughput of a 5870 and the gtx295?AFAIK theoretically AFR on 295 means that you have 275 with same amount of SPs, TMUs, ROPs, VRAM but with 2x core, SPs, bandwith clocks and 1 additional frame input lag(or same input lag as one gtx275).

jaredpace
09-24-2009, 02:28 PM
I've been told my card is shipping this morning, when I get it, if no one else has stepped up to the plate, I'll settle once and for all if the memory bandwidth is what's holding the card back.

take a couple of games and try them at three settings
(1. medium quality, lower res, noAA)
(2. high quality, 1080, 4xAA) and
(3. Insane-o maximum quality, 1920 or 2560 or Eyefinity range res, 8xAA, 16xAF)

Clock your memory from 800mhz to 1400+mhz in 100mhz or so increments, and record the data. Guru3d has a new GPU tool for overclocking and overvolting core/mem on 5800 cards. I would be interested to see your benchmarks. Another exciting bench coming out from a poster at ocforums.com is going to be 5870 crossfire tests ran extensively on a P55 vs. an X58 to test the 8x/8x vs. 16x/16x PCIE.

I am really interested to see how your memory performance benchmarks go. good luck, and thanks!
:clap:

hurleybird
09-24-2009, 02:37 PM
Actually, the plan is to clock both the memory and the core down to about half. That way the compute/bandwidth ratio stays the same and it's much easier to increase one or the other. Then I'll have the core or the memory stay the same while I increase on or the other in increments of maybe 150MHz. If the card reacts substantially more favorably to either core or memory frequency, we'll know there's an imbalance in the design.

Hopefully I get mine in tommorow (express shipping), but I do live somewhat north...

jaredpace
09-24-2009, 02:39 PM
That might work too, but I would try both ways just to replicate real world scenarios.

hurleybird
09-24-2009, 03:29 PM
We'll just have to see how far my card overclocks. If I can't get the memory high enough I won't get very much data the other way. Good compromise might be to lower both clocks to 3/4 as opposed to half and go from there.

saaya
09-24-2009, 11:58 PM
lots of talk, but barely anybody has a 5870 lol...
im really curious about this myself, 5870 SHOULD be much faster than a 4890 considering ati doubled everything except for mem bw, which is still improved notably over a 4890... so its really odd the perf is only 40% higher than a 4890 on average... it could be that the shader cores are less efficient now that dx11 is added?

w1zzard disabled some sps on his 5870 to simulate a 5850, it would be really cool if we could disable half the sps and connected rops and ideally tmus, to basically have a 5870 cut down to the same specs as a 4890 and we can do a clock for clock compare...

db87
09-25-2009, 06:06 AM
BS mate!

http://www.techpowerup.com/reviews/AMD/HD_5870_PCI-Express_Scaling/4.html

jaredpace
09-25-2009, 06:40 AM
BS mate!

http://www.techpowerup.com/reviews/AMD/HD_5870_PCI-Express_Scaling/4.html
This review is testing PCI-e lane link width. Hurleybird is testing memory bandwidth on the board by under/over clocking the memory frequency. :confused:

a 5870 cut down to the same specs as a 4890 and we can do a clock for clock compare...
that would be sweet.

Lightman
09-25-2009, 07:26 AM
What tests do you want guys?

I can run them for you now :)

BTW first and foremost bottleneck is triangle setup. Same speed per clock as in RV770!
Let's look for more :yepp:

russian boy
09-25-2009, 08:53 AM
BTW first and foremost bottleneck is triangle setup. Same speed per clock as in RV770!So G300 won't be much faster then gtx285 too?

jaredpace
09-25-2009, 09:00 AM
What tests do you want guys?
I can run them for you now :)


Bench a video game at different ram frequencies 800mhz - 1400mhz:D

Lightman
09-25-2009, 10:24 AM
Bench a video game at different ram frequencies 800mhz - 1400mhz:D

I came close CCC limits 900-1300MHz for mem :)

All tests done @1920x1200

http://img12.imageshack.us/img12/5499/scaling850900.png

http://img12.imageshack.us/img12/2028/scaling8501000.png

http://img12.imageshack.us/img12/4706/scaling8501100.png

http://img12.imageshack.us/img12/8480/scaling8501200.png

http://img12.imageshack.us/img12/3311/scaling8501300.png

http://img12.imageshack.us/img12/6264/scaling9001300.png

Enjoy! :D

W1zzard
09-25-2009, 10:29 AM
dont forget to check for memory error correction (more details on overclocking page of my review)

hurleybird
09-25-2009, 10:48 AM
Well, from the initial results it looks like HD 5870 responds to more memory bandwidth, but 100MHz mem OC doesen't tell us much. For all we know 150 more MHz on the memory would stop scaling. What we want to know is the point where overclocking the the memory stops giving more performance, as well as the point where overclocking core stops giving more performance. Here's a better methodology:

1. Download AMD GPU clock tool for HD 5870 (http://downloads.guru3d.com/AMD-GPU-Clock-Tool-v0.9.26.0-For-HD-5870-download-2383.html)

2. Underclock both core and memory to half frequency (425/600) to keep the same Compute/bandwidth ratio.

3. Test at that frequency

4. While keeping the core @ 425MHz, start increasing the memory clock by some increment, say 50-100MHz, until it stops scaling or becomes unstable.

5. While keeping the memory @ 600MHz, start increasing the core clock by some increment, say 50-100MHz*, until it stops scaling or becomes unstable.

6. Repeat with as many programs as you care for.

7. Analyze

*Might be easier to have the same % increment as for memory. For example, if you use 100MHz increments on both, each increment is more substantial for compute resources because it starts at a lower frequency (450 vs. 600MHz.) If you want to have proper proportions, a 100MHz increase on mem has the same significance as a ~71MHz (70.833) increase in compute. In other words, multiply whatever you choose as the memory increment by 0.70833 to get the amount you should increment core by.

Raptor-X
09-25-2009, 10:49 AM
dont forget to check for memory error correction (more details on overclocking page of my review)

Link Please

W1zzard
09-25-2009, 10:51 AM
http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870/33.html

Raptor-X
09-25-2009, 10:53 AM
http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870/33.html

Thank you.

jaredpace
09-25-2009, 10:58 AM
Thanks lightman, wish elmore would let us know how he got his vmem up and clocked the mem at 1400mhz...
however, from your tests on the canyon flight FPS @ 850mhz core

memory mhz / FPS / fps Increase
0900 / 091.7 / 0 0%
1000 / 095.2 / +3.5 +3.8%
1100 / 097.9 / +2.7 +2.9%
1200 / 100.3 / +2.5 +2.5%
1300 / 102.4 / +2.1 +2.1%

According to this test, as the memory clock increases, gains are evident at 1200 -> 1300mhz. Even though they are diminishing as it goes higher, you are still getting benefits above the stock mem clock, which backs peoples assumptions that the card is bottlenecked @ 153gbps. Would like to see memory at 1400mhz, if it yields a ~2% increase over 1300mhz, it's even further proof.

Lightman's overall FPS is still increasing so his memory is not producing errors yet


"Overclocking the memory on these cards is quite different from any other card so far. Normally you'd expect rendering errors or crashes, but not with these cards. Thanks to the new error correction algorithm in the memory controller, every memory error is just retransmitted until everything is fine. So once you exceed the "stable" clock frequency, memory errors will appear more often, get retransmitted, but the rendered output will still look perfectly fine. The only difference is that performance drops, the further you increase the clocks, the lower the performance gets. As a result a normal "artifact scanning" approach to memory overclocking on the HD 5800 Series will not work. You have to manually increase the clocks and observe the framerate until you find the point where performance drops."
http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870/33.html

Lightman
09-25-2009, 11:40 AM
Yes, I'm aware of ECC on Cypress. So far my memory won't cooperate @1400, but it is working @1350 (more tests needed to check for ECC)
This lets me believe 1300MHz should be stable.
BTW my core scales to 950MHz on default volts.

From my tests what caught my attention was Fillrate tests and how nice they respond to memory (for pixel) and engine (for texel) clocks! Also perlin noise seems to be purely GPU limited.

I think for what it's worth 1200MHz is a good compromise at this point of time.
This leaves AMD with doors opened for HD5890 with higher clocks and at the same time higher performance! It looks like 1GHz/1400 should be doable on current tech. using 6Gbps IC's (interconnect limited?).

saaya
09-26-2009, 01:05 AM
BTW first and foremost bottleneck is triangle setup. Same speed per clock as in RV770!
Let's look for more :yepp:
huh? how come?

but it cant be a big limitation, otherwise a 4870x2 would pull ahead notably in most tests...

saaya
09-26-2009, 01:51 AM
thx lightman for the infos!
you dont happen to have a 4890 or 4870 or better yet 4870x2 so you can do exactly the same thing on that card in the same rig? :D

i threw the numbers in a graph:
Pixel shader, multi fillrate, perlin noise, vertex shader simple are not bandwidth limited... yes some of these increase slightly, but the boost they get from increasing the memory clocks and hence bandwidth by almost 50% can be ignored...

complex vertex shader, shader particles and fillrate single are clearly bandwidth limited... but the gains are linear and proportional, so the gpu is definately not bandwidth crippled or starved like some people suggested... i dont think that memory bandwidth is limiting the 5870... i think its internal bandwidth or some internal architecture bottleneck...

CrimInalA
09-26-2009, 02:24 AM
very interesting information :up:

demonkevy666
09-26-2009, 01:42 PM
thx lightman for the infos!
you dont happen to have a 4890 or 4870 or better yet 4870x2 so you can do exactly the same thing on that card in the same rig? :D

i threw the numbers in a graph:
Pixel shader, multi fillrate, perlin noise, vertex shader simple are not bandwidth limited... yes some of these increase slightly, but the boost they get from increasing the memory clocks and hence bandwidth by almost 50% can be ignored...

complex vertex shader, shader particles and fillrate single are clearly bandwidth limited... but the gains are linear and proportional, so the gpu is definately not bandwidth crippled or starved like some people suggested... i dont think that memory bandwidth is limiting the 5870... i think its internal bandwidth or some internal architecture bottleneck...

http://www.xbitlabs.com/articles/video/display/radeon-hd5870.html

I'm still looking for this "bottleneck" on the architecture.
page three...

chispy
09-26-2009, 02:22 PM
http://www.techpowerup.com/reviews/ATI/Radeon_HD_5870/33.html

Than you W1zzard :up:.

Regards. Chispy

hurleybird
09-26-2009, 03:32 PM
Well, I'm getting my card on monday and should be able to do some more detailed tests, but at first glance it doesen't seem like bandwidth is that big of bottleneck. Either there's some obscure and unforseen bottleneck in the architecture, in which case we can probably expect some kind of very slightly modified but much faster fixed "RV890" chip to compete against GT300, perhaps in time to coincide with Nvidia's launch. More likely though, it's just immature drivers and broken optimizations that need to be updated for RV870. The fact that the card is launching relatively early, while eyefininty isn't working over CF, and SSAA is a broken blurry mess in the current drivers would seem to suggest this.

Again, there's no reason an enhanced and doubled HD4890 shouldn't outperform the HD4870X2, 4890CF, and the GTX 295. At the moment performance is good, but isn't anywhere close to where it should be.

wiak
09-26-2009, 08:40 PM
sounds like a good combo with GDDR5 memory that is rated 5Gbps plus 1Ghz Core and 2GB memory
if you remember 512MB > 1GB on 4870 cards did boost preformace

STEvil
09-27-2009, 12:53 AM
Just so you guys know, clocking the card to 1/2 speed will not result in the performance "scaling" to 1/2 performance.

Crankybugga
09-27-2009, 03:02 AM
Just so you guys know, clocking the card to 1/2 speed will not result in the performance "scaling" to 1/2 performance.
Dont let any facts get in the way of good ol 5870 bashing.:rofl:

I can see some experts here trying to track down some mysterious bottleneck crippling the HD5870, even though it out performs nvidias biggest and best dualcore monstrosity in some game benchmarks and poses a serious threat in most others. :)
Only nvidiots would portray a 40% performance improvement between models as too little, then claim to know about a mysterious bottleneck to back this opinion.
Its the same every model release when the opposing side feels threatened. :up:

dekruyter
09-27-2009, 04:21 AM
Dont let any facts get in the way of good ol 5870 bashing.:rofl:

I can see some experts here trying to track down some mysterious bottleneck crippling the HD5870, even though it out performs nvidias biggest and best dualcore monstrosity in some game benchmarks and poses a serious threat in most others. :)
Only nvidiots would portray a 40% performance improvement between models as too little, then claim to know about a mysterious bottleneck to back this opinion.
Its the same every model release when the opposing side feels threatened. :up:

Gotta say, I love this response, and agree. The 5870 is what it is....a "product", build on design, manufacturing, cost, and marketing/sales limitations. If the components are not equally matched, and thus there is hidden performance, so what. It is what it is. No one could ever claim that the 5870 components were perfectly tuned....to say within a few % of each other. It is "technically" interested what these test show, but in the real world what do they mean......they certainly have nothing to do w/ the decision to buy, or not buy.

And for disclosure's sake, I get my XFX 5870 on Monday.




.

okorop
09-27-2009, 04:26 AM
Gotta say, I love this response, and agree. The 5870 is what it is....a "product", build on design, manufacturing, cost, and marketing/sales limitations. If the components are not equally matched, and thus there is hidden performance, so what. It is what it is. No one could ever claim that the 5870 components were perfectly tuned....to say within a few % of each other. It is "technically" interested what these test show, but in the real world what do they mean......they certainly have nothing to do w/ the decision to buy, or not buy.

And for disclosure's sake, I get my XFX 5870 on Monday.




.

I agree with you, but that 2% more performance only overclocking the memory is a sign that the card is a little bit bandwidth limited at least with same game, I will buy 2 of 5870 for sure but I would like to see some test with the memory at 1400mhz to see the difference......

CrimInalA
09-27-2009, 05:51 AM
Gotta say, I love this response, and agree. The 5870 is what it is....a "product", build on design, manufacturing, cost, and marketing/sales limitations. If the components are not equally matched, and thus there is hidden performance, so what. It is what it is. No one could ever claim that the 5870 components were perfectly tuned....to say within a few % of each other. It is "technically" interested what these test show, but in the real world what do they mean......they certainly have nothing to do w/ the decision to buy, or not buy.

And for disclosure's sake, I get my XFX 5870 on Monday.
.

This got everything to do with buying the best product and overclocking it to the max . We are on xtremesystems here ..
Most people here want to get the most of of the product which they have bought . And not only that , but we want to learn more about this product and squeeze every last bit of perfomance out of it .

You seem to be one of those few people on here that is not interested too much in this material . But that's ok too :up:

saaya
09-27-2009, 06:05 AM
I'm still looking for this "bottleneck" on the architecture.
page three...
my guess is having the units in 2 800sp blocks might cause problems with the hw scheduler... that should be possible to at least be improved if not fixed with driver updates tho i think...


Well, I'm getting my card on monday and should be able to do some more detailed tests, but at first glance it doesen't seem like bandwidth is that big of bottleneck. Either there's some obscure and unforseen bottleneck in the architecture, in which case we can probably expect some kind of very slightly modified but much faster fixed "RV890" chip to compete against GT300, perhaps in time to coincide with Nvidia's launch. More likely though, it's just immature drivers and broken optimizations that need to be updated for RV870. The fact that the card is launching relatively early, while eyefininty isn't working over CF, and SSAA is a broken blurry mess in the current drivers would seem to suggest this.



Again, there's no reason an enhanced and doubled HD4890 shouldn't outperform the HD4870X2, 4890CF, and the GTX 295. At the moment performance is good, but isn't anywhere close to where it should be.
ill correct myself, bandwidth IS limiting the 5870, but not by a lot...

yupp, i find it hard to believe that a 4870x2 is using the two gpus on it 100% efficiently and hence performs the same as a chip that has the same resources in one gpu... that makes no sense whatsoever...

not to mention 5870 is even slower on average than a 4870x2... theres def something limiting...


Just so you guys know, clocking the card to 1/2 speed will not result in the performance "scaling" to 1/2 performance.yeah, like with any chip there is a base performance... 5870s responce to memory bandwidth is about the same as most vgas though, it doesnt seem very memory bandwidth limited at all...


Dont let any facts get in the way of good ol 5870 bashing.:rofl:

I can see some experts here trying to track down some mysterious bottleneck crippling the HD5870, even though it out performs nvidias biggest and best dualcore monstrosity in some game benchmarks and poses a serious threat in most others. :)
Only nvidiots would portray a 40% performance improvement between models as too little, then claim to know about a mysterious bottleneck to back this opinion.
Its the same every model release when the opposing side feels threatened. :up:sigh, thanks for... contributing to this thread :up:

if only there would be more people like you who just ignorantly take things for what they are and mock people who try to understand and possibly improve things around them, this world would be a much better place... NOT :p:

you might wanna check out afghanistan next vacation, youd find lots of friends over there for sure :D
fertilizers? god made plants grow fast enough as it is, there is no limitation slowing it down, impossible! only a blasphemer would doubt gods perfect creations! ati is great! err god i mean! :lol:


This got everything to do with buying the best product and overclocking it to the max . We are on xtremesystems here ..
Most people here want to get the most of of the product which they have bought . And not only that , but we want to learn more about this product and squeeze every last bit of perfomance out of it .

You seem to be one of those few people on here that is not interested too much in this material . But that's ok too :up:
well said bro! :toast:

and yes, if your not interested in this thats fine, but why would you bash people and call them fanboys and idi0ts if they are curious how the tech they bought works and how to make it faster? this is THE essence of overclocking and xtremesystems!!! posting here, on THE overclocking, tweaking and pc technology forum worldwide mocking this is quite bizarre...

saaya
09-27-2009, 10:03 AM
this might be interesting to benchmark! id love to test a 4870 vs a 5870 with half its blocks disabled to see how big of a diference this makes!


The computing cores can communicate on both local and global levels. ATI claims a considerable increase in cache bandwidth. Particularly, the speed of fetching data from the L1 cache is now as high as 1 terabyte per second while the bandwidth of the link between the L1 and L2 caches is increased to 435GBps. The L2 caches have become larger from 64 to 128KB.

xbitlabs keeps mentioning the reduced bandwidth "probably" causing some impacts at high resolutions compared to the 4870x2... they claim to have written this passage of the article before doing any tests, but i dont buy it ;)
im pretty sure they KNEW what perf was like when they wrote this part... :D

anyways, if bandwidth is really the limiting factor, then why doesnt the 5870 drop behind the 4870x2 notably at high resolutions and with high anti aliasing modes, and why doesnt the 5870 perform notably better at low resolutions where the memory bandwidth doesnt matter?

yes, you CAN see this trend, its definately there, but its veeery subtle... we are talking about a few percent points here, nothing mayor you would expect from the 5870 having more than 30% less bandwidth than a 4870x2...

i think i found something...
rv870 is a REALLY massive chip with a huge load of raw processing power... keeping all those sps busy is not easy... and i think thats whats holding the 5870 back...

http://img24.imageshack.us/img24/5228/d11vsd10.png

thread limit was only increased from 768 to 1024...
could this explain the limitation? 768 to 1024 is only a 30% boost... and a 5870 is roughly 40% faster than a 4890...

hurleybird
09-27-2009, 10:18 AM
First of all, that's just the number of compute threads possible, and has nothing to do with limiting the shader power in games. Secondly, the 5870 can only run a maximum of 1600 / 5 = 320 threads, because you can only run one thread per group of 5 "stream processors", so in terms of DirectCompute even TriFire 5870's will not be held back by the thread limit, although QuadFire might be.

Chumbucket843
09-27-2009, 10:19 AM
thx lightman for the infos!
you dont happen to have a 4890 or 4870 or better yet 4870x2 so you can do exactly the same thing on that card in the same rig? :D

i threw the numbers in a graph:
Pixel shader, multi fillrate, perlin noise, vertex shader simple are not bandwidth limited... yes some of these increase slightly, but the boost they get from increasing the memory clocks and hence bandwidth by almost 50% can be ignored...

complex vertex shader, shader particles and fillrate single are clearly bandwidth limited... but the gains are linear and proportional, so the gpu is definately not bandwidth crippled or starved like some people suggested... i dont think that memory bandwidth is limiting the 5870... i think its internal bandwidth or some internal architecture bottleneck...

they doubled registers and cache so internally it is fine.

Chumbucket843
09-27-2009, 10:22 AM
i think i found something...
rv870 is a REALLY massive chip with a huge load of raw processing power... keeping all those sps busy is not easy... and i think thats whats holding the 5870 back...

http://img24.imageshack.us/img24/5228/d11vsd10.png

thread limit was only increased from 768 to 1024...
could this explain the limitation? 768 to 1024 is only a 30% boost... and a 5870 is roughly 40% faster than a 4890...

it is held back by the compiler. i dont see how this would be any different from rv770 though so obviously scaling linearly with shaders isnt goint to happen for ATi unless they change something in their memory system.

iandh
09-27-2009, 01:01 PM
I'm benching my unlocked 720BE against a guy with that has a 5870 and i7 over at OCN... results are less than exciting (for me at least)

I just ran Crysis Warhead bench 0.33 (avalanche flythrough)


1920 0xAA

i7 @ 4.2Ghz Min: 29.55 Max: 46.81 Avg: 37.04

Phenom II x4 @ 3.6Ghz Min: 25.12 Max: 36.73 Avg: 30.99


1920 8xAA

i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

Phenom II x4 @ 3.6Ghz Min: 18.24 Max: 29.33 Avg: 23.83



note: Unlock from X3 to X4 gave me a slight avg fps increase, and almost 2fps increase in min fps.

It appears since the min FPS numbers are all very close, they are revealing spots where the bench is GPU limited, and the Max FPS numbers are quite different, showing spots where the bench is CPU limited

Anyone who owns a 5870 and wants to add benches into the mix is welcome... I'll run the same bench if possible on my system and we can see what can be worked out about this. OBVIOUSLY a 3.6Ghz Phenom II is no match for a 4.2Ghz i7, but I would have though that the results would have been much more GPU limited in this bench. :confused:

jaredpace
09-27-2009, 01:46 PM
crysis likes cpu as much as gpu

Gurr
09-27-2009, 01:47 PM
Not sure why you're less than excited. You're 600 mhz below him and only 5-10 FPS behind him in possibly the most intense game(graphically) there is. Still.

Nice results, are you 2 going to be doing any other tests together?

iandh
09-27-2009, 01:55 PM
crysis likes cpu as much as gpu

Yeah, I'm going to try to do some other titles.


Not sure why you're less than excited. You're 600 mhz below him and only 5-10 FPS behind him in possibly the most intense game(graphically) there is. Still.

Nice results, are you 2 going to be doing any other tests together?

Less than excited because I'm a hardware whore, and I don't need any excuses, no matter how weak, to buy new hardware :D

Considering our min FPS are still pretty much the same, means that we both would get roughly the same gameplay experience, so I still love my little engine that could Phenom II :)

The OP of that thread (the guy I'm benching against) seems to have dissapeared, so atm no, but I'm hoping to round up someone else with a 5870 and heavily OC'ed i7

demonkevy666
09-27-2009, 03:35 PM
I can't believe you guys thing it's really only 320 shaders.

Chumbucket843
09-27-2009, 04:26 PM
I can't believe you guys thing it's really only 320 shaders.

it IS 320 shaders. there are 5 alu's per shader. they are called stream processors. the 320 shaders are then grouped into 20 thread clusters.

demonkevy666
09-27-2009, 06:19 PM
it IS 320 shaders. there are 5 alu's per shader. they are called stream processors. the 320 shaders are then grouped into 20 thread clusters.

are actually 320 rather complex 5-stage computing subunits

which means that's the textures stream processors other stream processors are simple or general.


The general principle of the computing section has not changed much in the RV870. It is still based on shader processors with superscalar design, each processor incorporating five ALUs four of which are general-purpose ALUs and the fifth is a special-purpose ALU capable of executing complex instructions like SIN, COS, LOG, EXP, etc. Besides the ALUs, each shader processor also contains a branch control unit and an array of general-purpose registers.


When we are talking about 1600 stream processors in the RV870, we must keep it in mind that there are actually 320 rather complex 5-stage computing subunits. Provided sufficient code optimization, this design of the GPU’s computing section helps achieve a much higher level of performance than with Nvidia’s scalar architecture.

which is why nvidia has a shader clock and ati shader clock would be locked to the core speeds.

cpu is the bottleneck lol

nascasho
09-27-2009, 06:31 PM
I wish I could break the 3.8Ghz barrier to release some more potential outta these guys. Damn C0's.

demonkevy666
09-27-2009, 06:36 PM
First of all, that's just the number of compute threads possible, and has nothing to do with limiting the shader power in games. Secondly, the 5870 can only run a maximum of 1600 / 5 = 320 threads, because you can only run one thread per group of 5 "stream processors", so in terms of DirectCompute even TriFire 5870's will not be held back by the thread limit, although QuadFire might be.

I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

:shakes::shrug:

Chumbucket843
09-27-2009, 06:39 PM
are actually 320 rather complex 5-stage computing subunits

which means that's the textures stream processors other stream processors are simple or general.

cpu is the bottleneck lol

i dont see what you are disagreeing with. lets just stick with ALU because thats what they are. ATi can call them whatever they want. the 5th alu can also handle double precision btw.


realistically they say they will get more performance than nvidia but they dont tell you that their gpu's are heavily dependent on compilers to effectively keep the gpu under full load. not saying a gt200 is faster than rv870 though.

zerazax
09-27-2009, 07:15 PM
It'll be interesting to see if the DX11 changes affected performance

Obviously, it's hard to say since DX11 supposedly gives inherent performance boosts over DX10/10.1, but given that DX11 had some changes that might affect how the compiler works and how things are executed, im wondering if its possible one setup is bottlenecked

From what I've seen/heard too, the CPU power needed to keep up with one of these things might be a factor as well

saaya
09-27-2009, 08:00 PM
4.2 isnt really heavily oced :D
thats kinda what all 920s max out at... heavily oced means 4.4-4.6, which IS possible with good chips on water or even good air :)
im quite surprised it needs that much cpu oomph...

especially at 1920x1080.... strange...

saaya
09-27-2009, 08:12 PM
First of all, that's just the number of compute threads possible, and has nothing to do with limiting the shader power in games. Secondly, the 5870 can only run a maximum of 1600 / 5 = 320 threads, because you can only run one thread per group of 5 "stream processors", so in terms of DirectCompute even TriFire 5870's will not be held back by the thread limit, although QuadFire might be.
ahhhhhh, alright, thanks for clearing that up man! :toast:


they doubled registers and cache so internally it is fine.
well thats what they say... could still be limited internally somehow...
does anybody know how to disable parts of the gpu or get the driver to not use them like w1zz did in his 5870 review to simulate a 5850?



cpu is the bottleneck lol
i dont think so... hwcanucks benched with an i7 at 4g or even 4.2 iirc...
and their numbers arent all that diferent from other reviews...

nascasho, only 3.8? is your cpu multiplier dropping to default multiplier under load? your probably limited by max tdp/tdc, ask evga how to disable or manipulate current feedback sensing of your cpu on that board either by bios or hardmod and you should get 4G+ :)

its usually a single small resistor that needs to be removed or pencilled to adjust the resistance and as a result your cpu will think current draw is low and wont throttle.

iandh
09-27-2009, 08:17 PM
4.2 isnt really heavily oced :D
thats kinda what all 920s max out at... heavily oced means 4.4-4.6, which IS possible with good chips on water or even good air :)
im quite surprised it needs that much cpu oomph...

especially at 1920x1080.... strange...

Yeah, I just meant heavily OC'ed for 24/7 on air, he does have a 965 though. I'm more used to pre-D0 OC's too

I was pretty surprised as well with the results



He just posted a bench of downclocked i7, avg stayed about the same, but min fps took a BIG hit... worse than my 720BE now


i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46

hurleybird
09-27-2009, 10:34 PM
I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

:shakes::shrug:

Not what I was saying. What I was saying is that one thread is assigned to each group of five ALUs, unlike Nvidia's architecture where each ALU gets it's own thread. Because of this RV870 can only run a maximum of 1600 / 5 = 320 threads, which is way below the 1024-thread limit of DirectCompute11.

drunkenmaster
09-28-2009, 12:57 AM
Yeah, I just meant heavily OC'ed for 24/7 on air, he does have a 965 though. I'm more used to pre-D0 OC's too

I was pretty surprised as well with the results



He just posted a bench of downclocked i7, avg stayed about the same, but min fps took a BIG hit... worse than my 720BE now


i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46

Errm, as his numbers show, the minimum is the cpu limited situation, the max is gpu limited, not the other way around his numbers prove that.

The phenom with 4 cores should give almost identical results in most games to a i7, 99% of games aren't cpu limited, the biggest difference between your results are the max not the minimum, meaning theres just something very different with his rig to yours, forcing AF, aa, or something, different drivers, something.

You can't remotely investigate cpu limits on two different rigs set up differently by two people.

As he's shown by dropping cpu speed, thats how you test cpu limits, identical system and setup, with a diff cpu speed. Which shows completely reversed results and conclusions compared to your initial "testing".


Either way, benchmarks which aren't a great reflection of in game experience, aren't very useful for testing the real limits behind a card. Most benchmarks are designed as just that, not necessarily to give you an impression of performance you'll recieve in game, but something for sites to use for years to come.

Most other games, and most in game testing should have minimum fps as gpu limited numbers, but the Crysis benchmark benches cpu just as much as gpu and the lowest numbers are normally the very heaviest physics sections.

But while the minimum has changed drastically, the average results showed what a 0.2fps difference, meaning that "minimum fps" result is rare enough that it is barely in the benchmark. The average result shows for that benchmark at least, that the 1Ghz difference in CPU speed, made a completely un noticeable difference in average fps/max fps. It killed the min fps, but probably had that min fps up for around a quarter of a second anyway, so pretty much irrelevant.

If you can draw any conclusions, its that your rig has a major problem thats severely effecting performance, the crysis benchmark isn't remotely cpu limited by a 3.2Ghz i7 and moving to a 4.2Ghz i7 makes no noticeable difference and, theres no point testing further till you sort out your rig. Frankly, you should be getting very similar average fps with the same card.

astrallite
09-28-2009, 02:11 AM
http://img9.imageshack.us/img9/8168/5870overclocking.jpg

Definitely diminishing returns on the memory, if all you do is pump the memory the gains are negligible. But core & mem incremental increases seem to have a real benefit. I ran each benchmark twice (each is 3 loops long) and results never different more than a 0.2 fps (which makes the final result even more meaningless).

I was gonna simply have the image display but it was too damn big. Link instead.

Summary:

Crysis VH @ 1080p

850 core /1200 memory 37.2fps

890 core /1220 memory 38.34fps

900 core / 1230 memory 39.06fps

900 core / 1300 memory 39.33fps

astrallite
09-28-2009, 02:22 AM
With Crysis the minimum typically happens in area to area, or area to cutscene transitions, and definitely a faster CPU can help in these instances. Crysis has a long view distance, but most far objects are not rendered. When you quickly move into a new area, there's a sudden strain on the CPU to process a ton of new objects. I definitely found my min. framerates go up a bit moving from 3.2ghz to 3.6ghz on an i7.

Check out PCGH's GTX280 CPU scaling review. On Crysis, minimum framerates scale linearly with cpu clock speed, from 2.4ghz to 3.6ghz.

purecain
09-28-2009, 02:41 AM
just out of intrest i'll run this bench for you...

i can see i'm gonna have to put some effort into oc'ing the cpu above 4.1 for this...

i'll be back in england and in front of my pc by tommorow evening hopefully... expect an update then...

and thanks for an interesting thread...:up:

zalbard
09-28-2009, 03:56 AM
Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/showthread.php?t=235181, thanks!

zalbard
09-28-2009, 03:57 AM
I wish I could break the 3.8Ghz barrier to release some more potential outta these guys. Damn C0's.
What's your vCore for 3,8?

Ashraf
09-28-2009, 03:59 AM
Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/showthread.php?t=235181, thanks!

Done.

FragMagnet
09-28-2009, 04:44 AM
http://img9.imageshack.us/img9/8168/5870overclocking.jpg

Summary:

Crysis VH @ 1080p

850 core /1200 memory 37.2fps

890 core /1220 memory 38.34fps

900 core / 1230 memory 39.06fps

900 core / 1300 memory 39.33fps

Interesting, so 5-6% gain from a slight overclock ! Given the game you used, seems impressive to me ?

jaredpace
09-28-2009, 04:57 AM
http://www.extrahardware.cz/files/images/clanky/2009/09zari/rv870_oc_aa/graphs_tables/gpu_scaling.png
http://www.extrahardware.cz/files/images/clanky/2009/09zari/rv870_oc_aa/graphs_tables/mem_scaling.png

http://www.extrahardware.cz/pretaktovani-anti-aliasing-radeonu-hd-5870

Mr. K6
09-28-2009, 06:19 AM
http://www.extrahardware.cz/pretaktovani-anti-aliasing-radeonu-hd-5870It seems like the memory system complements the GPU well - neither appears to be a complete bottleneck at the speeds tested.

jaredpace
09-28-2009, 06:26 AM
When they increased the memory by 100mhz they got the same boost as when they increased the core by 65mhz.

JimmyH
09-28-2009, 07:19 AM
This could mean nothing but I tested with 4870 1gb in COD4 at 750/530 (same compute resource - bandwidth ratio as 5870) and at 750/900 to see the impact of memory bandwidth. Settings: 1440x900, All options max, 4xAA, AF max

750core 530mem
http://farm3.static.flickr.com/2549/3962966726_debd963579_b.jpg

750core 900mem
http://farm3.static.flickr.com/2662/3962966884_872c401421_b.jpg

crysis being much more shader intensive could mean memory overclocks having less effect on overall performance.

demonkevy666
09-28-2009, 07:27 AM
Not what I was saying. What I was saying is that one thread is assigned to each group of five ALUs, unlike Nvidia's architecture where each ALU gets it's own thread. Because of this RV870 can only run a maximum of 1600 / 5 = 320 threads, which is way below the 1024-thread limit of DirectCompute11.

so your saying the bottleneck is the thread dispatch and it's only being used at about 31.25% if it where redesigned to use all 1024 dispatches threads at once and not have those 5 alu's grouped. 5 alu is one SIMD. changing this to be all seprate alu shouldn't be too hard, the alu's them self are quite small already.

it's seem to me it's more like what ever is easy the programs will go for shorter times to code things.
easy isn't the best possible way to do things.

jaredpace
09-28-2009, 08:17 AM
750core 530mem
http://farm3.static.flickr.com/2549/3962966726_debd963579_b.jpg

750core 900mem
http://farm3.static.flickr.com/2662/3962966884_872c401421_b.jpg


Thanks, nice work, JimmyH. Don't suppose you could run that again on your HD4870 at 750/574, 750/618 & 750/662. Those speeds represent the compute/bandwidth ratios of 850/1300, 850/1400, and 850/1500 on the HD5870's memory. Thank you :up:

Mr. K6
09-28-2009, 08:59 AM
When they increased the memory by 100mhz they got the same boost as when they increased the core by 65mhz.You have to consider the values as percents, not the raw MHz. The 65MHz increase on the core is ~7.6% while the 100MHz increase on the RAM is ~8.3% - both very close, and the increase in FPS reflects this. Because increasing either the core or the RAM frequency nets noticeable performance gains, it appears that neither is bottlenecking the other (at the frequencies tested in the game tested). I'd imagine that in older games, the RAM will become more of a bottleneck to the FPS rate than would the core frequency.

jaredpace
09-28-2009, 09:15 AM
Okay Mr. K6 - good point. They've basically raised the core & memory 8%. They get the same increase in framerates from either +8% core or +8% memory. To me this means that the 5870 could use all the more core or memory speed you could give it.

This card is going to give very good results to anyone who overclocks the :banana::banana::banana::banana: out it.

Mr. K6
09-28-2009, 09:36 AM
Okay Mr. K6 - good point. They've basically raised the core & memory 8%. They get the same increase in framerates from either +8% core or +8% memory. To me this means that the 5870 could use all the more core or memory speed you could give it.

This card is going to give very good results to anyone who overclocks the :banana::banana::banana::banana: out it.Oh most definitely :D. I'm very much anticipating a stable release of Rivatuner so I can get to work on the voltages :). As it stands, AMD GPU Clock Tool isn't working with my card (drivers?), so I currently can't go above the limits of CCC (unless I'm using the program incorrectly). Anyway, with a little voltage, it seems these cores will easily do 1GHz+ on the GPU. I'd be interested to see not only the performance gains from this speed, but also if a memory bottleneck finally shows itself :eek:

Lightman
09-28-2009, 09:38 AM
I'm benching my unlocked 720BE against a guy with that has a 5870 and i7 over at OCN... results are less than exciting (for me at least)

I just ran Crysis Warhead bench 0.33 (avalanche flythrough)


1920 0xAA

i7 @ 4.2Ghz Min: 29.55 Max: 46.81 Avg: 37.04

Phenom II x4 @ 3.6Ghz Min: 25.12 Max: 36.73 Avg: 30.99


1920 8xAA

i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

Phenom II x4 @ 3.6Ghz Min: 18.24 Max: 29.33 Avg: 23.83



note: Unlock from X3 to X4 gave me a slight avg fps increase, and almost 2fps increase in min fps.

It appears since the min FPS numbers are all very close, they are revealing spots where the bench is GPU limited, and the Max FPS numbers are quite different, showing spots where the bench is CPU limited

Anyone who owns a 5870 and wants to add benches into the mix is welcome... I'll run the same bench if possible on my system and we can see what can be worked out about this. OBVIOUSLY a 3.6Ghz Phenom II is no match for a 4.2Ghz i7, but I would have though that the results would have been much more GPU limited in this bench. :confused:

OK here is mine on:
PhII @3750MHz/2500MHz/1000MHz (core/nb/mem)
HD5870 default
DX10, Enth., 64bit on Vista x64


DirectX 10 ENTHUSIAST 3X @ Map: avalanche @ 0 1920 x 1200 AA 0x
==> Framerate [ Min: 15.05 Max: 46.67 Avg: 32.30 ]

iandh
09-28-2009, 03:15 PM
Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/showthread.php?t=235181, thanks!

This thread that we are now in is about the 5870's internal memory bottlenecking the card's performance.

The thread I purposely made seperate was to investigate CPU bottlenecking of the 5870 as a whole.



Done.

At your discretion of course as a moderator, but I think maybe these threads should be un-merged. They were for two totally different topics and merging them will most likely prevent the original goal of this thread's OP from being reached.

edit: Ashraf edited thread title to "5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)" to reflect two different topics being discussed within thread.

Thanks!

Chumbucket843
09-28-2009, 03:27 PM
i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.


_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s

L1 to L2 bandwidth 384GB/s 435GB/s

purecain
09-28-2009, 03:50 PM
please unmerge the thread... we cant have a coherant discussion if everythread about the 5870 gets merged... meh....

iandh
09-28-2009, 04:56 PM
please unmerge the thread... we cant have a coherant discussion if everythread about the 5870 gets merged... meh....

Yeah, it's pretty screwed now as far as coherency goes. I wish the person who reported the threads had bothered to read them before clicking submit... just because two threads have a similar title doesn't mean they are discussing the same exact subject. I can understand Ashraf's mistake because he probably just was going down the job queue and saw the thread titles, then hit "merge threads".

I asked Ashraf and he said that there isn't really an "unmerge" option, so I basically I'd have to start a new empty thread, and then delete all the posts from my old thread that are now in this thread.:shakes:

hurleybird
09-28-2009, 08:58 PM
Got my 5870 today, but I can't seem to get the GPUclock utility to work. I'm using the RC7 drivers leaked from MSI, maybe that's the problem...

saaya
09-29-2009, 12:08 AM
i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46
those numbers make no sense, idential max fps, but lower min fps... yet he still gets the same av fps? 0_o

this only makes sense if the min fps is REALLY short, like it drops to 15fps once and thats it...

Astennu
09-29-2009, 01:53 AM
I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

:shakes::shrug:

Its depents on the game and the driver you can see it like this:

http://images.anandtech.com/reviews/video/ATI/4800/ilp.png

So if instructions can be grouped together you can have quite a performance boost. Down side is if you cant group instructions you only have 1/4e or 1/5e of the performance.

I dont know if its true but i have heard the AMD compiler does a quite good job. Grouping up to 3-4 of them most of the time.

But if you have a heavy nVidia optimised game you can have lower value's and bad shader performance. (you might it 1-3 then)


Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.

The RV870 core is still new. And i think AMD could optimize the scheduling of the threads a bit better so you can keep those 1600 alu's fed with data. It would be nice if there was a way to see the load on those shader units. And compare it to the load on RV770 and RV790 cores. Dont forget those are well optimized in the last years driver releases.

JimmyH
09-29-2009, 05:09 AM
Thanks, nice work, JimmyH. Don't suppose you could run that again on your HD4870 at 750/574, 750/618 & 750/662. Those speeds represent the compute/bandwidth ratios of 850/1300, 850/1400, and 850/1500 on the HD5870's memory. Thank you :up:

OK heres more:

750/575
http://farm3.static.flickr.com/2502/3965277121_4da2681904_b.jpg

750/618
http://farm4.static.flickr.com/3434/3965277499_5197c11f34_b.jpg

750/663
http://farm3.static.flickr.com/2578/3965277839_f610d40b55_b.jpg

800/530
http://farm4.static.flickr.com/3432/3966052492_af597305c5_b.jpg

800/575
http://farm3.static.flickr.com/2629/3965278571_77481b9ef8_b.jpg

835/575
http://farm3.static.flickr.com/2487/3965279181_642cf224a3_b.jpg

The framerate does fluctuate a bit when standing still so I am taking screens at the minimum.

Mr. K6
09-29-2009, 05:13 AM
Interesting read: http://firingsquad.com/hardware/ati_radeon_5870_overclocking/default.asp

CrimInalA
09-29-2009, 05:31 AM
Interesting read: http://firingsquad.com/hardware/ati_radeon_5870_overclocking/default.asp

Very interesting indeed . It seems that coreclocks give the most gains in the games they tested .

I actually can't wait to see what these cards can do when more voltage is given . And I also want to see what the little brother 5850 can pull out of it's hat in terms of overclocking :)

JimmyH
09-29-2009, 05:35 AM
Interesting read: http://firingsquad.com/hardware/ati_radeon_5870_overclocking/default.asp

Thanks for the link. Doesn't look like the 5870 is that bandwidth limited here. But it seems bottlenecked by something else. Most games return less gains than the increase on the core and memory. Poor drivers? Can't help thinking that the impressive power consumption of this card is actually could be the shaders sitting around doing nothing.

Anandtech power test with occt shows the 5870 actually uses alot more power than the other reviews suggest when loaded with a highly optimized application.

http://www.anandtech.com/video/showdoc.aspx?i=3643&p=26

jaredpace
09-29-2009, 07:13 AM
based on jimmyh's cod4 benchmark:
(compute power to bandwidth ratio)
memory:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ----- 100 --- 0%
750 / 575 --- 850 / 1300 ----- 104 --- 4%
750 / 618 --- 850 / 1400 ----- 107 --- 2.9%
750 / 663 --- 850 / 1500 ------ 112 --- 4.6%
750 / 900 --- 850 / 2040 ------ 121 --- n/a
core:
4870 speeds / 5870 equivalent / FPS / % increase
750 / 530 --- 850 / 1200 ------ 100 --- 0%
800 / 530 --- 906 / 1200 ------ 103 --- 3%
750 / 575 --- 850 / 1300 ----- 104 --- 0%
800 / 575 --- 906 / 1300 ----- 107 --- 2.9%
835 / 575 --- 946 / 1300 ----- 110 --- 2.8%
http://www.xtremesystems.org/forums/showpost.php?p=4038557&postcount=88

Lightman's 3d06 benchmark:
core / mem / FPS / + fps / + %
850 / 3600 / 091.7 / 0 / 0%
850 / 4000 / 095.2 / +3.5 +3.8%
850 / 4400 / 097.9 / +2.7 +2.9%
850 / 4800 / 100.3 / +2.5 +2.5%
850 / 5200 / 102.4 / +2.1 +2.1%
http://www.xtremesystems.org/forums/showpost.php?p=4032821&postcount=23

Extrahardware.CZ Crysis 1920 × 1200, 4× AA
core / mem / fps / +fps / +%
memory:
850 / 4400 - 40,9 - 0 / 0%
850 / 4800 - 42,0 - +1.1 +2.7%
850 / 5200 - 43,1 - +1.1 +2.6%
core:
785 / 4800 - 40,1 - 0 / 0%
850 / 4800 - 42,0 - +1.9 +4.7%
915 / 4800 - 43,2 - +1.2 +2.9%
core and memory:
785 / 4400 - 39,3 - 0 / 0%
850 / 4800 - 42,0 - +2.7 +6.9%
900 / 5200 - 44,7 - +2.7 +6.4%
http://www.extrahardware.cz/pretaktovani-anti-aliasing-radeonu-hd-5870

Firingsquad Crysis 1920 × 1200, 2× AA
core / mem / fps / +fps / +% (from stock)
850 / 4800 - 31.6 - 0 / 0%
850 / 5272 - 32.3 - +0.6 +2.2%
850 / 4800 - 31.6 - 0 / 0%
930 / 4800 - 33.1 - +1.5 +4.7%
850 / 4800 - 31.6 - 0 / 0%
930 / 5400 - 34.3 - +2.7 +8.5%
http://firingsquad.com/hardware/ati_radeon_5870_overclocking/page5.asp

Meh. You can get gains from overclocking core, or memory, or both together. You get higher gains (usually) from overclocking the core. Only in the game Batman did firingsquad get higher performance from overclocking memory vs. core. Conclusion: overclock core & memory as much as possible. Memory bandwidth isn't the bottleneck, otherwise we would have relatively no gain from core overclocking...:confused:?

Can't help thinking that the impressive power consumption of this card is actually could be the shaders sitting around doing nothing.
Yeah seems like it... jeez.

JimmyH
09-29-2009, 07:45 AM
Meh. You can get gains from overclocking core, or memory, or both together. You get higher gains (usually) from overclocking the core. Only in the game Batman did firingsquad get higher performance from overclocking memory vs. core. Conclusion: overclock core & memory as much as possible. Memory bandwidth isn't the bottleneck, otherwise we would have relatively no gain from core overclocking...:confused:?

Yeah seems like it... jeez.

I wouldn't say it isn't memory bandwidth bottlenecked just because core overclock works. Memory bandwidth bottleneck isn't a hard capped bottleneck like the gpu core. From my experience overclocking 4870's memory from stock by over 10% returns less than 1% fps gain. So in comparison yes the 5870 is relatively bandwidth starved compared to 4870/ 4890. In batman it highlights memory's importance. core 9% + memory 12% increase fps by 10.5%

jaredpace
09-29-2009, 08:06 AM
Jimmy - what situation maxes out memory bandwidth? Highest texture quality + quality AA filtering? I'm not sure if this maximizes the need for bandwidth. I know it loads up a greater amount of video memory potentially maximizing capacity, but how do you go about maxing out bandwidth? Is there a special test?

So you DO think it has a memory bottleneck? IMO it could definitely use faster memory. IDC @ anandtech says, "f you can overclock the GPU cores and see a performance improvement that exceeds that which comes from increasing the memory clocks then that is about as close to proof you are going to get that your compute system is not memory bandwidth constrained. "

iandh
09-29-2009, 08:33 AM
those numbers make no sense, idential max fps, but lower min fps... yet he still gets the same av fps? 0_o

this only makes sense if the min fps is REALLY short, like it drops to 15fps once and thats it...

Yeah that's what I thought too, he got lower min fps than my 720be at stock, must be a fluke or something

JimmyH
09-29-2009, 08:47 AM
Jimmy - what situation maxes out memory bandwidth? Highest texture quality + quality AA filtering? I'm not sure if this maximizes the need for bandwidth. I know it loads up a greater amount of video memory potentially maximizing capacity, but how do you go about maxing out bandwidth? Is there a special test?

So you DO think it has a memory bottleneck? IMO it could definitely use faster memory. IDC @ anandtech says, "f you can overclock the GPU cores and see a performance improvement that exceeds that which comes from increasing the memory clocks then that is about as close to proof you are going to get that your compute system is not memory bandwidth constrained. "

I don't know. Maybe look for games benchmarks where 4870 outperformed 4850 by significantly more than 20%. 16xAF could likely be one scenario. Colorfill test could show up limitations here too: http://www.techreport.com/articles.x/17618/6

Well the gpu core is doing the actual rendering work so increasing that usually gains more unless you are really badly bottlenecked by slow memory.

Bojamijams
09-29-2009, 09:36 AM
Has it been confirmed that all cards can have their voltages adjusted if flashed with the ASUS bios?

saaya
09-29-2009, 09:43 AM
http://www.extrahardware.cz/files/images/clanky/2009/09zari/rv870_oc_aa/graphs_tables/gpu_scaling.png
http://www.extrahardware.cz/files/images/clanky/2009/09zari/rv870_oc_aa/graphs_tables/mem_scaling.png

http://www.extrahardware.cz/pretaktovani-anti-aliasing-radeonu-hd-5870
what cpu speed did they test with?
might stop scaling cause cpu is limiting?

thx JimmyH for the 4870 results! :toast:
so reducing 4870s bandwidth by 88% to the same bw/compute ratio as a 5870 results in a mere 20% performance drop. sounds like yet another hint that 5870 is NOT held back a lot by memory bandwidth...


so your saying the bottleneck is the thread dispatch and it's only being used at about 31.25% if it where redesigned to use all 1024 dispatches threads at once and not have those 5 alu's grouped. 5 alu is one SIMD. changing this to be all seprate alu shouldn't be too hard, the alu's them self are quite small already.

it's seem to me it's more like what ever is easy the programs will go for shorter times to code things.
easy isn't the best possible way to do things.
i think hes saying the thread count is NOT a limitation cause even if all parts of the gpu are fully loaded its only using 30% of the max possible threads the dispatch processor can coordinate. and it can handle that many threads cause in xfire one dispatch processor apparently runs as master and oversees the threads running on all gpus in the system, hence the hint that in quad gpu configs the thread dispatch MIGHT limit.


i dont see any different scaling in benchmarks. the 4870 was 2x faster than rv670 and it had 2.5 times more shaders. i would not expect doubling the shaders to double effective performance. the problem might be the fact that there really isnt much to compare it to. might be l1 to l2 cache bandwidth.


_____________________RV770______RV870
Texture units ________40. ______80.
L1 cache bandwidth 480GB/s 1,000GB/s

L1 to L2 bandwidth 384GB/s 435GB/s
3870 to 4870 was a 150% shader unit boost that resulted in a 100% performance boost. this time we have a 100% boost of not only shader units but tmus and rops too! yet the perf boost is only 40% or even less in some cases... that would be as if 4870 would only have been 60% faster with a 150% logic boost instead of 100% faster. theres def something limiting...

l1 to l2 cache bw... interesting!
was looking for 770 figures but couldnt find any...
l1 to l2 barely increased at all... but then again, doesnt each 5way processor or alu or whatever you wanna call it its own L1? and each group of those shares the l2 right? the grouping hasnt changed, so then l1 to l2 bandwidth actually shouldnt matter and could have remained the same...

maybe it actually is if you normalize those numbers clockspeed wise for 770 and 870?

Chumbucket843
09-29-2009, 01:53 PM
So if instructions can be grouped together you can have quite a performance boost. Down side is if you cant group instructions you only have 1/4e or 1/5e of the performance.

I dont know if its true but i have heard the AMD compiler does a quite good job. Grouping up to 3-4 of them most of the time.

But if you have a heavy nVidia optimised game you can have lower value's and bad shader performance. (you might it 1-3 then)

it depends on the what it is doing .you can see here Ati is the intel of synthetic benchmarks.http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/10. the wider you make a vector, the harder it is to keep under full load.




Then about the performance of the HD5870. It does not seem to me memory bandwith limited. But i also think there is more performance inside this core then we see now. It might be driver related. It wont surprise me if we get up to 20% higher performance in the future.
on average one flop takes 1 byte per second of memory performance. this translates to 2 terabytes per second of required bandwidth for rv870 so every gpu made is bottlenecked from this. the only way is to further reduce the memory operation to calculation ratio. its already 100:1 but it must go higher.

Robin BP
09-29-2009, 09:17 PM
As the 5850 is out i think its clear that something is holding the 5870 back, but what ?


Conclusion

When you take the Cypress based Radeon HD 5870 and cut out 2 SIMDs and 15% of the clock speed to make a Radeon HD 5850, on paper you have a card 23% slower. In practice, that difference is only between 10% and 15% depending on the resolution. What’s not a theory is AMD’s pricing: they may have cut off 15% of the performance to make the 5850, but they have also cut the price by well more than 15%; 31% to be precise.

http://www.anandtech.com/video/showdoc.aspx?i=3650&p=14

Chickenfeed
09-29-2009, 11:13 PM
I don't expect to have my card until Friday at the earliest ( figures the Asus are in stock now but I'll wait for the XFX... ) but if anyone has a request of a game / test run that they feel might serve as a good comparison let me know and I'll try to set it up now ( I have the majority of the bigger games over the last 2 years so just hit me up with a PM and If it is something that is easy enough to compare, I'll do some clock scaling ) I want to do it at 1920x1200 with 8x Adaptive AA and also with no AA so there is that contrast ( obviously 8x MSAA uses more bandwidth / vram ) I don't see the point in testing at lower resolutions personally.

From what I've see so far here and elsewhere though clocks don't seem to scale any more than past designs so it is tough to say what is holding it back. I'm still a tad disappointed that the 4870x2 pulls ahead in as many things as it does. Multigpu has gotten better yes really...

saaya
09-30-2009, 12:37 AM
it depends on the what it is doing .you can see here Ati is the intel of synthetic benchmarks.http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/10. the wider you make a vector, the harder it is to keep under full load.

on average one flop takes 1 byte per second of memory performance. this translates to 2 terabytes per second of required bandwidth for rv870 so every gpu made is bottlenecked from this. the only way is to further reduce the memory operation to calculation ratio. its already 100:1 but it must go higher.thats too much theory there if you ask me...
this bw/flop number is a guideline... just think about it for a second, according to that guideline applied as a RULE, every gpu with the same flop performance would require the same memory bandwidth... which is clearly not true... as a guideline it may work very well, but even then it depends so much on what your actually doing and how much or how little data repetition your facing and how well you can group instructions that use the same data, and how many instructions use data that a previous instruction has just created inside the gpu, in a register or in cache, so it never needs to load any data from memory etc etc etc... ;)

as a rough guideline it works, but its not a rule :)

the anandtech review is very interesting...
so basically a 5850 with 2 less simd units and 15% lower clocks performs 10-15% worse than a 5870...
that means a 5850 at 5870 speeds will most likely perform so similar to it that you might not be able to tell the diference...
maybe there will be fake 5870s in china which are actually 5850s :lol:

Kuntz
09-30-2009, 09:04 PM
I do not think the 5870 is bandwidth limited at all. There are many examples out there, and in this very thread, showing that 10%-12% increases in memory bandwidth is not increasing frame rates that much (1.5% - 3%). This small increase in frame rate is due to the lowering of the memory timings and not the increase in memory bandwidth. The same phenomenon happens when overclocking system memory. This benchmark (http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page7.asp) shows that 10% increases in core speed is translating to a 5.5% increase in frame rate, where as the same 10% increase in memory bandwidth only increases frame rates by 1.5%.

You guys are getting marginal increases in frame rates when bumping up the memory speeds from the decrease in GDDR5 memory timings, and you're confusing that with a memory bottleneck.

JimmyH
10-01-2009, 09:07 AM
5870 AF performance:

http://www.bit-tech.net/hardware/graphics/2009/09/30/ati-radeon-hd-5870-architecture-analysis/12

Looks like there huge performance drop enabling 16xAF.

Chickenfeed
10-01-2009, 09:27 AM
Interesting, so currently ATI have better anti aliasing scaling but still have inferior AF scaling. I expect GT300 to both have good AF scaling and AA performance so it shall be interesting. I just installed the card so I'll have to see what the deal is as far as AF quality ( as 5 series AF is supposed to be the highest quality or so we are told )

I hope someone makes a tool that works with the memory error correction (as in it can detect errors ) as that would make overclokng much simpler. If anyone has any clock requests ( eg at less than stock ) Please let me know and I'll do some testing over the weekend.

saaya
10-01-2009, 11:27 AM
You guys are getting marginal increases in frame rates when bumping up the memory speeds from the decrease in GDDR5 memory timings, and you're confusing that with a memory bottleneck.
your saying that performance increases minimally because the mem timings get adjusted automatically to the higher memory speeds...

but what do you mean with people confuse that with a memory bottleneck?

a memory bottleneck indication would be if performance WOULD increase when ocing the memory, not when ocing memory barely does anything at all, like in this case...

and yes, the timings get adjusted automatically on ati cards, on nv cards as well afaik... but the increase in timings is not linear, so higher mem clocks still increase the bandwidth even though timings get increased.

Chumbucket843
10-01-2009, 01:19 PM
your saying that performance increases minimally because the mem timings get adjusted automatically to the higher memory speeds...

but what do you mean with people confuse that with a memory bottleneck?

a memory bottleneck indication would be if performance WOULD increase when ocing the memory, not when ocing memory barely does anything at all, like in this case...

and yes, the timings get adjusted automatically on ati cards, on nv cards as well afaik... but the increase in timings is not linear, so higher mem clocks still increase the bandwidth even though timings get increased.

i do have an idealistic view on memory bandwidth but obviously you would have to consider some things require a lot of bandwidth. in games AA, AF (http://www.bit-tech.net/hardware/graphics/2009/09/30/ati-radeon-hd-5870-architecture-analysis/12), fillrates (http://www.bit-tech.net/hardware/graphics/2009/09/30/ati-radeon-hd-5870-architecture-analysis/14) are bandwidth intensive. in gpgpu the bottleneck will usually be bandwidth. transcoding is a good example of this. fortunately some things dont have that problem, like perlin noise which loves vectors or texture sampling. i think it caches really well too.

Kuntz
10-01-2009, 02:41 PM
your saying that performance increases minimally because the mem timings get adjusted automatically to the higher memory speeds...

No, I think the memory timings are static, they do not change with memory adjustment. So if they are at, for example, a total cycle delay of 10, that would mean:

10 cycles @ 1200MHz = 8.3ns delay
10 cycles @ 1300MHz = 7.7ns delay

Because there is no memory bottleneck on the 5870 cards, when you increase the memory speed, all you are really doing is decreasing the nanosecond delay on memory operations, which is why a 10% increase in memory speed is only getting a 1.5% increase in performance.

jaredpace
10-01-2009, 03:03 PM
No, I think the memory timings are static, they do not change with memory adjustment. So if they are at, for example, a total cycle delay of 10, that would mean:

10 cycles @ 1200MHz = 8.3ns delay
10 cycles @ 1300MHz = 7.7ns delay

Because there is no memory bottleneck on the 5870 cards, when you increase the memory speed, all you are really doing is decreasing the nanosecond delay on memory operations, which is why a 10% increase in memory speed is only getting a 1.5% increase in performance.


How would you explain all these performance increases from memory overclocking that are greater than 1.5% per 100mhz?



850 / 3600 / 091.7 / 0 / 0%
850 / 4000 / 095.2 / +3.5 +3.8%
850 / 4400 / 097.9 / +2.7 +2.9%
850 / 4800 / 100.3 / +2.5 +2.5%
850 / 5200 / 102.4 / +2.1 +2.1%

850 / 4400 - 40,9 - 0 / 0%
850 / 4800 - 42,0 - +1.1 +2.7%
850 / 5200 - 43,1 - +1.1 +2.6%

785 / 4400 - 39,3 - 0 / 0%
850 / 4800 - 42,0 - +2.7 +6.9%
900 / 5200 - 44,7 - +2.7 +6.4%

850 / 4800 - 31.6 - 0 / 0%
850 / 5272 - 32.3 - +0.6 +2.2%


And this?
http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/images/bat1920.gif
http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page10.asp

Is this because of the latency of the memory (in nanoseconds), and not due to increasing bandwidth for chip's computational throughput?

Kuntz
10-01-2009, 03:31 PM
How would you explain all these performance increases from memory overclocking that are greater than 1.5% per 100mhz?

I've already explained them, they fall under the same explanation.


http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/images/bat1920.gif
http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page10.asp

2.9% perf increase from 9.8% memory speed increase.


850 / 4800 / 100.3 / +2.5 +2.5%
850 / 5200 / 102.4 / +2.1 +2.1%

2.1% perf increase from 8.3% memory speed increase.


I'll summurize the entire article in per cent's. Each game has 3 graphs for it representing various resolutions and stuff. Each overclock was by 9.8%:

Call of Duty WaW:
-1.3%
-0.7%
-2.0%

Crysis:
-2.2%
-2.2%
-0.3%

Crysis Very High:
-2.1%
-2.2%
-1.0%

Far Cry 2:
-2.3%
-2.4%
-2.3%

Stalker Clear Sky:
-0.5%
-0.8%
-1.6%

Left 4 Dead:
-0.8%
-2.0%
-2.9%

Res Evil 5:
-1.4%
-2.0%
-2.2%

Batman:
-3.3%
-2.9%
-3.9%

HAWX:
-1.2%
-2.7%
-3.6%

The evidence is pretty clear to me, the 5870 has no memory bottleneck. There is no increase in frame rates from increasing the memory bandwidth. These marginal increases are just from the decrease in memory timings.

I am no expert, this is just my opinion & conclusion based on the evidence floating around the internet.

If ATI went with a 512-bit bus, or 384-bit like nVidia, this would have increased memory bandwidth, however memory timings would remain the same on the GDD5 IC's, thus there would be no performance increase from the increased bus width. The cost of the card would be astronomical compared to it's entry price of $379, and it would be all for a 0% performanec increase.

Chumbucket843
10-01-2009, 03:39 PM
if one of you guys has a 5870 do the same thing i did.

this is furmark at 1280x1024 8xAA 16xAF on a gtx 260 192.
all clocks but memory are the same. i want to see how the regression for the 5870 compares. i know its a little ghetto but i dont care. the memory clocks are on the bottom
http://img194.imageshack.us/img194/417/furmemaa.jpg

Chickenfeed
10-01-2009, 04:20 PM
Here you go.

Furmark 1.7
1280x1024 8x AA 60000MS
HD 5870 1GB

Mem - Score

1300 - 2733

1200 - 2537

1100 - 2360

1000 - 2187

900 - 1963

http://img194.imageshack.us/img194/9013/graphtv.jpg (http://img194.imageshack.us/i/graphtv.jpg/)



I used what CCC offers for memory range.

I noticed a weird thing though. When Overdrive is enabled the gpu 2d clocks are 157 / 300 but when it is not enabled they are 400 / 1200. This is really strange. Also I suffer from the flickering display issue on the monitor connected to the second DVI port when over drive is enabled. Its fine for benching but too annoying to be usable for games. I'm sure it will be fixed soon though.

EDIT : I discovered that the correct 2d power play clocks are in fact 157 / 300. HOWEVER due to the timing differences when using more than 1 display the lowest they go is 400/1200. In other words so much for 27watt idle with more than one display :( Realistically I don't really care that much, I'll take a second display over a few watts any day. It is still an issue however as you are unable to properly overclock when using more than 1 display as it stands. I'm glad I've found this out as no reviews I've seen mention it ( as most reviews use 1 24-30" display ) I'd be fine with the same 400/1200 clocks at idle if it meant the display wouldn't flicker.

jaredpace
10-01-2009, 05:04 PM
Nice test chickenfeed. SO the 5870 CAN use all the bandwidth you can give it. It's scaling almost linearly.

Chickenfeed
10-01-2009, 05:27 PM
Yeah seems to be the case.

I tried doing a work around for the monitor flickering issue with overdrive enabled by setting custom clock states in a profile but sadly the memory still idles at 300 regardless of the setting in the profile meaning that I can't overclock with more than 1 display. I am going to use my 20" for the time being ( my 24" has image retention so I'm going to let it take a break for a week or so )

I will do some further testing with furmark with the gpu clock as well and gpu / memory combined. I am going to do the tests at 3 resolutions to give us the full picture as well.

demowhc
10-01-2009, 05:35 PM
Interesting, so currently ATI have better anti aliasing scaling but still have inferior AF scaling.

This is true, but you have to take into account that AF quality is superior on the 5800 series aswell..

gamervivek
10-01-2009, 05:38 PM
Nice test chickenfeed. SO the 5870 CAN use all the bandwidth you can give it. It's scaling almost linearly.

or the test is simply bandwidth starved.I used to get 100 more fps by overcloking memory on my 3870 in the ati tool window,made no difference in games.

jaredpace
10-01-2009, 05:43 PM
or the test is simply bandwidth starved.I used to get 100 more fps by overcloking memory on my 3870 in the ati tool window,made no difference in games.

yeah good point. all we're really looking for is fps increase in games. firing squad already proved that for us.

lutjens
10-01-2009, 05:46 PM
What may also be possible is that the card may be intentionally detuned or purposely not optimized. Given that ATI has played it's hand much earlier than NVidia, it wouldn't surprise me if ATI left some performance potential in reserve to counter NVidia. This would give Nvidia the illusion that the 5870 is weaker than it is.

Just a thought...:)

Chickenfeed
10-01-2009, 06:04 PM
I did the same test again but this time I left memory stock and changed the core speed from 600-900 in 50Mhz increments.

Furmark 1.7
1280x1024 8x AA 60000MS
HD 5870 1GB

Core - Score

900 - 2555

850 - 2551

800 - 2539

750 - 2531

700 - 2527

650 - 2524

600 - 2115

http://img9.imageshack.us/img9/5492/grapho.jpg (http://img9.imageshack.us/i/grapho.jpg/)

What is interesting is that from 650-900 performance barely changes. However at 600 it takes a considerable dive.

Because this is a relatively low resolution I am going to redo do both these tests also at 1680x1050 and 1920x1200 over the weekend. Perhaps there will be more gains on core increase at those resolutions but if 1280x1024 was any indication, I'll assume Furmark likes memory bandwidth over anything.

If I get really bored I might do Vantage ( although it would take a full day to properly do :ROTF: ; I'll probaley just run the GPU tests on Extreme and only look at the GPU score and avgs ) I am trying to pick out some games that are easily benched to do similar tests as well. I'll most likely do Far Cry 2 at the very least as its built in bench tool makes life oh so easy and it is also a recent and stressful enough game to actually show any scaling.

By the way I made the graphs at : http://nces.ed.gov/nceskids/createAgraph/default.aspx. Google searches FTW and LOL at kids zone.

demowhc
10-01-2009, 06:15 PM
the 5870 is most definitly bandwidth starved imo, I just dont see the (approx) 153GB/s (?) being enough..

I suspect the refresh, 5890?, will have much faster mem and a slight bump in core speed with a huge performance gain.

I think ATi may have slightly gimped the 5870 on purpose to get more life out of the 5800 series with future revisions..

Chumbucket843
10-01-2009, 06:27 PM
i went with texture filtering on high because ATi said that the 5870 is bandwidth limited 50% of the time.

ROUND 2: crysis demo benchmark
0x AA 0x AF
all settings on very high. gtx 260
http://img39.imageshack.us/img39/6995/crysisbecnh.jpg
framerates were MUCH smoother with memory at full speed, something you have to see to believe. a 100% increase in bandwidth gained 30% speed and much smoother fps. this should be interesting.

Kuntz
10-01-2009, 06:50 PM
i went with texture filtering on high because ATi said that the 5870 is bandwidth limited 50% of the time.

ROUND 2: crysis demo benchmark
0x AA 0x AF
all settings on very high.
http://img39.imageshack.us/img39/6995/crysisbecnh.jpg
framerates were MUCH smoother with memory at full speed, something you have to see to believe. a 100% increase in bandwidth gained 30% speed and much smoother fps. this should be interesting.

You increased memory speed by 15% (1000 -> 1150) and frame rate only went up 3.4% (29 -> 30). That is not from the increased bandwidth, that is from the decrease in memory latency.

Chumbucket843
10-01-2009, 06:59 PM
You increased memory speed by 15% (1000 -> 1500) and frame rate only went up 3.4% (29 -> 30). That is not from the increased bandwidth, that is from the decrease in memory latency.

went from 550mhz to 1150 mhz. i wish my memory went to 1500mhz:rolleyes:. the frame rate went from 23.6 to 29.7. gpu's are designed to hide latency. my card can handle 24,576 threads so any time spent accessing memory is covered up by working on other threads. this wasnt meant to be memory intensive at all. nobody buys a high end card and not use any texture filtering. it might be memory size too. double the shaders, double the memory needed for all of those alu's. its kind of like a 4870 512mb.

Astennu
10-02-2009, 12:11 AM
I agree with kuntz. Furmark is a nice tool but its not a game test. You dont know what it does and why the memory scales well there. If a card really is starved and you overclock the memory 10% you would see a 9-10% boost in performance. In games thats not the case. It seems to be the case in furmark but like i said its not a game test. That kind of load will not happen in games. Its to fully stress the VGA card to see if its stable. The load si way higher then any game on the market. So yes there are situations where there is not enough bandwith vor the RV870 core. But thats not the reason why is current Game performance is a bit to low for our liking.


went from 550mhz to 1150 mhz. i wish my memory went to 1500mhz:rolleyes:. the frame rate went from 23.6 to 29.7. gpu's are designed to hide latency. my card can handle 24,576 threads so any time spent accessing memory is covered up by working on other threads. this wasnt meant to be memory intensive at all. nobody buys a high end card and not use any texture filtering. it might be memory size too. double the shaders, double the memory needed for all of those alu's. its kind of like a 4870 512mb.

You could be right about the memory. The HD4870 gained a lot but the HD4850 did not gain much by going from 512 to 1024mb. I have also seen it in the past with the Radeon 8500. the 128 mb version was a lot faster then the 64 mb version while the clocks where the same.

It could be the same problem here but personally i dont think that is the case. If its the case we could see 10-25% performance gains from the 2 GB version. We will know it when it comes out in 3-4 weeks.

saaya
10-02-2009, 01:02 AM
No, I think the memory timings are static, they do not change with memory adjustment. So if they are at, for example, a total cycle delay of 10, that would mean:

10 cycles @ 1200MHz = 8.3ns delay
10 cycles @ 1300MHz = 7.7ns delay

Because there is no memory bottleneck on the 5870 cards, when you increase the memory speed, all you are really doing is decreasing the nanosecond delay on memory operations, which is why a 10% increase in memory speed is only getting a 1.5% increase in performance.
rv600 and rv670 and rv770 all adjusted the timings when you overclocked the memory... id be surprised if rv870 doesnt do it anymore...

a drop from 8.3 to 7.7ns is a latency reduction of 7% though, then how come we still only get a 1.5% boost? and memory timings tend to not really matter on vgas, they need bw bw bw... they usually have cas15 or so...

i dont think the boost we see is only from reduced memory latency, as a matter of fact, i dont think memory latency decreases a lot, since r600 all gpus use a formula to calculate memory timings based on memory clock. the latency still decreases, and bw still increases, otherwise we wouldnt see any gains at all, but its less than we would see from a static timing config with increased memory speed.

you are probably thinking of system memory performing better with lower timings, for vgas that doesnt apply... they do score better with lower timings by the boost is tiny compared to cpu system memory...

nice tests chumbucket and chickenfield (sp?) :D
what it shows is that in furmark the 5870 def is bw limited, but that doesnt seem to be the case in actual games...
furmark is 99% shader heavy load only isnt it?

Astennu
10-02-2009, 01:27 AM
Furmark has a strange way of stressing the cards to there max. You can overheat a HD4870 or HD4890 at ease even without renaming the .exe file A twin turbo could not cool a HD4890 VRM's got to hot. Stock cooler does a better job there bot you do have a lot more noise. The power draw is also way higher. So its deffinatly doing something. I also think furmark is able to keep all the 800 (RV770 and RV790) and 1600 shaders (RV870) fully loaded where games can not.

saaya
10-02-2009, 02:55 AM
i dont think its that all the shader cores are loaded, games can def do that, what furmark does is load the cores completely, all 5 parts of it, and occt even feeds the rest of the logic with work and keeps texture and geometry setup busy with bogus work i think... its kinda like linx that bombards cpus with random instructions without any dependencies so every single unit is fully loaded. i think... :D

Astennu
10-02-2009, 04:11 AM
i dont think its that all the shader cores are loaded, games can def do that, what furmark does is load the cores completely, all 5 parts of it, and occt even feeds the rest of the logic with work and keeps texture and geometry setup busy with bogus work i think... its kinda like linx that bombards cpus with random instructions without any dependencies so every single unit is fully loaded. i think... :D

Indeed thats what i tried to say :P Games mostly load 2-4 out of 5 alu's. And sometimes they get up to 5.

largon
10-02-2009, 07:19 AM
Furmark 1.7
1280x1024 8x AA 60000MS
HD 5870 1GB

Mem - Score
1300 - 2733
1200 - 2537
1100 - 2360
1000 - 2187
900 - 1963

http://img194.imageshack.us/img194/9013/graphtv.jpg (http://img194.imageshack.us/i/graphtv.jpg/)My HD4890 at SXGA, 8xMSAA scored 1694 at core 850MHz, memory 900MHz and 1859 at memory 1000MHz. So HD5870 at same clocks is whopping 16-17% faster. And that, ladies and gentlemen, is nothing short of messed up considering Cypress' overwhelming HW level superiority and the fact Furmark is very, very shader dependant...

Miwo
10-02-2009, 07:33 AM
Any conspiracy theories yet?

Witholding performance via driver for GT300 launch? lol...

gamervivek
10-02-2009, 07:42 AM
My HD4890 at SXGA, 8xMSAA scored 1694 at core 850MHz, memory 900MHz and 1859 at memory 1000MHz. So HD5870 at same clocks is whopping 16-17% faster. And that, ladies and gentlemen, is nothing short of messed up considering Cypress' overwhelming HW level superiority and the fact Furmark is very, very shader dependant...

doesn't furmark have to be updated for better loading of newer cards?

wez
10-02-2009, 07:47 AM
Any conspiracy theories yet?

Witholding performance via driver for GT300 launch? lol...

Well we haven't seen what the x2 can do yet. It's always possible that they've figured out an more efficient way for the GPU's to work over the traditional cfx..

wild speculations :)

largon
10-02-2009, 07:48 AM
gamervivek,
Of course not. It's just an OpenGL benchmark app. It's not like games and all 3D aps need to be patched to "support" new HW.
:stick:

Chickenfeed
10-02-2009, 08:26 AM
My HD4890 at SXGA, 8xMSAA scored 1694 at core 850MHz, memory 900MHz and 1859 at memory 1000MHz. So HD5870 at same clocks is whopping 16-17% faster. And that, ladies and gentlemen, is nothing short of messed up considering Cypress' overwhelming HW level superiority and the fact Furmark is very, very shader dependant...

I will be doing this test at more "real" world resolutions soon ( real world as far as these cards go - I know 1280x1024 is still widely used though ) I do agree that something is pretty messed up given the small difference.

I'll mention I've found that I get a display driver has stopped running error at 900/1300 when running OCCTs furmark type test after 30-45min ( this got core to like 88C and fan to over 50% which gets quite loud ) With actual games though it usually is in the 70s and only Crysis has managed to get any higher. I'm not sure if this is an indication of the card not being able to handle the load at these clocks in something like Furmark or merely drivers in their infancy. All I know is no games or other programs have shown signs of issue yet so far.

Kuntz
10-02-2009, 08:53 AM
rv600 and rv670 and rv770 all adjusted the timings when you overclocked the memory... id be surprised if rv870 doesnt do it anymore...

a drop from 8.3 to 7.7ns is a latency reduction of 7% though, then how come we still only get a 1.5% boost? and memory timings tend to not really matter on vgas, they need bw bw bw... they usually have cas15 or so...

i dont think the boost we see is only from reduced memory latency, as a matter of fact, i dont think memory latency decreases a lot, since r600 all gpus use a formula to calculate memory timings based on memory clock. the latency still decreases, and bw still increases, otherwise we wouldnt see any gains at all, but its less than we would see from a static timing config with increased memory speed.

you are probably thinking of system memory performing better with lower timings, for vgas that doesnt apply... they do score better with lower timings by the boost is tiny compared to cpu system memory...

nice tests chumbucket and chickenfield (sp?) :D
what it shows is that in furmark the 5870 def is bw limited, but that doesnt seem to be the case in actual games...
furmark is 99% shader heavy load only isnt it?

Ah I never knew that about the memory timings on ATI's cards.

I still don't think we have a memory bandwidth issue in games.

If a core increase of 10% nets 5% on average, and a memory bandwidth increase only nets 0.5% to 3.5%, I'd say we have a core bottleneck more than anything. :p

Though this benchmark stands out for oddity:

http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/images/c2560.gif

-A 10% increase in memory nets ... 0% increase in frame rate.
-A 10% increase in core nets ... 0.6% increase in frame rate.
-Both increases combined and now we have a 7.4% increase in frame rate.

Chumbucket843
10-02-2009, 12:55 PM
i dont think its that all the shader cores are loaded, games can def do that, what furmark does is load the cores completely, all 5 parts of it, and occt even feeds the rest of the logic with work and keeps texture and geometry setup busy with bogus work i think... its kinda like linx that bombards cpus with random instructions without any dependencies so every single unit is fully loaded. i think... :D

its more complicated than that. if a shader program is flow heavy it will do a lot better on nvidia cards. loading all of the alu's is harder than it sounds. its vliw too so not all instructions can take full advantage of all of the shaders. they encourage programmers to use certain floats and vecs to get the fastest performance. memory size is probably the issue. think about it. if you double the amount of threads you double the required memory. its like having a 4870 512mb. it makes sense that seeing that they doubled everything but memory and bandwidth.

Crankybugga
10-02-2009, 04:20 PM
Any conspiracy theories yet?

Witholding performance via driver for GT300 launch? lol...
Im still waiting for this "investigation" to be finalised so we can see how ATi have completely stuffed the HD5870 :rofl:

Soultaker52
10-02-2009, 07:40 PM
Any conspiracy theories yet?

Witholding performance via driver for GT300 launch? lol...

I know that's what's on my mind. I tried asking on Terry Makedon's (CatalystMaker) twitter, but he only evaded the question, then stopped answering when I pushed harder. I'm really beginning to think they're holding the card back.

JimmyH
10-02-2009, 10:28 PM
One theory why 5870 is slower than its spec suggest: Its 1600 shaders are nowhere near working at their max capacity. This would explain why

1) increasing memory clocks does not improve results as much as expected.
2) load power is lower than other cards in games but much higher in occt.
3) raising memory has a greater effect in furmark

flopper
10-03-2009, 12:39 AM
"if" they can hold it back a little, it make sense as they wait for the late and very late g300 performances.

iandh
10-03-2009, 01:18 AM
You know, ATI holding the card back really goes pretty far into tinfoil hat territory, but it almost makes sense...

Mr. K6
10-03-2009, 07:44 AM
I think it might be accidentally held back due to drivers. If I understand it correctly, a lot of the performance from ATI's design comes from the GPU being fed information optimally. I'd wait for a few driver releases to see final performance numbers.

zalbard
10-03-2009, 08:50 AM
Yep, need better drivers, hard to expect some crappy release candidate version to offer 100% performance. :yepp:

Astennu
10-04-2009, 11:59 PM
I think its driver related but i dont think AMD does this in purpose. Because the card now looks bad vs the HD4870 X2. The X2 is out gunning it. They did change a lot in the core. So i expect they will be able to boost the performance by 10-20% in the future. We have also seen massive gains with newer drivers for the HD48xx cards.

And about furmark. I know AMD does something in there drivers to cut down the load on the GFX card. Furmark puts a rare load on the card causing some to overheat. Because of that AMD has put a limiter in the driver. You can disable it by renaming the .exe in something else. You might wanna try that if you want a fair compare. But keep an eye out for you temperatures. They can get higher then before !

JimmyH
10-11-2009, 08:16 AM
So despite what they claimed the new AF does take more serious hit than before:

http://pclab.pl/art38674-9.html

Might be a better idea to run these cards at 8xAF instead of 16x

fornowagain
10-11-2009, 11:59 AM
rv600 and rv670 and rv770 all adjusted the timings when you overclocked the memory... id be surprised if rv870 doesnt do it anymore...That's very interesting, I remember reading that before somewhere. Don't suppose you have a link?

PaganII
10-11-2009, 01:39 PM
Maybe a bottleneck in crossfire.

http://www.hardwarezone.com/articles/view.php?cid=3&id=3032

Hornet331
10-11-2009, 03:56 PM
... its kinda like linx that bombards cpus with random instructions without any dependencies so every single unit is fully loaded. i think... :D

Sry but linx is nothing more then a gui for linpack... :rolleyes:

demonkevy666
10-14-2009, 06:36 PM
http://www.xtremesystems.org/forums/showthread.php?t=235831&page=2

look at the score on the pages before oc on card and after.
seems more like a bios problem, note this was an AMD system though :shrug:

Metatron
10-26-2009, 09:49 AM
One theory why 5870 is slower than its spec suggest: Its 1600 shaders are nowhere near working at their max capacity. This would explain why

1) increasing memory clocks does not improve results as much as expected.
2) load power is lower than other cards in games but much higher in occt.
3) raising memory has a greater effect in furmark

If you want to verify the capacities of the ALUs you should consider testing the same stuff on the "Froblin"-demo with a 4870 and a 5870, because you got adaptive tessalation even without DX11.
That somewhat disconnects the internal from the external bandwidth requirements, as no (proportionally) more geometry is streamed into the processor.
I would also make the very vague guess that crunching on up-LODed geometry creates a somewhat shaderinstruction-grouping favoring situation. But that's very hard to say. That can really only be verified if you can play around with the LOD-strength in the demo.

You'd check something like this combinations:

LOD: 1, 2, 4
Chips: 4870, 5870
Core-Clock: x, y, z

BTW the architecture is well described (ATI published ta 392 page doc to the linux guys), so someone eager may find the reason in there.

Good luck

jaredpace
11-16-2009, 03:19 PM
http://img.techpowerup.org/091116/wolf5.jpg

Unigine Heaven Benchmark - 1680x1050 - 4xAA, 16xAF - max w/tesselation - Cat 9.10

650/900 = 23.9 fps
650/950 = 24.4 fps
650/1000 = 24.7 fps
650/1050 = 24.9 fps
650/1100 = 25.1 fps
650/1150 = 25.2 fps
650/1200 = 25.5 fps
650/1250 = 25.7 fps
650/1300 = 25.8 fps

725/900 = 26.0 fps
725/950 = 26.2 fps
725/1000 = 26.7 fps
725/1050 = 27.0 fps
725/1100 = 27.3 fps
725/1150 = 27.4 fps
725/1200 = 27.8 fps
725/1250 = 28.0 fps
725/1300 = 28.2 fps

800/900 = 27.7 fps
800/950 = 28.2 fps
800/1000 = 28.6 fps
800/1050 = 28.9 fps
800/1100 = 29.3 fps
800/1150 = 29.6 fps
800/1200 = 29.8 fps
800/1250 = 30.1 fps
800/1300 = 30.4 fps

875/900 = 29.4 fps
875/950 = 29.9 fps
875/1000 = 30.2 fps
875/1050 = 30.7 fps
875/1100 = 31.1 fps
875/1150 = 31.4 fps
875/1200 = 31.8 fps
875/1250 = 32.1 fps
875/1300 = 32.4 fps

950/900 = 30.7 fps
950/950 = 31.1 fps
950/1000 = 31.7 fps
950/1050 = 32.1 fps
950/1100 = 32.6 fps
950/1150 = 33.3 fps
950/1200 = 33.6 fps
950/1250 = 33.8 fps
950/1300 = 34.4 fps

mem vs core

900/650 = 23.9 fps
900/725 = 26.0 fps
900/800 = 27.7 fps
900/875 = 29.4 fps
900/950 = 30.7 fps

950/650 = 24.4 fps
950/725 = 26.2 fps
950/800 = 28.2 fps
950/875 = 29.9 fps
950/950 = 31.1 fps

1000/650 = 24.7 fps
1000/725 = 26.7 fps
1000/800 = 28.6 fps
1000/875 = 30.2 fps
1000/950 = 31.7 fps

1050/650 = 24.9 fps
1050/725 = 27.0 fps
1050/800 = 28.9 fps
1050/875 = 30.7 fps
1050/950 = 32.1 fps

1100/650 = 25.1 fps
1100/725 = 27.3 fps
1100/800 = 29.3 fps
1100/875 = 31.1 fps
1100/950 = 32.6 fps

1150/650 = 25.2 fps
1150/725 = 27.4 fps
1150/800 = 29.6 fps
1150/875 = 31.4 fps
1150/950 = 33.3 fps

1200/650 = 25.5 fps
1200/725 = 27.8 fps
1200/800 = 29.8 fps
1200/875 = 31.8 fps
1200/950 = 33.6 fps

1250/650 = 25.7 fps
1250/725 = 28.0 fps
1250/800 = 30.1 fps
1250/875 = 32.1 fps
1250/950 = 33.8 fps

1300/650 = 25.8 fps
1300/725 = 28.2 fps
1300/800 = 30.4 fps
1300/875 = 32.4 fps
1300/950 = 34.4 fps

http://forums.techpowerup.com/showthread.php?t=106814&page=15

jaredpace
11-16-2009, 03:24 PM
I think this says compute power @ core of 650mhz gets a good increase as memory bandwidth increases. Also, quite interestingly, as compute power of the core increases towards 1000mhz, the performance increase associated with the memory bandwidth increase becomes greater itself. (degree of performance increases as well!).

Smartidiot89
11-17-2009, 12:04 PM
Strange idea:

Anyone tried changing the PCI-E mhz anything and see if it affects the performance?:D

Blkout
11-17-2009, 03:26 PM
Strange idea:

Anyone tried changing the PCI-E mhz anything and see if it affects the performance?:D

I got a small performance boost from raising the PCI-E from 100 to 115MHz when I used my 285GTX, but I get no gain when I change it with my 5850's.

***Deimos***
12-09-2009, 03:43 PM
No, I think the memory timings are static, they do not change with memory adjustment. So if they are at, for example, a total cycle delay of 10, that would mean:

10 cycles @ 1200MHz = 8.3ns delay
10 cycles @ 1300MHz = 7.7ns delay

Because there is no memory bottleneck on the 5870 cards, when you increase the memory speed, all you are really doing is decreasing the nanosecond delay on memory operations, which is why a 10% increase in memory speed is only getting a 1.5% increase in performance.
Here is good table of specification comparison of GDDR3, GDDR4 and GDDR5.
http://theovalich.wordpress.com/2008/11/22/100th-story-gddr5-analysis-or-why-gddr5-will-rule-the-world/gddr5_03_gddr345-diferences/

1. GDDR5 has high latency and so Fermi/Cypress have lots of cache/buffers. Unless special ultra high resolution with very large texturing benchmark is used, the mem system isn't CONSTANTLY used.

2. Although higher mem clock does marginally improve performance, maybe because offset by increasing errors.

3. Remember the i7/Phenom "un-core" clock. You only see significant mem scaling when you also increase the GPU clock... perhaps because this is increasing the mem controller speed and cache. ie doesn't help speeding up getting data to the chip, when the chip is slower and takes time carrying data where needed.

4. Driver optimizations have far greater impact than ALL OTHER FACTORS COMBINED. A well tuned shader compiler which makes good use of registers and cache can make minimal use of memory bandwidth - of course the behaviour would be different for other vendors/cards.

5. Issue of PCB quality (impedance matching), clock skew, and training. At super crazy high 5 000 000 000 bits per second GDDR5 is certainly not CAS3 or much higher than CAS9. The 8 bit prefetch latency is probably big factor. overriding factor.

SCAVENGER1
12-11-2009, 11:54 PM
Got my 5870 today, but I can't seem to get the GPUclock utility to work. I'm using the RC7 drivers leaked from MSI, maybe that's the problem...

maybe try the 8.663.1_Beta5_Hemlock_VistaWin7_Nov11 and see if it will help. they are the drivers i am using now on vista and win7 right now.

and so far so good and they run better then the official 9.11 did. since they cause some shader problems in borderlands for me.

but the 9.12 betas fix that for me.

but for me i have not over clocked my 5870 yet since i have no clue of how or what to do.

and from what i have read in this forum i really do not want to since this is my 1st rma 5870.

since my 5870 came to me bad and now this 1 seems to be ok

http://www.filefront.com/15115547/8.663.1_Beta5_Hemlock_VistaWin7_Nov11.exe

***Deimos***
12-12-2009, 09:52 AM
Even though this IS xtremesystems, I dont blame you for being cautious with overclocking on expensive new (rma) 5870.
Afterall, its not like you need to overclock to catchup.. its ALREADY the FASTEST.

Besides. 1000Mhz. 1100Mhz. Even 1300Mhz on GPU.. that's peanuts in terms of performance improvement compared to what driver optimizations bring (which also have the advantage of being cumulative and compounded!!).

Judging based on driver timelines of X1900, HD3870, and HD4870 - need HD5xx to wait 3-4 months for most major improvements (DX9/DX10) and probably till summer '10 to top out (especially since virtually no DX11 titles to optimize for right now).

Your HD5870 is only gonna get faster ;)