AMD's Radeon HD 6870 benchmarked? (updated more screens)

**Dimitriman** · 08-27-2010, 11:54 AM

Originally Posted by Manicdan

in the last thread someone showed referenced values for other cards:

i find this value too high. but i do expect such gains for dx11 stuff, cause honestly thats what this card is being built for.
for Dx10, maybe 10-15% due to higher clocks and better efficiency of power, but the architecture changes i doubt will do much for efficiency of perf/watt, just perf/mm2

I find around 11~12k proper for the Cayman 6870, sits right between 5870 and 5970 and beats the 480GTX (480 cuda) by about 15~20%

That does not seem too high for me at all it's exactly what I would expect from anything that earns the name 6870. In fact the jump achieved from 4870 to 5870 was much more significant than that.

Thus in my opinion, 11k Vantage is actually rather low for a generation change.

But we can forgive AMD given the lack of 32nm production process.

**Manicdan** · 08-27-2010, 12:00 PM

Originally Posted by Dimitriman

I find around 11k proper for the Cayman 6870, sits right between 5870 and 5970 and beats the 480GTX (480 cuda) by about 10%

That does not seem too high for me at all it's exactly what I would expect from anything that earns the name 6870. In fact the jump achieved from 4870 to 5870 was much more significant than that.

Thus in my opinion, 11k Vantage is actually rather low for a generation change.

But we can forgive AMD given the lack of 32nm production process.

4870 to 5870 was a shrink with double the SPs

im thinking this should be treated more like a 2900 to 3870, built to save money, not increase performance massivly.

keep in mind that whatever they do now, will probably just doubled when they go to 28nm, which means it will be near the same size. so if they do 300mm2 now, it will be right around 300mm2 at 28nm, with about double the perf.

if all they are doing is dropping the 5th SP to shave a little space, it might get them back 5-10% board space, for a 2% perf loss. which they then use to beef up tessellation, and were back to the same size, but 30% faster in Dx11, same in older games.

however im betting its not a simple SP layout change, something massive has to happen to see any real perf change, and im betting the 6870 will not be any larger than a 5870

**Calmatory** · 08-27-2010, 12:18 PM

33 % more memory bandwidth and 20 % more SPs with a minor loss couldn't lead to 40+ % performance increase. There has to be more to this than 1600->1920 SPs. Obviously assuming that both runs would be ran with ~similar hardware and software, which I have hard time believing.

Or then it's fake.

**Dimitriman** · 08-27-2010, 12:21 PM

Originally Posted by Calmatory

33 % more memory bandwidth and 20 % more SPs with a minor loss couldn't lead to 40+ % performance increase. There has to be more to this than 1600->1920 SPs.

Or then it's fake.

Where did you read that there isn't more than that?

Do you really expect that AMD will add 20% more SP and call it a day? There would be no need for secrecy if that were the case.

keep in mind that whatever they do now, will probably just doubled when they go to 28nm, which means it will be near the same size. so if they do 300mm2 now, it will be right around 300mm2 at 28nm, with about double the perf.

if all they are doing is dropping the 5th SP to shave a little space, it might get them back 5-10% board space, for a 2% perf loss. which they then use to beef up tessellation, and were back to the same size, but 30% faster in Dx11, same in older games.

however im betting its not a simple SP layout change, something massive has to happen to see any real perf change, and im betting the 6870 will not be any larger than a 5870

I really think that AMD will be targeting a major increase in DX11 and Tesselation performance strictly for the 6870. I wouldn't expect much improvement in DX10 and older games, this seems to be more of a change to 5k series architecture to make it more "future proof".
But I find it very very VERY unlikely that cayman will NOT beat 480GTX. It's very reasonable to believe that it will target a 512 Cuda Fermi (DX11 ofc).

**ColonelCain** · 08-27-2010, 12:37 PM

Oh God.... Already with these shenanigans?

**Chumbucket843** · 08-27-2010, 12:47 PM

Originally Posted by Calmatory

33 % more memory bandwidth and 20 % more SPs with a minor loss couldn't lead to 40+ % performance increase. There has to be more to this than 1600->1920 SPs. Obviously assuming that both runs would be ran with ~similar hardware and software, which I have hard time believing.

Or then it's fake.

there is more going on than sp's and memory bandwidth.

for a long time it has been a mystery as to why the 5870 was only 60% faster than the 4890 while doubling almost everything. then the card was still liked because nvidia could barely beat their previous gen, and everyone stopped talking about why it wasnt much faster.

perhaps whatever bottleneck(s) the 5870 had has been improved in SI.

**tajoh111** · 08-27-2010, 12:54 PM

I think one thing that makes this part slightly suspect for a release and performance is that 32nm was canceled last November and considering NI/SI was built around this, AMD would only have months to do a redesign to modify it to work on 40nm again, if they wanted to get things released this october. This in the grand scheme of things is not that long a time considering the planning process is years. I don't think they would have enough time to make huge changes, atleast to get 30-40% percent more performance unless this chip is 400 MM2 +. And I cannot see performance going up while not seeing an huge jump in power consumption on tsmc current process because at this point, I think with the timeframe AMD had to work with, SI has to be more cypress than NI.

One more thing is AMD has got to add something to increase geometry performance and thus tesselation performance. This is not going to come free from changing 5d shaders to 4 d shaders. One of this bigger priority of AMD at this point has to be delivering higher directx 11 support and I don't see how AMD is going to do this simply by changing the shaders to 4d instead of 5d.

They are are going to have to add something and this is going to increase the size of the chip substantially.

**Lightman** · 08-27-2010, 01:05 PM

Originally Posted by tajoh111

I think one thing that makes this part slightly suspect for a release and performance is that 32nm was canceled last November and considering NI/SI was built around this, AMD would only have months to do a redesign to modify it to work on 40nm again, if they wanted to get things released this october. This in the grand scheme of things is not that long a time considering the planning process is years. I don't think they would have enough time to make huge changes, atleast to get 30-40% percent more performance unless this chip is 400 MM2 +. And I cannot see performance going up while not seeing an huge jump in power consumption on tsmc current process because at this point, I think with the timeframe AMD had to work with, SI has to be more cypress than NI.

Good reasoning, but do you think TSMC wasn't kind enough to let know AMD and nVidia (two biggest future 32nm customers) that this node is no longer on a roadmap way before official announcement?

As a major customer I would be quite upset if I would find out about event like this from press(release) ...

**informal** · 08-27-2010, 01:08 PM

Originally Posted by Chumbucket843

there is more going on than sp's and memory bandwidth.

for a long time it has been a mystery as to why the 5870 was only 60% faster than the 4890 while doubling almost everything. then the card was still liked because nvidia could barely beat their previous gen, and everyone stopped talking about why it wasnt much faster.

perhaps whatever bottleneck(s) the 5870 had has been improved in SI.

There was a talk/rumor that the efficiency of their execution unit block(superscalar and five ALUs wide) is not that stellar so that they worked on improving that part of the design.

**Chumbucket843** · 08-27-2010, 01:21 PM

it's not superscalar. that's just marketing so they can 1-up nvidia's scalar architecture.

i'm not worried about the performance of shaders. they are already extremely fast. the efficiency is actually pretty good on the majority of shaders, in fact the 4870 was 50% at game shaders than the gtx280 so the bottleneck must be elsewhere.

**informal** · 08-27-2010, 01:28 PM

Well from the alleged fishy screenshot we see the memory clocks are raised quite a bit(if the screenshot is real that is). So they thought the new/reworked Evergreen core benefits from additional memory BW more than the old one.

**Hornet331** · 08-27-2010, 01:48 PM

Originally Posted by informal

Well from the alleged fishy screenshot we see the memory clocks are raised quite a bit(if the screenshot is real that is). So they thought the new/reworked Evergreen core benefits from additional memory BW more than the old one.

I guess they uped the bandwidth for higher resolutions + AA, since thats one of the weaknesses of the HD5870 compared to 480GTX.

**qcmadness** · 08-27-2010, 02:04 PM

Originally Posted by Chumbucket843

there is more going on than sp's and memory bandwidth.

for a long time it has been a mystery as to why the 5870 was only 60% faster than the 4890 while doubling almost everything. then the card was still liked because nvidia could barely beat their previous gen, and everyone stopped talking about why it wasnt much faster.

perhaps whatever bottleneck(s) the 5870 had has been improved in SI.

The IPC of each SP for AMD and NVIDIA for Dx11 generation is generally lower than Dx10 equivalents.

That means Dx11 has something to do with latency / complexity of instructions?

**Eastcoasthandle** · 08-27-2010, 02:22 PM

Originally Posted by Calmatory

He merely said that he sees no flaws in this faked pic.

This is what w1z said. Which explains why he is of that opinion of what w1z stated.

**Chumbucket843** · 08-27-2010, 02:24 PM

Originally Posted by qcmadness

The IPC of each SP for AMD and NVIDIA for Dx11 generation is generally lower than Dx10 equivalents.

That means Dx11 has something to do with latency / complexity of instructions?

i have not seen much about shader model 5.0 but it is probably trivial to change the hardware for it. it's pretty complex how all of this gets broken down. first a programmer writes a shader. it is compiled at runtime. this allows the shader to run on any compliant architecture. the code goes through a couple software layers until it is in the form of machine code (0's and 1's) and then it runs on the hardware.

the problem comes from the fact that computer graphics changes so fast that hardware slows it down. reusing parts of the design is hard because it will be too slow but redesigning it takes too long. a careful balance is necessary to remain on top.

looking at current archs we see nvidia screwed up somewhere. i cant say exactly but they definitely tried to do too much and got behind schedule. that probably happened early on judging by a 6 month delay. ati went for quickest time to market and tried to get their old arch updated enough to be acceptable at dx11. from that we must figure out what is slowing it down the architecture. it's hard because documentation is scarce and you cant view all part independently.

**hurleybird** · 08-27-2010, 02:27 PM

Originally Posted by Chumbucket843

it's not superscalar. that's just marketing so they can 1-up nvidia's scalar architecture.

Well, there's some debate as to whether the architecture is superscalar or VLIW (it's the later, technically), but the one thing for sure is that it isn't scalar

**Chumbucket843** · 08-27-2010, 02:39 PM

Originally Posted by hurleybird

Well, there's some debate as to whether the architecture is superscalar or VLIW (it's the later, technically), but the one thing for sure is that it isn't scalar

there is no argument. look at the instruction set manual. 448 bit instruction words. that's a very long instruction word.

superscalar is a microarchitecture technique.

stream core: The fundamental, programmable computation units, responsible for performing integer, single precision floating point, double precision floating point, and transcendental operations. They execute VLIW instructions for a particular thread. Each stream processor stream core handles a single instruction within the VLIW instruction.

**demonkevy666** · 08-27-2010, 03:02 PM

Split Shader clocks makes it unified with shader cores

Not super scalar it's only scalar with shader cores.

Ati is Super Scalar, because shader is linked to core speed.

**hurleybird** · 08-27-2010, 03:04 PM

Originally Posted by demonkevy666

Ati is Super Scalar, because shader is linked to core speed.

Uh, what?

**LordEC911** · 08-27-2010, 06:05 PM

Originally Posted by Calmatory

33 % more memory bandwidth and 20 % more SPs with a minor loss couldn't lead to 40+ % performance increase. There has to be more to this than 1600->1920 SPs. Obviously assuming that both runs would be ran with ~similar hardware and software, which I have hard time believing.

Or then it's fake.

320 5d -> 480 4d = 50% increase...

Also, the other changes that hopefully we see this generation is the dual triangle setup which will help with tessellation.

All of that in under 400mm2 is amazing, again if it is all true which we will have to wait at least a few more weeks until we get more leaks/info.

**spursindonesia** · 08-27-2010, 06:36 PM

Originally Posted by Manicdan

4870 to 5870 was a shrink with double the SPs

im thinking this should be treated more like a 2900 to 3870, built to save money, not increase performance massivly.

keep in mind that whatever they do now, will probably just doubled when they go to 28nm, which means it will be near the same size. so if they do 300mm2 now, it will be right around 300mm2 at 28nm, with about double the perf.

if all they are doing is dropping the 5th SP to shave a little space, it might get them back 5-10% board space, for a 2% perf loss. which they then use to beef up tessellation, and were back to the same size, but 30% faster in Dx11, same in older games.

however im betting its not a simple SP layout change, something massive has to happen to see any real perf change, and im betting the 6870 will not be any larger than a 5870

I respect your opinion, but sometime people forget that within the same process node, ATi CAN add quite significant performance increase and/or efficiency, just take a look at R 520 --> R 580 and RV 670 --> RV 770. And or Evergreen family to Northern Islands family, i think they got the improved efficiency nailed down.

Die size & power consumption WOULD increase, but disproportional with the added performance. It certainly would be more beneficial for them should Bart (HD 6770) successfully reach HD 5850 performance but only around 260-270 mm^2 in size, then Cayman (HD 6870) while perhaps around 390-400 mm^2 and 210-220 w TDP, manages to beat GTX 480 by 20%.

Regarding transistor budget, i think we all know that ATi has implemented many efforts and workarounds in their Evergreen design to make up for TSMC's faulty/underperforming & underyielding 40 nm process, so with the process now matured, perhaps they can reduce the extent of those workarounds & save transistors in the process.

Originally Posted by Calmatory

33 % more memory bandwidth and 20 % more SPs with a minor loss couldn't lead to 40+ % performance increase. There has to be more to this than 1600->1920 SPs. Obviously assuming that both runs would be ran with ~similar hardware and software, which I have hard time believing.

Or then it's fake.

I think the key words for this to work are added efficiency and mArch improvements (such as 5d to 4d SP array reshuffle).

R 520 --> R 580 grew by some in die size & TDP (but not extraordinarily so), still using the same process, but in modern game that is shader heavy, the performance difference can be like night & day. The key ? R 580's 48 Pixel shader pipeline vs 16 of it in R 520.

**vietthanhpro** · 08-27-2010, 07:09 PM

Originally Posted by LordEC911

320 5d -> 480 4d = 50% increase...

Also, the other changes that hopefully we see this generation is the dual triangle setup which will help with tessellation.

All of that in under 400mm2 is amazing, again if it is all true which we will have to wait at least a few more weeks until we get more leaks/info.

become.....
[IMG]

[/IMG]

**spursindonesia** · 08-27-2010, 07:18 PM

Originally Posted by vietthanhpro

become.....
[IMG]

[/IMG]

Now that's interesting, so in layman terms, can we expect a half rate DP FLOP rate from Northern Islands chip family instead or a fifth rate in Evergreen ?

....

**Mungri** · 08-27-2010, 07:53 PM

11k score in vantage extreme mode is still impressive for a single card, if true.

**Chumbucket843** · 08-27-2010, 08:25 PM

Originally Posted by LordEC911

320 5d -> 480 4d = 50% increase...

Also, the other changes that hopefully we see this generation is the dual triangle setup which will help with tessellation.

All of that in under 400mm2 is amazing, again if it is all true which we will have to wait at least a few more weeks until we get more leaks/info.

since ATi uses a 16 wide SIMD it should be 64 sp's per core for SI. you could have 448sp's or 512sp's. 480 is xtremely unlikely.

essentially that is what they have now but with some baggage. each register is still going to be 128bits with 12 read and 4 write ports. this is about 45% of the shader unit and it contributes a lot of power.

Thread: AMD's Radeon HD 6870 benchmarked? (updated more screens)

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions