Radeon HD 6870 could widely outperform gtx 480

**HelixPC** · 07-22-2010, 03:15 PM

Originally Posted by 570091D

and you, of course, realize this news came from kitguru... the same site that saw a custom gtx480 pcb and declared the coming of the 512sp gtx485.

and yes i do think that amd would ship a modified version of evergreen with better tessellation performance and only a modest improvement for all other aspects of the chip.

hahaha, yes i remember that. Im hoping that the 6000 is a fast card, who doesnt want a faster video card, i sure do! bring it on AMD!!!

**informal** · 07-23-2010, 05:02 AM

Little speculation on my part

. From here we see this table:

IF SI has 1920SPs,then it's a 20% increase in stream processors count,meaning the die size is roughly ~3.2% higher (since 40% bigger die ,at the same node, gets AMD ~250% more SPs). ~4% die area investment means ~ 347mm2,basically the same die size as Cypress.If the SPs are reorganized in 4D scheme and utilization is better(as rumored) compared to Cyoress' approach,then the SI 6000 series can bring more than 20% performance improvement with almost no die space investment.Keep in mind that there was a tessellation improvement mentioned too in the news,so overall the SI ,done @40nm could mean more performance @ same die space and the same or slightly higher TDP envelopes. It doesn't have to be named 6870,a 6770 would suffice.

**Lokinhow** · 07-23-2010, 05:22 AM

Originally Posted by SnipingWaste

I find it funny that y'all think there can't be much improvement because its going to be on 40nm. Just look at the RV670 and RV770. Both are on 55nm and there is a nice improvement from the RV670 to the RV770.

there was a nice improvement becase the RV770's shaders count is 150% bigger than the RV670's.
It can't apply here, RV670 was REALLY small and didn't eat much power, so there was much room for that improvement.
but RV870 is not small, such a big jump in shaders counts would mean a huge, really huge chip.
it's going to have some nice improvements, but no way comparable with RV670 -> RV770. This will only be possible at 28nm.

**tbone8ty** · 07-23-2010, 07:41 AM

Originally Posted by Lokinhow

there was a nice improvement becase the RV770's shaders count is 150% bigger than the RV670's.
It can't apply here, RV670 was REALLY small and didn't eat much power, so there was much room for that improvement.
but RV870 is not small, such a big jump in shaders counts would mean a huge, really huge chip.
it's going to have some nice improvements, but no way comparable with RV670 -> RV770. This will only be possible at 28nm.

read above

**WaterFlex** · 07-23-2010, 09:20 AM

impossible vga. what about 6970

**SnipingWaste** · 07-23-2010, 10:29 AM

Originally Posted by Lokinhow

there was a nice improvement becase the RV770's shaders count is 150% bigger than the RV670's.
It can't apply here, RV670 was REALLY small and didn't eat much power, so there was much room for that improvement.
but RV870 is not small, such a big jump in shaders counts would mean a huge, really huge chip.
it's going to have some nice improvements, but no way comparable with RV670 -> RV770. This will only be possible at 28nm.

Im not saying that we will see the same improvement from RV670 to RV770. There were many post say because there can't be much improvement to non because the next GPU is on 40nm. One thing I can see is some improvements with shader efficiency because to me its lower then the RV770 vs the evergreens. I can see nice improvement if the shader efficiency is improved, tessellation is improved, and more shaders is added.

**madcho** · 07-23-2010, 11:15 AM

last gen of nvidia did a nice improvement in geometry, i hope ATI thinked too upgrade this part.

**LordEC911** · 07-23-2010, 11:55 AM

Originally Posted by informal

Little speculation on my part

. From here we see this table:

IF SI has 1920SPs,then it's a 20% increase in stream processors count,meaning the die size is roughly ~3.2% higher (since 40% bigger die ,at the same node, gets AMD ~250% more SPs). ~4% die area investment means ~ 347mm2,basically the same die size as Cypress.If the SPs are reorganized in 4D scheme and utilization is better(as rumored) compared to Cyoress' approach,then the SI 6000 series can bring more than 20% performance improvement with almost no die space investment.Keep in mind that there was a tessellation improvement mentioned too in the news,so overall the SI ,done @40nm could mean more performance @ same die space and the same or slightly higher TDP envelopes. It doesn't have to be named 6870,a 6770 would suffice.

320ALUs 5d(1600SPs) to 480ALUs 4d(1920SPs) is a 50% increase...
You can't really use the RV670->RV770 transition to base future changes to the architecture since it isn't a true comparison because they went from a ringbus to a hub.
6700 is definitely larger than Cypress, with current rumors putting it just a bit under 400mm2, most specific number I have heard is 395mm2. Performance target is obviously a "full" GF100 and I'm guessing the TDP would be around GTX470 levels, 210-220w.

**Manicdan** · 07-23-2010, 11:58 AM

if they go for a 400mm2 chip at 40nm, then 28nm with double everything is still going to be close to 400mm2 and maybe 200W+. which does beg the question how stripped down will the 6970 have to be for a duel gpu card

**Solus Corvus** · 07-23-2010, 12:08 PM

Originally Posted by LordEC911

320ALUs 5d(1600SPs) to 480ALUs 4d(1920SPs) is a 50% increase...

Why would a 4d ALU be as big as a 5d ALU?

**Manicdan** · 07-23-2010, 12:11 PM

how big is that 5th SP in the ALU? removing 320 of those, then adding 160 of the other 4 sounds like it will make it alot bigger

**LordEC911** · 07-23-2010, 12:11 PM

Originally Posted by Solus Corvus

Why would a 4d ALU be as big as a 5d ALU?

It shouldn't be... I was just stating that some of the larger die savings from RV670->RV770 isn't going to happen with Cypress->6700. So you can't use the SP increase to die size increase to estimate 6700.

Originally Posted by Manicdan

how big is that 5th SP in the ALU? removing 320 of those, then adding 160 of the other 4 sounds like it will make it alot bigger

If they are all the same size, going from 320 5d to 480 4d should only be ~20% increase in total shader space, not taking into account other units; scheduler, cache, TMU/TFUs ect.
50% shader increase for 20% size increase, seems like an efficient design if 5d is really only averaging a max utilization of ~80% in most situations, meaning the actual performance of a 5d vs 4d ALU should be about the same.

Originally Posted by Manicdan

if they go for a 400mm2 chip at 40nm, then 28nm with double everything is still going to be close to 400mm2 and maybe 200W+. which does beg the question how stripped down will the 6970 have to be for a duel gpu card

That's what I was thinking about back in April, when I first heard that the 28nm beast will be 512bit, though I don't know if the 512bit part is true or if it is a single GPU or dual GPU?
It really depends on how "mature" 28nm is, what clocks and what power savings they can get from it.

**Chumbucket843** · 07-23-2010, 03:27 PM

TSMC's 28nm HP is 40% faster at the same leakage. that could mean a 2GHz 480sp fermi or fcypress with 1600sp @1200MHz with a smaller die and cheaper. or is ATi going to GloFo? if so that makes predictability a lot harder.

leakage increases exponentially with more voltage and linearly with transistor count.

**-Boris-** · 07-23-2010, 03:42 PM

Originally Posted by informal

Little speculation on my part

. From here we see this table:

IF SI has 1920SPs,then it's a 20% increase in stream processors count,meaning the die size is roughly ~3.2% higher (since 40% bigger die ,at the same node, gets AMD ~250% more SPs). ~4% die area investment means ~ 347mm2,basically the same die size as Cypress.If the SPs are reorganized in 4D scheme and utilization is better(as rumored) compared to Cyoress' approach,then the SI 6000 series can bring more than 20% performance improvement with almost no die space investment.Keep in mind that there was a tessellation improvement mentioned too in the news,so overall the SI ,done @40nm could mean more performance @ same die space and the same or slightly higher TDP envelopes. It doesn't have to be named 6870,a 6770 would suffice.

So if AMD made a die double the size they would have many thousand SPs, and hundreds of ROPs and TMUs?
I would say that you are doing some mistakes here. First, you can never use numbers like that since you don't know how much of the die increase that is due to shaders. And second, you can't compare different architectures like that. HD3870 had a big ringbus and AMD made some major changes to the layout, they increased the transistor density a bit too.

If you shall compare chips like that, compare modern chips from the same generation, like Cypress and Juniper. And you can see that it scales pretty linear.

I would say that SI will have at least 20% larger die, since the amount of shaders has increased. But we don't know much about TMUs and ROPs. Besides they have made some architectural changes that we don't know too much about either.

So, my guess at the moment, around 400mm² and about 20-50% better performance depending on situation.

**Manicdan** · 07-23-2010, 04:31 PM

Originally Posted by LordEC911

If they are all the same size, going from 320 5d to 480 4d should only be ~20% increase in total shader space, not taking into account other units; scheduler, cache, TMU/TFUs ect.
50% shader increase for 20% size increase, seems like an efficient design if 5d is really only averaging a max utilization of ~80% in most situations, meaning the actual performance of a 5d vs 4d ALU should be about the same.

i didnt think it was the same size, i thought the first was much larger than the other 4, which is why they went with 5, instead of like 2-3

removing 1 may get them 5-10% more space, for a 2-3% in game perf loss (unless your like furmark which hopefully means 20% less heat and power consumption) then adding in

i really would like to get a much more knowledgeable answer about the size of the ALUs and SPs.

im gonna go with really bad fake numbers that are made up in my head
first SP is 30%, next are 10% each, (30+10+10+10+10) total is 70% of the chip from SPs
removing 1 SP per ALU will net them 10% space (aiming high) with 320 ALUs thats .0003125% of chip space per small SP
adding in 160 more ALUs will add 480 small SPs is 15% more space
adding 160 large SPs is another 15%.
-10+15+15=
20% bigger for 1920 SPs using 480 ALUs of 4d

please be aware that i do not know crap about the accuracy of those numbers

**iMacmatician** · 07-24-2010, 05:26 AM

If they go from 4+1 configuration to 3+1 (assuming no other factors are involved in die area)…

If the fat SPU is 1.0x the size of a regular SPU:
480[3+1] is 1.20x the area of 320[4+1]

If the fat SPU is 1.2x the size of a regular SPU:
480[3+1] is 1.21x the area of 320[4+1]

If the fat SPU is 1.5x the size of a regular SPU:
480[3+1] is 1.23x the area of 320[4+1]

If the fat SPU is 2.0x the size of a regular SPU:
480[3+1] is 1.25x the area of 320[4+1] (and 384[3+1] is the same area as 320[4+1])

If the fat SPU is 3.0x the size of a regular SPU:
480[3+1] is 1.29x the area of 320[4+1]

If the fat SPU is 5.0x the size of a regular SPU:
480[3+1] is 1.33x the area of 320[4+1]

Even with a really fat SPU, the increase in fat SPU density by going from 4+1 to 3+1 doesn't result in a large increase in total SPU area.

**Olivon** · 07-28-2010, 10:04 AM

ATI to Start Releasing Second-Gen DX11 Chips in Late October (Rumours) - XbitLabs

**Tao~** · 07-28-2010, 10:50 AM

1920 SP is going to be hardly any faster than 1600 SP without significant background improvements (which I presume is called NI )
Higher clocks at same or lower TDP, better Tessellation and GPGPU features could be good enough to counter GF104 till we have a ATI 'Fermi'.
Heck, even Tessellation and other DX11 stuff is useless ATM, higher clocks would be enough to just go clear.

**-Boris-** · 07-28-2010, 10:46 PM

Originally Posted by Tao~

1920 SP is going to be hardly any faster than 1600 SP without significant background improvements (which I presume is called NI )
Higher clocks at same or lower TDP, better Tessellation and GPGPU features could be good enough to counter GF104 till we have a ATI 'Fermi'.
Heck, even Tessellation and other DX11 stuff is useless ATM, higher clocks would be enough to just go clear.

Up to 50% more power on the same frequency is not bad at all on the same node.
Everyone is assuming that the 1920 SPs are a fact even if we don't know anything about that yet. But if it is, we will have 50% more ALUs with a total of 20% more SPs, the difference will be quite big. Especially if you take into account how much each SP has been utilized earlier.

**Tao~** · 07-28-2010, 11:13 PM

Why would they go back from 4+1 to 3+1 , i thought we were moving towards the parallel computing era. But it depends what the long term strategy is -: they can trade die space for better ILP or more Shader Units

It is worth noting that ATi simply doesnt need anything much faster than what it has right now to dominate the market , it simply needs to improve the efficiency of the current-gen arch. I doubt Nvidia has anything left to offer in Perf/Enthusiast segment till maybe Q1 2011.

Once 28nm comes the whole equation changes. The improvements made now must be progressive, thats all.

**blindbox** · 07-28-2010, 11:28 PM

Tao~, go read a few uarch details on 4xxx reviews and also 5xxx reviews. 4+1 config are probably underutilized. 3+1 config will probably be utilized more, but as a result, requires more die-space, due to an increase in the number of fat shaders. The upside is, DP performance increases.

Radeon HD 6xxx as we know it is a big experiment by AMD. They want HD 7xxx, their next uarch to be smooth. Might as well take the opportunity to earn some profit from it, because they can. I think nvidia would've made a fermi+G90 uarch hybrid if they could, just as a stopgap before the arrival of HD 5xxx.

One other thing people haven't taken into account is how 4+1 is a MUST for DP (the shaders combine to do DP. I think techreport explains in length regarding this). If AMD were to do a 3+1 config, there must have been a change in how the shaders work. That's for SI. NI probably implements the whole uarch. All speculation on my part.

SI is the upcoming. NI is the future.

If AMD wants SI to do 1920 shaders in 40nm, plus a 3+1 config, there HAS to be a change in the shader uarch (or anything directly related to it). They're the most space-consuming after all.

**Johnny87au** · 07-29-2010, 01:41 AM

Not sure if this has been mentioned but any release date? havent been on much lately..

**informal** · 07-29-2010, 02:15 AM

Originally Posted by Johnny87au

Not sure if this has been mentioned but any release date? havent been on much lately..

5 posts above yours ....

Thread: Radeon HD 6870 could widely outperform gtx 480

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions