ATI Radeon HD 4000 Series discussion

**AliG** · 06-15-2008, 03:29 AM

meh, its well known that gpuz uses a database to identify the card, it does not magically read the card and know what everything is, w1zzard himself has said the database is simply outdated and that rv770 data is wrong

**trinibwoy** · 06-15-2008, 03:31 AM

Originally Posted by DilTech

You see, when you run multiple cards, both cards must have all the information in their individual memory banks, as both cards are likely to be showing the same textures as one another, as well as the same shaders. As such, you're effectively doubling the same exact info in both physical memory, as well as using the memory bandwidth of both cards to perform identical tasks. Seeing as how they can't share the same information from the same set of ram, they're effectively doing the same job twice.

No, ORBR is more accurate in this case. The duplicate transfer of textures from the host to the GPU happens over the PCIe bus and it doesn't happen every frame. So even if the same transfer occurs to both cards it doesn't have anything to do with the on-board bandwidth usage.

What he's referring to is the use of the on-board bandwidth to render the frame. The two cards are not doing "identical tasks". They are processing different geometry and rendering different framebuffers for their individual frame. However, there may be instances where the two cards generate the same render target or other buffers that normally would be reused in a single-card scenario which is a bit of a waste.

In any case both 256-bit buses are being used somewhat in parallel to process two different frames at once. So it is closer to the bandwidth of a 512-bit bus than it is to a single 256-bit one.

It's very simple really. Compare 9600GT-SLI to an 8800GTS-512. The former is much faster even though the number of processing units is the same. Why? Double bandwidth.

**AliG** · 06-15-2008, 03:44 AM

No, diltech is correct, the data is mirrored on both memory banks, so while the cards do not render the same frame, when using sli you really only get the memory bandwidth and pool of one card

**AliG** · 06-15-2008, 03:46 AM

Originally Posted by trinibwoy

No, ORBR is more accurate in this case. The duplicate transfer of textures from the host to the GPU happens over the PCIe bus and it doesn't happen every frame. So even if the same transfer occurs to both cards it doesn't have anything to do with the on-board bandwidth usage.

What he's referring to is the use of the on-board bandwidth to render the frame. The two cards are not doing "identical tasks". They are processing different geometry and rendering different framebuffers for their individual frame. However, there may be instances where the two cards generate the same render target or other buffers that normally would be reused in a single-card scenario which is a bit of a waste.

In any case both 256-bit buses are being used somewhat in parallel to process two different frames at once. So it is closer to the bandwidth of a 512-bit bus than it is to a single 256-bit one.

It's very simple really. Compare 9600GT-SLI to an 8800GTS-512. The former is much faster even though the number of processing units is the same. Why? Double bandwidth.

Nope. The g94 gpu is simply far more optimised for current games, even in single card it is just a few frame behind the 8800gt as long as you don't crank up too much detail. Once you hit the high detail, the single gts will win, which concludes that you aren't doubling the bandwidth because if you were the g94s would have get better frames at max details

**trinibwoy** · 06-15-2008, 03:48 AM

Originally Posted by AliG

No, diltech is correct, the data is mirrored on both memory banks, so while the cards do not render the same frame, when using sli you really only get the memory bandwidth and pool of one card

Gah! You guys are confusing the duplication of data on the framebuffer with memory bandwidth. They are two completely separate things!

I'm not sure how else to explain it but I'll try.

The duplication happens when the host CPU sends texture and other reference data to the GPU over the PCIe bus. This happens only when needed...if a texture is already in local memory on the GPU it's not resent. At this point both cards have the same data.

Bandwidth however is used to do the work to actually render the frame. Reading texture data into the chip, writing buffers back out to memory etc. This is not duplicate work as it's being done for two different frames. This is stuff a single GPU would have to do twice anyway so it's not wasted bandwidth.

**trinibwoy** · 06-15-2008, 03:51 AM

Originally Posted by AliG

Nope. The g94 gpu is simply far more optimised for current games, even in single card it is just a few frame behind the 8800gt as long as you don't crank up too much detail. Once you hit the high detail, the single gts will win, which concludes that you aren't doubling the bandwidth because if you were the g94s would have get better frames at max details

No, G92 is just bandwidth limited. Show me where a single G92 is faster than SLI 9600GT at any settings in a title that scales with SLI.

http://techreport.com/articles.x/14168/8

**Extelleron** · 06-15-2008, 03:53 AM

Originally Posted by trinibwoy

No, G92 is just bandwidth limited. Show me where a single G92 is faster than SLI 9600GT at any settings in a title that scales with SLI.

http://techreport.com/articles.x/14168/8

9600GT SLI also has a total of 32 ROPs, meanwhile G92 cards have only 16. So pixel processing & memory bandwidth are the weaknesses of G92.

**trinibwoy** · 06-15-2008, 03:58 AM

Originally Posted by Extelleron

9600GT SLI also has a total of 32 ROPs, meanwhile G92 cards have only 16. So pixel processing & memory bandwidth are the weaknesses of G92.

Correct. And the reason those 32 ROPs can be effective is because the additional bandwidth is available.

**AliG** · 06-15-2008, 04:04 AM

It already is well known that g92 is bandwidth limited, but check this review out
http://www.neoseeker.com/Articles/Ha...it9600gtsonic/

With exception of world in conflict (where the gts beats the sli in every test), in general the sli wins until hit the very high resolutions with AA (such as with crysis) and then the gts single card wins

**adamsleath** · 06-15-2008, 04:05 AM

memory bandwidth are the weaknesses of G92.

the single gts will win, which concludes that you aren't doubling the bandwidth because if you were the g94s would have get better frames at max details

one of these statements is wrong.

why does a single gts beat the 9600gt sli at high settings

?
ie. @16x12 AA=4...9600gtsli is bottlenecked by something it seems to me.

**Xello** · 06-15-2008, 04:10 AM

Originally Posted by DilTech

2x256bit doesn't equal 512bit.

**adamsleath** · 06-15-2008, 04:20 AM

i still say, (because those who know more than me have said it) that 9600gt's are bandwidth limited, even when you have 2 of them
even vs an 8800gt, or a gts or a 9800gtx or particularly 8800gtx/ultra, and new 512 bit cardies.

sli gives you an fps boost, but there seems to be a bottleneck@ high res+AA, and i'm not sure what it is.
i thought it was the bandwidth.
people have rabitted on about the bus also, but i dont really understand it.

all i know is i dont like busses because they get stuck in traffic

**AliG** · 06-15-2008, 04:25 AM

no, the 9600gt isn't bandwidth limited, why would it be? Think about it, it performs at its full potential and is almost at the level of the 8800gt with just over half the shaders but same bandwidth

But either way, it doesn't matter because this is an ATI thread

**adamsleath** · 06-15-2008, 04:27 AM

no, the 9600gt isn't bandwidth limited

well it is limited by something; i dont care whether you call it kentucky fried chicken limited.

Think about it

think about what?
youve provided the evidence that 9600gt is bottlenecked at 16x12 4xAA and yet you still have not provided the explanation.

but dont worry about actually answering; just say it's an ati thread

**AliG** · 06-15-2008, 04:30 AM

Its limited by the shader power (well I shouldn't really say limited as its at full potential pretty much), 2*64 shader cards in SLI =! 128 shaders in single card, sli just isn't that efficient yet. If you gave the 8800gts 512mb a 512bit memory interface, it would fly as it is definitely held back by memory bandwidth.

But once again, this is the ATI thread, lets drop it here, if you wish to continue, bring it up at the nvidia thread

**adamsleath** · 06-15-2008, 04:32 AM

Its limited by the shader power (well I shouldn't really say limited as its at full potential pretty much), 2*64 shader cards in SLI =! 128 shaders in single card, sli just isn't that efficient yet. If you gave the 8800gts 512mb a 512bit memory interface, it would fly as it is definitely held back by memory bandwidth.

you could have said that b4 now.
shader diplomacy? but thx for the clarification there.

**trinibwoy** · 06-15-2008, 04:55 AM

Originally Posted by DilTech

You see, both GPU's have to store the same information, EVEN IF IT'S AFR mode! Why? Because 99% chance says Frames 1,3,5,7,9, and 11 have 90% the same information(textures, shaders, etc) as 2,4,6,8,10, and 12. You see, when you run multiple cards, both cards must have all the information in their individual memory banks, as both cards are likely to be showing the same textures as one another, as well as the same shaders.

This statement is correct regarding STORAGE.

using the memory bandwidth of both cards to perform identical tasks. Seeing as how they can't share the same information from the same set of ram, they're effectively doing the same job twice.

This statement is incorrect regarding PROCESSING. A single card would have to do this job twice over two frames. Two cards do the same thing somewhat in parallel.

As such, it's the same as a single 256bit memory bus, and a single set of 512mb of ram. That's why most stores make sure to specify 256bit x2 and 512mb x2. 2x256bit doesn't equal 512bit.

No it is not the same as a single 256-bit memory bus. It certainly isn't the same as a single 512-bit one either but it's definitely closer.

**ORBR** · 06-15-2008, 04:58 AM

Originally Posted by DilTech

In a perfect world, maybe.

You see, both GPU's have to store the same information, EVEN IF IT'S AFR mode! Why? Because 99% chance says Frames 1,3,5,7,9, and 11 have 90% the same information(textures, shaders, etc) as 2,4,6,8,10, and 12.

You see, when you run multiple cards, both cards must have all the information in their individual memory banks, as both cards are likely to be showing the same textures as one another, as well as the same shaders. As such, you're effectively doubling the same exact info in both physical memory, as well as using the memory bandwidth of both cards to perform identical tasks. Seeing as how they can't share the same information from the same set of ram, they're effectively doing the same job twice.

I quote you
but the gpu aren't doing the same job, because they are processing different frames
so they can use 90% of same assets, but doing different job at 100%

**Unbornchild** · 06-15-2008, 06:13 AM

ORBR and trinibwoy:
1. what is your explanation for the microstuttering in 3870X2 and do you expect it to be solved in 4870X2?

2. Is/will be crossfire in practice more efficient on 4870X2 than on 2 single 4870 cards?

3. If a game "X" isn't optimized for cross/X-fire in drivers, will the 4870X2 in that case automatically act as a single card (2nd gpu not being a hindrance), and performance never drop to below that of a single 4870?

Many thanks in advance for answering.

**trinibwoy** · 06-15-2008, 06:23 AM

Originally Posted by Unbornchild

ORBR and trinibwoy:
1. what is your explanation for the microstuttering in 3870X2 and do you expect it to be solved in 4870X2?

2. Is/will be crossfire in practice more efficient on 4870X2 than on 2 single 4870 cards?

3. If a game "X" isn't optimized for cross/X-fire in drivers, will the 4870X2 in that case automatically act as a single card (2nd gpu not being a hindrance), and performance never drop to below that of a single 4870?

Many thanks in advance for answering.

Heh, if we had the answers to those questions everybody else would too. At this point nobody knows what AMD has done to improve on the 3870X2 formula.

1. There are a lot of threads explaining the issue. Essentially the frames rendered by the individual GPU's are not distributed evenly either in terms of game time or the length of time they remain on-screen. This is an inherent issue with AFR and the only true solution is to have all the GPU's co-operating on the same frame (Supertiling, SFR etc). Of course you don't get the geometry processing scaling of AFR.

2. It should be. Rumours have it that they are switching to a faster on-board PCIe interface between the chips or doing something totally different. I see no reason why two individual cards should be faster.

3. If it is AFR based then yes, if the game is not AFR friendly you will have the same issues that we have today with multi-GPU scaling.

But like I said nobody has the answers at this point.

**Xello** · 06-15-2008, 06:29 AM

Originally Posted by trinibwoy

I see no reason why two individual cards should be faster.

I.m.o it makes sense that 2 individual cards would be faster (as 2 x 3870's often outperform the 3870X2) - it's a sacrifice made to get both the GPU's on a single slot, a compromise if you like which of course has advantages as well as disadvantages, for example the possibility of crossfire-like performance on boards that don't support it, and obviously Quad-Xfire potential on boards that do

This of course goes for NV's dual-gpu solutions as much as the radeons.

**Helmore** · 06-15-2008, 06:57 AM

Originally Posted by Xello

I.m.o it makes sense that 2 individual cards would be faster (as 2 x 3870's often outperform the 3870X2) - it's a sacrifice made to get both the GPU's on a single slot, a compromise if you like which of course has advantages as well as disadvantages, for example the possibility of crossfire-like performance on boards that don't support it, and obviously Quad-Xfire potential on boards that do

This of course goes for NV's dual-gpu solutions as much as the radeons.

The fact that 2 x 3870 can outperform a 3870X2 has to do with the fact that the 3870X2 uses GDDR3 at 1800 MHz. while a 3870 uses GDDR4 at 2250 MHz. and there is also a core clock difference. The 3870X2 has it's cores clocked at 825 MHz. and a 3870 has its core clocked at 775 MHz. Then you also have the fact that there is a PCIe gen 1 bridge chip on the 3870X2 and this will give 2 3870s more bandwidth to communicate if they are installed on a mobo with 2 PCIe gen 2 16x slots.
This makes 2 x 3870 faster if there is a need for bandwidth, the 3870X2 will be faster if there is more need for raw crunching power.

**pest** · 06-15-2008, 06:57 AM

Originally Posted by trinibwoy

1. There are a lot of threads explaining the issue. Essentially the frames rendered by the individual GPU's are not distributed evenly either in terms of game time or the length of time they remain on-screen.

You're describing the symptoms not the source of them.
Microstuttering is a result of the CPU delivering the processing data too fast in heavy GPU-bound scenes.

**Luka_Aveiro** · 06-15-2008, 07:13 AM

Originally Posted by pest

You're describing the symptoms not the source of them.
Microstuttering is a result of the CPU delivering the processing data too fast in heavy GPU-bound scenes.

micro-stuttering is the result of AFR solutions, and it happens due to each gpu frames synchronization.

Example:

GPU1-> frames 1,3,5,7,9,11,13 @ 0,01sec 0,05 0,09 0,13, 0,17 0,21 0,25

GPU2-> frames 2,4,6,8,10,12,14 @ 0,02sec, 0,06 ,010 0,14 0,18 0,22 0,26

If you come to notice, there is a 0,03sec gap between GPU1 and GPU2 frames, and that's the micro stuttering thing, while you have 2 frames displayed by each gpu one imediatly after another, but then you have a 0,03 gap.

This is just a quick example nothing really deep.

**trinibwoy** · 06-15-2008, 07:16 AM

Originally Posted by pest

You're describing the symptoms not the source of them.
Microstuttering is a result of the CPU delivering the processing data too fast in heavy GPU-bound scenes.

Read what I wrote again. How is "the CPU delivering the processing data too fast" different to "the frames are not distributed evenly in game time" ? Obviously game time is controlled by the CPU. Don't argue just for arguing sake...a little reading comprehension goes a long way.

Thread: ATI Radeon HD 4000 Series discussion

Thread Tools

Search Thread

Rate This Thread

Display

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions