-
Ok. Stop the flame war, please.
First of all, i'll happy to Show my source code to any expert. But i'll not give it to anybody. That's the first point. No copies.
Here are the tests that lead to the conclusions shown there. I'll sum them up for you.
And please understand that i'm not a hardware guru. I'm a developer. i've got a good understanding of Hardware design, yes. But when it comes down to knowing how a VRM works, well... that's well beyond my knowledge. I've always been honest about that. That's why i'm here. Help. That's what i need.
Cards affected
- HD4870
- HD4890
- All in all, G-DDR5 based ATI cards @stock frequencies.
All of them ?
- No. The only card that could withstand the test was of specific Powercolor design, with a 4-stage numerical VRM built-in. All others failed
- All others claims, before this board, that "it runs fine", were because they did not use the test configuration for the maximum load. Investigation on the cases on this board is needed. Can you help me there ?

PC and components used
- Watercooled cards.
- Temporarily over-cooled cards (added 120mm fans on the cards)
- Cards were plugged on AMD and Intel PCs, i7, C2D, and other CPUs and northbridges.
- PowerSupplies were all trusted brands (Corsair, Antec, Seasonic), Min 550W, Max 1500W (yes, we had the bug happening on a 1500W ToughPower).
Test 1 : Debugging the app
- I spent about 3 days doing that. The only debugging tests that worked were those that castrated the load. Made no sense whatsoever.
- Conclusion : non conclusive. or i suck at debugging.
Test 2 : OSes
- Problems Occurs on the following OS-es : XP32, Vista32&64, 7 32&64
- Conclusion : This rules out the OS as the cause
Test 3 : Driver version
- 4 driver versions were used : from most recent, to older version. Same behaviour. Official or betas.
- Conclusion : problem is not related to a Specific driver version. It may be common to all driver versions though, but unlikely, as HD4850 supports the test, and they use the same chip. Memory related problem ? Seems unlikely to me, but hey... who knows.
Test 4 : Lowering the load on the card
- This was done using the following : lowering the resolution of the test, lower the shader complexity value (even if this is less relevant as this changes the shader that is used), and running into windowed mode. Worked.
- Conclusion : Lowering the load makes the test works, even with the same algorithm applied (in the case of Windowed mode / less resolution used). The shader program is not at stake.
Test 5 : Lowering the frequencies
- Lowering the frequencies made the test stable. it ran perfectly fine.
- Conclusion : the program is not at stake. The shader, or program, doesn't depend on the itnernal clocks of the card of any sorts. If lowering the frequency of the cards makes the test stable, it strongly indicates a hardware failure, wether this be temperature, or a power supply stage failure
Test 6 : Increasing the vGPU
- We had a card running the test fine, at lower frequencies. Then we increased the vGPU, and only the vGPU. This only increased the load on the VRM, and the temperature on the card. It did not have any impact on the frequencies and such. The test was made unstable by changing only the vGPU
- Conclusion : This shows that increasing the temps & VRM load makes the test unstable. We do not touch the frequencies, yet the problem is reproduced. We seem to have isolated the problem. The card IS the cause, as it is the only thing that changed between the tests.
Test 7 : Switching cards
- We switched the non-affected, because of specific design, powercolor HD4870 card, with a reference card. Problem was not happening with the powercolor card. problem showed up with the second card. Of course, same OS, same app, same driver version.
- Conclusion : i don't think software is at stake anymore
Lets's combine the conclusions :
Test 1 : non conclusive.
Test 2 : OS is not the cause.
Test 3 : A specific driver version i not the cause, or it is shared by all driver versions. Later conclusions shows that this is VERY unlikely.
Test 4 : Lowering the load make the test stable. With the very same algorithm behind. So the algorithm is supported by the card. Why only in a castrated version ?
Test 5 : Lowering the card frequencies instead of the castrating the test make the test stable again. The test ran at full power, which was unstable before. So the test is supported by the driver, and OS. This indicated a hardware failure.
Test 6 : making the test stable and increase the vGPU, and only this value, makes the test unstable. This changes only the teps and load on the VRM and GPU. This narrows down our research even more.
Test 7 : switching an unaffected, specific design card (the powercolor one) with an affected card makes the bug appear, where it never existed before. As everything else is the same in the PC but the graphic card, the card IS the cause.
Now, we came to the "82A" hypothesis (which is STILL a hypothesis, i remind you) because on ALL cards the crash was trigger when we reached this value, reported by rivatuner. Temps were rivatuner, but this was the common value shared by all cards.
And you got it. All we've done. All the research, with all our limited means.
My final point is : What can you do to :
- help me know what's going on ?
- Narrow why the thing is happening to some cards, and not on others ?
Remember, only G-DDR5 based cards (i.e. HD4870 and HD4890) seems affected by the problem, and only those with a 3-stage VRM power supply, which is the vast majority of the cards.
I am at work at the moment, so i can't do alot of research. Could you dig, for the card that works, what kind of VRM stages they use ? Reference ones, custom ones ? Thanks for your help
Last edited by Tetedeiench; 05-19-2009 at 11:31 PM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
Bookmarks