**UPDATE 05/23/2009**
Problem has been confirmed by professionnal testing done by french websites in their labs. (I'll let you use google translation to understand what's going on).
http://www.canardpc.com/news-36049-g...et_4890__.html
http://www.pcinpact.com/actu/news_multi/51011.htm
it is *NOT* related to Temperature, but to a weakness in the Power Supply stages of the Reference design HD4870/4890 cards from AMD. We don't know yet if it is the VRM themselves, or the OCP that triggers.
The problems occurs with OCCT GPU:3D test at stock frequencies. You can also make it appear with Furmark, renamed exe to bypass ATI/AMD limitations, by bumping the vGPU a little (Furmark is a tad less effective than GPU:3D on those cards, that's why).
It seems the problem occurs when you reach around 82A/83A on the VRMA on those cards. Use RivaTuner with the appropriate plugins for Monitoring.
Original Post :
First of all, if you want to reproduce the crash, be sure to read the Is there a specific test configuration to reproduce it ? section !
Hello guys,
Who are you ?
I'm the guy behind the well-known program OCCT, which has recently become an all-rounder stability check program (CPU, GPU, Power supply). English is not my native language, so please excuse the mistakes
What's the point ?
Recently, i've been working alot on the RC1 version, and especially on the GPU:3D test. It has been improved dramatically, and this new test revealed a hardware design flaw in the new Radeon HD4870/4890 cards who followed the ATI/AMD reference design. Cards like PowerColor 4870PCS+ are NOT affected, as they are custom design.
They basically crash because they can't handle the load. Early testing shows the VRM cannot supply enough power to the card. That's the only thing i could dig until now, with limited testing means. They seem limited to 82/83A. Please help us dig more !
For a 3d developer, that means : "Do not optimize your code too much for ATI cards, or you could reach the limits, and crash them".
What's this test doing ?
The new test is still a furry donut, but it sports, among new features, a shader complexity parameter, which is, as its names states, how complex the shader will be, i.e. the amount how work the graphic card will have to do in one pass. The highest value is not always the best. For HD4XXX cards, 3 is the best value.
Let me stress the following points :
- This test uses DirectX9, which is updated SEPARATLY from DirectX 10 and 11. Install it from Microsoft Website
- I do NOT use other functions than DirectX's basic function. Really, nothing fancy. Shader Model 3 shaders, alot of Alpha blending... and that's it !
Is there a specific test configuration to reproduce it ?
Yes. First download OCCT by going on the official website and grabbing the RC1 : http://www.ocbase.com/forum/viewtopic.php?f=5&t=68
Next, the goal is simple : maximise the GPU load. Here is the way to do so on those cards. Be sure to use these settings !
- Enable Fullscreen Mode
- Disable Errorcheck Mode (comparing images is NOT effective)
- Use a High resolution. Preferably the native resolution of your screen (i.e. 1680x1050 for a 22" LCD, etc)
- Shader Complexity 3 for HD4XXX cards
Click "Go" and watch your screen go black. Your card has gone into protection mode. Frequencies dropped to 200Mhz. Reboot is needed.
How many cards are affected ?
Right now, we've successfully crashed about 10 different cards using this test, using alot of different power supplies (ranging from 550W Antec to 1500W ToughPower (!!!). We had Seasonic, Corsairs... etc).
Why are you so sure your crappy test is not at stake here ?
Hey, very good question. Here is why :
- Underclocking the card makes the test run fine ! So my code is supported by the GPU and drivers.
- The test run on HD3XXX cards.
- The test runs when you lower the load somehow. You do that by : Lower the resolution drastically, lowering the Shader complexity parameter to 0,... (Enabling v-Sync DO NOT lower the load, as EVERY FRAME is still calculated, just not sent to the screen).
- We had a HD4870 that used a Powercolor-specific design that did NOT suffer the symptoms described here.
Are you 100% sure of what you're saying ?
No. 95% sure. I have limited testing hardware. In fact, my own computers, and my beta-testers. I couldn't contact ATI/AMD, they don't answer my mails yet. I tried to contact professionnal websites, so far no good. So i'd thought i'd start communicating the info to people who already supported OCCT before, who knew the program, and see what happens, if that would be of interest.
Do you have recording that shows the cards crashing ?
Yes. You'll see the RivaTuner screenshots of one of my testers. He used 2 HD4870s :
The first one is of PowerColor design (GPU0). The Vrm are a 4-phase instead of 3 in the reference design. More robust.
The 2nd one is the reference design (GPU1).
The first picture shows that everything is fine : the tests are increasing the cards power stage loads by changing the cards frequencies, everything runs fine, he doesn't hit the 82/83A border line on the GPU1 (reference design) card :
For those who wonders, here are the frequencies that were used :
* 500/500 for both
* 500/600 for both
* 500/700 for both
* 800/500 GPU0 750/500 GPU1
* 800/600 GPU0 750/600 GPU1
* 800/700 GPU0 750/700 GPU1
This picture shows you the test going into protection mode. The cards goes to 200Mhz. The Current spike is too brief for it to be recorded, unfortunatly :
That's it, you know everything about the problem. Please help us dig more about this problem, and if i'm wrong, please, oh please, correct me ! But i really do think we have something there...
If a hardware guru, or someone else, could conduct more advanced testing and confirm all this...
Bookmarks