 |
|
05-19-2009, 11:31 AM
|
#1
|
|
Xtreme Member
Join Date: Dec 2006
Posts: 211
|
OCCT 3.1.0 shows HD4870/4890 design flaw - they can't handle the new GPU test !
**UPDATE 05/23/2009**
Problem has been confirmed by professionnal testing done by french websites in their labs. (I'll let you use google translation to understand what's going on).
http://www.canardpc.com/news-36049-g...et_4890__.html
http://www.pcinpact.com/actu/news_multi/51011.htm
it is *NOT* related to Temperature, but to a weakness in the Power Supply stages of the Reference design HD4870/4890 cards from AMD. We don't know yet if it is the VRM themselves, or the OCP that triggers.
The problems occurs with OCCT GPU:3D test at stock frequencies. You can also make it appear with Furmark, renamed exe to bypass ATI/AMD limitations, by bumping the vGPU a little (Furmark is a tad less effective than GPU:3D on those cards, that's why).
It seems the problem occurs when you reach around 82A/83A on the VRMA on those cards. Use RivaTuner with the appropriate plugins for Monitoring.
Original Post :
First of all, if you want to reproduce the crash, be sure to read the Is there a specific test configuration to reproduce it ? section !
Hello guys,
Who are you ?
I'm the guy behind the well-known program OCCT, which has recently become an all-rounder stability check program (CPU, GPU, Power supply). English is not my native language, so please excuse the mistakes
What's the point ?
Recently, i've been working alot on the RC1 version, and especially on the GPU:3D test. It has been improved dramatically, and this new test revealed a hardware design flaw in the new Radeon HD4870/4890 cards who followed the ATI/AMD reference design. Cards like PowerColor 4870PCS+ are NOT affected, as they are custom design.
They basically crash because they can't handle the load. Early testing shows the VRM cannot supply enough power to the card. That's the only thing i could dig until now, with limited testing means. They seem limited to 82/83A. Please help us dig more !
For a 3d developer, that means : "Do not optimize your code too much for ATI cards, or you could reach the limits, and crash them".
What's this test doing ?
The new test is still a furry donut, but it sports, among new features, a shader complexity parameter, which is, as its names states, how complex the shader will be, i.e. the amount how work the graphic card will have to do in one pass. The highest value is not always the best. For HD4XXX cards, 3 is the best value.
Let me stress the following points : - This test uses DirectX9, which is updated SEPARATLY from DirectX 10 and 11. Install it from Microsoft Website
- I do NOT use other functions than DirectX's basic function. Really, nothing fancy. Shader Model 3 shaders, alot of Alpha blending... and that's it !
Is there a specific test configuration to reproduce it ?
Yes. First download OCCT by going on the official website and grabbing the RC1 : http://www.ocbase.com/forum/viewtopic.php?f=5&t=68
Next, the goal is simple : maximise the GPU load. Here is the way to do so on those cards. Be sure to use these settings !- Enable Fullscreen Mode
- Disable Errorcheck Mode (comparing images is NOT effective)
- Use a High resolution. Preferably the native resolution of your screen (i.e. 1680x1050 for a 22" LCD, etc)
- Shader Complexity 3 for HD4XXX cards
Click "Go" and watch your screen go black. Your card has gone into protection mode. Frequencies dropped to 200Mhz. Reboot is needed.
How many cards are affected ?
Right now, we've successfully crashed about 10 different cards using this test, using alot of different power supplies (ranging from 550W Antec to 1500W ToughPower (!!!). We had Seasonic, Corsairs... etc).
Why are you so sure your crappy test is not at stake here ?
Hey, very good question. Here is why : - Underclocking the card makes the test run fine ! So my code is supported by the GPU and drivers.
- The test run on HD3XXX cards.
- The test runs when you lower the load somehow. You do that by : Lower the resolution drastically, lowering the Shader complexity parameter to 0,... (Enabling v-Sync DO NOT lower the load, as EVERY FRAME is still calculated, just not sent to the screen).
- We had a HD4870 that used a Powercolor-specific design that did NOT suffer the symptoms described here.
Are you 100% sure of what you're saying ?
No. 95% sure. I have limited testing hardware. In fact, my own computers, and my beta-testers. I couldn't contact ATI/AMD, they don't answer my mails yet. I tried to contact professionnal websites, so far no good. So i'd thought i'd start communicating the info to people who already supported OCCT before, who knew the program, and see what happens, if that would be of interest.
Do you have recording that shows the cards crashing ?
Yes. You'll see the RivaTuner screenshots of one of my testers. He used 2 HD4870s :
The first one is of PowerColor design (GPU0). The Vrm are a 4-phase instead of 3 in the reference design. More robust.
The 2nd one is the reference design (GPU1).
The first picture shows that everything is fine : the tests are increasing the cards power stage loads by changing the cards frequencies, everything runs fine, he doesn't hit the 82/83A border line on the GPU1 (reference design) card :

For those who wonders, here are the frequencies that were used :
* 500/500 for both
* 500/600 for both
* 500/700 for both
* 800/500 GPU0 750/500 GPU1
* 800/600 GPU0 750/600 GPU1
* 800/700 GPU0 750/700 GPU1
This picture shows you the test going into protection mode. The cards goes to 200Mhz. The Current spike is too brief for it to be recorded, unfortunatly :
That's it, you know everything about the problem. Please help us dig more about this problem, and if i'm wrong, please, oh please, correct me ! But i really do think we have something there...
If a hardware guru, or someone else, could conduct more advanced testing and confirm all this...
Last edited by Tetedeiench; 05-23-2009 at 04:39 AM.
|
|
|
05-19-2009, 11:35 AM
|
#2
|
|
Xtreme Addict
Join Date: Oct 2006
Posts: 1,138
|
sorry but it just seems highly unlikely to me that if modern GPUs pull 100amps sometimes during full load on modern games, ATI wouldnt design a card that can only handle a max load of 83 amps.
__________________
Quote:
|
Originally Posted by Shintai
Imagine Microsoft said that Windows 7 would only work on Acer/HP/Dell/Lenovo machines.
Its just Apple thats full of BS
|
Tyan S2932G2NR-SI
AMD Opteron quads @ 2.6GHz (Shanghai) x2
16GB of Kingston DDR-800 ECC RAM
ATI 4890 1GB
ASUS M3A79-T Deluxe
Phenom II 940 @ 3.75GHz w/ 1.4v
NB and HTT @ 2.75GHz w/ 1.4v & 1.3v
G.Skill PI @ DDR1150 w/ 2.1v
ATI 4890 @ 1022MHz core
|
|
|
05-19-2009, 11:38 AM
|
#3
|
|
We swam on!
Join Date: Sep 2004
Posts: 1,289
|
What drivers were you using? Possible driver hiccup? If this is the case I would imagine that someone would have experienced this before outside of OCCT.
|
|
|
05-19-2009, 11:39 AM
|
#4
|
|
Xtreme Mentor
Join Date: Dec 2007
Posts: 2,560
|
why does the memory clock jump to almost 3200mhz when the core clock drops to 200 on that last picture showing it go into protection mode?
|
|
|
05-19-2009, 11:39 AM
|
#5
|
|
Xtreme Member
Join Date: Oct 2006
Location: South Africa
Posts: 386
|
Very interesting!
Nice work i think
__________________
920 c0
6gig Mushkin 1600 6-7-6-18 (Many thanks to TheGoatEater)
285gtx
Dfi x58 t3eh6
|
|
|
05-19-2009, 11:40 AM
|
#6
|
|
Xtreme Addict
Join Date: Jul 2008
Location: Shimla , India
Posts: 1,129
|
I can conform that the VRM's used on some of the 4850's were a joke they got reaallly hot while under load maybe your continues load made a few wear and tear "Due to heat" and lose some power for a fraction of a time when the core needed them the most. Heat does lower efficiency.
Try putting a heatsink on the VRM and see if that solves anything!!
|
|
|
05-19-2009, 11:42 AM
|
#7
|
|
Xtreme Member
Join Date: Mar 2008
Posts: 106
|
Unlikely or not, thats what the tests are showing. I'm getting a 4890 later this week, have a 4870 sitting in a friends PC, will test both within the next week and get back to you about this.
__________________
Mother used to say, if you want you'll find a way, but mother never danced through fire showers.
|
|
|
05-19-2009, 11:46 AM
|
#8
|
|
Xtreme Mentor
Join Date: Dec 2007
Posts: 2,560
|
im also not sure if you noticed, the Amps being put out are different for each, GPU1 is ahead by 2-5A across the board, except the last test where its lower by almost 5A, then the failure happens after youve hit the next jump.
i would adjust the program to give it a slight bit more complex render each frame. i really dont know if you could or not, but try to make it a very slight increase, and OC the card so you know that it will hit the cap of 82/83A before your program gives it the most complex image to render.
|
|
|
05-19-2009, 11:48 AM
|
#9
|
|
Xtreme Member
Join Date: Dec 2006
Posts: 211
|
Quote:
Originally Posted by EniGmA1987
sorry but it just seems highly unlikely to me that if modern GPUs pull 100amps sometimes during full load on modern games, ATI wouldnt design a card that can only handle a max load of 83 amps.
|
Well, we had a 4870 PCS+ (Asus) overclocked that could pull 106A, flawlessly, with the very same test configuration.
This value was pulled with an heavy overclock.
My test pulls about 87A (mesured on the non-reference design) @stock frequencies. That's an estimation.
|
|
|
05-19-2009, 11:49 AM
|
#10
|
|
Xtreme Member
Join Date: Dec 2006
Posts: 211
|
Quote:
Originally Posted by Reznik Akime
What drivers were you using? Possible driver hiccup? If this is the case I would imagine that someone would have experienced this before outside of OCCT.
|
We tried about 4 driver versions. As underclocking make the test work, the driver version problem is ruled out. Especially since the asus design, with the very same driver version, works.
My test pulls about 30/40% more power on the VRM than Crysis. That's why you don't see that outside OCCT.
|
|
|
05-19-2009, 11:52 AM
|
#11
|
|
Xtreme Member
Join Date: Dec 2006
Posts: 211
|
Quote:
Originally Posted by Manicdan
im also not sure if you noticed, the Amps being put out are different for each, GPU1 is ahead by 2-5A across the board, except the last test where its lower by almost 5A, then the failure happens after youve hit the next jump.
i would adjust the program to give it a slight bit more complex render each frame. i really dont know if you could or not, but try to make it a very slight increase, and OC the card so you know that it will hit the cap of 82/83A before your program gives it the most complex image to render.
|
It is doable, but it would only tell you, programatically, that it would fail @complexity 3.
We did alot of testing (using the frequencies, as it is the easiest way to increase the load on the VRM), and we came to the conclusion : if you go through the 82A barrier, your card go into protection mode.
problem is : a simple app as OCCT GPU can make this happen. And believe me : i'm not a 3d guru.
It is as if a CPU would not support Linpack, and crash.
|
|
|
05-19-2009, 11:54 AM
|
#12
|
|
Xtreme Guru
Join Date: Jul 2005
Posts: 4,372
|
Having done this what games does this show a problem with this new test of yours?
__________________
|
|
|
05-19-2009, 11:54 AM
|
#13
|
|
Xtreme Addict
Join Date: Apr 2006
Location: Cairo
Posts: 2,280
|
my HD4850 went past 115C with older version of the test , it is non reference HD4850 with zalman cooling so i wont be surprised if some cards will fail
__________________
Intel Core I7 920 @ 3.8GHZ 1.28V (Core Contact Freezer)
Asus X58 P6T
6GB OCZ Gold DDR3-1600MHZ 8-8-8-24
XFX HD5870
WD 1TB Black HD
Corsair 850TX
Cooler Master HAF 922
|
|
|
05-19-2009, 11:55 AM
|
#14
|
|
Xtreme Member
Join Date: Dec 2006
Posts: 211
|
UPDATE : the 4870 on the test is a 4870 PCS+ from PowerColor whose VRM are a 4-phase numerical VRM instead of 3, and that's why it is not crashing.
Sorry, i thought it was an Asus design. It is Not. My mistake.
|
|
|
05-19-2009, 11:59 AM
|
#15
|
|
Xtreme Guru
Join Date: Jul 2005
Posts: 4,372
|
Quote:
Originally Posted by Tetedeiench
UPDATE : the 4870 on the test is a 4870 PCS+ from PowerColor whose VRM are a 4-phase numerical VRM instead of 3, and that's why it is not crashing.
Sorry, i thought it was an Asus design. It is Not. My mistake.
|
So you basically designed a new test which is designed for 4-phase numerical VRM instead of 3?
__________________
Last edited by Eastcoasthandle; 05-19-2009 at 04:17 PM.
|
|
|
05-19-2009, 11:59 AM
|
#16
|
|
We swam on!
Join Date: Sep 2004
Posts: 1,289
|
Quote:
Originally Posted by Tetedeiench
We tried about 4 driver versions. As underclocking make the test work, the driver version problem is ruled out. Especially since the asus design, with the very same driver version, works.
My test pulls about 30/40% more power on the VRM than Crysis. That's why you don't see that outside OCCT.
|
Ah, gotcha. Well, this kinda miffs me. A latent problem with almost all early 48xx series, save for no refrence! I would test this on my own but I got transfers going and can't do a reboot.
|
|
|
05-19-2009, 12:00 PM
|
#17
|
|
.
Join Date: Dec 2007
Location: CR:IA
Posts: 268
|
Seems to me your putting an unrealistic load on the GPU anyways.
__________________
|
|
|
05-19-2009, 12:01 PM
|
#18
|
|
Xtreme Addict
Join Date: Nov 2005
Posts: 1,082
|
Design flaw? lol
You have a bench that don't run on those cards. So what, What does it prove?
Yes you can say that there are design flaws but it's not by this way.
By the way every chip these days have design flaws. Look at CPU erratas of Intel and AMD.
__________________
Quote:
Originally Posted by Shintai
And AMD is only a CPU manufactor due to stolen technology and making clones.
|
|
|
|
05-19-2009, 12:05 PM
|
#19
|
|
LOVE=MC²
Join Date: Dec 2002
Posts: 3,444
|
crashes? turn up the fan
fan settings has been plaguing the 3000/4000 series from the beginning
the cards that crash ^ need fan bios settings fix/manual fan settings
|
|
|
05-19-2009, 12:06 PM
|
#20
|
|
Xtreme Enthusiast
Join Date: May 2008
Posts: 622
|
IIRC the reference design has an over current and over voltage protection.
My guess is you are hitting this. So I dont think this is a design flaw, but an intended result of the design.
__________________
Quote:
Originally Posted by alacheesu
If you were consistently able to put two pieces of lego together when you were a kid, you should have no trouble replacing the pump top.
|
|
|
|
05-19-2009, 12:09 PM
|
#21
|
|
Xtreme Mentor
Join Date: Dec 2007
Posts: 2,560
|
Quote:
Originally Posted by Tetedeiench
It is doable, but it would only tell you, programatically, that it would fail @complexity 3.
We did alot of testing (using the frequencies, as it is the easiest way to increase the load on the VRM), and we came to the conclusion : if you go through the 82A barrier, your card go into protection mode.
problem is : a simple app as OCCT GPU can make this happen. And believe me : i'm not a 3d guru.
It is as if a CPU would not support Linpack, and crash.
|
going from 0 complexity to 3 would have too many major steps. the goal would be to having it increase in extremely small amounts and watch the power load across time.
it seems like the card that failed was increasing the Amps by a much larger amount, then reaching 80A gave in and couldnt provide anymore until it reached the failure point.
aslo could you show the total PC watts when running a game and your benchmark. so we can see load difference the is able to put onto the cards.
sounds like this is another version of furmark which ATI purposefully changed to run at a lower frequency on that program to prevent damage
|
|
|
05-19-2009, 12:09 PM
|
#22
|
|
LOVE=MC²
Join Date: Dec 2002
Posts: 3,444
|
Quote:
Originally Posted by Aberration
IIRC the reference design has an over current and over voltage protection.
My guess is you are hitting this. So I dont think this is a design flaw, but an intended result of the design.
|
thats a possibility.. then again it may just be bad fan settings/not working properly ^
|
|
|
05-19-2009, 12:16 PM
|
#23
|
|
Thrusters on full!
Join Date: Dec 2006
Location: Poland
Posts: 2,245
|
OCCT is a waste of time.....it's not ranked on HWBot
So the card has an OCP, and it's now considered a design flaw? Have you seen any game that puts that much load on a GPU? Have you at least created a special map for any game that would put that much load?
__________________
Quote:
Originally Posted by miahallen
FurMark is a waste of time.....it's not ranked on HWBot 
|
W3520 on Rampage II Gene | hardmounted TRUE + 2x 120mm push-pull | 3x 2gig of HCF0
Zombie GTX280 | look mom, i has no IHS | Thermalright HR-03 GT
Lian-Li PC-A70B | Brushed aluminium tank
MOA 2009 Poland #2
Test bench: Clarkdale, several AMD Confidentials
|
|
|
05-19-2009, 12:18 PM
|
#24
|
|
Xtreme Mentor
Join Date: Jul 2007
Location: Austria
Posts: 3,469
|
Quote:
Originally Posted by xoqolatl
OCCT is a waste of time.....it's not ranked on HWBot
So the card has an OCP, and it's now considered a design flaw? Have you seen any game that puts that much load on a GPU? Have you at least created a special map for any game that would put that much load?
|
Try furmark and rename the exe, its the same (or nearly the same).
If you rename the exe on furmark its but so much load on the card that vrms can reach 130°C+ and if you let them run for time -> *poof*
__________________
|
|
|
05-19-2009, 12:29 PM
|
#25
|
|
Xtreme Member
Join Date: Dec 2008
Location: Sweden
Posts: 286
|
Quote:
Originally Posted by Hornet331
Try furmark and rename the exe, its the same (or nearly the same).
If you rename the exe on furmark its but so much load on the card that vrms can reach 130°C+ and if you let them run for time -> *poof* 
|
You play furmark often? :P
This seems like a new furmark so it wasn't that shocking to me. One thing I find interesting is that the ATI cards seem to have a lot more power "left" compared to Nvidia's since power consumption goes through the roof with the former and not so much with the latter (please correct me if I'm using that wrong).
|
|
|
 |
|
| Thread Tools |
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -8. The time now is 04:17 PM.
|