Page 1 of 30 123411 ... LastLast
Results 1 to 25 of 730

Thread: OCCT 3.1.0 shows HD4870/4890 design flaw - they can't handle the new GPU test !

  1. #1
    Xtreme Member
    Join Date
    Dec 2006
    Posts
    213

    OCCT 3.1.0 shows HD4870/4890 design flaw - they can't handle the new GPU test !

    **UPDATE 05/23/2009**

    Problem has been confirmed by professionnal testing done by french websites in their labs. (I'll let you use google translation to understand what's going on).
    http://www.canardpc.com/news-36049-g...et_4890__.html
    http://www.pcinpact.com/actu/news_multi/51011.htm

    it is *NOT* related to Temperature, but to a weakness in the Power Supply stages of the Reference design HD4870/4890 cards from AMD. We don't know yet if it is the VRM themselves, or the OCP that triggers.

    The problems occurs with OCCT GPU:3D test at stock frequencies. You can also make it appear with Furmark, renamed exe to bypass ATI/AMD limitations, by bumping the vGPU a little (Furmark is a tad less effective than GPU:3D on those cards, that's why).

    It seems the problem occurs when you reach around 82A/83A on the VRMA on those cards. Use RivaTuner with the appropriate plugins for Monitoring.



    Original Post :





    First of all, if you want to reproduce the crash, be sure to read the Is there a specific test configuration to reproduce it ? section !

    Hello guys,

    Who are you ?
    I'm the guy behind the well-known program OCCT, which has recently become an all-rounder stability check program (CPU, GPU, Power supply). English is not my native language, so please excuse the mistakes

    What's the point ?
    Recently, i've been working alot on the RC1 version, and especially on the GPU:3D test. It has been improved dramatically, and this new test revealed a hardware design flaw in the new Radeon HD4870/4890 cards who followed the ATI/AMD reference design. Cards like PowerColor 4870PCS+ are NOT affected, as they are custom design.

    They basically crash because they can't handle the load. Early testing shows the VRM cannot supply enough power to the card. That's the only thing i could dig until now, with limited testing means. They seem limited to 82/83A. Please help us dig more !

    For a 3d developer, that means : "Do not optimize your code too much for ATI cards, or you could reach the limits, and crash them".

    What's this test doing ?
    The new test is still a furry donut, but it sports, among new features, a shader complexity parameter, which is, as its names states, how complex the shader will be, i.e. the amount how work the graphic card will have to do in one pass. The highest value is not always the best. For HD4XXX cards, 3 is the best value.

    Let me stress the following points :
    • This test uses DirectX9, which is updated SEPARATLY from DirectX 10 and 11. Install it from Microsoft Website
    • I do NOT use other functions than DirectX's basic function. Really, nothing fancy. Shader Model 3 shaders, alot of Alpha blending... and that's it !


    Is there a specific test configuration to reproduce it ?
    Yes. First download OCCT by going on the official website and grabbing the RC1 : http://www.ocbase.com/forum/viewtopic.php?f=5&t=68

    Next, the goal is simple : maximise the GPU load. Here is the way to do so on those cards. Be sure to use these settings !
    • Enable Fullscreen Mode
    • Disable Errorcheck Mode (comparing images is NOT effective)
    • Use a High resolution. Preferably the native resolution of your screen (i.e. 1680x1050 for a 22" LCD, etc)
    • Shader Complexity 3 for HD4XXX cards

    Click "Go" and watch your screen go black. Your card has gone into protection mode. Frequencies dropped to 200Mhz. Reboot is needed.

    How many cards are affected ?
    Right now, we've successfully crashed about 10 different cards using this test, using alot of different power supplies (ranging from 550W Antec to 1500W ToughPower (!!!). We had Seasonic, Corsairs... etc).

    Why are you so sure your crappy test is not at stake here ?
    Hey, very good question. Here is why :
    • Underclocking the card makes the test run fine ! So my code is supported by the GPU and drivers.
    • The test run on HD3XXX cards.
    • The test runs when you lower the load somehow. You do that by : Lower the resolution drastically, lowering the Shader complexity parameter to 0,... (Enabling v-Sync DO NOT lower the load, as EVERY FRAME is still calculated, just not sent to the screen).
    • We had a HD4870 that used a Powercolor-specific design that did NOT suffer the symptoms described here.


    Are you 100% sure of what you're saying ?
    No. 95% sure. I have limited testing hardware. In fact, my own computers, and my beta-testers. I couldn't contact ATI/AMD, they don't answer my mails yet. I tried to contact professionnal websites, so far no good. So i'd thought i'd start communicating the info to people who already supported OCCT before, who knew the program, and see what happens, if that would be of interest.

    Do you have recording that shows the cards crashing ?
    Yes. You'll see the RivaTuner screenshots of one of my testers. He used 2 HD4870s :
    The first one is of PowerColor design (GPU0). The Vrm are a 4-phase instead of 3 in the reference design. More robust.
    The 2nd one is the reference design (GPU1).

    The first picture shows that everything is fine : the tests are increasing the cards power stage loads by changing the cards frequencies, everything runs fine, he doesn't hit the 82/83A border line on the GPU1 (reference design) card :

    For those who wonders, here are the frequencies that were used :
    * 500/500 for both
    * 500/600 for both
    * 500/700 for both
    * 800/500 GPU0 750/500 GPU1
    * 800/600 GPU0 750/600 GPU1
    * 800/700 GPU0 750/700 GPU1

    This picture shows you the test going into protection mode. The cards goes to 200Mhz. The Current spike is too brief for it to be recorded, unfortunatly :


    That's it, you know everything about the problem. Please help us dig more about this problem, and if i'm wrong, please, oh please, correct me ! But i really do think we have something there...

    If a hardware guru, or someone else, could conduct more advanced testing and confirm all this...
    Last edited by Tetedeiench; 05-23-2009 at 03:39 AM.

  2. #2
    Xtreme Addict
    Join Date
    Oct 2006
    Posts
    2,141
    sorry but it just seems highly unlikely to me that if modern GPUs pull 100amps sometimes during full load on modern games, ATI wouldnt design a card that can only handle a max load of 83 amps.
    Rig 1:
    ASUS P8Z77-V
    Intel i5 3570K @ 4.75GHz
    16GB of Team Xtreme DDR-2666 RAM (11-13-13-35-2T)
    Nvidia GTX 670 4GB SLI

    Rig 2:
    Asus Sabertooth 990FX
    AMD FX-8350 @ 5.6GHz
    16GB of Mushkin DDR-1866 RAM (8-9-8-26-1T)
    AMD 6950 with 6970 bios flash

    Yamakasi Catleap 2B overclocked to 120Hz refresh rate
    Audio-GD FUN DAC unit w/ AD797BRZ opamps
    Sennheiser PC350 headset w/ hero mod

  3. #3
    Xtreme Addict
    Join Date
    Sep 2004
    Posts
    1,023
    What drivers were you using? Possible driver hiccup? If this is the case I would imagine that someone would have experienced this before outside of OCCT.

  4. #4
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    why does the memory clock jump to almost 3200mhz when the core clock drops to 200 on that last picture showing it go into protection mode?

  5. #5
    Xtreme Member
    Join Date
    Oct 2006
    Location
    South Africa
    Posts
    388
    Very interesting!

    Nice work i think
    920 c0
    6gig Mushkin 1600 6-7-6-18 (Many thanks to TheGoatEater)
    285gtx
    Dfi x58 t3eh6

  6. #6
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    I can conform that the VRM's used on some of the 4850's were a joke they got reaallly hot while under load maybe your continues load made a few wear and tear "Due to heat" and lose some power for a fraction of a time when the core needed them the most. Heat does lower efficiency.

    Try putting a heatsink on the VRM and see if that solves anything!!

  7. #7
    Xtreme Member
    Join Date
    Mar 2008
    Posts
    170
    Unlikely or not, thats what the tests are showing. I'm getting a 4890 later this week, have a 4870 sitting in a friends PC, will test both within the next week and get back to you about this.
    Quote Originally Posted by ryba View Post
    I don't carre about PCMark - it's for gays with moustache
    A Stranger's thoughts...

  8. #8
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    im also not sure if you noticed, the Amps being put out are different for each, GPU1 is ahead by 2-5A across the board, except the last test where its lower by almost 5A, then the failure happens after youve hit the next jump.

    i would adjust the program to give it a slight bit more complex render each frame. i really dont know if you could or not, but try to make it a very slight increase, and OC the card so you know that it will hit the cap of 82/83A before your program gives it the most complex image to render.

  9. #9
    Xtreme Member
    Join Date
    Dec 2006
    Posts
    213
    Quote Originally Posted by EniGmA1987 View Post
    sorry but it just seems highly unlikely to me that if modern GPUs pull 100amps sometimes during full load on modern games, ATI wouldnt design a card that can only handle a max load of 83 amps.
    Well, we had a 4870 PCS+ (Asus) overclocked that could pull 106A, flawlessly, with the very same test configuration.

    This value was pulled with an heavy overclock.

    My test pulls about 87A (mesured on the non-reference design) @stock frequencies. That's an estimation.

  10. #10
    Xtreme Member
    Join Date
    Dec 2006
    Posts
    213
    Quote Originally Posted by Reznik Akime View Post
    What drivers were you using? Possible driver hiccup? If this is the case I would imagine that someone would have experienced this before outside of OCCT.
    We tried about 4 driver versions. As underclocking make the test work, the driver version problem is ruled out. Especially since the asus design, with the very same driver version, works.

    My test pulls about 30/40% more power on the VRM than Crysis. That's why you don't see that outside OCCT.

  11. #11
    Xtreme Member
    Join Date
    Dec 2006
    Posts
    213
    Quote Originally Posted by Manicdan View Post
    im also not sure if you noticed, the Amps being put out are different for each, GPU1 is ahead by 2-5A across the board, except the last test where its lower by almost 5A, then the failure happens after youve hit the next jump.

    i would adjust the program to give it a slight bit more complex render each frame. i really dont know if you could or not, but try to make it a very slight increase, and OC the card so you know that it will hit the cap of 82/83A before your program gives it the most complex image to render.
    It is doable, but it would only tell you, programatically, that it would fail @complexity 3.

    We did alot of testing (using the frequencies, as it is the easiest way to increase the load on the VRM), and we came to the conclusion : if you go through the 82A barrier, your card go into protection mode.

    problem is : a simple app as OCCT GPU can make this happen. And believe me : i'm not a 3d guru.

    It is as if a CPU would not support Linpack, and crash.

  12. #12
    I am Xtreme
    Join Date
    Jul 2005
    Posts
    4,811
    Having done this what games does this show a problem with this new test of yours?
    [SIGPIC][/SIGPIC]

  13. #13
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    Cairo
    Posts
    2,366
    my HD4850 went past 115C with older version of the test , it is non reference HD4850 with zalman cooling so i wont be surprised if some cards will fail
    Intel Core I7 920 @ 3.8GHZ 1.28V (Core Contact Freezer)
    Asus X58 P6T
    6GB OCZ Gold DDR3-1600MHZ 8-8-8-24
    XFX HD5870
    WD 1TB Black HD
    Corsair 850TX
    Cooler Master HAF 922

  14. #14
    Xtreme Member
    Join Date
    Dec 2006
    Posts
    213
    UPDATE : the 4870 on the test is a 4870 PCS+ from PowerColor whose VRM are a 4-phase numerical VRM instead of 3, and that's why it is not crashing.

    Sorry, i thought it was an Asus design. It is Not. My mistake.

  15. #15
    I am Xtreme
    Join Date
    Jul 2005
    Posts
    4,811
    Quote Originally Posted by Tetedeiench View Post
    UPDATE : the 4870 on the test is a 4870 PCS+ from PowerColor whose VRM are a 4-phase numerical VRM instead of 3, and that's why it is not crashing.

    Sorry, i thought it was an Asus design. It is Not. My mistake.
    So you basically designed a new test which is designed for 4-phase numerical VRM instead of 3?
    Last edited by Eastcoasthandle; 05-19-2009 at 03:17 PM.
    [SIGPIC][/SIGPIC]

  16. #16
    Xtreme Addict
    Join Date
    Sep 2004
    Posts
    1,023
    Quote Originally Posted by Tetedeiench View Post
    We tried about 4 driver versions. As underclocking make the test work, the driver version problem is ruled out. Especially since the asus design, with the very same driver version, works.

    My test pulls about 30/40% more power on the VRM than Crysis. That's why you don't see that outside OCCT.
    Ah, gotcha. Well, this kinda miffs me. A latent problem with almost all early 48xx series, save for no refrence! I would test this on my own but I got transfers going and can't do a reboot.

  17. #17
    Xtreme Member
    Join Date
    Dec 2007
    Location
    CR:IA
    Posts
    384
    Seems to me your putting an unrealistic load on the GPU anyways.
    PC-A04 | Z68MA-ED55 | 2500k | 2200+ XPG | 7970 | 180g 520 | 2x1t Black | X3 1000w

  18. #18
    Xtreme Addict
    Join Date
    Nov 2005
    Posts
    1,084
    Design flaw? lol

    You have a bench that don't run on those cards. So what, What does it prove?
    Yes you can say that there are design flaws but it's not by this way.

    By the way every chip these days have design flaws. Look at CPU erratas of Intel and AMD.
    Quote Originally Posted by Shintai View Post
    And AMD is only a CPU manufactor due to stolen technology and making clones.

  19. #19
    Xtreme Guru
    Join Date
    Dec 2002
    Posts
    4,046
    crashes? turn up the fan

    fan settings has been plaguing the 3000/4000 series from the beginning

    the cards that crash ^ need fan bios settings fix/manual fan settings

  20. #20
    Xtreme Addict
    Join Date
    May 2008
    Posts
    1,192
    IIRC the reference design has an over current and over voltage protection.

    My guess is you are hitting this. So I dont think this is a design flaw, but an intended result of the design.
    Quote Originally Posted by alacheesu View Post
    If you were consistently able to put two pieces of lego together when you were a kid, you should have no trouble replacing the pump top.

  21. #21
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by Tetedeiench View Post
    It is doable, but it would only tell you, programatically, that it would fail @complexity 3.

    We did alot of testing (using the frequencies, as it is the easiest way to increase the load on the VRM), and we came to the conclusion : if you go through the 82A barrier, your card go into protection mode.

    problem is : a simple app as OCCT GPU can make this happen. And believe me : i'm not a 3d guru.

    It is as if a CPU would not support Linpack, and crash.
    going from 0 complexity to 3 would have too many major steps. the goal would be to having it increase in extremely small amounts and watch the power load across time.
    it seems like the card that failed was increasing the Amps by a much larger amount, then reaching 80A gave in and couldnt provide anymore until it reached the failure point.

    aslo could you show the total PC watts when running a game and your benchmark. so we can see load difference the is able to put onto the cards.

    sounds like this is another version of furmark which ATI purposefully changed to run at a lower frequency on that program to prevent damage

  22. #22
    Xtreme Guru
    Join Date
    Dec 2002
    Posts
    4,046
    Quote Originally Posted by Aberration View Post
    IIRC the reference design has an over current and over voltage protection.

    My guess is you are hitting this. So I dont think this is a design flaw, but an intended result of the design.

    thats a possibility.. then again it may just be bad fan settings/not working properly ^

  23. #23
    all outta gum
    Join Date
    Dec 2006
    Location
    Poland
    Posts
    3,390
    OCCT is a waste of time.....it's not ranked on HWBot

    So the card has an OCP, and it's now considered a design flaw? Have you seen any game that puts that much load on a GPU? Have you at least created a special map for any game that would put that much load?
    www.teampclab.pl
    MOA 2009 Poland #2, AMD Black Ops 2010, MOA 2011 Poland #1, MOA 2011 EMEA #12

    Test bench: empty

  24. #24
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by xoqolatl View Post
    OCCT is a waste of time.....it's not ranked on HWBot

    So the card has an OCP, and it's now considered a design flaw? Have you seen any game that puts that much load on a GPU? Have you at least created a special map for any game that would put that much load?
    Try furmark and rename the exe, its the same (or nearly the same).

    If you rename the exe on furmark its but so much load on the card that vrms can reach 130°C+ and if you let them run for time -> *poof*

  25. #25
    Xtreme Member
    Join Date
    Dec 2008
    Location
    Sweden
    Posts
    450
    Quote Originally Posted by Hornet331 View Post
    Try furmark and rename the exe, its the same (or nearly the same).

    If you rename the exe on furmark its but so much load on the card that vrms can reach 130°C+ and if you let them run for time -> *poof*
    You play furmark often? :P

    This seems like a new furmark so it wasn't that shocking to me. One thing I find interesting is that the ATI cards seem to have a lot more power "left" compared to Nvidia's since power consumption goes through the roof with the former and not so much with the latter (please correct me if I'm using that wrong).

Page 1 of 30 123411 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •