not sure about you, but id much rather run the program for 4 minutes and tell my my comp is unstable, then run a game for 4 hrs and still not know for sure.
thats the whole point of it. to know if your stable or not.
I don't see the point of this, everything has a limit, you push it enough, it crashs, am sure you could produce the same on a nVidia card if you push enough amps or whatever through it.
Meh.
It's interesting that nobody has asked what exactly the test is stressing. The reason games with "complex" shaders don't stress hardware as much is that various functional units are often idle waiting on high latency memory or texturing operations.
If Tetedeiench's test is very math heavy with with high shader utilization for extended periods of time that should throw up a red flag for any sort of GPGPU applications. All of the excuses about games not stressing hardware are bollocks as it's trivial to whip up an OpenCL or CS application that runs full tilt on the shader core.
Tetedeiench, you wouldn't be willing to share your source code would you?
lets not mention nv
it crashed all three of my 4890's....but hell i hate the prime/furmark stuff anyways
cpu or gpu must pass ANY stress test at stock speeds no matter what. As long as it is not drivers, system or test itself that is flawed :)
Can you imagine AMD telling some researcher not to optimize his algorithm for maximum utilization because the hardware isn't meant for that? But we really can't say anything for sure unless we see the shader code. Because it might be doing something very inefficient that has no performance or IQ benefit.
I think there is definitely a design flaw or something about the 4890.
When I bench 3dmark06. I can run the 4890 at 1000 mhz(this is with 300 cfm of air directly on top mind you and 100 percent fan setting). When I run 3dmark 05 I can run it at 975, when I run 3dmark03, I can't even run it at 945.
I don't see why there is so much variability in the clocks of these card. I think stability is definitely an issue when there is such variation in runable bench speeds. AMD needs to stop this marketing propaganda, e.g anandtech overclock extravaganza, that these cards are so bulletproof for overclocking, when they might have potential issues in the future even a stock speed.
I think these type of tests are important because they give show how well a card is built for the long term. I have used this analogy before, but I akin it to the test of elevators. We don't test elevators using only it rated weight. They test way way beyond that to ensure longterm strength. If something can run at 150% percent capacity then running at its designed specs will be a cake walk and thus running at that speed for a couple years will be ensured.
I am sure there is a reason company like ATI exclusive companies like powercolor have reduced their warranty times from lifetime to 1 year. The engineers are the best guys at knowing this type of stuff. Diamonds another company that offers 1 year warranty. I think AMD cards are not built for longterm lifetime warranties. Sapphire has by far the best warranty as an AMD exclusive(two or three years), and they are famous for their bad RMA service.
Interesting. You did run the test in FullScreen mode, did you ?
I wonder if Sapphire did use the reference design VRM for its card... would be an interesting thing to look for.
I'm going to sleep (it's more than time for me to do so), i'll keep looking at this thread tomorrow. Please don't start a flame war, i'm just trying to get this thing sorted out, and to know what's going on...
yea, trinibwoy, and maybe I want to use OCCT as my screensaver :D If it cant run that then I return my product back as faulty :)
Before going to sleep, i have to answer that : sorry, but the answer is no.
I don't think there are people skilled enough to analyze the effect produce on the die of a GPU by a particular shader instruction and how that will affect the crash we're encountering.
Don't you think ?
I've always kept my code for myself for the following reasons :
- I like to know where my code is used, for what purpose, by whom
- I don't want to see branches popping everywhere
- I don't want comments on my way of coding ;)
So sorry, the answer is no ;)
So a new power virus...
If I (as well as others) are not experiencing any problems I really don't see the need for it at this time. :up:
Yet the games and programs currently available don't provide this problem. So the question in your case becomes: Do I use my video card for that particular program or for the games I play. My answer is for the games I play. If a person is not experiencing any problems I simply don't see the need for it.
Then occt doesn't apply to you.
GPUpi might be better suited.
Its for those who use Linpack , superpi 32MB x2 , prime95 32bit/x64, wprime, folding@home , etc
Since you don't use OCCT gpu test, you don't have a problem, thus why complain? :cool:
What this shows, is that if developers coded to make 100% use of the shaders available on the 4870 / 4890 , the cards would crash.
At least thats my opinion
No, I've been asked a few times to use it. I am answering why it's not necessary. Anyone can use their stock (for example) CPU and GPU to run those applications. Obtain a result and may or may not reflect your expectations in other programs and games used. :D
I don't have a compliant, but have asked what purpose does this serve if folk using any video card currently have no issues with the programs/games (Super PI, FAH, etc) they use. ;)
Based on what exactly? Has this been tested by a gaming developer already? If so, what game is that?Quote:
What this shows, is that if developers coded to make 100% use of the shaders available on the 4870 / 4890 , the cards would crash.
At least thats my opinion
It can help explain, why some are stuck with what can be considered an un-satisfactory overclock :D
If you believe that any game out there currently, makes 100% use of these gpu's. I will never be able to convince you of anything new, thus arguing with you is 100% pointless, much like how you feel this program is.
What I am saying is, we may never see more performance being unlocked through the optimization of code, or more efficient F@H GPU cores to make use of all these gpu's have to offer, due to this new found flaw.
But I am going to end it here. This back and forth really hasn't lead to anything constructive concerning this use, enjoy!
A theory , has no facts.
This has multiple facts now.
FurMark and OCCT
Lets not turn this into a co-incidence w/o proof now.
i've read all comments and i must say some of you just dont know what hes talking about(i'll give you no possible reasons...)
all he wants is you try his benchmark and report your results so he can if it persists or if hes mistaken. He already begged you twice, so this should be enough.
And as one said everything will eventually reach its limits. but its nevertheless interesting that pwm is holding the core down from reaching its limits....so i dont think the core is flawed just the powermanagement is insufficient.
some should think about their reading skills, cause its important to comprehend texts.:2cents:
Anyone with watercooling or the fan at 100% try this test? It might be a case of insufficient cooling rather then insufficient VRMs.
Its something that carried over from an purely NV company those.
Also I think another problem is these things are running way to close to maximum capacity. If you look at the water cooling section, alot of chips die from vrm damage. It might be due to some overvolting, but I think alot of it has to do because with air you atleast have the guarantee air is blowing over it. With water, contact might be weak and eventually you lose the card.
These things are running way to hot and I think they should have over budget a bit rather than run just enough for these cards.
I don't even think these things should even have to be cooled. I think they should put enough or better quality so they are not so close to running at max spec.
Hey tetediench I have tried your test with both fan at 100% and with it stock. My card reacts quickly without fan manually set to 100% and runs the fan to 100% quicly whent he test is applied and the sapphire runs your test without problems. The test does seem to ramp up pressure on the card slowly and take it to 100 percent over about 20 seconds so thats good. It gives the card time to respond with fan . I think you are right. I think some cards are not being givin adequit on card power supplies to handle the load your test creates because this test really maxes the card out quickly. Perhaps a lot of manufacturers did see that most games cant take full advantage of the cards and in making them fudged a bit on the PS end of them to save money. It would be good if guys in here would stop bickering off topic and posturing as gpu know-it-alls and just help out by posting results of thier cards. I think having a test that can max out any hardware is a good thing. It can help seperate the wheat from the chaff and be a very nice aid when making hardware purchase decisions.
What 'specifications' is the card exceeding? I read through your first post but found no mention of design specs, nor links to published specs on AMD's website.
Your rather limited lack of power supplies (e.g. there's nothing special about the card crashing under both 550W and 1500W power supplies so long as the PSU itself is able to adequately supply power) as well as some rather broad sweeping generalizations you make (optimizing 3D code too much will crash the card) calls your claims into serious question.
Hardware should never fail due to software giving it valid instructions, end of story. Period. You whiners cant cope with the idea of having faulty hardware.
This. If it does, it's faulty. People saying "But but but there are no applications (games) that load the GPU like this!" is just hilarious. I'd love to see what they'll say when company X releases a game, GPGPU software or whatever that loads the GPU at this same level and the GPU turns off in consequence. Or when they're doing their favorite task with their CPU and the computer turns off, then at startup a popup appears saying "load was too high, please don't stress me so much". LOL :ROTF: :eh:
[EMO]I'm so depressed that my CPU can't handle 365 days straight of Prime95 and i feel like killing myself because my graphics card cannot handle hours on end of OCCT and Furmark. I'm so sad, why won't anyone listen to me.[/EMO]
Show me the cards crashing on a REAL application, then I'll be vaguely interested.
How is that a question?
Where was I referring to an errata?
I read where the OP supposedly has this HUGE sample of cards that crash... whoops that was only 2 cards and 1 of them works.
In this thread I read where 3 people had a problem even getting the app to work and another 2 people that have a "problem card" but it worked just fine.
So right now... 4 samples:
1 reference card crashes
1 non-reference card works
2 reference cards work
also 3 cannot get app to work...
Obviously there is a problem with AMD/ATi making faulty cards, let's send all cards they make back and boycott them, that is the only solution.:rolleyes:
Some of us like numbers and statistics to base our claim. Others like to jump to conclusions.
Edit- To the people complaining about AMD/ATi fanboys and all the supposed whining going on in this thread.
The OP tested 2 cards, jumped to a conclusion that all RV770 cards are faulty because they cannot handle his new power virus.
The title is very misleading and shows a large bias. What he could have/should have done is simply ask people here or in the GPU section to try out his new stability app and grab a consensus. After getting a good number of samples, then you might be able to post a thread like this, SHOWING the numbers.
This path he took is simply very wrong and a sad way to try and get attention for his app.
Wow, this thread quickly points out who are the fanboys out there and who are sane and willing to be reasoned with.
Wow, false, FALSE. FALSE!
A theory has plenty of facts. You're reminding me of the argument around the theory of evolution and how a lot of evolution-doubters claim it doesn't have facts because it's a theory.
The truth is, the word theory and law are very misleading to the general public. Just like Newton's law's of gravity SEEM to apply universally - the truth is, they also break down and fail at certain levels in the universe - like at the atomic level, or the massive scale levels. Same for the theory of relatively (which has been tested repeatedly in particle accelerators and in observations, so there's plenty of facts that have resulted in proving that the equations and premises work)
I could argue about this all day since I did plenty of EE work in my day, but these conversations get brought up daily around the interwebs.
Until the OP is willing to release the source code, then you can't claim ththe cards are failing because the program was proper. Since the OP is unwilling to release the source code, no logical conclusion can be made that a piece of hardware is failing due to PROPER software. I quote:
So I don't care about whether people can understand your code. There are plenty of board members here who can understand shader instructions or coding, and if not, certainly experts out there can read this and understand it.
The simple fact is, unless you can prove you're giving the GPU proper instructions and that the software cannot be to blame, then you can't conclude the hardware is at fault while the software side remains to be seen.
So to tie this back in with the science lesson, this is why research work is PEER reviewed! Until then, its just hypothesizing. And without the proper checking of work, conclusions are pure FUD.
Disassemble and reverse engineer his source from the binary then.
I believe it is the OCP "Over Current Protection"
Us Overclockers on LN2 have this problem with most cards when raising the Vcore well beyond Spec and benching long hard apps, you can overcome this problem by performing a physical modification of the card that disables or increases the parameters for the Over Current Protection.
"OCP" is different from "OVP" but both are similar "OVP" is "Over Voltage Protection" a similar result would be seen if his application slowly increased the Core voltage once the "OVP" parameter is surpassed the card will fail safe and shut down or "throttle" in an attempt to protect itself.
-------OP-------
can you remove the heatsink of a reference card and the ASUS non-reference card and tell me what Vreg is being used on each?
also for the non0referance ASUS card can you tell me what amperage the GPU is pulling during your test?
and what motherboard are you using? does your BIOS support the adjustment of PCI Express power allocation? ie increasing the maximum PCI Express slot power draw.
Exactly. The guy said it was just DirectX and we have no reason to doubt him. So unless you're willing to make the leap and claim some combination of DX shaders constitutes a "power virus" the code isn't doing anything evil.
I don't know of any high level language or api in the x86 world for which you could make the kind of excuses popping up in the thread. But then again, GPUs don't have to be anywhere as stable as CPUs do. That's all changing now though.
To every one saying that's the application flute...then why don't we see any other NV cards fail? why don't we see any other OLD ATI cards fail? why its only HD 4870 / HD 4890 ? why not GTX 260 or HD 4650 for example? its logic every other card works BUT the HD4870/HD4890 and some of them whit a good quality PCB works so in logic again it the application flute? of course not if every other card works then there is something wrong with this one..end of story.
so please less blabla chit chat more testing..
thank you.
How long do I have to run it for it crash?
I only let run for an hour
http://i231.photobucket.com/albums/e...hr/occtpic.jpg
The answers to this question...
...are yes or no. And why if you like. You haven't answered it yet.Quote:
So now you're comparing a crash caused by an erratum with one caused by a crappy PWM?
Some of us use our little and insignificant brain to think about why ATI has capped FurMark, which is a less demanding application than this one, because it burned a few cards' PWM. Then we wonder about what will happen when we test this thing in more cards, like it happened with FurMark. The same people said back then bah, this application can't be right. It was so right that ATI capped it. What a surprise. However, right now it's just a conclusion indeed.
I've tested it in my 4870 and it FAILS. Let me ask you another question (will you answer this one?): will you admit the cards have such problem when ATI caps OCCT in a future driver release like it will probably happen?
Anybody can. Why? Because the same cards that fail the test at default clocks pass it with 0 problems if they're underclocked. This has been proved too in this thread several times by the author you don't want to believe. Or did you miss it?
I do not think that ati says anywhere in their warranty/proper use of hardware, about occt or test programs , other than games.
But i believe that in order to cut costs, some cards may have insufficient vrm's...
http://www.xtremesystems.org/forums/...ghlight=4870x2
Heres is an example of what I was talking about earlier.
Really when did you test it?
Did you read the thread?
Why didn't you post your results before?
This thread is obviously turning into a crap shoot thanks to the way the OP went about this thread.
Wtf?
Learn to read. He had a 4800 running that didn't crash... Congrats.
Who cares about this OCCT synthetics stress test? We have no reports of massive HD4xxx failures nor did any users who OC and overvolt their cards reported such behavior,even in most stressful tests like Crysis ,FarCry2 etc.
On the other hand ,if anyone likes to tests the quality of board design and VRM circuitry let him run this test and see if it bricks his card.He can say later his radeon has a flaw and RMA it for Nv card:p:
Scientific method anyone? Did we all skip grade school? I mean if his idea is bunk at least disprove it using tests and evidence based off of what he has provided. Glad to see a thread with a simple request can still dissolve into a flame war.
i will agree with this only for the simple fact that people, and i mean overclockers base their entire system stability on tests that are designed to stress their hardware in ways that you would NEVER, let me repeat that N E V E R stress your PC under your normal daily PC use.Quote:
Who cares about this OCCT synthetics stress test? We have no reports of massive HD4xxx failures
even under the biggest gaming load you cant stress it the way OCCT does for CPU/MEM/PSU/GPU
its just not going to happen and for people to keep continually believing that OCCT/Futuremark/Prime/etc are the golden rule for system stability is absolutely f'ing childish.
if your system is stable doing what you normally do then its stable.... period.
4 samples hardly prove or disprove someones theory, especially with a benchmark program. We'd need numbers on how many cards were sold, and then narrow it down to the affected models to get an effective sample size (even then it would be a flawed study because everyone's rig is different, too many variables to get a good standard study). It really comes down to how the cards were manufactured, and if it is a hardware issue, why are people raising such a fuss? Yeah it sucks, return the card and buy a different one. Or for your next purchase buy Nvidia. Of course I doubt we get any of that, but that's what we'd need. For now it's just what is available on this board. Would the Fire cards be affected by this? I have access to 2 at work but I don't know what a comparable model would be to the 4870/90.
Exactly. OP had 2 samples but yet he decided to go post this thread with this thread title.
Also trinibwoy-
http://en.wikipedia.org/wiki/Thermal_Design_Power
Simple fact is that all real applications sit well inside the TDP and design specs of what they are designed for as evidenced by the fact that nobody has seen these issues on gamesQuote:
The TDP is typically not the most power the chip could ever draw, such as by a power virus, but rather the maximum power that it would draw when running real applications. This ensures the computer will be able to handle essentially all applications without exceeding its thermal envelope, or requiring a cooling system for the maximum theoretical power, which would cost more and achieve no benefit.
(or even overclocking their boards with games), however apps like Furmark and this one are not real world tests.
OP would get similar results by striking a match and screaming fire in a crowded theater.
Really?
So gathering the actual results and revealing the glaring bias of the OP is "stubborn and biased interventions?"
Thanks for your insight.:rofl:
So like it was mentioned before, if this is a huge RV770 design flaw why is the 4850/4830 unaffected with their even "cheaper" VRMs?
Aren't there two versions of the 4870 floating around with some variations in the VRM circuitry?
My 4870 runs at 95%-98% load 24/7 no issues running F@H. For kicks I'll download this and run it, just to see. My 4870 is a 1st gen Visiontek right from first release last year.
OMG, He posted with some results that he had found with his program and was hopeing (being XS and all) that some people who own 4870/4890s would mabey give this a shot to confirm or disprove his results and the fanboys went crazy.
Atleast show the guy who developed a free and usefull program that we have all used at some point some respect.
OK I downloaded it, set the settings, and ran it. Nice fuzzy red donut thingy waving around on my screen. Got bored of it after a few minutes and shut it off. How long is it supposed to be before it was to black out?
My 4870 is overclocked to 790 core too by the way (highest stupid CCC will allow me).
Didn't kill my Asus DK card.
Answer 1 : http://en.wikipedia.org/wiki/Optimism_bias :up::up::up:
Answer 2 : http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem :shrug:
Answer 3 : http://en.wikipedia.org/wiki/Rosy_retrospection :up:
Answer 4 : http://en.wikipedia.org/wiki/Positivity_effect :up:
Answer 5 : http://en.wikipedia.org/wiki/Bandwagon_effect :up::up::up::up::up::up::up::up::up::up::up::up:
Answer 6 : http://en.wikipedia.org/wiki/Commitment_bias :up::up::up::up::up:
and so forth..
--
Please note : The preceding list of cognitive biases may not necessarily apply 100%.
Someone disprove the OP's assertions or kindly STFU.
XmX
... :down: AMD, is it possible that you've let me down again? :down: ...
Is it a flaw if performs to the level that ATI intended? If they would have prevented this with there drivers by not loading so heavy would it still be a flaw?
I think that what has been found is fascinating and all but nothing more than a way to stress the board more than it was intended to be. IMHO, not a flaw.
I was pretty sure I was going pass
I already fixed my problem with March's DirectX distribution and Vista
I didn't any stress test to cause a crash
http://i41.tinypic.com/fnfmmt.png
No black Screen here. Gecube 4870, and Sapphire 4870 1gb in Xfire. I didn't like how high the temps went up, in such a short time though!! That is an intense test.
To address the real world application of it... Any game that would put this kind of stress on hardware, would be out of business pretty quick, considering most of us here, have decent hardware, versus the masses... I dont think many Dell's or E-machines would last haha.
As for not being able to use the "full" potential of the card, it seems ATI designed the card for the 99.9% of people who will never hit this wall, they could have made the card impervious to this, by adding more pwm/vrm to the card..but at a cost. I don't see why the cost benefit analysis isn't surprising.. this condition will not surface in the wild, and for those that it does, its cheaper to RMA those few cards, than it is, to design change it to meet such strenuous rare occurances. That said, my cards worked fine, i just dont want to fry them for no reason :P
I don't think any optimisation of a game would make use of all 800 shader cores, the rops, the TMU, etc all at the same time. Aka, unrealistic load.
LOL again, you didn't answer any of my questions, yet you write more. I'm putting you in the list of XS people that love to argue a lot, think they're always right but when you ask direct questions that compromise their argument they just STFU or act like you, with evasives and answering a question with another question not related to it (Speederlander, Shintai and the like). You know what list is it, don't you?
Learn how to do things properly:
- Some hours ago.
- Yes.
- Because I didn't want to. Last time I checked I decide when and what I write in the Internet. But who knows, maybe there's a conspiracy and I was hiding the result? Maybe I have invented it? Ahhh... ;)
I guess you never heard of Intel/AMD's in-house power viruses.
Although it might be an insane case of the ALUs not being TEX limited at all, plus heavy ROP utilization we're seeing here.
No, a GPGPU application will NOT get to this level of utilization.
Try it with 4770/4650. I bet you'd see an INSANE power draw increase over the general load.
The bug is happening only on 4870/4890 so far. We found 4850's to be unaffected (we even had a Crossfire of 4850 ro be "bug-free").
And we reproduced this bug on about 10 different cards, i stated that clearly in the first post, using AMD and Nvidia Processors, alot of different northbridges, and power supplies brands and such. Alot of hypothesys came live during this, and all were cut out, until the VRM one remained. Not 2. Please be constructive.
I mean, we even had a reference design underclocked running fine under 82A, bumped only the vGPU, and... well... black screen.
The 82A limit on the VRM right now is just a hypothesis. Really, it's the sole thing we can think of that could cause a crash that quick. I can't see temperature rising that quickly that will cause such a crash. If temperature was at stake, the crash wouldn't be that frank, you would see the donut with artefacts first, as usual.
Anyway, my goal is to understand what is going on. Again :
- Only 4870/4980 are affected (that's what our testing shows) at stock frequencies
- Only reference design (i.e. cards that use the reference design 3-stage VRM) are affected. That is a huge number of cards, but indeed, that is
I have to admit i'm surprised of the amount of "hate" i'm getting there. I mean, on any other board i'm going, OCCT is already pretty well known, and people know i'm not doing this for fame, or anything else (hell, OCCT has been around for 6 years now, why would i seek that just NOW ?).
Right now, i've seen RC1 news popping around everywhere (for my own discouragement - i hate people spreading betas), and people complaining that OCCT isn't compatible with ATI cards. Websites refuse to help me understand what is going on... they ignore my emails. I finally could get my hands on a french website - i do hope it'll start the chain.
Posting in the boards is the sole thing that is left for me to understand what is going on. Because i truly think my test is not at stake there, as underclocking make the cards support it.
well, i'll say 'thanks' for being open and honest with us. :up:
i'm guessing that there was no way for ATI to know that there would be such a powerful graphics application available for these cards. i'm sure the pwm's are just fine for everything but this, and if that's the case, just leave the program at 2. there is no reason to ensure your card's stability under a load that no game can reproduce.
The point of a stability test is to put an unrealistic load on a component :stick: Why don't people complain about running Memtest or Linpack for hours? These tests also stress specific components beyond what is to be expected under normal use. These tests are designed to put as much load on the components as possible, if you don't like that then you're rejecting the idea of a stress test as a whole so don't direct your anger at just this one person.
Another thing to think about, what would you say if a new Intel CPU couldn't Prime95 at stock speeds, that it is an unrealistic load?
As the OP has stated multiple times already, the same cards that fail will pass the test when sufficiently underclocked, suggesting a hardware rather than a software problem.
And I find it amusing that you keep pushing me to answer a question in regards to something I didn't say.
You are intentionally misunderstanding my statement, twisting my words and then trying to provoke me into answering a question comparing things I didn't mention.
Also when you say "questions" you mean question right? There was only one.
Well I guess I will explain my thought to your question.
I was stating any special made software can cause hardware to fail, which was my original statement. You then ask me to compare an errata, which I didn't mention, to a VRM failure, also wasn't in my statement.
Why am I answering a question about myself comparing an errata to a VRM failure when I never did so?
This "supposed" VRM failure makes no sense, seeing as how lesser models are not affected, 4850/4830, and there is simply no proof to backup the claim that it is a VRM failure.
Also, when I asked you about why you didn't post your results right away, yes I was implying you didn't do the test.
Why would you wait until your fourth post in the thread to post the results, which were in a response to my start of gathering data, when you and the OP are trying to help the community with this thread?
Also with some updated results...
#Samples- 9
4 Crashes- Stargazer's 4870, cowie's 3x4890
5 Noncrashes- SparkyJJO's 4870@790mhz, Telperion's Asus DK, AMDDeathstar's VisionTek 4870's in CF@780mhz, MsB's Sapphire 4870 1Gb
Plus two people with a 4850 tested and didn't have a problem.
Then some are having problems with the App-
Zanzabar, Jamesrt2004's 4870,
Plastok's 4890
Again, there is no conclusive data here.
I am trying to, I guess I am not allowed to forget everything covered in the first post ~ 9 hours after I read it.
I guess what I was remembering was the test which you had the screenshots of.
@Tetedeiench
Have you presented your findings to AMD/ATI ?
If yes, what did they say ?
There are also certain stability tests that can permanently damage some Intel CPUs, and nothing really stresses them that much in real-world use.
But one could argue that doing everything you possibly could at once is not a meaningful mode of operation.
Oh really? He proved two cards? How many other members on this board just posted they encountered no such problem?
All sorts of cards will have varying tolerances, and unless the OP has more than a SMALL SAMPLE SIZE, his "conclusion" is irrelevant
Unless you've got a hundred cards that can all REPRODUCE this same issue, you're playing with small sample sizes
This is like basketball when someone shoots 12/15 for a game. Can you conclude this guy is a .800 shooter? And right now, from our own thread's results, the guy isn't even at 50% for the small sample size, so how can you conclude either way?
It's silly to compare GPU tests like this with CPU 'stress tests'.
CPU is very complex with a number of different units and execution paths and a bunch of cache.. it is very difficult to really stress the CPU simultaneously.
GPU on the other hand (especially one like 4xxx series) is a whole bunch of math units, which can easily be synthetically activated to stress the entire GPU.
Obviously ATi know that in 'real world use' (even GPGPU) applications can't utilise their chip in this way.. hence they make a design decision to use lower-cost power circuitry to meet 'real world' power designs. OK oh well, that's their choice and if you're the 0.0001% of people who care about 'synthetic stress testing' then you should avoid ATi products!
You can kind of think of it like a car engine. It can run from 0-7000rpm. If you run your car engine at 7000rpm 24/7 it won't be very happy :) Car markers will tune the engine according to durability vs cost and make design decisions based on real world usage. But sure, go stress your car engine :P
i fixed the app, i had to reinstall the dx redistributable but games were working fine and i had installed it previously so im just confused.
i get an extra 3c on my 3870 (i know its not the problem card) but it is doing more than before
im thinking that it is the ram
what about asking in a game developers forum like Gamedev?
For me, I still uses games as stability tester for video cards. I think it would be more helpful for OCCT to simulate realistic gaming loads rather than an unrealistic load that will crash the card.
Ok. Stop the flame war, please.
First of all, i'll happy to Show my source code to any expert. But i'll not give it to anybody. That's the first point. No copies.
Here are the tests that lead to the conclusions shown there. I'll sum them up for you.
And please understand that i'm not a hardware guru. I'm a developer. i've got a good understanding of Hardware design, yes. But when it comes down to knowing how a VRM works, well... that's well beyond my knowledge. I've always been honest about that. That's why i'm here. Help. That's what i need.
Cards affected
- HD4870
- HD4890
- All in all, G-DDR5 based ATI cards @stock frequencies.
All of them ?
- No. The only card that could withstand the test was of specific Powercolor design, with a 4-stage numerical VRM built-in. All others failed
- All others claims, before this board, that "it runs fine", were because they did not use the test configuration for the maximum load. Investigation on the cases on this board is needed. Can you help me there ? :)
PC and components used
- Watercooled cards.
- Temporarily over-cooled cards (added 120mm fans on the cards)
- Cards were plugged on AMD and Intel PCs, i7, C2D, and other CPUs and northbridges.
- PowerSupplies were all trusted brands (Corsair, Antec, Seasonic), Min 550W, Max 1500W (yes, we had the bug happening on a 1500W ToughPower).
Test 1 : Debugging the app
- I spent about 3 days doing that. The only debugging tests that worked were those that castrated the load. Made no sense whatsoever.
- Conclusion : non conclusive. or i suck at debugging.
Test 2 : OSes
- Problems Occurs on the following OS-es : XP32, Vista32&64, 7 32&64
- Conclusion : This rules out the OS as the cause
Test 3 : Driver version
- 4 driver versions were used : from most recent, to older version. Same behaviour. Official or betas.
- Conclusion : problem is not related to a Specific driver version. It may be common to all driver versions though, but unlikely, as HD4850 supports the test, and they use the same chip. Memory related problem ? Seems unlikely to me, but hey... who knows.
Test 4 : Lowering the load on the card
- This was done using the following : lowering the resolution of the test, lower the shader complexity value (even if this is less relevant as this changes the shader that is used), and running into windowed mode. Worked.
- Conclusion : Lowering the load makes the test works, even with the same algorithm applied (in the case of Windowed mode / less resolution used). The shader program is not at stake.
Test 5 : Lowering the frequencies
- Lowering the frequencies made the test stable. it ran perfectly fine.
- Conclusion : the program is not at stake. The shader, or program, doesn't depend on the itnernal clocks of the card of any sorts. If lowering the frequency of the cards makes the test stable, it strongly indicates a hardware failure, wether this be temperature, or a power supply stage failure
Test 6 : Increasing the vGPU
- We had a card running the test fine, at lower frequencies. Then we increased the vGPU, and only the vGPU. This only increased the load on the VRM, and the temperature on the card. It did not have any impact on the frequencies and such. The test was made unstable by changing only the vGPU
- Conclusion : This shows that increasing the temps & VRM load makes the test unstable. We do not touch the frequencies, yet the problem is reproduced. We seem to have isolated the problem. The card IS the cause, as it is the only thing that changed between the tests.
Test 7 : Switching cards
- We switched the non-affected, because of specific design, powercolor HD4870 card, with a reference card. Problem was not happening with the powercolor card. problem showed up with the second card. Of course, same OS, same app, same driver version.
- Conclusion : i don't think software is at stake anymore
Lets's combine the conclusions :
Test 1 : non conclusive.
Test 2 : OS is not the cause.
Test 3 : A specific driver version i not the cause, or it is shared by all driver versions. Later conclusions shows that this is VERY unlikely.
Test 4 : Lowering the load make the test stable. With the very same algorithm behind. So the algorithm is supported by the card. Why only in a castrated version ?
Test 5 : Lowering the card frequencies instead of the castrating the test make the test stable again. The test ran at full power, which was unstable before. So the test is supported by the driver, and OS. This indicated a hardware failure.
Test 6 : making the test stable and increase the vGPU, and only this value, makes the test unstable. This changes only the teps and load on the VRM and GPU. This narrows down our research even more.
Test 7 : switching an unaffected, specific design card (the powercolor one) with an affected card makes the bug appear, where it never existed before. As everything else is the same in the PC but the graphic card, the card IS the cause.
Now, we came to the "82A" hypothesis (which is STILL a hypothesis, i remind you) because on ALL cards the crash was trigger when we reached this value, reported by rivatuner. Temps were rivatuner, but this was the common value shared by all cards.
And you got it. All we've done. All the research, with all our limited means.
My final point is : What can you do to :
- help me know what's going on ?
- Narrow why the thing is happening to some cards, and not on others ?
Remember, only G-DDR5 based cards (i.e. HD4870 and HD4890) seems affected by the problem, and only those with a 3-stage VRM power supply, which is the vast majority of the cards.
I am at work at the moment, so i can't do alot of research. Could you dig, for the card that works, what kind of VRM stages they use ? Reference ones, custom ones ? Thanks for your help
How they can know that? What is the point to release monsters with 800 shader processors if you can't utilize all of them?
We can't know it for sure but it is possible that some games would look mutch better (with the same FPS) if game designers did better optimization during development. Or may be they did it and then ran into the same instability issue? So may be we need less shader processors but better game optimization?
Okay, one thing I'd be interested in would be if those cards that failed were ever taken apart to see if there's proper contact between HS and VRM's? I can imagine that if there isn't, temps with this test would skyrocket so fast that they'll just shut down very fast. Was this checked? Does it even apply? Seeing that we have vanilla cards that do not fail this test I think the question is valid.
(i.e. maybe it's manufacturers fault. See mainboard VRM cooling sinks, which don't always have good contact.)
Which begs the next question - why? Ideally, we could pop the sinks on all the cards tested, and see if there are differences in the components that correspond to pass/fail. Obviously, that's not feasible, but that doesn't mean it might not be interesting nonetheless.
That aside, there have been cards coming out lately with boosted power circuits - if the manufacturer is going to make a big deal over their non-reference boosted power circuits, aren't we allowed to at least question why there's room for improvement over the base design? It's not as though this industry doesn't have its share of incidents where a manufacturer tried to intelligently cut corners, but flubbed it.
Seriously - somebody puts a processor under liquid Helium, and we all cheer and throw in our armchair opinions. We'll discuss/argue whether a motherboard having "only" 8-phase CPU power is a "problem". But somebody writes some software that can overpower certain stock cards, and suddenly a whole bunch of people are stonewalling questions and shouting that it's not a 'realistic scenario'? :rolleyes:
I hope you don't think I was flaming you with my source code comment. I am just saying that we can't necessarily rule it out either. Even the best programmers make mistakes and it helps to have extra eyes looking at it.
If it only happens on GDDR5 cards, what about the 4870x2 or 4770? I would test it on my x2 except that I sold it. I'll be sure to test it on the 4890 when I get it - it better pass at 1.45v and 1100mhz core :mad: :ROTF:.
What kind of water blocks were used? Full cover?