Gosh -- this is what I meant by the 4870 X2 is better shown paired with an Intel CPU:
http://www.legionhardware.com/document.php?id=770
Printable View
Gosh -- this is what I meant by the 4870 X2 is better shown paired with an Intel CPU:
http://www.legionhardware.com/document.php?id=770
You can do some research --- generally, average and min track to an extent. Though I don't disagree completely, average FPS does not show the whole story. But neither does minimum by itself.
Average is just a statistical representation of the population that it is sampling. If the CPUs were not important, than the statistical average would not be affected and the mean of the population would average to become statistically equivalent. That is not true here, the average FPS is certainly showing a response with the power of the CPU (hence the reason for the CPU scaling article). Thus, average is not meaningless overall, it does allow one to conclude which CPU supports the GPU better. Average goes higher as both min and max goes higher.
Fraps'ing any of those games shows that Intel Min is also greater than AMD min....
However, my original point is that the 4870 X2 is a darn fast card -- and drives to the CPU limited gaming domain even at high end resolutions. This is the reason review sites use the fastest possible CPU to evaluate the capabilities of a GPU ... otherwise, it hides the performance and skews the evaluation. Regardless of which CPU is faster, the interesting question to ask is... why? Why (pre-Core 2 Duo) did AMD perform better at gaming code, but with C2D the tables turn?
Jack
I agree with this, good post :)
But one thing is clear, if the min fps is above 30, smoothness is guaranteed.
No sky high avg or max fps numbers can guarantee smoothness, it may still be very choppy during some parts.
With that said, fps graphs like the ones @ [H] are very useful, in conjunction with min and avg numbers.
Yep, I prefer myself to match the refresh rate of my monitor (no chance of tearing) ... Phenom is a great gaming CPU, only in one or two cases could I really see a Intel quad advantaged over Phenom. For the vast majority, both will yield the same game play experience.
Those games are not that heavy (hard on the hardware) to run (the CoH test is :banana::banana::banana::banana:ed up). AMD will not show advantages before Intel L2 cache isn't that advantageous (more demanding threads, more memory will decrease that advantage) able to use its large cache or communication with other hardware is increased. It is impossible to see that just viewing the numbers legion shows.
What that test showed at legion was pretty much the difference L2 (size and speed) cache is doing for whatever run the test was done for. More conclusion apart from that is hard to make. Deneb that will have 6 MB L3 cache will gain some, if the latency is decreased it will gain some more in average FPS.
What was most interesting (just knowing the average FPS) in the legion test was that you will get the same game experience with cheap X2 compared to one expensive C2Q using 4870X2 running those games.
Well, I agree about COH, Intel blows AMD away on those test by >50% clock for clock.... so i don't accept that as representative.
however, every one of those tests are high-res multithreaded games (1900x1200, full AA).... it isn't that it is weak on the software, it's that the 4870 X2 is such a powerful card, at high resolutions and full AA the bottlenecks stay on the CPU... in fact, there is evidence that even the highest end Intel CPU at 3.0 GHz still bottlnecks this card, meaning there is even more performance being held back.
What the legion tests show are exactly what I have been showing in this thread.... and counters your assertion. Intel is currently better at gaming code, single or multithreaded. It also explains, again, why every (and that EVERY) site that evaluated the 4870 X2 used an Intel cpu.... again, to evaluate the competency of a component, you remove the ambiguity of any other bottlenecks by using the fastest support system available.
But it is not L2 cache that is doing this.... Intel has far superior branch prediction, and gaming code is the branchiest you will find. Deneb will be an improvement over Agena no doubt, but it will not overtake even a Kentsfield in gaming.
Jack
Actually, my theory is that the branch predictors are the only Netburst features to be ported over into the Core microcarchitecture.
Netburst, at the end of life, had some 31 stages in it's pipeline... without question the largest performance hit were in mispredicted branches, where some 30 cycles would be wasted just flushing the pipeline and repopulating it.
To get as much as they could, the designer likely really worked over the branch predictors to avoid this penalty. Unfortunately, I cannot post the data, it is not mine, but RealWorldTech asked me for some inputs on an article they are working on specifically asking 'why is Core so much better at gaming than K10'... so I know the answer already as I have seen the data. :)
Jack
High res is the video card that needs to handle, you know that what I first thought that games did take advantage of the resolution was not right. Processor is working the same independently of the resolution.
You could do the exact same test with a slower video card just decreasing the resolution.
Here is a very simple explanation showing how it works (single threaded game)
+ = processor is working
- = video card is working
Fast processor - low res
++--++--++--++--++--++--
Slow processor - low res
++++--++++--++++--++++--++++--++++--
Fast processor - high res
++---------++---------++---------++---------++---------++---------
Slow processor - high res
++++---------++++---------++++---------++++---------++++---------++++---------
More threads and these could do work while some thread is waiting for the videocard to paint the picture.
But the processor and the video card isn't working independently. They need to wait for each other. Some sort of synchronization is needed in a multithreaded environment also.
When you say the processor bottlenecks the video card. Can you explain what you mean.
its quite simple, the cpu bottlenecks the gpu when the card can render the image faster then the cpu can provide data that is necessaryfor the game (AI/physics etc) to advance. Just think of a massive explosion with hundersts of parts, but the parts have a rather simple form and only plain textures.
The gpu could render this scene with several 100fps but the cpu can only provide the location of all that parts 10times a second (aka 10fps), so your framerate would be 10fps, regardless of what the gfx card can do .
Branches are done i all software. You do it all the time. conditions and loops are everywhere. I think that games tries to avoid these in order to gain speed. That is a trick I do when I need better performance. Align data is also common and that has the same speed target, to have one single flow of data
Do you know how the branch prediction works on K10?
Yeah, there is lots of literature on in the branch predictors for both K10 and Core...
Branches in software are generated by code, branch predictors attempt to predict which branch will be true and loads that block of code into the pipeline... branch prediction is entirely in the hardware and is a hard wired logical algorithm intended to increase IPC. This is a fundamental comp sci topic, you need to do some research.
EDIT: However, game code contains more conditional logic than say 'encoding' or 'rendering' code ... hang tight, I have several papers that document the number of branches for different code types somehwere. Games are the highest, and this is where strong branch prediction shines, hence the reason Intel whollops AMD in gaming code.
Intel's branch prediction algorithms are much stronger than AMD's.
jack
Read this thread, all the explanations are there.... I have posted numerous links that even nVidia explains how to determine bottlnecking at the CPU.
Wow.
What it essentially means is that the Phenom is not a good CPU to pair with a 4870 X2 even at high resolution game play, you are holding back potential of the video card.
Do you have a good link about K10?
About code that needs speed and branches. The general rule is to avoid branches as much as you can. There are numerous ways to do that. It will be harder for the compiler to optimize code also if there is a lot of branches. I don't think games has a lot of branches, logically it would be the opposite. There is a lot of talks looking at game code how to avoid branches.
If you need a branches then try to have one single flow and avoiding moving the instruction pointer with a conditional branch.
In a chain of works to get the work done all that takes time in that chain is something that will add to total time. If you have a slow processor you may gain most to change the processor. If you have a slow video card you main more to change the video card. Hopefully the developers has optimized the game. This talk about my processor bottlenecks my video card or the video card bottlenecks the cpu I do not understand. They both participate in the work to render the frame. They both use time to render the frame. Buying faster video card or faster processor will decrease the total time. But both will still use up time that is added to total time to render the frame.
Here is a good paper: http://ati.amd.com/developer/gdc/PerformanceTuning.pdf
http://www.realworldtech.com/page.cf...WT051607033728
Now think game code ... a player shoots a weapon....Quote:
The branch prediction in the K8 also received a serious overhaul. The K8 uses a branch selector to choose between using a bi-modal predictor and a global predictor. The bi-modal predictor and branch selector are both stored in the ECC bits of the instruction cache, as pre-decode information. The global predictor combines the relative instruction pointer (RIP) for a conditional branch with a global history register that tracks the last 8 branches to index into a 16K entry prediction table that contains 2 bit saturating counters. If the branch is predicted as taken, then the destination must be predicted in the 2K entry target array. Indirect branches use a single target in the array, while CALLs use a target and also update the return address stack. The branch target address calculator (BTAC) checks the targets for relative branches, and can correct predictions from the target array, with a two cycle penalty. Returns are predicted with the 12 entry return address stack.
Barcelona does not fundamentally alter the branch prediction, but improves the accuracy. The global history register now tracks the last 12 branches, instead of the last 8. Barcelona also adds a new indirect predictor, which is specifically designed to handle branches with multiple targets (such as switch or case statements). Indirect branch prediction was first introduced with Intel’s Prescott microarchitecture and later the Pentium M. Indirect branches with a single target still use the existing 2K entry branch target buffer. The 512 entry indirect predictor allocates an entry when an indirect target is mispredicted; the target addresses are indexed by the global branch history register and branch RIP, thus taking into account the path that was used to access the indirect branch and the address of the branch itself. Lastly, the return address stack is doubled to 24 entries.
According to our own measurements for several PC games, between 16-50% of all branch mispredicts were indirect (29% on average). The real value of indirect branch misprediction is for many of the newer scripting or high level languages, such as Ruby, Perl or Python, which use interpreters. Other common indirect branch common culprits include virtual functions (used in C++) and calls to function pointers. For the same set of games, we measured that between 0.5-5% (1.5% on average) of all stack references resulted in overflow, but overflow may be more prevalent in server workloads.
a) In the evaluation loop (which a loop is a branch condtion on when to exit), it needs to check does the player pull the trigger (a branch).
b) If the trigger is pulled, what weapon is he firing (another branch).
c) Calculate the physics, does he hit the bad guy (yes or no) another branch
d) Where does he hit the bad guy (head, arm, kneck)
Game code is the absolute branchiest of all code classes. I am still looking for those papers that show it 30-80% higher than an other major kind of code. The reason for this is the total amount of variability in the propogation of the game. Game code does not know if you are going to jump, crouch, turn left, or right, die or blow up, as opposed to something like say a 3D renderer which only needs to know data and do a calculation, then move to the next pixel, use the information, do the calculation. Same with encoding, take one frame of data, calculate the attributes based on the other pixels around it, move to the next pixel... very linear. This is why P4's could do well at multimedia but sucked to bad at gaming... so long as there was little branching in the code, P4's could handle the load.
I do agree, branchy code is to be avoided at all costs ... but some applications simply demand a large amount of checks and conditions that generate new code paths (games are the biggest one, how much fun would game be if you did the exact same thing everytime).
Jack
Eeehhh.. Are you joking with me?
Firstly, I can tell you that firing a weapon will need much more branches than that ;). There is a lot going on the processor. Code isn't exactly like "If shoot then boom" if you know what I mean.
One type of branchy code is parsers. The complexity of the stream that is parsed and how many different paths that could be taken will add to the how branchy it is. But this can be optimized using different techniques.
branch prediction is in fact mostly for stupid developers that doesn't understand how to write fast code for the processor.
The paper you showed about branch prediction was for K8, K10 has some improvements there but what I know they are not showing how that works.
EDIT: The most branches in normal code are in fact used to handle errors.
What the hell are you talking about? First you say that game code is light on branches, I give you one case of a general algorithm that pins down how something as simple as shooting a weapon generates branches, then you say it does much more than that.
Gosh, frankly -- in all this discussion, it is becoming more and more clear you are pretty ignorant of how CPUs actually function, it is not worth my time any more.
You will continue to spam forums and get the flack you get because of two reasons, you have a preconceived notion of what you think it does, and what reality is are two different things. The majority application of concepts are wrong. This is why you will always get flack from the community.
It would be worth your time to spend a bit more study into basic, fundamental computer science and unlearn what you think you have learned and re-establish yourself in the fundamentals.
Take care...
jack
Yes that paper discuss CPU AND GPU bottlenecks. And how you decrease bottlenecks. But where do you see that that one bottleneck stands for 100% of the total time? If you have a video card running at 2000 GHz it will maybe render the picture in notime. Almost all time that needs to render the picture will be the processor time. But even if the GPU is extremely fast it will need to add a fraction of the total time.
If you have something that is slow you will of course look for areas that takes most time. But even if something takes very much time it doesn't use 100% of the time.
You could say something similar to any developer and they will immediately know that you haven't seen code. Code is so extremely detailed. Maybe you mean scripting, but there is a lot going on in the background.
You mean that you can't handle that someone doesn't agree with you or have knowledge that you could learn from?
Sofwaredevelopment is a VERY strong area form me, you know that I am working with it. I have been doing some teaching in advanced C++ for developers that already are good.
i dont think he is really intersted in learning anything, he'll even i picked up some things, and computers are only my hobby.... from a person that works on a daly basis with such a device i would figure that would be interessted in how they work...
for myself i cant imagin to use any kind of "tool" whitout knowing how the principles work.
Anyways, i think everything worth mentioning has already been said, and now we are only running in circles, over and over again.
Well if they do thats their problems. Most people are good at something. The problem in these debates is that is so extremely sensitive to talk about what Intel isn't that good at. If someone say something and that isn't right then it is even hard to point that out. And if you do they will react very immature sometimes.
What people thinks of me I don't give a **** ;) Maybe I did when I was young. You tend to grow away from those types of feelings
There is nothing wrong with preferring AMD CPUs, and it is not fanaticism to argue AMD's strength. I don't see Gosh so much an AMD fanatical, rather I seem him utterly ignorant of the general workings of computer architecture (his assertion of being a world class programmer is meaningless, program code != architectural execution, and does to provide any specific insight into the behavior of a two different CPUs -- architecturally -- to execute that code).
Gosh truly believes in how he things things work, though it contradicts all the empirical benchmark data available as well as all the source material from both industry and academia that have been written on the topic.
If there is a measure of fanaticism in his approach it is his inability to accept what is currently true, Intel makes the better CPU -- IPC is better, clock speed is better, thermals are better ... he seems to see this as a dish against the quality of the Phenom -- which he shouldn't, the CPU is a fine CPU and an engineering accomplishment on many levels... it does not change the fact that Intel renders the computational result faster in this particular segment in time.
It was not long ago, of course, that the competitive positioning we see today was not true of the P4 vs A64 era a few years ago... and there are reasons for that situation as well. Myself, it is most about delving into the details of each architecture, be it P4, A64, Core, or Phenom because without a comparative analysis for sake of contrast, the salient details of the device do not shine through === for example, Intel currently being better with gaming code and the strength of the branch predictors as a result.
There are plenty of points of Intel's current implementations that are inferior to AMD's (Gosh loves to harp on the FSB, following along the AMD PR lines), which is true -- but not generally true, other architectural implementations remove the deficiencies (manifested in large, fast L2 cache for example) and only make it an issue in select applications (which is not gaming code as Gosh would want to believe).
AMD knows they have an inferior performing product, and have adjusted their pricing to hit a price to performance ratio that keeps them competitive, otherwise we would still be paying 650 bucks for a 5000+. Intel was in the same boat through most of 2003 to 2005. Unfortunately for AMD, their cost structure and revenue stream do not support profitability at these levels, and something has gotta give ... one of those things is AMD appears (based on rumor) ready to carve itself up to shore up capital and strengthen the balance sheet.
Jack
No, what I did was post a nice executed study by Legionhardware that demonstrated the reason why no site would use a Phenom to evaluate the 4870 X2.... if they had, then it would have shown the 4870 X2 performance to be the same as a 260 GTX -- since the Phenom would have capped all the highest end cards to the same framerate.
jack
No I don't but I know how develop applications (it's my work). And seeing all this talk about this and that comparing CPU got me curious (I don't normally play games). What I have found is that this area is extremely hard to talk about. If do say anything good about AMD you will not be liked.
It's not about saying good things or bad things about one or the other, it is about arriving at the correct conclusion based on data.
Your original post showed a GPU limited test where Phenom was comparably at parity with Intel.... but you concluded that the CPU was equally good when the GPU was capping the result. This is false and incorrect.
are you ever wrong?
EDIT:
I am working just now so not that much time to answer.
I was wrong about thinking that games used the higher resolutions to increase details etc. But what was found instead was that AMD in fact is a very good game processor. It isn't as good as intel on single threaded applications. Core 2 will be the best processor for those applications next year I think. Maybe 2010 it will be beaten in that area. Other processors can win if there is some special management like memory speed or something else that is needed.
BUT, even if Intel is better at single threaded applications games can be executed easily with processors done for threading. it isn't a problem.
The problem starts when games start to scale, not that much scaling and it isn't a problem but with more scaling it will soon be a problem. And the biggest problem is when the game needs to work hard (at the low fps areas).
He gave you system to system comparisons and even bought a 4870 x2 to give you real data to compare the two systems without a gpu bottleneck.
It's not a matter of blatantly saying you are right or wrong because the information is there to draw your own conclusion. He gave all of us the basic insight and raw information to be able to figure things out ourselves.
If you where to removed the name branding (intel/amd) off the systems and had to draw a conclusions of which system performed better overall in the tests performed the answer would be crystal clear.
But it doesnt work that way. You could post some questions about bottlenecks here http://www.gamedev.net/community/forums/
or check this: http://www.xtremesystems.org/forums/...&postcount=413
As the say goes "give a man a fish and he'll eat for a day, teach a man to fish and he'll eat for a lifetime."
Well we should have just given the man a fish and moved along because the mans not willing to fish.
Isn't this discussion not yet closed after all the tests and co we saw ? Each CPU has it strenghts, all of them have flaws... to each his own...
Some people don't understand some things, some people don't understand a single word, and some people choose to not understand things depending on their opinion/strategy.
I tried to explain in that message so that was the reason for showing.
going back to that message. When some here say that the CPU bottlenecks the GPU. Is this what they mean.
+ = processor is working
- = video card is working
Slow processor
+++-----+++-----+++-----+++-----+++-----
Fast processor
-------------------------
Do you mean that if the processor is fast, then time spent by the processor isn't a factor. The only time that counts is the gpu
Or if you say that the video card is bottlenecked by the processor is it like this then?
++++++++++++++++++++++++++++++++++++++++++++++++
What I have read is that cpu will always bottlneck the videocard and the videocard will always bottleneck the cpu. What is a bottleneck? If something use 5% of total time, is this a bottleneck? What we are arguing about is maybe when something is a bottleneck or I don't understand. It isn't possible to remove time completely from the processor or the video card
your correct thers always a bottleneck, but you dont want to have a cpu bottleneck. Gamers want the reach the gpu bottelneck to get the maximum out of there rig.
Also your graphs are kinda bad to show either cpu or gpu bottleneck, you cant only examine the cpu or the gpu, you have to watch both:
working: +
idle: -
GPU bottleneck:
GPU ++++++++++++++++++++++++++++++++++++++
CPU --+--+---+----+-----+----+----+----++-----+++-
CPU bottleneck
GPU --+--+---+----+-----+----+----+----++-----+++-
CPU ++++++++++++++++++++++++++++++++++++++
both are extreme cases, but with nowadays graphics cards you reach the CPU limit far more often then the GPU limit, meaning that potential of your graphics card is unused.
if the game is bottlenecked by the processor. is that processor working at 100% then?
I didn't understand your "+-" thing.
There will always be some waiting for the cpu and the gpu. do you mean that too? of course if some hardware is slower and is used much than this will be the MAIN bottleneck. if you have a game that just draws one pixel but this pixel is very complicated to calculate. The the video card could render like 100 000 FPS or something like that. But if the cpu needs 0,1 second to calculate the pixels position it will be 10 FPS. Still, the video card is going to take 1/100 000 seconds for each frame. Total time is 1/10 + 1/100 000
Errr, or maybe he needs to see the code that generated the "+-" thing.
Maybe this link will jog his memory!! :rofl:
http://www.xtremesystems.org/forums/...&postcount=413
god... you have given the ansewer to your self... plus its the same what i wrote a few post befor you. If the graphics card can render with 100k fps and the cpu only with 10fps you only will see 10fps on your monitor, reagrdless of what resolution you use or what graphic settings....
damn this feels like talking with a rubber wall...
sure there is a syncronisation between them...
how on earth would the render thread else know what the physics or AI thread is doing and where to draw the correct particle, etc.?
The slowest part of the system determines the overall performance, if the cpu lacks the adequate power you gimp the graphics card, period.
And we are back the what jack said, and why everyone is using C2D in reviews.
exactly! and that means that there is no 100% bottleneck by any hardware that participate in the rendering of a frame. Render thread (if they use one thread for this) needs to wait for gpu and other threads needs to synchronize with the render thread. More threads that sends data to the gpu and they all need to serialize data sent.
It is extremely difficult to do asynchronous development. One area where this is done is sending data over the internet (sockets).
if there is heavy threading and much memory use then AMD might work faster and thereby use less time. If you run on some way and mostly one thread is used, maybe less memory is used, no physics, then C2D is faster compared to AMD.
How the processor works for that game depends on what type of actions that is done in the game.
This is so incredible, even a moron could figure it out. The GPU renders the scene, the CPU calculates the physics, the AI, etc. etc. If the GPU finishes rendering a frame before the CPU finishes the subsequent calculations the next and subsequent frames, the GPU must wait until it has that information to start the next rendering. Conversely, if the CPU finishes it's calculations for the next frame before the GPU finishes rendering the scene, the CPU must wait. One will bottleneck the other... guaranteed.
When the result of to computational resources are inter-depending on one or the other two finish, there will always be a case where one limits the ability of the other.
Ooodles and oodles of data show this to be true ... the LegionHardware article shows the phenom bottlnecking a 4870 X2, severely.
I have shown lost planet, 3DMark data, company of heros, world in conflict ... the lost planet is a good one as it produces two different scenes one GPU limited, the other CPU limited... how hard is this to figure out .
IT HAS NOTHING TO DO with threads.... one thread or 4 threads, Intel is faster at producing a result, thus it will show better performance in the absence of a GPU limitiation... hence, a 4870 X2 (the fastest single card solution you can get) ends up FASTER on Intel -- this is clear from what i posted, this is clear from what LegionHardware showed. Company of Heros, Quake Enemy Territory, Crysis, Unreal 3, lost planet, ALL are MULTITRHEADED, ALL ARE FASTER ON AN INTEL + 4870 X2 than on a PHENOM + 4870 X2 (AT THE SAME FRIGGIN' clock meaning Intel is superior clock for clock, higher IPC). Heck, legion showed the Q6600 beating the 3.0 GHz OC Phenom in most cases.
If you use a 3870 ... even a medicroe dual core is fine, the GPU is so slow ... the CPU makes NO DIFFERENCE. Hence, the Phenom 'appears' to you to be equivalent to an Intel... this is not TRUE ... the GPU set the framerate at those resolutions.
How someone cannot poor over the data, all the links provided, and not understand this is incomprehensible.
It is not saying anything bad about AMD when one states the obvious, AMD has a weaker architecture and cannot clock as high. This does not make the Phenom a bad processor, but in a two horse race someone comes in second. For the past 2 years this has been AMD, the data is irrefutable.
Does it make a difference in the gaming experience ... nope, Phenom is completely capable of supporting the necessary frame rates to make a good gaming system, but that is not the same as comparing them then making some off the wall, illogical, and incorrect statement on the capability of the CPU when you show a GPU-limited data point.
Jack. The CPU waits for the GPU and the GPU waits for the CPU. They bottleneck eachother because they will always wait for each other. If you have a processor that is running at 1 GHz and compares this with a processor that is running at 2 GHz, Then the 2 GHz processor will perform the CPU computation twice as fast. But they still needs to wait for each other and if you call this waiting for bottlenecking then they will both bottleneck the other. The 1 GHz CPU is a bigger bottleneck compared to the 2 GHz cpu but they both are bottlenecks. The video card is also a bottleneck. ALL is bottlenecks.
What I don’t understand in these discussions is why you just talk about one of the two. Slower memory compared to faster memory is also something that both is bottlenecks but the slower memory will be a bigger bottleneck compared to the faster memory.
The video card will always perform as fast as that video card can when data has arrived. It doesn’t matter which processor that is feeding the video card with data. If it is a slow processor it will need to wait a bit longer before has arrived but when the data is there the speed is the same.
When someone say, this cpu is bad for the video card you might think that the videocard will work more slowly. It doesn’t, it will always work at the same speed. The only think that works more slowly is the processor if it is a slower processor. But the video card needs to wait for the processor, it doesn’t matter if the processor is slow or fast.
About Intel compared to AMD they are good at different things. These processor are built differently. And when AMD is better compared to Intel or Intel is better compared to AMD depends on the type of code they are running.
If you read some game code you will se that most games are thinking more of the Intel processors in order to avoid slow areas for that processor because Intel is a much more common processor.
The strong area for Intel is the HUGE L2 cache and the weak area is communication (high latency). Things like branch prediction etc does very little to total performance. And this is a more important feature for Intel because intel is sensitive for external communication. All processors has branch prediction and all processor handle the most common case best. That is that the branch condition isn’t taken. I think that almost all more advanced programmers know this.
No one, and I mean no one, can lack this level of conceptual fortitude....
PRECISELY!!! This is it! What can't you understand....Quote:
The video card will always perform as fast as that video card can when data has arrived. It doesn’t matter which processor that is feeding the video card with data. If it is a slow processor it will need to wait a bit longer before has arrived but when the data is there the speed is the same.
Gosh.... a there can only be ONE bottlneck at any given time, either the GPU is too slow and it will bottleneck or the CPU will be too slow and it will bottleneck. A system, regardless of it being a computer, a chemical reaction, anything occcuring in time with a rate of output will be dependent upon the SLOWEST step. Period.
This is where the cliche' -- 'a chain is only as strong as it's weakest link' orginates.
One or the other will be the slower link in the chain, that weakest link will determine the computational rate at which it can complete. This must be true....
This is basic, not even basic, it is just simple common sense.
If you pair a Phenom at 5 GHz or a Intel quad at 4 GHz with a nVidia 8600 GT and run any game at 1920x1200, you will get the same frame rates.... why? Because the GPU is slowest of this situation....
If you pair a Sempron 2800+ or a celeron 550 with a quad cross fire 4870X2 setup, you will get the same framerates for the same systems with a 8600GT because the CPUS are TOO slow!
On average, Phenom's appear to be the same as an Intel CPU in your original post because the GPUs are capping the frame rates. Intel slaughters the Phenom with 4870 X2's (all the data I have shown, all the data shown at LegionHW) because the 4870 X2 is faster at completing it's rendering tasks than what the Phenom can supply it with the next step of information. It is rare, that in very high resolution settings, that the GPU is not the rate limiter.... increasing resolution increases the computational load on the GPU because the GPU is responsible for rendering the image not the CPU. The rate of progress for GPUs has been such that we are beginning to see GPUs surpass CPUs in their ability to perform such that CPUs are now the rate limiters. It happened with the G80 introduction and there AMD also was a limiter with the X2s of the time. We are seeing it again, where now the 280GTX and the 4870X2 are so fast, they can push the bottleneck back to the CPU even at resolutions as high as 1920x1200 full AA.
To evaulate the ability of a CPU to crunch gaming code -- use a high end graphics card with lower resolutions to ensure the CPU is the rate limiting step, thus you will see different FPS if you use faster or slower CPUs ... in fact, it is when you see the dependency of FPS on CPU clock speed that you can say... ahhhh, I am in a CPU limited situation.
This is why most gamers are going Intel at this time.... they want the CPU to be more powerful than the GPU .... within reason, high graphics cards are more expensive than CPUs.
Maybe an analogy will heip ---
A housing developer is building houses. He needs two things to build his houses lumber and nails. A lumber factor can produce enough lumber to build houses at a rate of one per week, the nail factor can make enough nails to produce one house per day. Given this information, what is the fastest the housing developer make houses? Answer: at most one house per week, he will always be waiting on lumber. Now, in another part of the country, the lumber mill is very fast it can supply enough lumber to make two houses per week, but the nail factory is very slow it can only supply enough nails to make one house per month. Question, how fast can the housing developer make houses? Answer: one house per month.
The concept of a bottleneck is just that, there can be only one.
Jack: You don't get the same framerate. You will get close to same framerate if one type of task is very slow but it will not be the same. If five task are done in parallel, then the slowest task will decide total speed. But if those five tasks are done in a series the total time will be the sum of all those five tasks.
Yes! This is very easy. But you don’t seem to understand? If you think it is like the weakest link in a chain isn’t right.
The sample you gave about hous, lumber and nails. That sample is a parallel situation. This situation isn’t like the gpu.
It would be more right to say that the developer creates the lumber and while he does that he can’t use the nails to build the house from lumber and nails. When the lumber is ok then he can build the house.
Lumber for one house = one week
Nails for one house = one day
Total time for building house = one week + one day for the developer
Wouldn't the best way to end this argument be to:
1. Find at what clock a C2D/Quad stops bottlenecking eg. a 4870
2. Find at what clock Phenom stops bottlenecking the same card
With all the variables accounted for, there should be a significant clock difference between the two platforms. Sorry if this has been addressed already. Secondly, could code optimization significantly skew performance in the case of CPUs?
does anyone think that maybe the intel system architecture with the memory controller and PCI-e controller on the same piece of silicon has a slight advantage over AMD's (with the PCI-e controller on the mobo-chipset and the memory controller on the CPU) as the GFX card in a intel system can do direct memory accesses a little quicker (less hops and less system overhead)?
just speculating, because the legionhardware results show the phenom bottlenecking the 4870x2 far harder than its slightly lower IPC can explain..
Yes it is... the GPU renders the scene, shades the pixels, yada yada, at the same time the CPU is calculuting the physics for the next frame, if the GPU finishes first it waits. If the CPU finishes first, it must wait for the GPU before it can do the next frame. One will limit the other depending on who finishes first. Period.Quote:
Yes! This is very easy. But you don’t seem to understand? If you think it is like the weakest link in a chain isn’t right.
The sample you gave about hous, lumber and nails. That sample is a parallel situation. This situation isn’t like the gpu.
This is why, in the 4870X2 Intel is faster, the GPU is so fast that it is now waiting on the CPU most all the time, hence, when CPUs change frequency (gets faster or slower) you see a response in the frame rate. On weaker GPUs, the CPU finishes first, waits on the GPU .. the GPU determines frame rate. So when the CPU clock varies, there is no systematic change in frame rate:
this is classic CPU limited behavior. Notice how the FPS repsonds with CPU speed using a 4870 X2 at 1920x1200 full AA.
http://www.legionhardware.com/Bench/...70_X2/ETQW.png
http://www.legionhardware.com/document.php?id=770&p=7
This is classic GPU limited behavior. using the much slower 4870 (non-X2) at a meager 1600x1200 full AA. Notice, how FPS does not change with clock speed or type of CPU. This is demonstrating a GPU limited regime.
http://www.firingsquad.com/hardware/...ges/ep1920.gif
http://www.firingsquad.com/hardware/...view/page9.asp
No ... this is insane, you don't understand what a bottleneck is.... this is your problem.
Lumber company can supply enough lumber to build the a house in a week.
The nail company can supply enough nails to build 1 house per day.
Day 1.... enough nails arrive to build the house, some lumber arrives. Day 2, day 3, day 4.... day 7 the total lumber arrives. Fastest the house builder can build houses is 1 per week, the nail company is not the RATE LIMITER, the lumber company is... just
i think i know what he is reffering to, it seems he says that there is some sort of pipeline an each thread gets processed one after another, hencethe "1/10 + 1/100 000".
I dont know much about programing, but i think this form of programing is kinda "antique", even on a singel core machine you can try to run more threads parallel where its possible.
Jack: Do you mean that calls to the video card are asynchronous?
No.... :)
Dude this is almost painful to watch.
A CPU crunches the physics, AI, and other non graphical portions of the game, it then wraps all that information up in a small package, sends it through the DX API where it is loaded into the command buffer for the GPU, the GPU uses that info and the local texture and vertex information video ram to render the frame. All the rendering duites have been moved of the CPU for more than 5 years now.
In a GPU limitation, before the CPU can send the next package of information, the GPU must complete it's work... conversely, in a CPU limited regime ... if the GPU finishes the frame before the CPU has completed the next frame, the GPU must wait until the CPU finishes it's work. This does not mean work cannot be done in parallel -- say the GPU is rendering frame 12110, the CPU can be working on the next frame 12111 AT THE SAME TIME -- but one will finish the task before the other -- it has to happen, in which case to 'synchronize' the next frame of rendering one will wait on the other or vice versa. The GPU shades the pixels, determines the visibility of the Z-buffer, applies the aliasing corrections. The CPU calculates the physics, collision boundaries, the AI, the animation of characters, etc. etc. BUT DOES NOT participate in creating the image this is why the GPU is called the Graphics Processing Unit, it processes the graphics. Changing resolution changes the load on the GPU not the CPU, which is why with weaker GPUs you can overwhelm the GPU at high resolutions and move into the GPU-limited regime.... all the data shows this... it is not difficult to see.
The slowest of the two will determine the observed frame rate... period. All the data around the web shows this to be true. It has nothing to do with how well or how poorly a CPU is threaded, it has everything to do with when does the CPU finish it's work relative to when the GPU finishes it's work. If the CPU is the slowest component it determines the output of the frame rate. If the GPU is the slowest it determines the observed output of the frame rate.
Read the whole thread, nVidia (even your link) shows the flow charts to figure out how to determine which one is the rate limiter.
If the CPU is the the limiter, then increasing the performance of the CPU will vary the FPS... (which is what the LegionHW data shows).... if the GPU is the slowest component, then varying the CPU speed with have no effect on FPS... as shown by the 4870 data in the same game, lower resolution but weaker GPU above from Firingsquad. This is not rocket science.
EDIT: Also -- it is a one way street -- the CPU is the host controller in the current program model --- I have linked in this thread references to you that explains this, the CPU recieves no data from the GPU, the CPU sends commands and object information (non-world assets to be exact) to the GPU buffer which initiates the GPU to do it's work. Spend some time researching it... it will educate you.
Remember, I told you shortly after the 4870 X2 launch that the card was so fast that most all situations at 1920x1200 would be CPU limited, and Phenom would be significantly behind... this is what the LegionHW data shows to be true... even the fastest Intel processor can still bottleneck this card in many games (Devil May Cry is an exception) at 1920x1200 full AA, the 4870 is one hellava card.
Question:
In these discussions at least I am talking generally. When you say gpu is working or cpu is working I mean the area that cpu uses and gpu uses. You seem to be very picky when you talk about gpu work and general when you talk about cpu work. GPU and this only is rendering and all other work is CPU to you?
The cpu waits for a lot of things, memory, cache, disk etc. If you are picky about one hardware you should be that with other hardware too.
if we take your pickiness about the gpu then. if the gpu is on heavy work and even if the frames are bufferd. it will be BOTH that decides total speed. The cpu (you say it to in your text) can't produce image after image stored as data in order for the gpu to process. It needs to wait for the gpu to process the frame and while it waits it can't work with new images. Suddenly the gpu is ready and then it can start to work again. If the cpu needs to wait for the gpu that doesn't mean that cpu time will be equal to 0% and gpu time will be 100%.
You could compare this with other types of applications. Take databases. They need fast harddrives. But you never say that the hard drive is bottlenecked by the processor or the cpu is bottlenecked by the hard drive. They both add up to total time. If one of the two will be very slow than that hardware may stand for almost 100% of total time and this I think you call that is bottlenecking the performance?
Buying a faster processor even if the hard drive uses 99% of total time will increase performance but it will not be noticed because other parts are using that much time.
Now if gpu can handle all data sent from the cpu and render faster and we use your pickines what cpu work is then this will not use up very little time. If you increase the resolution then the gpu will have harder to render all frames. Some frames then it may don't have the same speed as the cpu and other it does. Increasing more and more frames will stall the cpu. But this situation will be BOTH cpu and gpu time.
This isn't what I mean when I say gpu work or cpu work. When I say gpu and cpu work I mean general for both. cpu is works that only is used by the cpu and no video card. But as soon as the cpu sends data to the gpu then this is gpu work. If you are picky about hardware then you need to add latency for communication, memory etc.
http://img358.imageshack.us/img358/4...800x600sw9.jpg
:spam::spam::spam::spam::spam::spam::spam:
^
i loled :rofl:
Me too. Fits like a glove :D
Jack is clever, gosh is a retard, and that picture is well funny, lol.
Thing I don't get, is how DX canes so much CPU... It seems to have a mysterious 2ghz overhead.
With an efficient OpenGL engine you can underclock ur CPU to 500mhz, still push 300m polys/sec on an 8800, and run awesome physics with the spare 400 mhz...
jack: I have some new questions :)
Have been reading and I have some clues why the FSB problem is hard to find. I know that there are a lot here that believes this problem doesn't exists, my "problem" is that I have tested it with code and found problems. It is easy to create one application that will run much faster on AMD but those who use games to test CPU are sometimes saying that Intel is better all over. The problem could be that FPS is a very bad test to check how the game behaves even if you get exact time spans between frames. And if one think that fps is exact then it could be problematic to understand differences in how processors behave.
Situation 1:
There is a demanding scene for the video card (GPU) but the CPU is able to handle it well (the game is running at 30 FPS). Frames are triple buffered and that means that the video card is three frames behind the cpu. Responsiveness for the mouse is vital and if you press the mouse button for shooting the picture will render this three frames after the cpu got information about fire button pressed.
Will this make the game feel unresponsive?
Situation 2:
This again is a demanding scene for gpu and the cpu handles it well. Image is triple buffered (three frames are cued because cpu is much faster) and suddenly there is a need to reload some information from memory. This will stall the CPU but you can't see it on the frame rate because the GPU has three images to render. When the CPU is ready the GPU has only one frame cued (two frames was rendered during the CPU work) and the CPU starts to feed the gpu with data again. This could be a small stop in the game even if frames are produced.
This would mean that it isn't (it is inexact measurement) possible to check latency issues for the FSB (and other latency issues) in game checking frame rates because the video card will hide it?
Situation 3:
Two different CPU's are used. One is very good at synchronization between threads and one is bad.
This is a very demanding frame for the CPU that doesn't synchronize well, the scene is easy but something happens that makes the cpu to be delayed (synchronization of memory for a quad using the FSB while data is sent to the video card or thread is moved from one core to another). The CPU is delayed for just one frame. The video card this time has only one frame that it is able to cue and this is being rendered while the CPU does this synchronization (multiple threads are used).
The fast synchronize CPU is ready when half of the frame is ready in the video card and starts to produce new images.
The slow synchronize CPU is ready one half frame after the frame that is rendered is ready.
Let's say that this frame took 1/20 ( 0.05 seconds ) seconds to render if no cpu delay. On the fast synchronize cpu the new image was started to being produced 0.05 - 0.05/2 = 0.025 seconds after the frame before. On the slow cpu it was produced 0.05 + 0.05/2 = 0.075 seconds after.
The real delay between these frames comparing both processors would be 0.05 seconds. But when frames are checked it will show a difference for 0.025 seconds. The GPU is masking 0.025 seconds delay.
Situation 4:
Two different CPU's are used. One is C2D with extremely high clock. and fast big cache. The other is AMD Phenom. Better at more threads.
Testing a game that first is walking on a road for 100 seconds and suddenly you are attacked with lots of bombs using physics for 10 seconds.
First test is on 800x600: C2D produces a lot of frames walking on the road, GPU handles all frames and only two threads are used. C2D gets about 200 FPS on the road and AMD gets 100 FPS. When you are attacked two new threads are activated, one for physics and one AI thread for enemies. Here AMD gets 50 FPS and C2D gets 30 FPS. First test average will be higher on C2D.
Calculation:
C2D = (100 * 200 + 10 * 30) / 110 = 184
Phenom =( 100 * 100 + 10 * 50) / 110 = 94
Second test is on 1680x1050: Now the game has difficulties to render more than 50 FPS. So when walking on the road C2D and Phenom needs to wait, C2D waits more. On the attack scene though the GPU and Phenom will be exactly the same. But the C2D slows the GPU
Calculation:
C2D =( 100 * 50 + 10 * 30) / 110 = 48
Phenom = (100 * 50 + 10 * 50) / 110 = 50
Third test is on 1920x1200: The video card now has problems to render more than 30 FPS
C2D =( 100 * 30 + 10 * 30) / 110 = 30
Phenom = (100 * 30 + 10 * 30) / 110 = 30
Conclusions (if above conclusions is right) :
1: If the video card is buffering images and the video card is slower than the processor it is very hard (impossible) to find latency problems in the FSB checking frame rates.
2: It is very difficult to find the exact difference in the low FPS values (GPU slows it) if the game is suddenly is stalled by the processor because the video card hides some of it. If the image is buffered (two or more) it is even harder.
3: The CPU needs to be "close" to the current frame if you are on low FPS values if the game should be responsive and smooth.
4: It is difficult to find advantages in games for more cores and better synchronizations if you don't know exactly how the game is done and what's tested at that exact time.
5: FPS isn't one good measurement to find out how the game feels. It is possible to get high FPS values but the game could feel strange anyway.
gOJDO: Cut the crap, are you three years old?
Don't read the thread if you are so sensetive
Oh dear, this is difficult -- please empty your mind of what you have embedded in it .... :) The concept is really very simple ... two computing resources, one depends upon the other to finish, the slowest one to complete the duty assigned to it will be the rate determinant. This is the concept of a bottleneck. So yes, the GPU is the rendering processor and the CPU does all the other work not related to the rendering. This is a fact and not debatable, I will prove this to you.
Now, I will eventually address all your points... (your database analogy is completely irrelevant here, but I will discuss that later in a different post).
But let's focus on this first.... I thought I have been very clear.... the GPU is responsible for computing the transforms, shading, and texturing of the the 3D objects and tanslating them to a 2D image projected on a screen, the CPU is responsible for calculating the bad guy AI, collision boundaries, physics, and all things not related to producing the image -- this is not being general, I am being specific. The CPU will, for example, take a gun shot, calculate the trajectory in 3D space, and send coordinate information, frame by frame as the bullet/rocket/whatever travels through space, this info is sent to the GPU to render that shot as it flies through space, each frame getting a new set of 3D coordinates. The CPU does NOT calculate the pixel intensity nor the position on the 2D screen, this is the job of the GPU.
As I have made clear over and over, the CPU finishes the calculation for a frame and sends the information to a command buffer, the GPU takes the information from the command buffer and builds the scene. The CPU fills the buffer, the GPU empties it. Period.
Two scenarios ...
1) The CPU finishes it's work faster than the GPU can empty the buffer, the CPU must wait until the buffer is empty and available. This is GPU limited.
2) The GPU finishes the work fastest, and must wait for the CPU to fill the command buffer, this is CPU limited.
I have stated this over and over and over again, not based on my opinion or my preconceived ideas of how it works rather the concepts are based on researching the literature that describes in morbid detail of how it works. I have linked and provided details of the command-buffer relationship via this documentation over and over again. I fail to understand why you are incapable of accepting this as true. This is not ME SAYING THIS, this is the experts, people who specialize in the field of graphics processing, saying this.... you are not disagreeing with me, you are disagreeing with them.... either your right and they are wrong or they are right and you are wrong. I will choose to accept them as right.
A better look a the history of the GPU-CPU interaction is probably easier, so I will attempt this from that angle. Back, way way back, the one that started it all ... DOOM by iD software. Back then, there was not such thing as a 3D accelerator or GPU, all the work was done by the CPU, it did it's thing, and transposed the pixel by pixel information to the small little tiny bit of video ram that the RAMDAC used to produce the image to the screen. (This historically is called the framebuffer, errantly referenced by HW review sites for the entire video RAM, this is false and incorrect, and somewhat irrepsonsible of them).
As time progressed, 3DFx produced a 3D accelerator, which offloaded some of the massive calculations from the CPU and enabled faster rendering. nVidia entered the picture and ATI jumped on board, and moved some of more functions to the GPU, such as triangle setup and such. Up to this point, your concept of the CPU-GPU interaction is rooted squarely in the mid 1990's -- the problem is, it was not called a GPU then, it was called a 3D acclerator. More time progressed, and the last bit of the graphics pipeline finally made it all to the GPU, which was the transform and lighting (i.e. taking 3D coordinate, transforming to a 2D image, and shading pixels to represent different light intensities).
This is best summarized here by nVidia's little diagram from their technical brief when the last vestage of rendering transitioned from the CPU to the GPU: http://www.nvidia.com/object/Technical_Brief_TandL.html (see PDF file)
http://forum.xcpus.com/gallery/d/7768-1/cpu-gpu.jpg
Ultimately, by 2000 all rendering duties have now been transistioned to the GPU... to quote the same nVidia reference above:
The bolded is exactly what I have been telling you for the last 15+ pages of debate. This comes straight from people who make GPUs ... don't you think it is time you ask yourself ... "maybe I really don't understand how this works?"Quote:
ll of the work in the 3D graphics pipeline is divided between the CPU and the graphics processor.
The line that divides the CPU tasks from those performed on the graphics processor moves as the
capabilities of the graphics processor continue to grow. 1999 is the year when graphics processors with
integrated T&L engines can be sold at mainstream PC price points and create a compelling value
proposition for any PC user. Figure 8 graphically shows the growing role in the last few years of the
dedicated graphics processors for mainstream PCs. The complete graphics pipeline is now computed
by the graphics processing unit, hence the term GPU.
Now, what influence the amount of work the GPU must do ... well, resolution is one. Because it has to calculate the position and intensity of each pixel on the screen. At 640x480, it only needs to worry about 307,200 pixels, as well as each surface to map textures, etc. etc. but at 1920x1200 it needs to perform calculations of 2,304,000 pixels so you answer the question, if you run at 307,200 pixels then repeat the exact same run on the exact same GPU at 2,304,000 pixels which case will it take the GPU longer to finish one frame? (this should be a no-brainer question and hence rhetorical). What else influences the GPU? Antialiasing, because the GPU is now interpolating based on oversampling adjacent pixels, new attributes to smooth out edges and provide new details, adding more complexity to the workload.
So it makes sense -- if I change resolution and the frame rate changes, then over that span of resolution it must be GPU limited because I am changing the time it takes for the GPU to finish, hence it will affect the output if it is truly GPU limited.
So how can we test for CPU limited runs? Well, if the CPU is the limiting factor and I can change the amount of time it takes for the CPU to finish it's task, then it should affect the results? No? Of course it should.... so if I have a CPU that finishes its work in x time and try a different CPU that finishes in y time such that y < x then the FPS should change (get better) if it is CPU limited to begin with. At low resolutions -- where the GPU is not taxed, you see that difference easily on most mid-range to high end cards, i.e. a 2.6 GHz Phenom will produce higher FPS than a 2.3 GHz Phenom, a 2.5 GHz Q9300 will have higher FPS than either of those because it is faster (multithreaded games included, just search the data). This is simply the way it is... Phenom is not a bad gaming CPU at all, it can very well support FPS higher than the refresh rate of the monitor... so don't take this as a diss on the Phenom, it is a fine CPU. But in the computer science of the question, Intel has the faster CPU.
Another way of looking at it... if I have a setup that is GPU limited, i.e. the GPU determines the frame rate, then no matter the CPU the output FPS is the same. Now, I stick in a faster GPU ... that is I upgrade and try the same game, resolution again but with a range of CPUs varying in capability/speed... ahhh, the FPS now changes, and is higher ... this is what happened with the 4870 X2, it is such a powerful card it now moved most all games at 1920x1200 full AA to a CPU limited scenario -- this is the legionhardware data.
This concept is basic and standard, and even taught in some computer science courses (a little time spent on the Google machine yields oodles of information). I will leave you with this.... a PPT of a computer science course designed specifically for graphics and gaming programming. When you talk of thread sychronization, you are completely incorrect that it has anything to do with the ability of multithread on a CPU, but has everything to do with the GPU rendering thread (which is on the GPU, proprietary to the GPU architecture, and peformed by the GPU) and the CPU threads. They communicate, as I have pound on and on about, through the command buffer.
http://www.cse.ohio-state.edu/~crawf...chitecture.ppt
Slide 13
Odd a computer science course at a major univeristy would say this don't you think ... actually, no... because it is absolutely true. In fact, you should study all the information in this PPT if you can (if you have power point, if not download a viewer!), it will enlighten you! (NOTE: I just found this today, but it nicely summarizes everything I have been trying to beat into your head). When you say something like "when someone talks about a GPU limited game, they don't know how computers work" you are saying that people who do this for a living do not know how it works :) .... the professors name is on the title page of this PPT, email him if you like.Quote:
If this command buffer is drained empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster!
If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
So when you see two CPUs, one obviously faster than the other, yielding the same or approximately the same FPS then you must conclude it is GPU limited and no matter what you do, you will never get above that FPS under the stated conditions ... you can make it slower, just add a slower CPU until it becomes CPU limited, but you will never make it faster for that GPU, resolution, application combination.
Hence the GPU limited case of Enemy Territories (multithreaded game) on a 4870... the GPU is slower at rendering the frame than any of the CPUs to produce the necesary CPU side information:
http://www.firingsquad.com/hardware/...ges/ep1920.gif
So when you post information, like the lead post of this thread, showing multiple CPUs of different speed classes, yielding the same or roughly the same FPS -- you are going to get a chorus of 'GPU limited' rants like mine, because this is what is going on. Period.
I will address your other points in follow up posts.
Jack
Jack: Yes I see what you mean. This is just like two threads running in parallel and all stuff about FSB, memory, hypertransport is the CPU management (CPU work).
But that also means that the frame rate isn't good for testing how smooth the game is. It's just frames that may be a bit old because how "up to date" the picture is depends on when the cpu did its work for it.
If the CPU decides frame rate speed and it is above 30 to 40 FPS and LOW FPS is above 25 then the game will be smooth. But if the game has an average 60 FPS and it is the GPU that slows the game to that frame rate the game could in fact behave more unresponsive and delayed compared to when the slower frame rate decided by the cpu.
gOJDO: When the frame rate depends on the GPU you don't measure the CPU. AND if the CPU is ALL computer works then this is also the mouse work as one example.
GPU processor and CPU processor don't know about each other and GPU is in fact asynchronous. That means that there isn't any way for the processor to know when exactly the frame was produced. If it doesn't know that it can't calculate the exact difference in time in order to know how much movement etc that should be done.
In this discussion this will mask bottlenecks in the cpu if you test the gpu
w = work
i = idle
Code:Sample (only one frame is buffered):
GPU |wwwwwwww| wwwwwwww | wwwwwwww |
CPU |wwwiiiii| wwwiiiii | wwwiiiii |
In this situation the picture that will be shown on the
screen is (big W) wwWiiiii.
Now something happens and
the cpu needs to do extra work
GPU |wwwwwwww| wwwwwwwwiii | wwwwwwww |
CPU |wwwiiiii| wwwwwwwwwww | wwwiiiii |
Here the difference for the two pictures when cpu needed to do extra work
is: wwWiiiii wwwwwwwwwwW 16 "units" But the frame rate is 11 "units" (MAX)
Another example (most extreme) with no difference in GPU frame rate
GPU |wwwwwwww| wwwwwwww | wwwwwwww |
CPU |wiiiiiii| wwwwwwww | wwwiiiii |
Here the difference for the two pictures when cpu needed to
do extra work is: Wiiiiiii wwwwwwwW 15 "units" Frame rate is 8 "units"
If you buffer more than one picture this error will increase.
Congratulations!!! 20 pages of BS and you are still failing to make a point.
When I think that you have posted the greatest crapload of BS ever, you come with more marvelous BS. I fail to realize how you came to your theories, but if you have ever tried to understand anything related to how CPU & GPU work you have understood everything completely wrong.
Actually the point all together is in truth you didn't even have a clue of what really happens between the cpu and gpu from the time you started this thread and you still seem very lost or hard headed one. You should be posting a page long thanking JumpingJack for the free education.
Man this thread is so awesome. Just one recommendation to Jack: stop writing that kind of posts, not worth the effort with some guys. Save your time :)
Jack is like an mixture of Gandhi and Mother Theresa.
After the first gems, I have preserved my brain by not reading those walls of text written by gosh.
Yes I have read about how some solutions in hardware are done. But this isn't a rule when a programmer develops against one driver. How the video card works or the driver works isn't something that the programmer needs to know and different vendors for video cards can solve API's as they like. The programmer just call api's then and measure time spans.
And this isn't what really what this thread is about. it's about how CPU works and why there are so much intel fanboys out there.
Whats been said in this thread about how the videocard works is just that if you buy a Intel processor your computer will use a lot more power because it will produce frame rates that isn't needed (max fps and average fps). Also checking how this works will rule out the need for faster processors in current games (those that have been tested). Intel runs good when it finds data in the cache (almost20 times faster compared to memory). And framerates for this are sky high for fast video cards. If the processor really needs to work for the game using threads it is a different scenario.
The reason why I am in this discussion is that I think this is strange, it got me curious. Doing other types of development it is easy to check different scenarios when amd is better compared to intel and intel better compared to amd. Why are people willing to spend more money for something that isn't noticed?
Also most know the behavior for intels processors. it has worked on it's fast parts and skipped the weak parts.
Oh my gosh.
Are we riding a marry go round ?
Once again you're implying that AMD is faster in games and specifically that it gives you better ( higher ) minimum framerates ( which isn't true ).
I for one give up, nobody and nothing can change your mind.
Yes I have read about how some solutions in hardware are done. But this isn't a rule when a programmer develops against one driver. How the video card works or the driver works isn't something that the programmer needs to know and different vendors for video cards can solve API's as they like. The programmer just call api's then and measure time spans.
And this isn't what really what this thread is about. it's about how CPU works and why there are so much intel fanboys out there.
Whats been said in this thread about how the videocard works is just that if you buy a Intel processor your computer will use a lot more power because it will produce frame rates that isn't needed (max fps and average fps). Also checking how this works will rule out the need for faster processor for games in current games (those that have been tested). Intel runs good when it finds data in the cache (almost 20 times faster compared to memory). And framerates for this are sky high for fast video cards. If the processor really needs to work for the game using threads it is a different scenario.
The reason why I am in this discussion is that I think this is strange, it got me curious. Doing other types of development it is easy to check different scenarios when amd is better compared to intel and intel better compared to amd. Why are people willing to spend more money for something that isn't noticed?
Also most know the behavior for intels processors. it has worked on it's fast parts and skipped the weak parts.
Doubtful because:
1) Clock per clock the Core 2 CPUs are faster than the Phenoms
2) The operating frequencies of the Core 2 CPUs are way higher than those of AMD's Phenoms.
3) The "scaling" advantage of AMD's with Quad-Cores isn't good enough to cover the Clock per clock performance gap.
4) Unfortunately in most games ( if not all ) in real-life gaming settings ( resolutions, game details and AA/AF ) the minimum framerate depends on the graphics card, just like the average & maximum framerates.
But i can assure you that no game developer will develop a game that needs one processor that is running at 3.0 GHz or more (few buyers). Whats very easy to check for the programmer is how the processor works for the game. If it is raw clocks that matters this will be noticed immediately. if the processor slows the game it will probably be because there are something strange happening. Maybe the cache needs to be refreshed or latency for something else will be high. This is harder for the developer to check.
omfg :ROTF:... this post of yours leave only to conclusions:
a) your a total amd fanboy or
b) your preception dont even reaches beyoned your house door and you reject reality and substitute it with your own. (god, finally i could use that quote :ROTF:)
So much BS in that bold highlighted part, it isn't even funny any more...
Just in case it sliped your attention:
Phenom consumes more power while delivering less performance then C2Q.
http://www.computerbase.de/artikel/h...stungsaufnahme
this chart show full load with cpu (prime 95) and gpu (Firefly Forest“ & „Canyon Flight“ endlessloop 3dmark06)
clock for clock amds phenom consume 3% more power then a kentsfield and 20% more then a yorkfield... all with the same graphics card....
And the third option: c) He's just trolling us.
I vote for a mixture of a) + b) + c). I can't believe it if not :ROTF:
Well, for each amd fanboy you will find at least 10 intel fanboys. and they are very sensitive to hear something that says that amd could be better
My friend and I just tested computers running at idle to see how much it was. He has one E6600 and one 7900 GTX. It used 140 watt idle. I had one opteron 165 and 7900 GTX and it was using 130 watt idle. Also tested other computers and the strange thing is that they often seem to draw less power compared to intel on idle. Now I have one using 9750 and ATI HD3850. It uses 100 watt idle, one server with X2 3600+ and that is using 52 watt. It is from the wall.
GPU's are using much more power, and CPU's isn't maxed out (quads)
I have no doubt whatsoever he is trolling.
If you are an AMDZone True Believer as he is, then your warped perceptions can never be altered and that is why I posted to Jack earlier in this thread the below:
Jack is completely wrong to think Gosh has taken on anything he has been told, all he has done is zigged and zagged to keep his trolling efforts alive.
Also I think he has taken pleasure in getting Jack to do so much leg work all for nothing(well at least as far as Gosh is concerned), so that is why he keeps writing nonsense replies hoping to get Jack to keep wasting hours of his time.
Only when that something is complete and utter bullsh1t.
If it is true like K8 was better than P4 for gaming, you will get no dispute, but when a AMDZone loopie wants to claim that clock for clock the K10 is better than Penryn, then of course the AMDZone loopie's claims will be rubbished for the nonsense that they are.
I think that all has learned quite a bit in this thread. Check whats been said at the start and you will find that much of those things has been cleared later in thread.
The problem to talk about good parts for amd is well known. Most forums have quite a bit of people that likes intel and they will immediately hit on amd talks. it is even hard to ask about processors, they just can't handle it
I haven't seen anyone say that the cpu and gpu runs asynchronously which they in fact does
from market's perspective
1) insignificantly in games
2) for a cost
3) scaling is the same as for Intel's CPUs. Platform's scaling is higher
4) true, though not sure about the "unfortunately" part. There should be a reason to buy high-end GPUs :D
From Mods point of view
I think some of you guys need to edit your posts as per forum policies to conduct clean no-flaming-or-name-calling policy.
Apparently these people cannot oppose in a polite manner
that view is so faulted its not even funny...
1) you compare a 2,4ghz proc to 1,8ghz proc,
2) you compare different systems with different psus -> different efficiency -> different results (and they are quite massive depending on the what psus are used)
3) i gave you a review that showes both cpu and gpu maxed out, with same psu and same graphic cards.
It's time to phrase some Willi Wonka:
It's all there, black and white, clear as crystal! You lose! Good day sir!