sure there is a syncronisation between them...
how on earth would the render thread else know what the physics or AI thread is doing and where to draw the correct particle, etc.?
The slowest part of the system determines the overall performance, if the cpu lacks the adequate power you gimp the graphics card, period.
And we are back the what jack said, and why everyone is using C2D in reviews.
exactly! and that means that there is no 100% bottleneck by any hardware that participate in the rendering of a frame. Render thread (if they use one thread for this) needs to wait for gpu and other threads needs to synchronize with the render thread. More threads that sends data to the gpu and they all need to serialize data sent.
It is extremely difficult to do asynchronous development. One area where this is done is sending data over the internet (sockets).
if there is heavy threading and much memory use then AMD might work faster and thereby use less time. If you run on some way and mostly one thread is used, maybe less memory is used, no physics, then C2D is faster compared to AMD.
How the processor works for that game depends on what type of actions that is done in the game.
This is so incredible, even a moron could figure it out. The GPU renders the scene, the CPU calculates the physics, the AI, etc. etc. If the GPU finishes rendering a frame before the CPU finishes the subsequent calculations the next and subsequent frames, the GPU must wait until it has that information to start the next rendering. Conversely, if the CPU finishes it's calculations for the next frame before the GPU finishes rendering the scene, the CPU must wait. One will bottleneck the other... guaranteed.
When the result of to computational resources are inter-depending on one or the other two finish, there will always be a case where one limits the ability of the other.
Ooodles and oodles of data show this to be true ... the LegionHardware article shows the phenom bottlnecking a 4870 X2, severely.
I have shown lost planet, 3DMark data, company of heros, world in conflict ... the lost planet is a good one as it produces two different scenes one GPU limited, the other CPU limited... how hard is this to figure out .
IT HAS NOTHING TO DO with threads.... one thread or 4 threads, Intel is faster at producing a result, thus it will show better performance in the absence of a GPU limitiation... hence, a 4870 X2 (the fastest single card solution you can get) ends up FASTER on Intel -- this is clear from what i posted, this is clear from what LegionHardware showed. Company of Heros, Quake Enemy Territory, Crysis, Unreal 3, lost planet, ALL are MULTITRHEADED, ALL ARE FASTER ON AN INTEL + 4870 X2 than on a PHENOM + 4870 X2 (AT THE SAME FRIGGIN' clock meaning Intel is superior clock for clock, higher IPC). Heck, legion showed the Q6600 beating the 3.0 GHz OC Phenom in most cases.
If you use a 3870 ... even a medicroe dual core is fine, the GPU is so slow ... the CPU makes NO DIFFERENCE. Hence, the Phenom 'appears' to you to be equivalent to an Intel... this is not TRUE ... the GPU set the framerate at those resolutions.
How someone cannot poor over the data, all the links provided, and not understand this is incomprehensible.
It is not saying anything bad about AMD when one states the obvious, AMD has a weaker architecture and cannot clock as high. This does not make the Phenom a bad processor, but in a two horse race someone comes in second. For the past 2 years this has been AMD, the data is irrefutable.
Does it make a difference in the gaming experience ... nope, Phenom is completely capable of supporting the necessary frame rates to make a good gaming system, but that is not the same as comparing them then making some off the wall, illogical, and incorrect statement on the capability of the CPU when you show a GPU-limited data point.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Jack. The CPU waits for the GPU and the GPU waits for the CPU. They bottleneck eachother because they will always wait for each other. If you have a processor that is running at 1 GHz and compares this with a processor that is running at 2 GHz, Then the 2 GHz processor will perform the CPU computation twice as fast. But they still needs to wait for each other and if you call this waiting for bottlenecking then they will both bottleneck the other. The 1 GHz CPU is a bigger bottleneck compared to the 2 GHz cpu but they both are bottlenecks. The video card is also a bottleneck. ALL is bottlenecks.
What I don’t understand in these discussions is why you just talk about one of the two. Slower memory compared to faster memory is also something that both is bottlenecks but the slower memory will be a bigger bottleneck compared to the faster memory.
The video card will always perform as fast as that video card can when data has arrived. It doesn’t matter which processor that is feeding the video card with data. If it is a slow processor it will need to wait a bit longer before has arrived but when the data is there the speed is the same.
When someone say, this cpu is bad for the video card you might think that the videocard will work more slowly. It doesn’t, it will always work at the same speed. The only think that works more slowly is the processor if it is a slower processor. But the video card needs to wait for the processor, it doesn’t matter if the processor is slow or fast.
About Intel compared to AMD they are good at different things. These processor are built differently. And when AMD is better compared to Intel or Intel is better compared to AMD depends on the type of code they are running.
If you read some game code you will se that most games are thinking more of the Intel processors in order to avoid slow areas for that processor because Intel is a much more common processor.
The strong area for Intel is the HUGE L2 cache and the weak area is communication (high latency). Things like branch prediction etc does very little to total performance. And this is a more important feature for Intel because intel is sensitive for external communication. All processors has branch prediction and all processor handle the most common case best. That is that the branch condition isn’t taken. I think that almost all more advanced programmers know this.
No one, and I mean no one, can lack this level of conceptual fortitude....
PRECISELY!!! This is it! What can't you understand....The video card will always perform as fast as that video card can when data has arrived. It doesn’t matter which processor that is feeding the video card with data. If it is a slow processor it will need to wait a bit longer before has arrived but when the data is there the speed is the same.
Gosh.... a there can only be ONE bottlneck at any given time, either the GPU is too slow and it will bottleneck or the CPU will be too slow and it will bottleneck. A system, regardless of it being a computer, a chemical reaction, anything occcuring in time with a rate of output will be dependent upon the SLOWEST step. Period.
This is where the cliche' -- 'a chain is only as strong as it's weakest link' orginates.
One or the other will be the slower link in the chain, that weakest link will determine the computational rate at which it can complete. This must be true....
This is basic, not even basic, it is just simple common sense.
If you pair a Phenom at 5 GHz or a Intel quad at 4 GHz with a nVidia 8600 GT and run any game at 1920x1200, you will get the same frame rates.... why? Because the GPU is slowest of this situation....
If you pair a Sempron 2800+ or a celeron 550 with a quad cross fire 4870X2 setup, you will get the same framerates for the same systems with a 8600GT because the CPUS are TOO slow!
On average, Phenom's appear to be the same as an Intel CPU in your original post because the GPUs are capping the frame rates. Intel slaughters the Phenom with 4870 X2's (all the data I have shown, all the data shown at LegionHW) because the 4870 X2 is faster at completing it's rendering tasks than what the Phenom can supply it with the next step of information. It is rare, that in very high resolution settings, that the GPU is not the rate limiter.... increasing resolution increases the computational load on the GPU because the GPU is responsible for rendering the image not the CPU. The rate of progress for GPUs has been such that we are beginning to see GPUs surpass CPUs in their ability to perform such that CPUs are now the rate limiters. It happened with the G80 introduction and there AMD also was a limiter with the X2s of the time. We are seeing it again, where now the 280GTX and the 4870X2 are so fast, they can push the bottleneck back to the CPU even at resolutions as high as 1920x1200 full AA.
To evaulate the ability of a CPU to crunch gaming code -- use a high end graphics card with lower resolutions to ensure the CPU is the rate limiting step, thus you will see different FPS if you use faster or slower CPUs ... in fact, it is when you see the dependency of FPS on CPU clock speed that you can say... ahhhh, I am in a CPU limited situation.
This is why most gamers are going Intel at this time.... they want the CPU to be more powerful than the GPU .... within reason, high graphics cards are more expensive than CPUs.
Maybe an analogy will heip ---
A housing developer is building houses. He needs two things to build his houses lumber and nails. A lumber factor can produce enough lumber to build houses at a rate of one per week, the nail factor can make enough nails to produce one house per day. Given this information, what is the fastest the housing developer make houses? Answer: at most one house per week, he will always be waiting on lumber. Now, in another part of the country, the lumber mill is very fast it can supply enough lumber to make two houses per week, but the nail factory is very slow it can only supply enough nails to make one house per month. Question, how fast can the housing developer make houses? Answer: one house per month.
The concept of a bottleneck is just that, there can be only one.
Last edited by JumpingJack; 09-08-2008 at 12:44 AM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Jack: You don't get the same framerate. You will get close to same framerate if one type of task is very slow but it will not be the same. If five task are done in parallel, then the slowest task will decide total speed. But if those five tasks are done in a series the total time will be the sum of all those five tasks.
Yes! This is very easy. But you don’t seem to understand? If you think it is like the weakest link in a chain isn’t right.
The sample you gave about hous, lumber and nails. That sample is a parallel situation. This situation isn’t like the gpu.
It would be more right to say that the developer creates the lumber and while he does that he can’t use the nails to build the house from lumber and nails. When the lumber is ok then he can build the house.
Lumber for one house = one week
Nails for one house = one day
Total time for building house = one week + one day for the developer
Wouldn't the best way to end this argument be to:
1. Find at what clock a C2D/Quad stops bottlenecking eg. a 4870
2. Find at what clock Phenom stops bottlenecking the same card
With all the variables accounted for, there should be a significant clock difference between the two platforms. Sorry if this has been addressed already. Secondly, could code optimization significantly skew performance in the case of CPUs?
does anyone think that maybe the intel system architecture with the memory controller and PCI-e controller on the same piece of silicon has a slight advantage over AMD's (with the PCI-e controller on the mobo-chipset and the memory controller on the CPU) as the GFX card in a intel system can do direct memory accesses a little quicker (less hops and less system overhead)?
just speculating, because the legionhardware results show the phenom bottlenecking the 4870x2 far harder than its slightly lower IPC can explain..
Yes it is... the GPU renders the scene, shades the pixels, yada yada, at the same time the CPU is calculuting the physics for the next frame, if the GPU finishes first it waits. If the CPU finishes first, it must wait for the GPU before it can do the next frame. One will limit the other depending on who finishes first. Period.Yes! This is very easy. But you don’t seem to understand? If you think it is like the weakest link in a chain isn’t right.
The sample you gave about hous, lumber and nails. That sample is a parallel situation. This situation isn’t like the gpu.
This is why, in the 4870X2 Intel is faster, the GPU is so fast that it is now waiting on the CPU most all the time, hence, when CPUs change frequency (gets faster or slower) you see a response in the frame rate. On weaker GPUs, the CPU finishes first, waits on the GPU .. the GPU determines frame rate. So when the CPU clock varies, there is no systematic change in frame rate:
this is classic CPU limited behavior. Notice how the FPS repsonds with CPU speed using a 4870 X2 at 1920x1200 full AA.
http://www.legionhardware.com/document.php?id=770&p=7
This is classic GPU limited behavior. using the much slower 4870 (non-X2) at a meager 1600x1200 full AA. Notice, how FPS does not change with clock speed or type of CPU. This is demonstrating a GPU limited regime.
![]()
http://www.firingsquad.com/hardware/...view/page9.asp
No ... this is insane, you don't understand what a bottleneck is.... this is your problem.
Lumber company can supply enough lumber to build the a house in a week.
The nail company can supply enough nails to build 1 house per day.
Day 1.... enough nails arrive to build the house, some lumber arrives. Day 2, day 3, day 4.... day 7 the total lumber arrives. Fastest the house builder can build houses is 1 per week, the nail company is not the RATE LIMITER, the lumber company is... just
Last edited by JumpingJack; 09-11-2008 at 06:26 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
i think i know what he is reffering to, it seems he says that there is some sort of pipeline an each thread gets processed one after another, hencethe "1/10 + 1/100 000".
I dont know much about programing, but i think this form of programing is kinda "antique", even on a singel core machine you can try to run more threads parallel where its possible.
Last edited by JumpingJack; 09-08-2008 at 05:36 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Jack: Do you mean that calls to the video card are asynchronous?
No....![]()
Dude this is almost painful to watch.
A CPU crunches the physics, AI, and other non graphical portions of the game, it then wraps all that information up in a small package, sends it through the DX API where it is loaded into the command buffer for the GPU, the GPU uses that info and the local texture and vertex information video ram to render the frame. All the rendering duites have been moved of the CPU for more than 5 years now.
In a GPU limitation, before the CPU can send the next package of information, the GPU must complete it's work... conversely, in a CPU limited regime ... if the GPU finishes the frame before the CPU has completed the next frame, the GPU must wait until the CPU finishes it's work. This does not mean work cannot be done in parallel -- say the GPU is rendering frame 12110, the CPU can be working on the next frame 12111 AT THE SAME TIME -- but one will finish the task before the other -- it has to happen, in which case to 'synchronize' the next frame of rendering one will wait on the other or vice versa. The GPU shades the pixels, determines the visibility of the Z-buffer, applies the aliasing corrections. The CPU calculates the physics, collision boundaries, the AI, the animation of characters, etc. etc. BUT DOES NOT participate in creating the image this is why the GPU is called the Graphics Processing Unit, it processes the graphics. Changing resolution changes the load on the GPU not the CPU, which is why with weaker GPUs you can overwhelm the GPU at high resolutions and move into the GPU-limited regime.... all the data shows this... it is not difficult to see.
The slowest of the two will determine the observed frame rate... period. All the data around the web shows this to be true. It has nothing to do with how well or how poorly a CPU is threaded, it has everything to do with when does the CPU finish it's work relative to when the GPU finishes it's work. If the CPU is the slowest component it determines the output of the frame rate. If the GPU is the slowest it determines the observed output of the frame rate.
Read the whole thread, nVidia (even your link) shows the flow charts to figure out how to determine which one is the rate limiter.
If the CPU is the the limiter, then increasing the performance of the CPU will vary the FPS... (which is what the LegionHW data shows).... if the GPU is the slowest component, then varying the CPU speed with have no effect on FPS... as shown by the 4870 data in the same game, lower resolution but weaker GPU above from Firingsquad. This is not rocket science.
EDIT: Also -- it is a one way street -- the CPU is the host controller in the current program model --- I have linked in this thread references to you that explains this, the CPU recieves no data from the GPU, the CPU sends commands and object information (non-world assets to be exact) to the GPU buffer which initiates the GPU to do it's work. Spend some time researching it... it will educate you.
Remember, I told you shortly after the 4870 X2 launch that the card was so fast that most all situations at 1920x1200 would be CPU limited, and Phenom would be significantly behind... this is what the LegionHW data shows to be true... even the fastest Intel processor can still bottleneck this card in many games (Devil May Cry is an exception) at 1920x1200 full AA, the 4870 is one hellava card.
Last edited by JumpingJack; 09-08-2008 at 10:09 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Question:
In these discussions at least I am talking generally. When you say gpu is working or cpu is working I mean the area that cpu uses and gpu uses. You seem to be very picky when you talk about gpu work and general when you talk about cpu work. GPU and this only is rendering and all other work is CPU to you?
The cpu waits for a lot of things, memory, cache, disk etc. If you are picky about one hardware you should be that with other hardware too.
if we take your pickiness about the gpu then. if the gpu is on heavy work and even if the frames are bufferd. it will be BOTH that decides total speed. The cpu (you say it to in your text) can't produce image after image stored as data in order for the gpu to process. It needs to wait for the gpu to process the frame and while it waits it can't work with new images. Suddenly the gpu is ready and then it can start to work again. If the cpu needs to wait for the gpu that doesn't mean that cpu time will be equal to 0% and gpu time will be 100%.
You could compare this with other types of applications. Take databases. They need fast harddrives. But you never say that the hard drive is bottlenecked by the processor or the cpu is bottlenecked by the hard drive. They both add up to total time. If one of the two will be very slow than that hardware may stand for almost 100% of total time and this I think you call that is bottlenecking the performance?
Buying a faster processor even if the hard drive uses 99% of total time will increase performance but it will not be noticed because other parts are using that much time.
Now if gpu can handle all data sent from the cpu and render faster and we use your pickines what cpu work is then this will not use up very little time. If you increase the resolution then the gpu will have harder to render all frames. Some frames then it may don't have the same speed as the cpu and other it does. Increasing more and more frames will stall the cpu. But this situation will be BOTH cpu and gpu time.
This isn't what I mean when I say gpu work or cpu work. When I say gpu and cpu work I mean general for both. cpu is works that only is used by the cpu and no video card. But as soon as the cpu sends data to the gpu then this is gpu work. If you are picky about hardware then you need to add latency for communication, memory etc.
Me too. Fits like a glove![]()
Jack is clever, gosh is a retard, and that picture is well funny, lol.
Thing I don't get, is how DX canes so much CPU... It seems to have a mysterious 2ghz overhead.
With an efficient OpenGL engine you can underclock ur CPU to 500mhz, still push 300m polys/sec on an 8800, and run awesome physics with the spare 400 mhz...
Last edited by brenanh; 09-10-2008 at 02:02 PM.
jack: I have some new questions
Have been reading and I have some clues why the FSB problem is hard to find. I know that there are a lot here that believes this problem doesn't exists, my "problem" is that I have tested it with code and found problems. It is easy to create one application that will run much faster on AMD but those who use games to test CPU are sometimes saying that Intel is better all over. The problem could be that FPS is a very bad test to check how the game behaves even if you get exact time spans between frames. And if one think that fps is exact then it could be problematic to understand differences in how processors behave.
Situation 1:
There is a demanding scene for the video card (GPU) but the CPU is able to handle it well (the game is running at 30 FPS). Frames are triple buffered and that means that the video card is three frames behind the cpu. Responsiveness for the mouse is vital and if you press the mouse button for shooting the picture will render this three frames after the cpu got information about fire button pressed.
Will this make the game feel unresponsive?
Situation 2:
This again is a demanding scene for gpu and the cpu handles it well. Image is triple buffered (three frames are cued because cpu is much faster) and suddenly there is a need to reload some information from memory. This will stall the CPU but you can't see it on the frame rate because the GPU has three images to render. When the CPU is ready the GPU has only one frame cued (two frames was rendered during the CPU work) and the CPU starts to feed the gpu with data again. This could be a small stop in the game even if frames are produced.
This would mean that it isn't (it is inexact measurement) possible to check latency issues for the FSB (and other latency issues) in game checking frame rates because the video card will hide it?
Situation 3:
Two different CPU's are used. One is very good at synchronization between threads and one is bad.
This is a very demanding frame for the CPU that doesn't synchronize well, the scene is easy but something happens that makes the cpu to be delayed (synchronization of memory for a quad using the FSB while data is sent to the video card or thread is moved from one core to another). The CPU is delayed for just one frame. The video card this time has only one frame that it is able to cue and this is being rendered while the CPU does this synchronization (multiple threads are used).
The fast synchronize CPU is ready when half of the frame is ready in the video card and starts to produce new images.
The slow synchronize CPU is ready one half frame after the frame that is rendered is ready.
Let's say that this frame took 1/20 ( 0.05 seconds ) seconds to render if no cpu delay. On the fast synchronize cpu the new image was started to being produced 0.05 - 0.05/2 = 0.025 seconds after the frame before. On the slow cpu it was produced 0.05 + 0.05/2 = 0.075 seconds after.
The real delay between these frames comparing both processors would be 0.05 seconds. But when frames are checked it will show a difference for 0.025 seconds. The GPU is masking 0.025 seconds delay.
Situation 4:
Two different CPU's are used. One is C2D with extremely high clock. and fast big cache. The other is AMD Phenom. Better at more threads.
Testing a game that first is walking on a road for 100 seconds and suddenly you are attacked with lots of bombs using physics for 10 seconds.
First test is on 800x600: C2D produces a lot of frames walking on the road, GPU handles all frames and only two threads are used. C2D gets about 200 FPS on the road and AMD gets 100 FPS. When you are attacked two new threads are activated, one for physics and one AI thread for enemies. Here AMD gets 50 FPS and C2D gets 30 FPS. First test average will be higher on C2D.
Calculation:
C2D = (100 * 200 + 10 * 30) / 110 = 184
Phenom =( 100 * 100 + 10 * 50) / 110 = 94
Second test is on 1680x1050: Now the game has difficulties to render more than 50 FPS. So when walking on the road C2D and Phenom needs to wait, C2D waits more. On the attack scene though the GPU and Phenom will be exactly the same. But the C2D slows the GPU
Calculation:
C2D =( 100 * 50 + 10 * 30) / 110 = 48
Phenom = (100 * 50 + 10 * 50) / 110 = 50
Third test is on 1920x1200: The video card now has problems to render more than 30 FPS
C2D =( 100 * 30 + 10 * 30) / 110 = 30
Phenom = (100 * 30 + 10 * 30) / 110 = 30
Conclusions (if above conclusions is right) :
1: If the video card is buffering images and the video card is slower than the processor it is very hard (impossible) to find latency problems in the FSB checking frame rates.
2: It is very difficult to find the exact difference in the low FPS values (GPU slows it) if the game is suddenly is stalled by the processor because the video card hides some of it. If the image is buffered (two or more) it is even harder.
3: The CPU needs to be "close" to the current frame if you are on low FPS values if the game should be responsive and smooth.
4: It is difficult to find advantages in games for more cores and better synchronizations if you don't know exactly how the game is done and what's tested at that exact time.
5: FPS isn't one good measurement to find out how the game feels. It is possible to get high FPS values but the game could feel strange anyway.
gOJDO: Cut the crap, are you three years old?
Don't read the thread if you are so sensetive
Oh dear, this is difficult -- please empty your mind of what you have embedded in it ....The concept is really very simple ... two computing resources, one depends upon the other to finish, the slowest one to complete the duty assigned to it will be the rate determinant. This is the concept of a bottleneck. So yes, the GPU is the rendering processor and the CPU does all the other work not related to the rendering. This is a fact and not debatable, I will prove this to you.
Now, I will eventually address all your points... (your database analogy is completely irrelevant here, but I will discuss that later in a different post).
But let's focus on this first.... I thought I have been very clear.... the GPU is responsible for computing the transforms, shading, and texturing of the the 3D objects and tanslating them to a 2D image projected on a screen, the CPU is responsible for calculating the bad guy AI, collision boundaries, physics, and all things not related to producing the image -- this is not being general, I am being specific. The CPU will, for example, take a gun shot, calculate the trajectory in 3D space, and send coordinate information, frame by frame as the bullet/rocket/whatever travels through space, this info is sent to the GPU to render that shot as it flies through space, each frame getting a new set of 3D coordinates. The CPU does NOT calculate the pixel intensity nor the position on the 2D screen, this is the job of the GPU.
As I have made clear over and over, the CPU finishes the calculation for a frame and sends the information to a command buffer, the GPU takes the information from the command buffer and builds the scene. The CPU fills the buffer, the GPU empties it. Period.
Two scenarios ...
1) The CPU finishes it's work faster than the GPU can empty the buffer, the CPU must wait until the buffer is empty and available. This is GPU limited.
2) The GPU finishes the work fastest, and must wait for the CPU to fill the command buffer, this is CPU limited.
I have stated this over and over and over again, not based on my opinion or my preconceived ideas of how it works rather the concepts are based on researching the literature that describes in morbid detail of how it works. I have linked and provided details of the command-buffer relationship via this documentation over and over again. I fail to understand why you are incapable of accepting this as true. This is not ME SAYING THIS, this is the experts, people who specialize in the field of graphics processing, saying this.... you are not disagreeing with me, you are disagreeing with them.... either your right and they are wrong or they are right and you are wrong. I will choose to accept them as right.
A better look a the history of the GPU-CPU interaction is probably easier, so I will attempt this from that angle. Back, way way back, the one that started it all ... DOOM by iD software. Back then, there was not such thing as a 3D accelerator or GPU, all the work was done by the CPU, it did it's thing, and transposed the pixel by pixel information to the small little tiny bit of video ram that the RAMDAC used to produce the image to the screen. (This historically is called the framebuffer, errantly referenced by HW review sites for the entire video RAM, this is false and incorrect, and somewhat irrepsonsible of them).
As time progressed, 3DFx produced a 3D accelerator, which offloaded some of the massive calculations from the CPU and enabled faster rendering. nVidia entered the picture and ATI jumped on board, and moved some of more functions to the GPU, such as triangle setup and such. Up to this point, your concept of the CPU-GPU interaction is rooted squarely in the mid 1990's -- the problem is, it was not called a GPU then, it was called a 3D acclerator. More time progressed, and the last bit of the graphics pipeline finally made it all to the GPU, which was the transform and lighting (i.e. taking 3D coordinate, transforming to a 2D image, and shading pixels to represent different light intensities).
This is best summarized here by nVidia's little diagram from their technical brief when the last vestage of rendering transitioned from the CPU to the GPU: http://www.nvidia.com/object/Technical_Brief_TandL.html (see PDF file)
Ultimately, by 2000 all rendering duties have now been transistioned to the GPU... to quote the same nVidia reference above:
The bolded is exactly what I have been telling you for the last 15+ pages of debate. This comes straight from people who make GPUs ... don't you think it is time you ask yourself ... "maybe I really don't understand how this works?"ll of the work in the 3D graphics pipeline is divided between the CPU and the graphics processor.
The line that divides the CPU tasks from those performed on the graphics processor moves as the
capabilities of the graphics processor continue to grow. 1999 is the year when graphics processors with
integrated T&L engines can be sold at mainstream PC price points and create a compelling value
proposition for any PC user. Figure 8 graphically shows the growing role in the last few years of the
dedicated graphics processors for mainstream PCs. The complete graphics pipeline is now computed
by the graphics processing unit, hence the term GPU.
Now, what influence the amount of work the GPU must do ... well, resolution is one. Because it has to calculate the position and intensity of each pixel on the screen. At 640x480, it only needs to worry about 307,200 pixels, as well as each surface to map textures, etc. etc. but at 1920x1200 it needs to perform calculations of 2,304,000 pixels so you answer the question, if you run at 307,200 pixels then repeat the exact same run on the exact same GPU at 2,304,000 pixels which case will it take the GPU longer to finish one frame? (this should be a no-brainer question and hence rhetorical). What else influences the GPU? Antialiasing, because the GPU is now interpolating based on oversampling adjacent pixels, new attributes to smooth out edges and provide new details, adding more complexity to the workload.
So it makes sense -- if I change resolution and the frame rate changes, then over that span of resolution it must be GPU limited because I am changing the time it takes for the GPU to finish, hence it will affect the output if it is truly GPU limited.
So how can we test for CPU limited runs? Well, if the CPU is the limiting factor and I can change the amount of time it takes for the CPU to finish it's task, then it should affect the results? No? Of course it should.... so if I have a CPU that finishes its work in x time and try a different CPU that finishes in y time such that y < x then the FPS should change (get better) if it is CPU limited to begin with. At low resolutions -- where the GPU is not taxed, you see that difference easily on most mid-range to high end cards, i.e. a 2.6 GHz Phenom will produce higher FPS than a 2.3 GHz Phenom, a 2.5 GHz Q9300 will have higher FPS than either of those because it is faster (multithreaded games included, just search the data). This is simply the way it is... Phenom is not a bad gaming CPU at all, it can very well support FPS higher than the refresh rate of the monitor... so don't take this as a diss on the Phenom, it is a fine CPU. But in the computer science of the question, Intel has the faster CPU.
Another way of looking at it... if I have a setup that is GPU limited, i.e. the GPU determines the frame rate, then no matter the CPU the output FPS is the same. Now, I stick in a faster GPU ... that is I upgrade and try the same game, resolution again but with a range of CPUs varying in capability/speed... ahhh, the FPS now changes, and is higher ... this is what happened with the 4870 X2, it is such a powerful card it now moved most all games at 1920x1200 full AA to a CPU limited scenario -- this is the legionhardware data.
This concept is basic and standard, and even taught in some computer science courses (a little time spent on the Google machine yields oodles of information). I will leave you with this.... a PPT of a computer science course designed specifically for graphics and gaming programming. When you talk of thread sychronization, you are completely incorrect that it has anything to do with the ability of multithread on a CPU, but has everything to do with the GPU rendering thread (which is on the GPU, proprietary to the GPU architecture, and peformed by the GPU) and the CPU threads. They communicate, as I have pound on and on about, through the command buffer.
http://www.cse.ohio-state.edu/~crawf...chitecture.ppt
Slide 13
Odd a computer science course at a major univeristy would say this don't you think ... actually, no... because it is absolutely true. In fact, you should study all the information in this PPT if you can (if you have power point, if not download a viewer!), it will enlighten you! (NOTE: I just found this today, but it nicely summarizes everything I have been trying to beat into your head). When you say something like "when someone talks about a GPU limited game, they don't know how computers work" you are saying that people who do this for a living do not know how it worksIf this command buffer is drained empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster!
If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited.... the professors name is on the title page of this PPT, email him if you like.
So when you see two CPUs, one obviously faster than the other, yielding the same or approximately the same FPS then you must conclude it is GPU limited and no matter what you do, you will never get above that FPS under the stated conditions ... you can make it slower, just add a slower CPU until it becomes CPU limited, but you will never make it faster for that GPU, resolution, application combination.
Hence the GPU limited case of Enemy Territories (multithreaded game) on a 4870... the GPU is slower at rendering the frame than any of the CPUs to produce the necesary CPU side information:
So when you post information, like the lead post of this thread, showing multiple CPUs of different speed classes, yielding the same or roughly the same FPS -- you are going to get a chorus of 'GPU limited' rants like mine, because this is what is going on. Period.
I will address your other points in follow up posts.
Jack
Last edited by JumpingJack; 09-11-2008 at 11:29 PM.
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Bookmarks