Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

**JumpingJack** · 08-10-2008, 12:28 PM

Originally Posted by gosh

The only one that is trying to explaing facts and how the processor works is me. I haven't seen anyone here trying to explain.
Maybe you could explain WHY some of you think that the FSB isn't a problem even if there are thousands of links informing about this problem. Games is one type of application that has the highest demands on hardware. Maybe you could provide some test showing games on high res comparing AMD and Intel?
Using tests that is running games on low res, or games that isn't using advanced graphics is another area.

The repeated sentence "games are 100% GPU limited" is only one proof that the person doesn’t understand how the computer works. You can get close to 100% if something is very slow but you will never reach 100% when two or more parts is working to get the job done.

We have been explaining it to you until we are blue in the face. And please link up those thousands of links showing the FSB is a problem. What I read is the FSB is aging, and has lower bandwidth than a point to point solution --- typically trumpeted by AMD PR (hint, do not trust anyone telling you something that is trying to sell you something). FSB BW is only a problem when the workload demands more BW than is available, this is the case in many HPC and some server applications, but for desktop/gaming it is a non issue and the data shows that. AMD has BW limitations as well that can hold back performance, just look at the latency in a non-local transaction in their NUMA architecture.

Games work along two workloads divided between two computational resources. The CPU calculates the collision boundaries, AI, physics (which is changing) etc. The GPU is responsible for rendering the scene, plotting vertices, painting textures.

The GPU does this pixel by pixel, this is why it is called rasterization. Each pixel has it's color and intensity calculated based on the information presented to the GPU. The GPU acquires the data, stores it in memory on the card dedicated to the GPU, and paints the texture based on a complex set of 3D to 2D transformations, this is why GPUs are built the way they are... they do not need to virutalize memory, they do not need to branch predict, they just need to compute a lot of numbers. The shear volume of data is the reason GPUs design in such high BW memory interconnect, so high that it puts either AMD or Intel to shame. Nope, unlike your misconception that the threads reach out to the GPU for all it's work, the GPU is a standalone processor with (until recently) single fixed function purpose in mind... grab data as quickly as possible form video RAM (where it gets most of it's info) and render it to the frame buffer where the RAM dacs can then put it on the screen.

Thus, the load on the GPU is directly proportional to the total number of pixels, the speed or rate at which the frame buffer is updated, is dependent upon the quality and speed of the GPU. However, before the GPU can render that frame, it must have all the information about that scene finished... such as what the camera angle is, where each character is standing, what the model of the bad guys is doing in terms of animation -- this is the duty of the CPU. Conversely, the CPU cannot calulate it's next frame of information until the GPU has finished doing it's job.

So you have two computational sources, each working on their own set of data that one depends on the other to finish before it moves on....

Thus, if the GPU is waiting on the CPU -- a case in Lost Planet Cave due to all the models floating around, then it is CPU limited. The opposite is true for the GPU, if the GPU is full load crunching as fast as it can but the CPU is waiting on it to finish, then it is GPU limited.

This is not terribly hard to understand. At a resolution of 640x480 the GPU must shade 307,200 pixels (There is a reason there is an 'x' when they quote resolutions). However, at 1680x1050 the GPU must shade 1,764,000 pixels .. 5.74 times more pixels, multiply this times whatever oversampling you are doing and the computation demands become enormous. Demonstrating this is straight forward, run a game and measure the FPS from very low to very high resolution, but plot it against total pixels rendered, if the GPU is the one and only determinant of the output rate then the results should drop monotonically as the number of pixels increases.... however, if the GPU is not the determinant of the output result, all other things being equal... there should be no observed change in rate.

I use Lost Planet, it is my favorite for this because it has two scenes that push the envelop on either end. You have probably often read or heard that Snow is GPU bound and cave is CPU bound.

Lost Planet puts the latest hardware to good use via DirectX 10 and multiple threads—as many as eight, in fact. Lost Planet's developers have built a benchmarking tool into the game, and it tests two different levels: a snow-covered outdoor area with small numbers of large villains to fight, and another level set inside of a cave with large numbers of small, flying creatures filling the air. The former doesn't appear to be CPU-bound, so we'll be looking at the latter.

http://techreport.com/articles.x/14756/5

This is indeed true, run this game from 640x480 up to 1280x1024 and observe the Snow vs Cave behavior. Snow is clearly GPU bound as it responds monotonically to the load presented to the GPU via the resolution selection. It has NOTHING to do with threading the CPU, the load and owness is on the GPU with changing resolution all other things being equal. Cave on the other hand is clearly CPU bound, and consequently responds to strength of the CPU better.

QX9650 @ 2.5 Ghz (FSB 1333) for Lost Planets

Now, if we look at what a Phenom 9850 @ 2.5 Ghz (2000 MHz NB), again you can see the results are clear -- the difference is the at lower pixel loading (where the GPU is less taxed), it levels off for the snow condition .... this is where the Phenom's weakness really shows up.

Now, this is a multithreaded game, that scales just fine on both platforms with core count ... and, when raising the question about which CPU threads game code better most reviewers rightly look at the CPU bound cases otherwise using uber high resolutions you are extrapolating a statement about the CPU when the observation is actually dictated by the strength of the GPU -- which results in false conclusion and a bunch of people like you spouting junk throughout the web.

Explaining threading and how the CPU, memory, cache and the management is functioning is probably beyond your understanding, and would take much longer.

EDIT: Here are the screen dumps for the runs that generated those plots:
http://forum.xcpus.com/gallery/v/Jum...QX9650Screens/
http://forum.xcpus.com/gallery/v/Jum...PhenomScreens/

And the original article I wrote on this very topic:
http://www.xcpus.com/GetDoc.aspx?doc=12&page=1

jack

Thread: Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Thread Tools

Search Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions