Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

**JumpingJack** · 08-18-2008, 09:37 PM

Originally Posted by Boschwanza

Here is my theory

An that case can happen in games too. So why are we focusing on high resolution benchmarks? When your are at low resoultions and the prefetcher works well, the Core Architecture will always give you a bunch more FPS then an AMD thats why you will get higher average Frames, in fact there is no focusing on low fps. You wont see the results of the, i will name it, "latency hole" of a C2Q.

First you have to know that a gpu bound situation doesnt exclude a cpu bound situation.

So when your are at high resolutions a graphic card is like a frame limiter and there is more focusing on the low FPS, so where there are more latency holes, frames will drop much more then with a K10 and you will might get an better average score with the K10 because the better high fps score of a Core2Q are simply cutted off .

Thats what happend on the review of overclockersclub and World of Conflict, they used a graphic card which ran very early into a gpu bound situation and showed that a Phenom performed better at the cpu bound situations. By taking a better graphic card just forwards such a scenario to a higher resolutions.

Again: thats not always the case and in fact a very rare scenario, depending on the game and how its written. Jack provided us with very good Data has perfectly shown this and i realy appreciate his work.

And Jack, ist there a program which limits the frames software wise? Maybe you can check out my theory.. Thanks alot.

This is where research and study pay off

I will respectfully disagree and provide a rationale for my reasoning. Please take the time to read, I am wordy here.... but am using your post as a hook to provide more detail. I address your points at some point

Here is an example. First, a screen shot from the ATI demo toy shop:

The image above was taken at 1600x1200 resolution. Next the wireframes overlaid for the same scene but one at 1600x1200 and another at 1280x1024:

1600x1200

Study these two scenes carefully ... look at the trash cans for example... look at the eves along the walls, and the rain drops hitting the pavement. Take a game for example, increase the resolution makes the overall scene look better in terms of sharpness, but it does not change the 'blockiness' of the characters, objects, or world. Changing the resolution did not change a) the number of vertices (and consquently the number of polygons) and b) did not change the number of objects (such as rain drops, trashcans, etc).

The toy shop scene is quite educational ... and I chose it because it was the quick and easy way i knew off the top of my head to generate a wireframe rendition (I knew at one time how to do it in games too

)... nonetheless, increasing resolution does not change the 3D complexity of the vertices, but does of the textures and the total number of pixels that make up the frame. Second point to note is the total number of raindrops do not change, just their visual acuity, run the demo yourself (if you have an ATI card), watch it study it... if you want to change resolution, edit the sushi.ini file and change it there.

This is a good example because the CPU is responsible for various things, creating objects, calculating AI, translating objects based on the trajectory of physics, the colliding rain drops on the pavement etc. However, changing the resolution of a game does not change the loading that will be placed on CPU as it will still calculate the same number of rain drops, it will still calculate their respective position in space regardless... same in a game, changing the resolution does not affect the particles generated (unless you change the particles and physics options -- I always run my CPU test benches with the physics and particles at max

) or the number of bad guys in the game (the AI of which is CPU duty), etc etc.

Thus, resolution is irrelevant to the CPU -- only the objects, physics, and trajectory in 3D space matter. What does matter to the CPU is the state of the GPU ... i.e. is the GPU ready to take the next chunk of commands.

In fact AMD even did a presentation on this particular demo, the rain was the CPU limiter (due to so many drops I suspect coupled with the fact they use a Pentium 4 3.2 GHz

) http://ati.amd.com/developer/eurogra...onFestival.pdf. However, all things being the same -- the rain drops, in this case, will be the same in terms of number/density regardless if the resolution is 1024x768, 1280x1024, 1600x1200 or 1920x1200, the CPU will still be loaded with the same calculation burden.

Therefore, low resolution or high resolution, evaluating the CPU for executing the portion of gaming code specific to the CPU is not the issue. The nVidia article I linked up was very clear when looking for bottlenecks ... change the parametrics that affect the visual rendering and if the FPS varies with that change, the bottleneck resides inside the GPU pipeline somewhere. Conversely, if it does not change then the bottleneck resides on the host processor (in this case the CPU).

As such, the ability of the CPU to complete it's task is just the complexity of the code it is responsible for.... nothing more nothing less.

The GPU/CPU gambut are two different processors providing two different processing functions, where one result depends upon the other.... the GPU cannot render it's frame unless the CPU has finished providing the information for that frame, conversely the CPU cannot send a frame of information to the GPU until the GPU has finished it's prior obligation and will be ready to receive that information.

Last time I went to nVidia's technical documentation.... so let's turn to AMD instead, they describe in detail how the CPU and GPU work together in this document: http://developer.amd.com/gpu_assets/...ation_v1.2.pdf
Jump to the section 4.2 Host Programming Model Description where AMD (ATI) describes the CPU as the host controller. They discuss two methods by which the CPU interacts with the GPU, one is via writing directly to the GPU command buffer which is located in the VRAM on the graphics card, the other is to share a section of system memory (that both the CPU and GPU access). The GPU always reads, the CPU always writes -- it is a one way street. Let's consider the latter, the pull model.

The GPU sets the write pointer when it retrieves a section of the ring buffer, the CPU then reads the write pointer to know where to send the command packet, once sent it sets the read pointer that the GPU will then use for the next block of commands, and so forth and so on. The CPU / GPU do this via the graphics driver which rides beneath the directx api (the API programmers use as the universal launguage so they do not need to program to different architectures, hence API).

Now follow what happens, if the CPU does not finish in time to provide the next read pointer to the GPU what does the GPU do .. re-read the same block in the ring buffer? Nope, it waits until the correct pointer is set. Conversely, what happens if the GPU is late to the buffer and the CPU picks the same write pointer as before ... does it over write? Nope, it waits.... this is sorta that syncing thing that Gosh keeps going on about, except it has nothing to do with threading the processor but everything to do with the processor completing it's refresh of the frame correctly and timely.

When two competing computational resources must handshake, there is never the possibility that they always mate up temporally to complete the tasks at hand... and the only way to determine if one is bottlenecking the other is to vary the workload on one and observe the output.

In this case, it is easiest to simply vary the load on the GPU by changing the resolution ... the CPU will complete it's assigned workload within the same amount of time regardless of resolution so it will either give it's result to a ready GPU or have to wait for the GPU to become ready ... so simply changing resolution boils down to a simple observation, does the FPS vary if yes then GPU limited, if the FPS does not vary then CPU limited.

Now, the efficiency and capability of the CPU architecturally is a different question all together, but if one wants to compare the ability of a CPU to complete the gaming code for the CPU specifically, then you must observe it unhindered by the GPU ... i.e. lowest resolutions.

Now, in a follow up post, i will demonstrate to you how this is working with some other simple observations..... before I do, think carefully about the hypothesis...

If I run a game sequence at very low resolution then again at very high resolution, should I see a load change on the CPU since it will need to wait on the GPU? Hmmmm food for thought.

(Ohhhh.... I don't buy into the latency argument one second, rather in so far as how it might affect the CPU performance .. but not the ability to feed the command buffer... here is why ... lets make this easy, 100 FPS. At this frame rate it is taking the entire system 1/100 of a second to produce a frame, that 0.01 seconds.... latency on the order we are discussing is a few hundred nanoseconds... compare that .... 0.01 seconds to say 200 nanoseconds or 0.0000002 seconds ... it is a blink, it not even a blurb in the grand scheme of the time spent in computation, it has no effect. In terms of the Overclocker clubs data, they always run GPU limited in their games, from what I have read of their data sets -- as such, any CPU related conclusion based on their gaming data is bunk, meaningless take it all with a grain of salt ... my personal thought is that Intel's PCIe implementation sux coupled with the fact that 2 or 3 frames out 30 or 40 is almost in the noise).

Jack

Thread: Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Thread Tools

Search Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions