Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

**Boschwanza** · 08-18-2008, 07:17 AM

Originally Posted by JumpingJack

.... , only in a few cases can you see this really be a problem

Here is my theory

An that case can happen in games too. So why are we focusing on high resolution benchmarks? When your are at low resoultions and the prefetcher works well, the Core Architecture will always give you a bunch more FPS then an AMD thats why you will get higher average Frames, in fact there is no focusing on low fps. You wont see the results of the, i will name it, "latency hole" of a C2Q.

First you have to know that a gpu bound situation doesnt exclude a cpu bound situation.

So when your are at high resolutions a graphic card is like a frame limiter and there is more focusing on the low FPS, so where there are more latency holes, frames will drop much more then with a K10 and you will might get an better average score with the K10 because the better high fps score of a Core2Q are simply cutted off .

Thats what happend on the review of overclockersclub and World of Conflict, they used a graphic card which ran very early into a gpu bound situation and showed that a Phenom performed better at the cpu bound situations. By taking a better graphic card just forwards such a scenario to a higher resolutions.

Again: thats not always the case and in fact a very rare scenario, depending on the game and how its written. Jack provided us with very good Data has perfectly shown this and i realy appreciate his work.

And Jack, ist there a program which limits the frames software wise? Maybe you can check out my theory.. Thanks alot.

**JohnZS** · 08-18-2008, 09:12 AM

I believe the strange results which showed Phenom beating much faster QX9770 systems on the Overclockers club review were caused by a rather annoying BIOS bug on the X38/X48 based boards. Basically the PCI-E 2.0 sometimes does not allocate full bandwidth to a PCI-E 2.0 card.
There is a thread on this forum here.
Basically the fixed BIOSes Improve performance of PCI-E 2.0 cards which use a lot of the bandwidth
Core2 is much better than Phenom at the same clock and most Core2's are faster clocks than Phenoms so this whole debate is a non issue.
John

**RunawayPrisoner** · 08-18-2008, 10:19 AM

Originally Posted by JohnZS

I believe the strange results which showed Phenom beating much faster QX9770 systems on the Overclockers club review were caused by a rather annoying BIOS bug on the X38/X48 based boards. Basically the PCI-E 2.0 sometimes does not allocate full bandwidth to a PCI-E 2.0 card.
There is a thread on this forum here.
Basically the fixed BIOSes Improve performance of PCI-E 2.0 cards which use a lot of the bandwidth

If this proves true, then... maybe there's more to test, Jack. Don't you have a X38 board?

**ACE76** · 08-18-2008, 12:28 PM

Originally Posted by gosh

Do you have any test comparing AMD and Intel on high res and/or advanced settings using fast video cards?

I will soon...with phase change no less...I'm going to do the Intel system with a QX9770 and the AMD with a 9950BE...both systems will be tested with a 9800GX2 and then 4870X2 Crossfire...I have no doubt that the Intel system will destroy the AMD...I will use my 42 inch 1080p TV which has resolution of 1920x1080...Maybe after that, I will borrow my brother's Apple 30 incher to test 2560x1600 as well.

**Hornet331** · 08-18-2008, 04:53 PM

Originally Posted by ACE76

I will soon...with phase change no less...I'm going to do the Intel system with a QX9770 and the AMD with a 9950BE...both systems will be tested with a 9800GX2 and then 4870X2 Crossfire...I have no doubt that the Intel system will destroy the AMD...I will use my 42 inch 1080p TV which has resolution of 1920x1080...Maybe after that, I will borrow my brother's Apple 30 incher to test 2560x1600 as well.

woot, keep us updated on that.

**demonkevy666** · 08-18-2008, 05:34 PM

Originally Posted by demonkevy666

334 > 200

& not ?

200 = 200

gosh, said you can't clock AMD's L3 cache thats not true

Me and charged3800z24 both have gotten our 2.4ghz NB which is L3 cache speed.

=___________________________________________=

**JumpingJack** · 08-18-2008, 09:37 PM

Originally Posted by Boschwanza

Here is my theory

An that case can happen in games too. So why are we focusing on high resolution benchmarks? When your are at low resoultions and the prefetcher works well, the Core Architecture will always give you a bunch more FPS then an AMD thats why you will get higher average Frames, in fact there is no focusing on low fps. You wont see the results of the, i will name it, "latency hole" of a C2Q.

First you have to know that a gpu bound situation doesnt exclude a cpu bound situation.

So when your are at high resolutions a graphic card is like a frame limiter and there is more focusing on the low FPS, so where there are more latency holes, frames will drop much more then with a K10 and you will might get an better average score with the K10 because the better high fps score of a Core2Q are simply cutted off .

Thats what happend on the review of overclockersclub and World of Conflict, they used a graphic card which ran very early into a gpu bound situation and showed that a Phenom performed better at the cpu bound situations. By taking a better graphic card just forwards such a scenario to a higher resolutions.

Again: thats not always the case and in fact a very rare scenario, depending on the game and how its written. Jack provided us with very good Data has perfectly shown this and i realy appreciate his work.

And Jack, ist there a program which limits the frames software wise? Maybe you can check out my theory.. Thanks alot.

This is where research and study pay off

I will respectfully disagree and provide a rationale for my reasoning. Please take the time to read, I am wordy here.... but am using your post as a hook to provide more detail. I address your points at some point

Here is an example. First, a screen shot from the ATI demo toy shop:

The image above was taken at 1600x1200 resolution. Next the wireframes overlaid for the same scene but one at 1600x1200 and another at 1280x1024:

1600x1200

Study these two scenes carefully ... look at the trash cans for example... look at the eves along the walls, and the rain drops hitting the pavement. Take a game for example, increase the resolution makes the overall scene look better in terms of sharpness, but it does not change the 'blockiness' of the characters, objects, or world. Changing the resolution did not change a) the number of vertices (and consquently the number of polygons) and b) did not change the number of objects (such as rain drops, trashcans, etc).

The toy shop scene is quite educational ... and I chose it because it was the quick and easy way i knew off the top of my head to generate a wireframe rendition (I knew at one time how to do it in games too

)... nonetheless, increasing resolution does not change the 3D complexity of the vertices, but does of the textures and the total number of pixels that make up the frame. Second point to note is the total number of raindrops do not change, just their visual acuity, run the demo yourself (if you have an ATI card), watch it study it... if you want to change resolution, edit the sushi.ini file and change it there.

This is a good example because the CPU is responsible for various things, creating objects, calculating AI, translating objects based on the trajectory of physics, the colliding rain drops on the pavement etc. However, changing the resolution of a game does not change the loading that will be placed on CPU as it will still calculate the same number of rain drops, it will still calculate their respective position in space regardless... same in a game, changing the resolution does not affect the particles generated (unless you change the particles and physics options -- I always run my CPU test benches with the physics and particles at max

) or the number of bad guys in the game (the AI of which is CPU duty), etc etc.

Thus, resolution is irrelevant to the CPU -- only the objects, physics, and trajectory in 3D space matter. What does matter to the CPU is the state of the GPU ... i.e. is the GPU ready to take the next chunk of commands.

In fact AMD even did a presentation on this particular demo, the rain was the CPU limiter (due to so many drops I suspect coupled with the fact they use a Pentium 4 3.2 GHz

) http://ati.amd.com/developer/eurogra...onFestival.pdf. However, all things being the same -- the rain drops, in this case, will be the same in terms of number/density regardless if the resolution is 1024x768, 1280x1024, 1600x1200 or 1920x1200, the CPU will still be loaded with the same calculation burden.

Therefore, low resolution or high resolution, evaluating the CPU for executing the portion of gaming code specific to the CPU is not the issue. The nVidia article I linked up was very clear when looking for bottlenecks ... change the parametrics that affect the visual rendering and if the FPS varies with that change, the bottleneck resides inside the GPU pipeline somewhere. Conversely, if it does not change then the bottleneck resides on the host processor (in this case the CPU).

As such, the ability of the CPU to complete it's task is just the complexity of the code it is responsible for.... nothing more nothing less.

The GPU/CPU gambut are two different processors providing two different processing functions, where one result depends upon the other.... the GPU cannot render it's frame unless the CPU has finished providing the information for that frame, conversely the CPU cannot send a frame of information to the GPU until the GPU has finished it's prior obligation and will be ready to receive that information.

Last time I went to nVidia's technical documentation.... so let's turn to AMD instead, they describe in detail how the CPU and GPU work together in this document: http://developer.amd.com/gpu_assets/...ation_v1.2.pdf
Jump to the section 4.2 Host Programming Model Description where AMD (ATI) describes the CPU as the host controller. They discuss two methods by which the CPU interacts with the GPU, one is via writing directly to the GPU command buffer which is located in the VRAM on the graphics card, the other is to share a section of system memory (that both the CPU and GPU access). The GPU always reads, the CPU always writes -- it is a one way street. Let's consider the latter, the pull model.

The GPU sets the write pointer when it retrieves a section of the ring buffer, the CPU then reads the write pointer to know where to send the command packet, once sent it sets the read pointer that the GPU will then use for the next block of commands, and so forth and so on. The CPU / GPU do this via the graphics driver which rides beneath the directx api (the API programmers use as the universal launguage so they do not need to program to different architectures, hence API).

Now follow what happens, if the CPU does not finish in time to provide the next read pointer to the GPU what does the GPU do .. re-read the same block in the ring buffer? Nope, it waits until the correct pointer is set. Conversely, what happens if the GPU is late to the buffer and the CPU picks the same write pointer as before ... does it over write? Nope, it waits.... this is sorta that syncing thing that Gosh keeps going on about, except it has nothing to do with threading the processor but everything to do with the processor completing it's refresh of the frame correctly and timely.

When two competing computational resources must handshake, there is never the possibility that they always mate up temporally to complete the tasks at hand... and the only way to determine if one is bottlenecking the other is to vary the workload on one and observe the output.

In this case, it is easiest to simply vary the load on the GPU by changing the resolution ... the CPU will complete it's assigned workload within the same amount of time regardless of resolution so it will either give it's result to a ready GPU or have to wait for the GPU to become ready ... so simply changing resolution boils down to a simple observation, does the FPS vary if yes then GPU limited, if the FPS does not vary then CPU limited.

Now, the efficiency and capability of the CPU architecturally is a different question all together, but if one wants to compare the ability of a CPU to complete the gaming code for the CPU specifically, then you must observe it unhindered by the GPU ... i.e. lowest resolutions.

Now, in a follow up post, i will demonstrate to you how this is working with some other simple observations..... before I do, think carefully about the hypothesis...

If I run a game sequence at very low resolution then again at very high resolution, should I see a load change on the CPU since it will need to wait on the GPU? Hmmmm food for thought.

(Ohhhh.... I don't buy into the latency argument one second, rather in so far as how it might affect the CPU performance .. but not the ability to feed the command buffer... here is why ... lets make this easy, 100 FPS. At this frame rate it is taking the entire system 1/100 of a second to produce a frame, that 0.01 seconds.... latency on the order we are discussing is a few hundred nanoseconds... compare that .... 0.01 seconds to say 200 nanoseconds or 0.0000002 seconds ... it is a blink, it not even a blurb in the grand scheme of the time spent in computation, it has no effect. In terms of the Overclocker clubs data, they always run GPU limited in their games, from what I have read of their data sets -- as such, any CPU related conclusion based on their gaming data is bunk, meaningless take it all with a grain of salt ... my personal thought is that Intel's PCIe implementation sux coupled with the fact that 2 or 3 frames out 30 or 40 is almost in the noise).

Jack

**JumpingJack** · 08-18-2008, 10:24 PM

Ok, to demonstrate that the CPU can indeed be waiting on the GPU I ran the following experiment.

I ran the Lost Planet Performance script in two regimes, the first was at 1920x1200 with everything maxed out, and one GPU on the 4870 turned off (you can select this as an option in the lost planet setup screens), I then repeated the same run but at 640x480 with everything turned down to minimum and both GPUs turned on. In the former case I am demonstrating the GPU limited scenario and in the second case the CPU limited scenario)... during the runs I use Everest's logging utility to log the total CPU utilization as well as each individual core.

In the case of high res, max everything, the CPU should be waiting on the GPU on average, hence the total utilization should be lower compared to the low resolution case... these are the results:

(PARDON the TYPO's I am not gonna regenerate the plot)

For those interested if LP is a multithreaded game, here is the utilization core for core for the same run:

**Boschwanza** · 08-18-2008, 10:32 PM

Quick answer.. gotta go to work

Jack, i never said high resolutions will gives a bunch of more CPU utilization then high ress (if i got you right). My intention was do clear out that a Intel Core CPU will always give you a lot more overall fps in low res just because of their power when l2 cache and prefetchers does it best. And that this higher Frames will cuted off in high Ress just because the Card acts like a framelimiter and allows you to focus more on the low FPS.

To get away from the resoultion :

Lets take a game where the sweet spot is around 60 FPS (min FPS around 30 max FPS around 90). The AMD will max out the graphic card to around 60 FPS an Intel will go up to 90. Enable Vsync to 60 FPS and the overall higher FPS over 60 will be cutted off. So there is more focusing on the low FPS. Run the game with different CPUs (or Platform) and lets see which handle the critical situations better.

**JumpingJack** · 08-18-2008, 10:37 PM

Originally Posted by Boschwanza

Quick answer.. gotta go to work

Jack, i never said high resolutions will gives a bunch of more CPU utilization then high ress (if i got you right). My intention was do clear out that a Intel Core CPU will always give you a lot more overall fps in low res just because of their power when l2 cache and prefetchers does it best. And that this higher Frames will cuted off in high Ress just because the Card acts like a framelimiter and allows you to focus more on the low FPS.

To get away from the resoultion :

Lets take a game where the sweet spot is around 60 FPS (min FPS around 30 max FPS around 90). The AMD will max out the graphic card to around 60 FPS an Intel will go up to 90. Enable Vsync to 60 FPS and the overall higher FPS over 60 will be cutted off. So there is more focusing on the low FPS. Run the game with different CPUs (or Platform) and lets see which handle the critical situations better.

No I understood what you mean, I simply just disagree -- in that, resolution has nothing to do with it ... those prefetchers work just the same on the same code and load ... the CPU will crunch through the code as fast as it can unless it waits on the GPU, if that is the case then the CPU stalls - waiting. Low resolutions simply allow you to observe the CPU performance free of any GPU bottlenecks.

This is why you see the utilization drop for the high res case ... the CPU is simply waiting for the GPU most of the time on average.

Nonetheless, your response was a well thought out post and gave me a chance to bounce off some details.

EDIT: Your second point is a good experiment, check the CPU utitilization on the Phenom vs the Intel quad by artificially capping the FPS ... interesting idea... not sure if we will see a good signal, but might as well check nonetheless.

EDIT2: Completely unrelated -- but I am extra disappointed by the performance of the 4870 X2 in Lost Planet ... (it rocks on most everything else), but I took out my favorite gaming bench to study this stuff....

I read through AMD's 8.7 release notes, where they claim a 1.7x improvement in LP for x-fire, but this is not a good 4870 X2 ready driver... need to wait for 8.8.

jack

**JumpingJack** · 08-18-2008, 11:38 PM

http://http.download.nvidia.com/deve...orialNotes.pdf

This is probably the best paper to read (much a rehash of a link posted earlier) ... it goes through the history of the GPU as it developed and the slow migration of all rendering calculations moving off the CPU (early 1990s) to the GPU (2004).

It also goes into detail the relative functions of each computational resource, and makes very dramatic points about avoiding accessing GPU resources during rendering (i.e. the CPU should wait until the GPU is done)

....

As power and programmability increase in modern
GPUs, so does the complexity of extracting every bit
of performance out of the machine. Whether your
goal is to improve the performance of a slow
application, or look for areas where you can improve
image quality “for free”, a deep understanding of the
inner workings of the graphics pipeline is required.
As the GPU pipeline continues to evolve in the
coming years, the fundamental ideas of optimization
will still apply: first identify the bottleneck, by varying
the load or computational power of each of the units;
then systematically attack those bottlenecks with an
understanding of the behavior of the various units in
the pipeline.

This was written by an nVidia engineer. Interesting, even nVidia suggest varying the resolution, texture aliasing to identify bottlenecks.

jack

**accord99** · 08-19-2008, 12:06 AM

Originally Posted by Boschwanza

Give me an Intel Rig and i sure will

. And again the FSB does not limit bandwidth wise, it all depends on the prefetcher and how well its actually working.

The benchmark doesn't appear to be limited by the FSB, memory latency or cache size since the 2.4GHz/2MB L2/800MHz FSB E4600 beats the 2.33GHz/4M L2/1333MHz FSB E6550.

"Wrong" Data in the Cache (simplyfied) -> accessing memory -> high latency -> bad coherency -> bad performance

This isn't a prefetcher issue, it's an issue within the core itself. The Core 2 has some problems with data that isn't aligned to a nice power of 2 value, causing it to take far longer to load than if the data was aligned. Even if the Core 2 had an IMC, it wouldn't help at all.

**Boschwanza** · 08-19-2008, 01:41 AM

Originally Posted by accord99

The benchmark doesn't appear to be limited by the FSB, memory latency or cache size since the 2.4GHz/2MB L2/800MHz FSB E4600 beats the 2.33GHz/4M L2/1333MHz FSB E6550.

I never said that, again: FSBs physical latency stays always the same for an Intel Core no matter how large cache or fast the FSB is.

The lines, soldered on the board, provide a fixed latency, which the FSB protocol has to respect. Thats the weak point in my point of view. If the prefetcher catches the wrong data the whole Cache (no matter what size) is pollutet with sensless Data and a very very long access (through all the PCB layers) to the memory is necessary. In this Case the great advantage of the Core architecture (getting the data fast and very close to the processing unit) has turned upside down.

Jack, i just made a quick test with CoH and a 1950Pro, unfortunatly i cannot go higher with the resolution due to a 19 TFT

First shows Resolution 1280*1024 CPU settings max GPU setting max.
Second shows Resolution 800*600 CPU settings max GPU Settings low

I have alot more Action on Core 3 with higher GPU Setting

I'm confused

. Maybe you can check this out. Besides can you give me a script or program which records the cpu utilization ?

I´ll try to get more data later

**BenchZowner** · 08-19-2008, 01:42 AM

Jack, I'm tending to believe that your thoughts about Intel's PCI-Express 2.0 controller are correct.
When I did the testing for the Asus Striker II Extreme ( nFORCE 790i ) review I found out something weird.
On GPU Limited scenarios, the Striker II Extreme was getting higher framerates ( minimum, average and maximum ), about 2fps higher than the ones on Intel chipsets ( P35, P45, X48 ) [ all tests done with a single card ].
Then I turned the tables and set some CPU Limited settings with some games and saw the Striker lagging behind ( just a tad ), which is "reasonable" since "computationally" clock-per-clock the nVIDIA chipsets have been slower than Intel's ( however the nFORCE 790i did a good job closing this gap ).
So... in GPU Limited tests ( PCI-Express 2.0 Controller can be/become a factor here ) the nFORCE 790i proved to be faster than Intel ( ~2fps advantage )
in CPU Limited tests ( PCI-Express 2.0 Controller not playing a part due to low GPU usage, low load, etc ) the Intel chipsets were faster than the nFORCE 790i.

All tests were done with the PCI-Express frequency at 100MHz on all boards.

**billdavis** · 08-19-2008, 02:09 AM

dang this thread has gone a lot longer than i thought

good info JumpingJack

**Hornet331** · 08-19-2008, 04:50 AM

so all that -> amd is better at high res has nothing to do with the cpus itself, rather with the way how the different vendors implement the pcie 2.0 bus/controller.

**JohnZS** · 08-19-2008, 09:08 AM

Originally Posted by Hornet331

so all that -> amd is better at high res has nothing to do with the cpus itself, rather with the way how the different vendors implement the pcie 2.0 bus/controller.

Which has been addressed in BIOS updates for X38/X48 boards as discussed here (the BIOSes idiscussed in the thread are BETA's fresh from ASUS and in most cases are not released....yet but will be soon, however there are links in the thread to the BIOSes).
John

**Boschwanza** · 08-19-2008, 09:41 AM

I have done the CoH test again recording the utilization with everest.

Phenom 2.5 GHz / ATI 1950Pro / 2 Gigabyte Ram @ 1000 MHz

So resolution changes about nothing in terms of CPU utilization at least for CoHOF. I think the small differences are within measurement tolerances. I will try Call of Duty next.

Call of Duty 4

I have choosen the Mission where are you heading to the ship in the ocean b/c its very script based.

Call of Duty acts like Jack says, cpu utilization decreases while resolution goes up. There is something odd, during the intro sequence to the mission at 640*480 there is almost no utilization (7%) if you turn to 1280*1024 this suddenly turns to about 25 % i re run this test a couple of times giving me exact the same results in every run.

World in Conflict Next

World in Conflict

This is quite interesting, during the atomic bomb attack and the bomber raid the CPU Utilization is increasing .

Edit:

Just played around a bit with WiC comparing "very high" settings with 1280*1024 and 800*600

Well, its hard to draw an exact conclusion. During the bomber incoming i gained the most fps in 800*600 in comparision to 1280*1024, so in this particular case, Jack is right. But when the real action starts (atomic bomb & bombing raid) somehow the utilization increases so there has to be more calculation or whatever. I gotta think about this "issue". So far good night

**gosh** · 08-19-2008, 04:44 PM

Originally Posted by JumpingJack

Changing the resolution did not change a) the number of vertices (and consquently the number of polygons) and b) did not change the number of objects (such as rain drops, trashcans, etc).

I must admit that I didn’t know that games was done like that. I thought that it was possible to set the viewable area and this wasn’t static and that the area increased a bit for higher resolution (don’t to much gaming..). If the area used for paint objects don’t increase with higher resolution then the burden for the processer obviously isn’t changed (only more detail will add work but not that much I think). These game programmers may not be as good as you think they are because that was a bit off. That would also mean that the game might not look as good on 30” as lower resolutions if textures etc is stretched and of course it looks worse on low res.

About the FSB and speed so what I have noticed is that this isn’t change for long streams of data. But if same amount of data is sent in small pieces and different directions then it gets slower. And that could be one explanation why performance isn’t changed as much modifying the FSB clock or how it is done. If the viewable area is the same then it would be easier to control the data sent to the gpu and that could make it easier to send the whole wireframe (all dots in the 3d space) in one transaction (or few transactions). And then send the commands needed for inform how to paint.

Is it possible to test Race Driver Grid?

**gosh** · 08-19-2008, 04:57 PM

Originally Posted by RunawayPrisoner

Uhm... basically, I'm wondering why on the QX9650, there was always 2% of the time that fps was between 25 to 40. If you look at the screenshots again, on the QX9650, although fps is sky rocketting, 2% is always between 25 to 40. On the Phenom system, upon overclocking to 3GHz, it's always over 40. And minimum was at 45.

Is FEAR single threaded?

Testing a game that is using more threads (more memory, more details) will probably find if AMD doesn’t slow down as much.

**xPliziT** · 08-19-2008, 04:57 PM

Jack

wow long post.

A couple of questions....

Using your example scenario where the cpu writes data pointer to memory and the gpu takes this data to further process it.
Assuming the gpu is slower getting the data from memory than the cpu is writing.

You said:
Conversely, what happens if the GPU is late to the buffer and the CPU picks the same write pointer as before ... does it over write? Nope, it waits....

Doesnt this mean that when the CPU waits that the particular task which writes the data to memory is idle during that short period of time?
Wouldnt it mean that overall in such case you should see a slight drop in CPU utilization?

You said:
If I run a game sequence at very low resolution then again at very high resolution, should I see a load change on the CPU since it will need to wait on the GPU? Hmmmm food for thought.

I guess yes see above.

Some other thoughts:
CPU limited in this case would probably mean that the CPU calculations are not fast enough producing results for the task which writes the result into memory.

Also dont underestimate changing resolution on the gfx as well as detail level does affect GPU performance. So in worst case you could drift from an GPU bound to an CPU bound case. e.g. GPU is just a little quicker in reading the CPU data where the GPU operates at low resolution.
When resolution is suddenly set to high the GPU cannot grab the new memory data as quick as before because it is suddenly framebuffer bandwidth limited and now the CPU provides the data quicker than the GPU can process it.

**JumpingJack** · 08-19-2008, 04:59 PM

Originally Posted by gosh

Is FEAR single threaded?

Testing a game that is using more threads (more memory, more details) will probably find if AMD doesn’t slow down as much.

Yes, fear is single threaded. I will test it for you.

jack

**JumpingJack** · 08-19-2008, 05:04 PM

Originally Posted by Boschwanza

I have done the CoH test again recording the utilization with everest.

Phenom 2.5 GHz / ATI 1950Pro / 2 Gigabyte Ram @ 1000 MHz

So resolution changes about nothing in terms of CPU utilization at least for CoHOF. I think the small differences are within measurement tolerances. I will try Call of Duty next.

Call of Duty 4

I have choosen the Mission where are you heading to the ship in the ocean b/c its very script based.

Call of Duty acts like Jack says, cpu utilization decreases while resolution goes up. There is something odd, during the intro sequence to the mission at 640*480 there is almost no utilization (7%) if you turn to 1280*1024 this suddenly turns to about 25 % i re run this test a couple of times giving me exact the same results in every run.

World in Conflict Next

World in Conflict

This is quite interesting, during the atomic bomb attack and the bomber raid the CPU Utilization is increasing .

Edit:

Just played around a bit with WiC comparing "very high" settings with 1280*1024 and 800*600

Well, its hard to draw an exact conclusion. During the bomber incoming i gained the most fps in 800*600 in comparision to 1280*1024, so in this particular case, Jack is right. But when the real action starts (atomic bomb & bombing raid) somehow the utilization increases so there has to be more calculation or whatever. I gotta think about this "issue". So far good night

First, and foremost, thanks for taking the time to do this... data is always, always welcome

....

Yeah, there are cases in your runs where it is evident. At these lower resolutions, it is harder to see any effects... though the 1950 should throttle more than what I suspect...

I will repeat the very gross extremes I did with the 4870 X2 on the phenom and publish those as well.

Hey -- how are you benching COD4?

EDIT: Also, I just tested WIC -- and getting interesting results -- on CPU utilization. Did you use presets to set the high vs low?

**JumpingJack** · 08-19-2008, 06:30 PM

Originally Posted by xPliziT

Jack

wow long post.

A couple of questions....

Using your example scenario where the cpu writes data pointer to memory and the gpu takes this data to further process it.
Assuming the gpu is slower getting the data from memory than the cpu is writing.

You said:
Conversely, what happens if the GPU is late to the buffer and the CPU picks the same write pointer as before ... does it over write? Nope, it waits....

Doesnt this mean that when the CPU waits that the particular task which writes the data to memory is idle during that short period of time?
Wouldnt it mean that overall in such case you should see a slight drop in CPU utilization?

The answer is yes, which is why I forced the extreme situations in the Lost Planet case. As I argued above, increasing the resolution is a burden on the GPU not the CPU. As such, the scenarios is that if the CPU waits it stalls and does nothing, this should translate into lower CPU utilization ... but you gotta push it. Some of the data posted by Boschwanza shows the same effect, in other cases not .. but he correctly points out that it may be due to the limitation he has on the overall resolution.

CPU limited in this case would probably mean that the CPU calculations are not fast enough producing results for the task which writes the result into memory.

Also dont underestimate changing resolution on the gfx as well as detail level does affect GPU performance. So in worst case you could drift from an GPU bound to an CPU bound case. e.g. GPU is just a little quicker in reading the CPU data where the GPU operates at low resolution.
When resolution is suddenly set to high the GPU cannot grab the new memory data as quick as before because it is suddenly framebuffer bandwidth limited and now the CPU provides the data quicker than the GPU can process it.

Yeah... the details in the PDF from nVidia is a good, layman's general outline of how the GPU and CPU subsystems work... in their bottlnecking flow chart they pretty much outline what various parametrics affect the GPU loading.

Workload affecting the GPU
- Resolution
- Aliasing
- Filtering
- Vertex size
- Vertex code/instructions

The workloads that affect the CPU are not enumerated, but you can gather from other resources, even the link Gosh posted related to how the WIC development went...

Things that change the workload on the CPU
- Objects (bad guys, rain drops)
- Physics (particles)
- Character AIs

I have other papers, I will see if I can link up publically available PDFs of them, much of the stuff I get is not available online or if it is requires a subscription...

Here is one: http://www.digra.org:8080/Plone/dl/db/06278.34239.pdf

The general flow of a gaming program can be done as follows

1. Start of the frame
2. Poll the input devices
3. Calculate the physics for objects
4. Calculate the AI
5. Sound decomposition
6. Write frame information
7 repeat until exit

Each of a multitude of steps can be split into separate threads depending on the circumstance. For example, calculating the 3D translation of a falling rock can be done independent of establish the next motion of AI for a bad guy. Single threaded each segment must be carried out in order, and all segments must be complete before sending info to the command buffer for rendering...

Multithreading this can speed things up tremendously... this is why you see Lost Planet, for example, scale so well... UT3 scales to 3 cores very nicely, the 4th doesn't add much.

**JumpingJack** · 08-19-2008, 06:56 PM

Originally Posted by gosh

I must admit that I didn’t know that games was done like that. I thought that it was possible to set the viewable area and this wasn’t static and that the area increased a bit for higher resolution (don’t to much gaming..). If the area used for paint objects don’t increase with higher resolution then the burden for the processer obviously isn’t changed (only more detail will add work but not that much I think). These game programmers may not be as good as you think they are because that was a bit off. That would also mean that the game might not look as good on 30” as lower resolutions if textures etc is stretched and of course it looks worse on low res.

About the FSB and speed so what I have noticed is that this isn’t change for long streams of data. But if same amount of data is sent in small pieces and different directions then it gets slower. And that could be one explanation why performance isn’t changed as much modifying the FSB clock or how it is done. If the viewable area is the same then it would be easier to control the data sent to the gpu and that could make it easier to send the whole wireframe (all dots in the 3d space) in one transaction (or few transactions). And then send the commands needed for inform how to paint.

Is it possible to test Race Driver Grid?

Gosh -- Race Driver Grid is a hard game (at the moment) to test for a few reasons. This is not to say I won't, it will just take me some time.

First reason -- I just bought a copy of the retail, so i will want to use that instead of the demo I downloaded. This is more representative.

Second reason -- RDG does not have a way to reproducibly recreate the 'action' to make a good reproducible run. I really wish they would have put an 'instant replay save' feature into the game or some way to record a lap around the track. So far, all reviewers are using FRAPs to evaluate RDG by lapping once around the track.

Ok, so really two reasons... what I mean by this is that I will do it, but it will take a day or two to figure out a way to get a reproducible bench system going on this game. At which time I will produce a CPU stressing experiment like I did above.

On the large packet vs small packet --- this is true ... latency is the time between the initiation of a request for a packet to the actual receipt of that packet... like priming a pump, once the stream comes it comes. Several small packets, and request generate more accumulative latency than a large single block of data in one request.

jack

Thread: Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Thread Tools

Search Thread

Display

Bookmarks

Bookmarks

Posting Permissions