Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

**JumpingJack** · 08-17-2008, 08:36 AM

Originally Posted by gosh

Question: Is there information about the difference in latency reading and writing data comparing Hypertransport and the Front Side Bus?
Hypertransport is designed to be a very fast point to point communication (if I am right). If this communications is very fast it could be one explanation why AMD is even or sometimes have a bit better numbers on some tests when they test single threaded games on very high detail and high resolution.
If communication to external hardware is similar on AMD and Intel then Intel should always win ins single threaded games if they are clocked the same even if the main bottleneck is the GPU. 6 MB L2 cache at 15 clocks that can be used for one core compared to 512 KB L2 cache on AMD does make a huge difference (I think more than 10% in all single threaded games) if processor performance are compared.

Gosh ....

You are confused a bit about the communication to and from the GPU on the different platforms.

Let's take getting a chunk of data from memory to the GPU (which does happen, just not the volume you think it is)....

AMD's hypertransport connects the chipset to the CPU, then the CPU to memory. Intel's layout connects the memory to the chipset then to the GPU. PCIe 2.0 is spec'ed to be it's own bus master, as opposed to earlier implementations which used DMA. On Intel's platform, the GPU has direct access to low level system memory for various data and the CPU simply writes the command buffer on GPU memory simply because it does not need to access the frontside bus to get to the memory data to begin with. The GPU is only one hop away from memory on the Intel platform, it is two hops away on AMD's.

In terms of the data the GPU gets, it is in fact very little, not enough to saturate the FSB or HT ... the CPU populates the command buffer on the GPU for the GPU to do it's action, all the other large data elements are precached onto the GPU memory (hence the reason GPU card makers keep upping memory, to keep pace with the large textures of todays games)

http://people.cs.uchicago.edu/~robis.../gpu_paper.pdf

The GPU is able to make calls to a certain window of the system’s main memory and is responsible for loading the data it will operate on into its own memory. The CPU directs this activity by writing commands to a command buﬀer on the GPU. On the old ﬁxed-function pipeline, these commands associated matrices with vertex data. Now, CPU commands could very well point to program code that will be fetched and executed on the GPU.

This is essentially your second misunderstanding, in that you are thinking that all the data for a game event is stored in system memory, it is not .. I provided you a link that showed the different usages of video memory per game, perhaps you did not realize that that was reporting the video memory on the card and not system memory. Not sure, but all the heavy duty data that is needed for rendering a level is first loaded into the GPU's local memory (textures, vertex data, etc.) this is why when you start a game it takes several seconds (20, 30 or even a few minutes) to load... it is transferring that data over the low BW bus (both HT and FSB are low BW compared to the memory BW of a GPU).

Even nVidia provides you the concept of the partition between main and GPU memory:

http://http.developer.nvidia.com/GPU...gems_ch28.html

The point is.... on an Intel platform the Graphics Memory Controller Hub (GMCH) provides one hop access for the GPU to memory. AMD's arrangement puts it as a two hop access ... if anything, Intel provides lower latency access for the graphics card to main memory, in either case ... it is irrelevant since all the texture and geometry data is loaded to video ram (with it's high BW interface) before run time.

This is moot regardless, because the volume of data needed by the GPU from main memory is very small, since all the data that the GPU needs is placed in the Vertex, Texture, Mesh, and other buffers on the local GPU memory. The rendering for the scene is done by the GPU via commands written to the command buffer.

If the bottleneck is the GPU, then AMD and Intel will tie +/- a few FPS just on noise of the measurement, single threaded, multithreaded -- it does not matter. Again, Gosh ... moving to parallel computational methods for gaming code or any code, will simply speed up the computational result than that to be had over a single thread. A single task application will always speed up if you can run segments in parallel over simple sequential execution. The trick, and challenge, of multithreading in gaming is the interdependency of segments on the other. This is why you see some speed up but not a 2x gain, for example, going from single to dual thread. This is really nothing more than an example of Amdahl's Law.

Intel produces a computational result (clock for clock) faster than AMD, and as such, the CPU depended code will finish faster single or multithreaded, hence Intel will be faster in games.

You are being fooled and misled by the forum posters who run their tests upto the GPU limit then you make an incorrect conclusion that it is somehow manefested in single/multithreaded. This happens all the time....

At low resoltutions, Intel wins by 20 to 30 to up to 50% clock for clock in gaming, but at high resolutions they show tied... this is again due to GPU bottlenecking the computation flow.

nVidia states it in their own words:
http://http.download.nvidia.com/deve...erformance.pdf

They show you a bottleneck flow chart that does exactly what we have been telling you.... to find the GPU bottleneck, vary the resolution, if the FPS varies it is the GPU or some component within the GPU pipeline, if not it is the CPU:

Varying or increasing the graphically important parameters in a game changes the computational workload on the GPU (NOT THE CPU). This is why, when one wants to assess the computational capability of a CPU to the gaming code, it is important to observe the CPU as the limiter (i.e. low resolutions) in order to make a statement on how well the CPU can handle the code of the game the requires the CPU (i.e. non graphical code such as physics, AI, boundary collisions, etc.)

Let's go back to your latency question.... it is MOOT, even if the FSB latency was 3x longer it would not make a difference.

At 200 frames per second, the GPU is busy rendering ~ 1/200 seconds or 0.005 seconds. This is 5 milliseconds, or 5000 microseconds, or 5,000,000 nanoseconds. Latency of even 200 nano seconds is a wink compared to the time the GPU is spending in it's calculation, even for a very high frame rate.

The more interesting question to ask is what architectural feature of the Core uArch is allowing Intel to perform so much better at executing gaming code vs AMD's solution?

Jack

Thread: Intel Q9450 vs Phenom 9850 - ATI HD3870 X2

Thread Tools

Search Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions