No I don't!
I have tried to explain in this thread.
Here is some of AMD strong points (what I think) in gaming and why those will show on more complex games. I am NOT a game programmer (that type of programming is very boring) but have some knowledge about DirectX (very little so I might be wrong here).
If a game has live graphics (faces that can show feelings, wind that can move trees etc) then processor need to calculate the picture. This will of course increase the burden for the processor but the cache used for that type of calculation is probably similar in AMD and Intel. They calculate points in a 3d space and pack them in memory blocks. When the points for one block has been calculated it s transferred to the video card. This is done by allocating video memory (I think that memory on the video card is some how mapped to one address space for the computer memory area). There are special commands for this. When the block has been allocated the calculated points are copied to the video card, maybe they use memcpy in C++ for this. When the points has been copied some sort of command is used that can work on those points. Video card communication has high latency so there can’t be too much requests between the video card and the processor, the bandwidth is high so if you can pack more data in blocks you will gain speed.
AMD is using hypertransport for GPU data. If this copying of memory travels using hypertransport these blocks will not hold back any memory transfers because memory transfers are done using the IMC. On Intel all these copying of points to the GPU needs to go through the FSB and that can sometimes block memory transfers and the latency will go up. If one thread needs that it will be slower and if another thread is waiting for that thread then it will also be slower. It’s like a chain reaction in worst case scenario.
In single threaded applications all operations are done in sequence. And there are no conflicts. Advantages using a different passage for one type of traffic don’t exist there. In multithreaded applications one thread could be used for sending data, other threads prepare buffers for the sender thread in order to do work in parallel. If they are using memory and the sender thread is sending data then there is one conflict on Intel and that will delay operations.
I think that live graphics need to send much more data to the video card because the picture needs to be calculated. In race driver grid for example, if there is a crash and smoke then you can se that the processor works harder and is probably more data that is sent between the video card and processor. If the game has parts in the picture that isn’t changed then it probably isn’t that much data that is transferred.
In complex games (live graphics) and the resolution goes up then the game probably is going to use more memory (more data needs to be calculated). That will make the processor use more data from memory instead of finding it in the cache (the cache on Intel is HUGE so it might need some very high and complex picture for this). If threads need to communicate more, smaller threads then this traffic will also increase the burden on the FSB for C2Q. If memory transfers to the video card are running, synchronization will have higher latency. On AMD threads talks to each other using the L3 cache.
When and how different system designs will be better depends on the game of course. But complex games that are calculating much of the picture, is using more than one thread to do this will make the AMD system design to show some of its advantages. Increasing thread count and system design will will shift more to favour AMD.
If you compare only processor speed than Intel is faster because of the cache. If AMD and Intel have the SAME FPS then it is something OTHER than the processor that makes the FPS equal and the answer to that can’t be more than what differs from AMD and Intel.
If you compare these designs you will also be able to se that there exist more bottlenecks on Intel. Raw processor power than Intel wins, but if the game does something special or there are big changes and much data needs to be processed or recalculated then this will need more time on Intel.
EDIT: About mapped memory and hypertransport
http://www.amd.com/us-en/assets/cont...docs/40546.pdf
Appendix B
AMD Family 10h processors support four write-combining buffers. Although the number of buffers available for write combining depends on the specific CPU revision, current designs provide as many as four write buffers for WC memory mapped I/O address spaces. These same buffers are used for streaming store instructions. The number of write-buffers determines how many independent linear 64-byte streams of WC data the CPU can simultaneously buffer.
Having multiple write-combining buffers that can combine independent WC streams has implications on data throughput rates (bandwidth), especially when data is written by the CPU to WC memory mapped I/O devices, residing on the AGP, PCI, PCI-X® and PCI Express® buses including:
•Memory Mapped I/O registers—command FIFO, etc.
•Memory Mapped I/O apertures—windows to which the CPU use programmed I/O to send data to a hardware device
•Sequential block of 2D/3D graphic engine registers written using programmed I/O
•Video memory residing on the graphics accelerator—frame buffer, render buffers, textures, etc.
HyperTransport™ Tunnels and Write Chaining
HyperTransport™ tunnels are HyperTransport-to-bus bridges. Many HyperTransport tunnels use a hardware optimization feature called write-chaining. In write-chaining, the tunnel device buffers and combines separate HyperTransport packets of data sent by the CPU, creating one large burst on the underlying bus when the data is received by the tunnel in sequential address order. Using larger bursts results in better throughput since bus efficiency is increased.
[...]





Bookmarks