I also want to bring your attention to potential AMD/ATI killer -- MOVNTDQA which comes with Penryn.
That instruction enables extremely fast reading from MMIO space which is always marked as USWC (uncacheable, write combining).
What does that mean? Well it means that instead of say 800 MB/sec readback from video card you will have 7,000MB/sec which is 9x speedup. That was measured with two threads and 1066 MHz FSB and it is very close to theoretical peak of 8.5GB/sec for the mentioned FSB speed.
This will IMO have a great (positive) impact on GPGPU applications. Main obstacle in GPGPU today is the fact that moving data to and from the GPU is slow. Especially readback is slow. MOVNTDQA should change that once GPU vendors optimize their drivers to use it.
Of course it can also be used to speed up disk access for huge RAID0 arrays, network I/O, etc.
If AMD CPUs don't get MOVNTDQA any time soon, I believe that they will have a serious problem. That problem will be called "ATI video cards working faster with Intel CPUs" -- irony at its best.
That brings us to the conclusion -- Intel has became very aggresive in promoting new extensions this time. You can already download Instruction Set Reference, SDK and even emulator to test your code. New compiler is in the works as we speak (currently it is at version 10.0.018 beta) and I expect final 10.0 release to have SSE4 support.
Everything I wrote here you can find
at this link.
Bookmarks