32 byte fetch can help with decoding long instructions, but K10 still limited by 3 x pipeline (Core has 4 x pipeline). It can't help in legacy code with short instructions.
But Core(tm) feature 64-byte fetch buffer wich can help short loops run faster (on any code).
Core(tm) is still better in almost all which is related to the memory subsytem.The load forwarding capabilities of K8 are quite deficient (none!) compared to Core 2 ( load forwarding already in pentium pro) which means that their inclusion in K10 will give an even bigger boost than Core 2 got from it. Too bad the clock rate is low and the cache is relatively small.
http://www.xbitlabs.com/articles/cpu...k10.html#sect0
As a result, we see that the memory subsystem of K10 processors has undergone some positive improvements. But we still have to say that it still potentially yields to the memory subsystem in Intel processors in some characteristics. Among these features are: the absence of speculative loading at unknown address past the write operations, lower L1D cache associativity, narrower bus between L1 and L2 caches (in terms of data transfer rate), smaller L2 cache and simpler prefetch. Despite all the improvements, Core 2 prefetch is potentially more powerful than K10 prefetch. For example, K10 has no prefetch at instruction addresses so that we could keeps track of individual instructions, as well as no prefetch from L2 to L1 that could hide L2 latency efficiently enough. These factors can have different effects on various applications, but in most cases they will determine higher performance of Intel processors.
Bookmarks