That is, in principle, the huge advantage of the C2D. It loads the data early enough to perform efficiently, so the application can count immediately, without accessing the memory .
This principle can get problems with SMT and crashes with ongoing data transfer (depending of the type of data transfer , its important how much data has to be calculated). In terms of streaming Data K10 wins, in terms of ex. video encoding there is more calculating and the C2D takes advantage of its huge L2Cache an the mighty prefetcher in the background.
Now here comes the problem: with SMT and 4 (or more) Threads the prefetcher has problems finding out which Data is needed an works partially effective. Does the prefetcher well the result is very good and L2 and prefetcher works great.
But when something unexpected happens a very long memory access is needed and the whole structure collapses while k10 still can handle the coherency due to the shared L3 Cache and even if the Data is not in the L3 Cache K10 can loads the Data 3 times faster then C2Q. Again : this will happen only with massive SMT .
Bookmarks