Nehalem actually steps one notch beyond the "native" quad-core design of K10, integrating the cache hierarchy more closely together, by making the L3 cache inclusive with the L2 arrays, thus allowing for shorter data-coherent update between the threads, just by picking an L3 cache line.
Of course, this comes at the cost of totally available L3 size, as follows: 8 - (4*256K) = 7MB. That's the reason for the rather shy L2 per core, not because of the pure low latency design intentions.
Read it again: inclusive relationship with the L2 arrays!Originally Posted by Cronos
Bookmarks