Quote Originally Posted by fellix_bg View Post
Of course, this comes at the cost of totally available L3 size, as follows: 8 - (4*256K) = 7MB. That's the reason for the rather shy L2 per core, not because of the pure low latency design intentions.

Read it again: inclusive relationship with the L2 arrays!
The L3 still holds 8MB of data for quick access, which gets pulled into l2, then l1 when needed. The l3 is only there to provide a quick access point for the l2 to grab data, and the l2 has access to a pool of 8MB worth of data - when the l2 cache uses uses data from l3 thats a successful hit.

Regardless of the fact that data may be stored simultanously in the l2 and l3, each level 2 has access to 8mb of l3 (and if it is non dependent you could have all cores using the same data)