Cache

  1. nn_step
    nn_step
    Misses per 1000 instructions [average for x86_64 on Linux with 8-way set associativity for all applications in the Debian source repository] rounded up

    Size Instruction cache Data Cache Unified cache
    16KB 3.81 46.57 52.89
    32KB 1.27 37.30 42.20
    64KB 0.76 22.38 25.32
    128KB 0.16 4.69 5.31
    256KB 0.04 1.06 1.19
    512KB 0.02 0.73 0.82
    1MB 0.02 0.71 0.81
    2MB 0.02 0.70 0.81
    4MB 0.01 0.69 0.80
  2. nn_step
    nn_step
    Thus the only logical conclusion is that 256K to 512KB is the optimal size of cache for a random application. Extrapolating the data suggests that 1MB will provide 0 to 10 percent performance advantage over 512KB, 2MB will only provide 0 to 7 percent performance advantage over 1MB and 4MB will only provide 0 to 5 percent performance advantage over 2MB.
    Real world comparisons such as Venice vs San Diego and Conroe-XE vs Conroe vs Allendale vs Conroe-L prove these estimations to be approximately accurate (+/- 1%).

    Thus should one design for many cores 32KB L1 data and instruction cache with a 256KB L2 is optimal but for server workloads 64KB L1 data and instruction cache with a 1MB should be approximately optimal.
  3. nn_step
    nn_step
    Exclusive versus inclusive:

    The advantage of exclusive caches is that they store more data. This advantage is larger when the exclusive L1 cache is comparable to the L2 cache, and diminishes if the L2 cache is many times larger than the L1 cache. When the L1 misses and the L2 hits on an access, the hitting cache line in the L2 is exchanged with a line in the L1. This exchange is quite a bit more work than just copying a line from L2 to L1, which is what an inclusive cache does.
  4. nn_step
    nn_step
    The advantage of inclusive caches is that the larger cache can use larger cache lines, which reduces the size of the secondary cache tags. (Exclusive caches require both caches to have the same size cache lines, so that cache lines can be swapped on a L1 miss, L2 hit). If the secondary cache is an order of magnitude larger than the primary, and the cache data is an order of magnitude larger than the cache tags, this tag area saved can be comparable to the incremental area needed to store the L1 cache data in the L2.

    In short for a system with 32KB L1 and a 256KB L2, Exclusive is the way to go but for a system with 64KB L1 and 1MB L2, inclusive is the way to go.

    A shared L3 however depends HEAVILY on the application and no reasonable recommendation may be made.
Results 1 to 4 of 4