Quote Originally Posted by kl0012 View Post
1 MB per core? It's not funy at all. After all you don't buy such cpu for single thread performance. If I remember it correctly, AMD estimation for specfp was 38% more performance for 50% more cores.
Also even bigger cache can't help to apps with streaming data. We can look at benchmarks from techreport:
How is it compiled?
How does the code look like?

Reading one byte on each read compared to read 8 byte each read, and you will se huge differences. Align reads for cache lines (the size for each line) will improve speed.
Or why not SSE2 optimize it.

You just can't take one test and say "this is how fast it is". bad code isn't fast.

L3 cache is shared among cores.