Quote Originally Posted by gosh View Post
prefetch:

Reading one byte will not read one byte from memory, it reads a cache line (64 byte).
if the cpu will guess next it will load next cache line (64 byte)? that is going to trash cache very fast
You have a very narrow view on the prefetch mechanism. You're missing here that prefetchers dosn't prefetch data all the time but only when specific scenario (such as reading sequential data) was detected. Then there is an exellent chance that prefetched line will required later on. Indeed, prefetcher can make mistakes but after all we don't want to get rid of caches (because of high latency on cache miss) or branch predictors (because of pipeline flush on wrong prediction). Except that there is many different techniques to avoid unwanted side effects (such as cache trashing).
Your view on the cache architecture seems to me narrow too. Higher associativity leads to higher latency. So you can not make conclusion that more associativity is allways better. There is no clear winner here.

Quote Originally Posted by JumpingJack View Post
He is goading you ... you will not find that kind of information, nor does he know, caching policy and algorithms have never been disclosed to any level of detail that would satisfy a good answer to his question.

What he is trying to get you to admit that for sparse memory locality, large cache line fetches pollute the cache -- and Intel's prefetching sux rocks.
Yeh, now I see it.