OK sorry ajaidev, but I think you don't really understand prefetching.
Prefetching is a method to predict which data the CPU will be requesting in the future based on its access patterns in the past. Simple prefetching mechanisms are next-line (prefetch sequential pieces of memory after a load/store req), stride-based (determine if there's a constant stride (eg 0, 4, 8, 12, 16...) and prefetch based off off the stride), and target-based (keep track of branches that cause misses).
Generally speaking, prefetch requests are the lowest priority accesses to memory, meaning that it will not "slow down the work that is done". However, there can be cases where a too-aggressive prefetching mechanism can evict cache lines that will be used in the future. For most applications, this problem is avoided through application profiling and tuning of the prefetching algorithm.
As a general rule of thumb, prefetching can improve the performance of most applications regardless of the amount of bandwidth available and even with "low latency" DDR. Remember that accessing "low latency" DDR3 is still ~50-70ns round trip, which translates to at least 150 cycles at 3GHz. Compare this to even the slowest 30 cycle L3. The fact that prefetching is even done from L3 -> L2 or L2 -> L1 (move items from lower level caches to higher level caches), which saves a "mere" 10 or so cycles, should clue you in on how important it is.
Bookmarks