I agree about write cache, but I was actually talking about read cache. In some environments, I've seen measurably better perf with read cache disabled. SQL Server is notorious in that way. Since it has it's own rich cache and manages its own read-ahead, etc, the odds are very small that a remote cache will somehow contain needed data that isn't already cached by SQL Server itself. If the cache was 100% transparent from a performance perspective, it wouldn't matter. Unfortunately, that's not the case for many implementations.
I guess I'm not 100% sure of this, but I'm pretty sure that in Windows at least, the cluster size also determines the minimum physical I/O read size, since clusters are the unit of on-disk addressability, not blocks. Apps that are sloppy about how they read files can benefit from larger clusters, because the data they need for the next N reads can already be in RAM. OS read-aheads tend to only get one cluster ahead, and may use an extra I/O, to avoid delaying the originally requested block.
I agree about the application side, but some related perf issues are OS oriented. Application start-up time, for example. For an app that requires multiple DLLs, if they are spread out among multiple equal-speed LUNs, the OS will issue parallel I/O requests; otherwise, if they're all on the same drive, the requests are serialized.
It can help for read-heavy apps with certain access patterns. By having all drives active on every read with a QD of 1, read throughput is maximized compared to issuing separate requests to each drive and waiting for them to finish. Of course, this assumes that caching is effective and that the per-device internal striping doesn't interfere.
Bottlenecking is only a problem with QD > 1; otherwise, in both RAID-4 and -5, you're always writing one or more data drives and a parity drive. One potential advantage of RAID-4 vs. RAID-5 on SSD is that you could use SLC for the heavily-written parity drive, which should help maximize array life.
With a RAID-0 or RAID-5 array, with QD =1, if my app issues a request that's a single strip in length or less, only a single drive is active. Only with higher QD or larger request sizes will multiple drives be active. I think this is why many desktop users don't experience performance improvements when they move to RAID: at any instant, they are only using a single drive.
+1. The Intel SSDs have ten internal channels, but Intel apparently still considers the details of how those channels work to be proprietary. I have some ideas on how to reverse engineer it, but as you said, it's time consuming.
Bookmarks