Can you define a "stutter". How do you measure it? Is it a subjective experience for the user or can you objectively measure it? Would that "threshold value" be different for different users?
I think there is a high correlation from what many would describe as a "stutter" and IOmeter's reported "maximum write response time". But there I don't think there is necessarily a linear relationship between IOPS and "stutter". I don't think you can say "at this IOPS, you get stutter" . For example you can use a regular HD, and have very low 4K random write IOPS, but it's not the same perceptible experience as "stutter" on a SSD.
I think best tests are the ones that test real usage patterns/conditions
Bookmarks