Alright, got a file ready for myself, weighing in at 42,022,123,868 bytes. C300 64GB should start tomorrow so long as UPS sticks to their delivery date.
I'm also going to test a SF-1200 drive now that the prospect of no-LTT has emerged (and if anyone wants to test a 25nm vs. my 34nm, let me know...easier to arrange testing and setup in pairs!). With a Sandforce back on the scene, I wanted to examine the compression settings in Anvil's app and see if any were suited to mimic 'real' data. With the discovery of the 233 SMART value, we can now see NAND writes in addition to Host writes, so if we can also write 'real' data we can kill two birds with one stone: see how long a drive lasts with 'real' use and how much the NAND can survive.
So what did I do?
First, I took two of my drives, C: and D:, which are comprised of OS and applications (C:) and documents (D:, .jpg, .png, .dng, .xlsx probably make up 95% of the data on it) and froze them into separate single-file, zero compression .rar documents. I then took those two .rar files (renamed to .r files...WinRAR wasn't too happy RARing a single .rar file) and ran them through 6 different compression algorithms: WinRAR Fastest RAR setting, WinRAR Normal RAR setting, WinRAR Best RAR setting, 7-zip Fastest LZMA setting, 7-zip Normal LZMA setting, and 7-zip Ultra LZMA setting. I then normalized the output file sizes.
Doing this created two 'compression curves' showing how my real data responds to various levels of compression. My thinking being that if any of Anvil's data compressibility settings had similarly shaped and similarly sized (after normalization) outputs, it would be a good candidate to use to mimic real data and allow the use of 'real' data with SF testing. Real data != 'real' data; 'real' data is just the best attempt to generate gobs of data that walk, talk, and act like real data. A great candidate would be a generated data set that had a compression curve between the two curves from real data, across the entire curve.
Once I had those curves mapped out, I made ~8GB files of each of the various settings with Anvil's app (0-fill, 8%, 25%, 46%, 67%, and 101%) and made curves for each of them.
All put together, they look like this:
The green zone is where the potential candidates should show up. Only one candidate was in that range, however: 67%. Unfortunately, it fell out pretty aggressively with stronger compression algorithms. So I turned off the "Allow Deduplication" setting and generated another 8GB file and compression curve and it was a little better.
While dedicated hardware can be magnitudes more efficient than a CPU with an intensive task, I do doubt the SF-1200 controller's ability to out-compress and out-dedup even low resource LZMA/RAR (R-Fastest and 7-Fastest), so the left-most part of the green zone is a stronger green as I feel that's the most important section of the curve. Unfortunately, I don't have the ability to get more granular compression curves at the low-end (left side) of the curve, so I'll have to make do with overall compression curves with just an emphasis on the low-end.
Of all the data I have available it does look like 67% compression setting with "allow deduplication" unchecked seems to be the best fit for use as a 'real' data setting for when I start testing SF-1200. Hopefully anybody else who plans to test a controller with compression and deduplication will find this useful as well![]()




Reply With Quote

Bookmarks