Quote Originally Posted by johnw View Post
I mentioned in my post that I put a ~40GB static file on the SSD. To be precise, it is 41992617078 Bytes (I imagine that is a typical amount of static data for a 64GB SSD) And Anvil's app and data is on the SSD. All settings in Anvil's app are the default, except that I checked the box about keeping running totals for GB written (option just added yesterday).

For reference, the md5sum of the 42GB file is: 0d1c4ec44d9f4ece86e907ab479da280
Alright, got a file ready for myself, weighing in at 42,022,123,868 bytes. C300 64GB should start tomorrow so long as UPS sticks to their delivery date.

I'm also going to test a SF-1200 drive now that the prospect of no-LTT has emerged (and if anyone wants to test a 25nm vs. my 34nm, let me know...easier to arrange testing and setup in pairs!). With a Sandforce back on the scene, I wanted to examine the compression settings in Anvil's app and see if any were suited to mimic 'real' data. With the discovery of the 233 SMART value, we can now see NAND writes in addition to Host writes, so if we can also write 'real' data we can kill two birds with one stone: see how long a drive lasts with 'real' use and how much the NAND can survive.

So what did I do?

First, I took two of my drives, C: and D:, which are comprised of OS and applications (C:) and documents (D:, .jpg, .png, .dng, .xlsx probably make up 95% of the data on it) and froze them into separate single-file, zero compression .rar documents. I then took those two .rar files (renamed to .r files...WinRAR wasn't too happy RARing a single .rar file) and ran them through 6 different compression algorithms: WinRAR Fastest RAR setting, WinRAR Normal RAR setting, WinRAR Best RAR setting, 7-zip Fastest LZMA setting, 7-zip Normal LZMA setting, and 7-zip Ultra LZMA setting. I then normalized the output file sizes.

Doing this created two 'compression curves' showing how my real data responds to various levels of compression. My thinking being that if any of Anvil's data compressibility settings had similarly shaped and similarly sized (after normalization) outputs, it would be a good candidate to use to mimic real data and allow the use of 'real' data with SF testing. Real data != 'real' data; 'real' data is just the best attempt to generate gobs of data that walk, talk, and act like real data. A great candidate would be a generated data set that had a compression curve between the two curves from real data, across the entire curve.

Click image for larger version. 

Name:	AnvilCompressionCurvesStart.png 
Views:	819 
Size:	22.5 KB 
ID:	116904

Once I had those curves mapped out, I made ~8GB files of each of the various settings with Anvil's app (0-fill, 8%, 25%, 46%, 67%, and 101%) and made curves for each of them.

All put together, they look like this:
Click image for larger version. 

Name:	AnvilCompressionCurves.png 
Views:	823 
Size:	93.3 KB 
ID:	116905

The green zone is where the potential candidates should show up. Only one candidate was in that range, however: 67%. Unfortunately, it fell out pretty aggressively with stronger compression algorithms. So I turned off the "Allow Deduplication" setting and generated another 8GB file and compression curve and it was a little better.

While dedicated hardware can be magnitudes more efficient than a CPU with an intensive task, I do doubt the SF-1200 controller's ability to out-compress and out-dedup even low resource LZMA/RAR (R-Fastest and 7-Fastest), so the left-most part of the green zone is a stronger green as I feel that's the most important section of the curve. Unfortunately, I don't have the ability to get more granular compression curves at the low-end (left side) of the curve, so I'll have to make do with overall compression curves with just an emphasis on the low-end.
Of all the data I have available it does look like 67% compression setting with "allow deduplication" unchecked seems to be the best fit for use as a 'real' data setting for when I start testing SF-1200. Hopefully anybody else who plans to test a controller with compression and deduplication will find this useful as well