Anvil, I've just switched the new app you sent me. I used the 100% uncompressible setting and things look very different. Avg ~ 42 MB/s. Those files in the first app must have been near 100% compressible.
Thanks for the explanation!
Looks like the file I/O operation activity as described basically coincides with the random/sequential metrics observed by hIOmon further down at the "physical disk" level within the Windows OS I/O stack.
I think that it's important to distinguish between I/O operations performed "logically" within a file and the I/O operations subsequently performed at the "physical device".
As you mentioned, writing to a file can consist of three successive "sequential" write file I/O operations (which together comprise the entire length/size of the file).
However, each of these three write file I/O operations can actually be to non-contiguous locations upon the device. And so, hIOmon observes the three write I/O operations to the device as "random" write I/O operations.
Filesystem (re)allocation of clusters can also be a factor - but I don't want to go off topic here.
In any case, many thanks to you, Ao1, and One_Hertz for undertaking this exercise!
That complicates things needlessly. Just how compressible is a "realistic load"? And how much does the Sandforce actually compress when given typical data? (not very much, I think). Better to stick with incompressible data so that the test is repeatable and we know how much is actually written to the flash.
Ao1,
Try using the Randomize compressibility, it would be very close to real life IO as it would be a complete mix of the different levels of compression that are available.
I'll switch to the new version shortly, it won't change anything for the Intels though.
The incompressible level you selected is actually 101% as it's totally incompressible. (it would result in a file of 101% of the original using 7Zip)
Overthere,
I can see that hIOmon sees it that way and I don't think it can do otherwise.
@johnw
There aren't that many files that are incompressible, mp3, jpg, png, raw image files from DSLRs are all examples of so called incompressible files.
Programfiles, dll's, documents, most temporary files, datafiles, windows log files are usually easily compressible.
Unless you are solely working with images, producing mpegs there is bound to be a lot of easily compressible data.
You'd be surprised, try making a backup image of your C drive using e..g Acronis TrueImage, it is very likely that it would be somewhere between 33-50% of the acual size.
Last edited by Anvil; 05-21-2011 at 09:50 AM.
-
Hardware:
I think 100% is the best test for NAND durability (and parity scheme, I suppose). But if you're entering a SF into the test, one of its big features that sets it apart is the compression and negating it doesn't represent what the SF can do (for most usage models).
I'd like to see both tested, of course, but the Call for MOAR in testing is always there
Take what you have on a drive, copy it to another, compress the files with the 2nd lowest level of RAR compression (lowest being 0 compression, just making a single file out of many), observe compressibility
I kept hIOmon running so the stats below reflect the data from both versions of the app, however straightaway the max response times have jumped up following the switch to non compressible data.
I believe the Max Control (4.69s) is a TRIM related operation (?)
MB/s is swinging from 20 to 40MB/s
Got to pop out for a bit, will switch more compressible data later.
AnandTech mentioned in the Vertex 3 preview I believe that all of their in-house SandForce drives had a write amplification of about 0.6x. So assuming a write amplification of 1.1 for incompressible data (seems reasonable since that's what Intel and other good controllers without compression/dedupe can achieve), as an OS drive it can compress host writes by about 45% on average.
I think it would be far more valuable to see how well the SF controller can deal with more realistic workloads like this than completely compressible or incompressible data. But just my $0.02.
Last edited by frostedflakes; 05-21-2011 at 09:53 AM.
I read that, too. The problem is that it is likely a bogus number, since Anand's drives get a lot of benchmarks run on them which write highly compressible (and unrealistic) data.
I think what you are talking about should be a separate experiment. You could put monitoring programs on computers to record months of writes, and then find a way to play those writes back and determine how much Sandforce can compress them.
But for this experiment, I think it is important to know how much data is actually written to the flash, and the only way to know that with a Sandforce drive is to use random data.
Random data also has the benefit of being a worst-case scenario for Sandforce, and I think that is what is most valuable to know, since then you can state with confidence that a more typical workload will likely result in a drive lasting AT LEAST X amount of writes. Anything other than random data and you have to make wishy-washy statements like, well, it might last that long, but if you write data that is less compressible, it will probably last a shorter time.
Last edited by johnw; 05-21-2011 at 10:05 AM.
I got the impression from the article that these weren't review drives. So the WA should be typical of what a regular user would see if they used it as an OS drive.
http://www.anandtech.com/show/4159/o...t-sf2500-ssd/2Thankfully one of the unwritten policies at AnandTech is to actually use anything we recommend. If we're going to suggest you spend your money on something, we're going to use it ourselves. Not in testbeds, but in primary systems. Within the company we have 5 SandForce drives deployed in real, every day systems. The longest of which has been running, without TRIM, for the past eight months at between 90 and 100% of its capacity.
You're right about it being a good indicator of worst-case durability, though, and we could still extrapolate about more ideal situations with more compressible data from this.
I read that, too. I still think you are wrong about those drives seeing "typical" use. Do you really think that the guys using those drives aren't playing around and running a boatload of benchmarks on them? I seriously doubt Anand's crew bought Sandforce drives and gave them to "typical" users to work with, without playing with them themselves first.
Here are the options in Anvils app. I could run at 100% incompressible for an equal time that I ran the 0 fill - I would then be at ~50%. I could then go to the 46% or I could just carry on with 100% incompressible?
I'm inclined to keep at 100% incompressible but if there is a consensus otherwise I can do whatever.
46% is much more realistic than incompressible, a mix would be interesting though.
Too many options with that drive
Added a graph to the first post, have a look at it, if thats going to work we need to agree on milestones where we give input.
-
Hardware:
Make the chart an XY rather than a line graph...any input pair on the X and Y axes will work, rather than set milestones.
Also, no need for the 3D effect
yes, impossible to have a 'control' when dealing with unknown compression factors/percentages.But for this experiment, I think it is important to know how much data is actually written to the flash, and the only way to know that with a Sandforce drive is to use random data.
"Lurking" Since 1977
Jesus Saves, God Backs-Up *I come to the news section to ban people, not read complaints.*-[XC]GomelerDon't believe Squish, his hardware does control him!
I think you are being imprecise -- you have just lumped together at least 3 questions, which is bad experimental procedure:
1) What exactly does this 46% data consist of, and how can someone duplicate it if they want to try to repeat Ao1's experiment?
2) Is the data that you call 46% a realistic representation of data that many people will be writing to their drives?
3) How much does the Sandforce drive actually compress the data you call 46%?
Dealing with (1) is a hassle, since the exact data specification would need to be written and posted, and anyone wanting to repeat the experiment would need to carefully duplicate the exact data.
Questions (2) and (3) are very difficult to answer, and even if they could be answered, I think we do not know the answers at this time.
Better to sidestep all of those issues and go with completely random data.
Depends on what question you're most interested in answering. If NAND durability is the question, 100% incompressible. If SSD durability is the question, definitely not 100% incompressible or 0-fill data. None of us know what the 'world average' is for data compressibility, but I bet we can all agree not at (or near) either extreme
With WinRAR, I took my C: and D: drives and looked at what kind of compressibility they have.
C: drive (Windows + applications) was able to be compressed to 55.2% of the original size with the fastest compression algorithm.
D: drive (documents + photos) was able to be compressed to 79.4% of the original size with the fastest compression algorithm.
Real world data is somewhat compressible with even lightweight algorithms.
So, yeah, if you're interested in seeing when the NAND dies out, do 100% incompressible and take the controller's dedup/compression out of it (although parity schemes will still be a factor). If you're interested in seeing how the controller can mitigate some NAND wear (relative to the X25-V which has the same NAND [right?] but different controller), test with the 46% (seems to be the least compressible, but still compressible, option).
I do agree that 100% incompressible with the SF controller is something that should be tested, but picking between 100% incompressible and somewhat-compressible data, I'd have to pick somewhat-compressible to be tested first.
But MUCH less compressible by a Sandforce controller than your test might indicate. I doubt the controller can have a power budget of more than 1 W. And it has to compress general data at a rate of hundreds of MBs per second in a single pass. Whatever algorithm it is using cannot be very good -- certainly not as good as even the fastest WinRAR algorithm.
Anand hypes the compression of the Sandforce controller, but I think he (and many others) greatly overestimate the compression that can be achieved on realistic data under such difficult conditions as real-time SSD data compression.
The question is, what do we want to know?
100% incompressible data is of course an option as long as the static part of the drive is filled with realistic data, e.g. a Windos 7 installation.
100% incompressible data is just not possible as long as the drive is used as an OS drive, OS and applications are roughly 50% compressible, possibly more if one uses the pagefile.
By using 100% incompressible data it would mean that the answer one would get is, the quality of the NAND used, the abilty of the SF controller to do wear-levelling of both static and dynamic data, (can't think of any others right now) but one of the main points with the SF controller would be totally lost, which is compression.
Of course I agree with you about what is a proper way to conduct such a test, which in short = knowing all the varialbles.
Real-Life just isn't like that, so we need a lot of SF drives to get the anwers we want.
To answer your question about 46%, it's just a level of compression that is a likely average for all files on your OS drive. (an SSD in particular, storage excluded)
From what I've found playing with compression on the SF 12XX controller there aren't that many "levels" of compression.
Meaning that when you reach a certain level of compressibility it doesn't matter if the data is e.g. 30 or 40 percent compressible, they are handled with the same speed.
We can't get all the answers using 1 SF drive so it's just a matter of making a decision about whats more important.
edit:
reading Vapors post...
---
We need more SF drives to get to the answers
Last edited by Anvil; 05-21-2011 at 01:17 PM.
-
Hardware:
The data may be "50% compressible" by a standard compression program, but it is unlikely that the Sandforce controller can achieve such a level. Possibly what you are seeing with your levels of compressibility test is an artifact of the data you are using. It is very difficult to achieve set levels of compression without mixing together random and highly compressible data in some pattern. But that does not necessarily model real data very well.
Here is a screen shot on a fresh run with the 100% non compressible option.
Anvil are you sure the xfers are 100% non compressible?
Bookmarks