There's been a couple of reviews on Anandtech recently about new 6TB drives in consumer-class and enterprise variants: eg. http://anandtech.com/show/8263/6-tb-...ate-ec-hgst-he That review in particular disturbed me in that a consumer-class drive (the WD Red) was discussed alongside two enterprise-class drives as suitable for home NAS use with no mention of the high chance of read errors due to the sheer size of the beasts.
I want to copy here the maths I laid out in a comment on that article, and maybe provoke a discussion. Since this is an Xtreme forum, my guess is that many people may be thinking that bigger/newer is better, salivating at the idea of 6TB in a single package and building huge arrays, and thinking that a RAID6 will keep their data (relatively) safe. Or even thinking they are OK for use in a NAS in a mirror set, for instance. I'm not discussing the need for backups here, just the issue of keeping the array running and being able to rebuild in the case of drive failure. It's possible that running ZFS will mitigate this issue, but these drives will no doubt be used by misinformed users in normal RAID or mirror sets and that's where the danger lies.
The most important number for this discussion is the URE figure for a drive, or unrecoverable bit read error rate. For normal consumer-class drives this is quoted at 1 in 10^14 bits, and for enterprise-class drives it is 1 in 10^15 bits. This is partly why those drives (as well as being designed/marketed specifically for 24/7 array usage) are more expensive. As a first reference, read these great articles which lay out the core of the problem: http://www.zdnet.com/blog/storage/wh...ng-in-2009/162 and http://www.zdnet.com/blog/storage/wh...ng-in-2019/805
So here's the maths which (to me) shows that 6TB drives are completely crazy in consumer-class ranges.
6TB is approximately 0.5 x 10^14 bits. That means if you read the entire disk (as you have to do to rebuild a parity or mirrored array from the data held on all the remaining array disks) then there's a 50% chance of a disk read error for a consumer-class disk with 1 in 10^14 unrecoverable read error rate. Conversely, that means there's a 50% chance that there WON'T be a read error.
Let's say you have a nice 24TB RAID6 array with 6 of these new 6TB WD Red drives - four for data, two parity. RAID6, so good redundancy right? Must be safe! One of your disks dies. You still have a parity (or two, if it was a data disk that died) spare, so surely you're fine? Unfortunately, the chance of rebuilding the array without ANY of the disks suffering an unrecoverable read error is: 50% (for the first disk) x 50% (for the second) x 50% (for the third) x 50% (for the fourth) x 50% (for the fifth). Yes, that's about 3% chance of rebuilding safely. Most RAID controllers will barf and stop the rebuild on the first error from a disk and declare it failed for the array. Would you go to Vegas to play those odds of success?
If those 6TB disks had been enterprise-class drives (say WD RE, although there's no RE 6TB drive yet, or the HGST and Seagates reviewed by Anandtech) they would have a 1 in 10^15 unrecoverable bit read error rate, an order of magnitude better. How does the maths look now? Each disk now has a 5% chance of erroring during the array rebuild, or a 95% chance of not. So the rebuild success probability is 95% x 95% x 95% x 95% x 95% - that's about 77.4% FOR THE SAME SIZE OF DISKS. Obviously you've invested more heavily in better disks to achieve that, but the end result is that you have a better-than-even chance of completing an array rebuild if you lose a drive!
Note that this success/failure probability is NOT PROPORTIONAL to the size of the disk and the URE rate - it is a POWER function that squares, then cubes, etc. given the number of disks remaining in the array. That means that using smaller disks than these 6TB monsters is significant to the health of the array, and so is using disks with much better URE figures than consumer-class drives, to an enormous extent as shown by the probability figures above.
For instance, suppose you'd used an eight-disk RAID6 of 4TB Red drives to get the same 24TB array in the first example. Your non-error probability per disk full read is now roughly 65% (better since there are fewer bits being read), so the probability of no read errors over a 7-disk rebuild after a single failed drive is 65% x 65% x 65% x 65% x 65% x 65% x 65% or roughly 5%. Better than 3%, but not by much. The same calculation using 2TB disks for a 13-disk rebuild of a 14-disk 24TB array gives 8% - not much better, unfortunately. However, all other things being equal, using far smaller disks (but more of them) to build the same size of array IS marginally safer for your data.
Before anyone rushes to say none of this is significant compared to the chance of a drive mechanically failing in other ways during the rebuild, sure, that's an ADDITIONAL risk of array failure to add to the pretty shocking probabilities above. The bottom line for me is that anyone thinking of using huge drives and just adding extra redundancy drives to parity arrays, to make up for the fact they are using consumer-class drives, is ignoring the fact that their brag-worthy huge drives will likely fail and kill the array as soon as it is stressed during a full rebuild after any drive failures. Consumer-class drives are intrinsically UNSAFE for your data at these bloated multi-terabyte sizes, however much you think you're saving by buying the biggest available, since the build quality has not increased in step with the technology cramming the bits into smaller spaces.
I'm extremely disappointed that the Anandtech review completely ignored this fact and didn't recommend that the 6TB Red drives were a seriously risky proposition for the Home/NAS usage they were being reviewed for. Especially as the review even reported an array rebuild failure using 6TB WD Reds that could well have been such a bit read error, merely dismissing it as a possible "compatability issue" with the particular NAS. EDIT - Ganesh, the Anandtech reviewer, has since responded personally to me showing the error was not a URE on the rebuild, but a odd write error apparently happening afterwards as the rebuilt array was being tested.
Without a serious improvement in the build quality/URE figures for these huge drives, I feel the conclusion of the article should have been a very clear warning that these consumer-class 6TB drives should be avoided at all costs. Despite the obvious technological and engineering achievement in squeezing 6TB into standard-size HD packages, my gut instinct is summed up by the Jurassic Park quote: "your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should."