RAID And You (A Guide To RAID-0/1/5/6/xx)

**Serra** · 07-03-2007, 01:27 AM

Index:
Section 1: Overview
Section 2: How it Works 1 - The Basics
Section 3: How it Works 2 - More Advanced
Section 4: Side Notes
Section 5: Advantages (summary)
Section 6: Disadvantages (summary)
Section 7: Rules of Thumb
Section 8: FAQ (various)
Section 9: Future of RAID-0
Section 10: References / Further Reading

Section 1: Overview

RAID-0 is the simplest RAID level that exists, and runner up for the most misunderstood (see RAID-1 for that). The basic idea is elegant in its simplicity: all hard drives in the array work simultaneously to read and write all data. The primary advantage of this technique is obvious - namely that it can allow data read and write operations to complete more quickly than a single disk could without any disk space being lost or partitions needed. The disadvantages though become somewhat more technical in nature, and is why this RAID level is so misunderstood. The rest of this examination of RAID-0 will begin with the basics of RAID-0 then delving into the more technical details so as to provide an understanding of just what the disadvantages are and illustrate the many situations when a RAID-0 array is not the end-all, be-all of speed as many enthusiasts blindly believe.

To simplify my job, all examples used will assume an array with 3 hard drives. It is possible to create an array with only 2 (or for that matter more than 3), however it has been my experience that people need an example with more than 2 drives before they really understand, and 3 is the next most convenient number.

Section 2: How it Works 1 - The Basics
2a. Writing to the Array
2b. Reading from the Array

Fortunately for the author, the processes involved with RAID-0 are relatively easy to explain. This section is broken up into two main parts; how data is written to an array and how data is read from it (because a hard drive cannot do both simultaneously that situation will not be considered).

2a. Writing to the Array:
When your computer has generated data that it needs to write to a regular single disk, that data is simply laid down on the platter in its entirety. When data is to be written to a RAID-0 array however the data is instead broken into equal, fixed-size blocks by the array controller (either in hardware or software) before it is written. Once the data has been broken down into these fixed-size blocks the RAID controller simply sends them to each hard drive in rotating sequence until no blocks remain to be written. In this fashion each drive in our example 3-drive set needs only to do 1/3rd of the total work.

What happens next is best illustrated by example. Lets say you have a piece of data which the controller has broken into blocks A - I and you have 3 disks in your array. The result would look something like this:

Disk 1: A , D , G
Disk 2: B , E , H
Disk 3: C , F , I

And so voila - you have just cut the time it takes to write the data into one third versus a single disk! (or so it appears - be sure to read on)

2b. Reading from the Array:
Reading from the array works much like writing - each disk simply reads off what parts of the whole data set that it knows and the RAID controller puts them back together. Using the same example as above, Disk 1 would read blocks A, D, G, Disk 2 would read B, E, H, and so on. Again, much like above you *appear* to have completed the same operation as a single disk would in one third the time (but as above, be sure to read on - it is not necessarily as it seems).

Section 3: How it Works 2 - More Advanced
3a. Reading a "small" File
3b. Reading a "large" File
3c. Reading a "large" NON-SEQUENTIAL File
3d. Writing to the Array
3e. Reading & Writing from/to the Array (ie. uncompressing a RAR file)

Below you will find an explanation that goes into some of the finer details of RAID-0 read and write operations. This section strongly requires that you have read and understood the first How it Works section, as well as the introductory post about basic hard drive mechanics themselves. Unfortunately for the author once you dig down a bit into the mechanics of how a RAID-0 array reads data, you necessarily run into issues relating to the fact that read sizes and disk states greatly affect the outcome. This subsection must itself then be broken down into a number of different possibilities, and each examined independently. A summary will be provided later that lumps these details together however and will demonstrate how they will affect performance in the real world. Two cases that relate to both reading and writing from the array will be explained first.

Case 0: The array has spun down or is starting fresh
It is imporant to the operation of RAID-0 to note that in the event the array is either just starting up (ie. you're starting your computer and reading your OS from it) or has spun down (ie. you have not used your array in awhile and it spun down), the array will - on average - start up slower than any random single hard disk. The reason is fairly intuitive: if we assume that there is variation in spin up time among disks (and there is), the law of averages dictates that an array of multiple disks will probably have at least one disk that is slower than the other(s). The consequence of this is that because for an array to be useful all disks must be fully spun up, it will take longer on average for an array to spin up versus a regular disk AND the more disks there are in the array the longer (on average) this will take.

While this particular delay may not be too long itself, remember that the latency you experience is an accumulation of various events, and this will add into that equation.

This section assumes that disks are currently spun up, operating at full speed. See Case 0 above for an add-on about what would occur if they were not.

3a. Reading a "small" file
This is the most important scenario to consider when weighing the various RAID options in your mind. A "small" file for the purposes of this discussion is one which meets the following criteria:

The size of the file is small enough such that it would take a greater amount of time to reach the point where the data begins than it would to read the data itself once the head has reached the start point.

Now we will again use the law of averages. On average, lets say it takes your hard drives head 10ms to reach any random data point from any other random point on a disk. As more drives are added to your RAID array the law of averages suggests it is likely that one drive will reach the data faster than another, and the more drives there are in the array the more likely it becomes that there is a larger difference between the fastest and slowest drives in the array. It will now be given that a file cannot properly be read from a number of drives until all drives have arrived at the data points. As such, if we define a small file as one where it takes longer to get to the data point than it takes to read the data, then the logical conclusions are:
1. The largest source of delay in reading this file will be to get to the data itself and,
2. This delay will, on average, be longer in an array than on any given single disk and,
3. This delay will increase with the number of disks

3b. Reading a "large" SEQUENTIAL file
A sequential file is a file laid out on your disks such that if a disk requires blocks A, B, C, then they are laid out next to each other and no additional seeking is necessary after reading A or B, it can just continue reading. This occurs on well defragmented disks. This case is the reverse to the above. In this case we're assuming that a file is of such size that the time it takes to reach the data start point is trivial compared to the time it takes to read the data. Using similar reasoning as above, we draw two new conclusions:
1. The largest source of delay in reading this file will be to read the data itself and,
2. This delay will then, on average, be shorter in an array than on any given single disk due to the fact that multiple disks are breaking up the work and,
3. Performance will increase almost proportionately with the number of disks (be warned however that the minimum file size necessary for a large sequential read will increase at the same rate)

3c. Reading a "large" ***NON***-SEQUENTIAL file
This case is included for completeness only. As you may have realized, a non-sequential file is (unlike the above) a file that is fragmented, causing the heads of your disk to seek between file reads. In this case the amount of fragmentation will cause the seek latency issue to become a larger factor... if enough fragmentation occurs, you may even be looking at the issue described in subsection 3a.

3d. Writing to the Array:
Fortunately the cases for writing to the array are effectively the same as those for reading from the array. Just as with reading, seek time is the key component to the downside of RAID-0, which will otherwise scale fairly well (to a point) with the number of disks.

3e. Reading & Writing From/To the Array: (ie. unRARing files)
Another important note about the mechanics of RAID-0 is that it fails utterly when dealing with read/write operations (such as uncompressing a large file back to disk) as compared to a solution where both disks in the array are independent and one is used to read from and another is written to. The explanation for this behavior is simple: using the RAID array the operation will perform a portion of the read, then have to reposition heads to write a bit of data, then move the heads again, etc. very frequently and incurs a very large penalty from the seek time. Single disks, by comparison, have the freedom to simply read and write with no seek time hindrance.

Section 4: Side Notes
4a. When Drives go Bad...
4b. Software vs. Hardware RAID-0
4c. RAID-0 Performance Scaling With # of Drives

4a. When Drives go Bad...
You lose all your datas. There is no way to recover data from a failed array. And when you think about it, the chances of any one drive failing is FxN,where F is the chance of any one drive and N is the number of drives in the array... so the more drives you have, the more likely you are to lose all your data.

4b. Software vs. Hardware RAID-0
Because RAID-0 requires no logic beyond the division of data to be written into blocks of pre-determined size, this array type alone stands as a type which receives next to no benefit from a hardware controller versus a software-based controller. For example, an AMD Athlon 4000+ (single core) processor using software RAID over 3 drives only accumulates approximately 2-3% more utilization than the same rig with a hardware controller... and given that recent system advances have come so far since that point, the current difference should be less than 1-2% for heavy I/O.

4c. RAID-0 Performance Scaling With # of Drives
One of the things I have always hated seeing here is people who have RAID-0 arrays with 7 or 8 drives. Aside from the clear danger of disk failure, RAID-0 may scale well with a few drives, but much less so as you add more. For example, versus one disk and assuming theoretical maximums:

1 Disk = Baseline
2 Disks = 1/2 Time Decrease = 50% performance increase vs. 1 disk
3 Disks = 2/3 Time Decrease = 16% performance increase vs. 2 disks
4 Disks = 3/4 Time Decrease = 9% performance increase vs. 3 disks
5 Disks = 4/5 Time Decrease = 5% performance increase vs. 4 disks
6 Disks = 5/6 Time Decrease = 3% performance increase vs. 5 disks

And one must take into account the fact that as the number of drives increase, so too does the minimum size of the file required to be considered a "large" file. Add in the fact that overhead alone accounts for a few % of performance and you can see that past 3 disks your *theoretical maximum* increase is sitting in the low to mid single digit range.

Section 5: Advantages (summary)

In short,
1. Fastest "large" file reads versus single disk or any other RAID type
2. Fastest "large" file writes versus single disk or any other RAID type
3. Trivial CPU impact even with software RAID (and hence #4)
4. Cheapest array type per GB (no hardware cost, no drives lost to parity bits)

Section 6: Disadvantages (summary)

In short,
1. Poor "small" file read performance
2. Poor "small" file write performance
3. No fault tolerance - one disk dies, it's all gone

Section 7: Rules of Thumb
7a. When to use RAID-0
7b. When NOT to use RAID-0
7c. One Thing you *DO* Need

7a. When to use RAID-0
Use RAID-0 when:
1. You do not care about fault tolerance
2. You work with "large" files often, except
2a. do not use RAID-0 if your work will see files read from and written to the hard drive in the same operation
3. You want a boost in startup time for many OS's (but this applies to many RAID levels, not just 0)

7b. When Not To use RAID-0
1. You care that your data will be lost if any drive fails
2. You're just an average desktop wanderer - most desktop work involves very small files
3. You like to do video or other large file manipulation from and to the same source

7c. One Thing You *Do* Need
You *DO* need a good disk defragmenting program. RAID-0 suffers with fragmentation, so something beyond the simple Windows Defragment tool is highly suggested. My personal favorite is Diskeeper, but O&O is another widely used program.

Section 8: FAQ (various)

Q: Is it true you shouldn't mix hard drive types?
A: Yes. You will be constrained by the slowest drive in your array so it makes no real sense to mix two different hard drives with different properties

Q: What's better, a 10k rpm drive or a 7.2k rpm drive for RAID-0?
A: For some reason some people think this may make a difference. With RAID-0 as with anything else, a faster drive is always faster.

Q: What's better, a single 10k rpm drive or an array of 2x 7.2k rpm drives?
A: For small file/average desktop use, the 10k rpm drive will be better because it is actively better for small file work, but the larger the files get that you're looking to play with, the more the array will start to shine.

Q: Will a RAID-0 array help with game X or application Y?
A: I have neither the time nor inclination to look up and examine ever game/apps file usage patterns and thus cannot answer any type of question like this.

Q: Why don't you ever define a file size for "large"/"small"?
A: Because it's fully dependant upong the hard drive in use. The faster the drive is for seeking, the smaller a file will have to be to be considered "small" and the faster it reads the larger a file will have to be to be considered "large"

Q: Why don't you provide more benchmarks?
A: Right now, this guide is new and some are coming. With that being said, to truly illustrate all points of all RAID levels comparatively, someone needs to invent a new benchmarking program. Until that happens, there really isn't a great application out there eyt for that kind of thing because it's all so heavily dependant on usage patterns, fragmentation, etc. Even a users hardware setup (less with RAID-0, but more with RAID 1/5/6) can make a monumental difference with the existence of or lack of a few key code optimizations. When in doubt, test it out.

Q: How do I implement RAID-0 on my motherboard/OS?
A: Given the hundreds of motherboards that exist and dozens of operating systems, I have no intention of even trying to explore this issue. RTFM.

Section 9: Future of RAID-0

As drives shift from platter-based to solid-state disks (and as solid state disks improve in quality/speed), RAID-0 will correspondingly undergo a massive shift. With Solid State disks access time is a nonissue, eliminating the idea of "small" files, and as solid state disks contain no moving parts their failure rate *should* (one would hope) be noticeably smaller.

Section 10: References / Further Reading

Coming Soon.

Thread: RAID And You (A Guide To RAID-0/1/5/6/xx)

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Raid-0

Bookmarks

Bookmarks

Posting Permissions