Index:
Section 1: Overview
Section 2: How it Works 1 - The Basics
Section 3: File Reading Models
Section 4: How it Works 2 - More Advanced
Section 5: Side Notes
Section 6: Advantages (summary)
Section 7: Disadvantages (summary)
Section 8: Rules of Thumb
Section 9: FAQ (various)
Section 10: Future of RAID-1
Section 11: References / Further Reading
Section 1: Overview
In the broad strokes, RAID-1 (or mirroring) is the simplest RAID form there is. It requires two disks which are basically exact copies of one another. The RAID controller simply duplicates the stream of data that was to be written to the array and sends one stream to each hard drive. This provides a number of advantages, such as full fault tolerance, potentially amazing data read speeds, and the ability to create the array without formatting... but the cost is that you lose half of your total potential storage capacity (one disk simply mirrors the other), it is not scalable in terms of the ability to add in more disks to increase capacity, and suffers a penalty in write speeds.
You may notice that throughout this thread many references are made to the complexity of RAID-1, yet I also just said that in the broad strokes it is the simplest RAID form there is. Both are true. The real complexity comes into play when you start to examine the various methods that exist to read data from a RAID-1 array that are not available in any other RAID model, which you will find explored in more detail in Section 3: File Reading Models (performance for various file sizes is in How it Works 2).
Section 2: How it Works 1 - The Basics
2a. Writing to the Mirror
2b. Reading from the Mirror
2a. Writing to the Mirror
As mentioned in the introduction, the basic writing mechanism is quite simple: the RAID controller simply duplicates the stream of data that was to be written to the array and sends one stream to each hard drive. Still, for a more graphical representation I'll draw a little diagram showing the result on the physical disks of the processor requesting data blocks A, B, C, D be written to the array (where the array is, as you'll recall from earlier sections, the name for the logical grouping of disks):
Disk 1: A, B, C, D
Disk 2: A, B, C, D
2b. Reading from the Mirror
Reading from the mirror is where RAID-1 becomes more complicated, and I will highly suggest reading the "File Reading Models" section for a more thorough explanation of the nuances and different methods that can occur. For now it suffices to say that there are two major models which can be used depending on the type of controller being used:
1. All data is read from only one disk
2. Data can be read from both disks
As you may have guessed, the second model has the potential to provide significantly better request processing.
Section 3: File Reading Models
3a. Single Disk
3b. Both Disks - Per Job Load Balancing
3c. Both Disks - Per "Stripe" Load Balancing
3d. Both Disks - Read Optimizations - Elevator Seek
3e. Both Disks - Read Optimizations - Shortest Seek First
The potential strength of RAID-1 comes entirely from the variety of read methods available. Some methods provide no benefit over a single disk solution, while others are able to handily beat the speed of a RAID-0 array
3a. Single Disk
The single disk read model is one you will find in many basic RAID chipsets and is solely responsible for the bad rap that RAID-1 receives from the enthusiast community. As the name implies, although both disks hold the same data, only one disk is used to read data from (and though some implementations change this disk over from time to time, the result is the same). This is a gross inefficiency and reduces read times to give approximately the same result as a single disk which is not in an array. As a small bit of consolation, it should be noted that even with cheap onboard RAID chipsets this particular RAID implementation uses less CPU time than any other.
3b. Both Disks - Per Job Load Balancing
Per Job load balancing requests read data from different disks based on various criteria (see below). For example, say you as a user were going to open two files simultaneously. Each file (or "job") is then read from a different hard drive. Already you can see how this is a significant improvement over the single disk model - instead of reading two jobs off one disk, one job is read off each disk and the total is completed in half the time. Another advantage to this type of Load Balancing is that it is a popular method employed by many software-based RAID card manufacturers, and may even be employed by many operating systems' software-based RAID arrays, yet offers no more a performance hit than 1-2% on the CPU. This makes it a very cost-efficient solution. The various criteria that controllers can use to determine how to assign jobs include (but are not limited to): disk queue length (common), elevator seek (see 3d, often used in conjunction with disk queue length), shortest seek first (also used with disk queue length), and basic round robin.
The "disadvantage" of this type of load balancing is that it doesn't often do much for a desktop user because most people do not encounter issues of a large read request interrupting smaller read requests in their day-to-day browsing.
It is notable that this type of load balancing is that it inherently lends itself to multi-user environments where one users large request will not make a second user wait a long time for his small request to be filled. If, however, you find yourself in a situation where you have one program that does intensive reading and it interrupts your small side-requests (or vice versa), RAID-1 with Per-Job Load Balancing may be the way to go for you.
3c. Both Disks - Per Stripe Load Balancing
For lack of a better name I have decided to call this Per Stripe Load Balancing, though in reality a mirrored array really has no need for "stripes" in the same way that RAID-0 and RAID-5/6 do. In this scenario both drives can team up for each request that comes in, with one drive reading (for example) all the odd blocks of data and one drive reading all of the even blocks. This method effectively allows for RAID-0 style read speeds for large file read jobs while ensuring that if one disk fails you do not lose all of your data. Like Per Job Load Balancing, Per Stripe Load Balancing can also make use of various optimizations such as Elevator seek and Shortest Seek First which actually overcome the RAID-0 small file seek issue, and can vault RAID-1 into the throne of the fastest disk reading solution.
3d. Both Disks - Read Optimizations - Elevator Seek
The best explanation of this optimization comes from Wikipedia, and rather than plagarize, I'll simply quote them:
This particular optimization is seen extensively in both software and hardware implementations. It does not really require any additional processing power and allows a RAID-1 two disk array to overcome the largest pitfall of a two-disk RAID-0 array - small file access time.Originally Posted by Wikipedia
3e. Both Disks - Read Optimizations - Shortest Seek First
I'll start off by stating that I myself have never specifically seen this implemented in RAID-1, but it does get spoken of and is probably in place somewhere. This particular algorithm is designed to provide the shortest access times possible by buffering requests and tying their positions to cylinder data. Each hard drive then selects jobs by choosing those jobs who are closes to the current read head, providing exceptional service time. The downside is that if requests continue coming in this method could starve requests for areas of the disk that are further from the head. This limitation is often overcome by micro-optimizations concerning when the head should service other requests. In theory this method has the potential to provide the shortest seek times of any optimization, but thanks to its complexity and limited implementation very little data about its actual performance exists.
Section 4: How it Works 2 - More Advanced
4a. Reading a "small" File
4b. Reading a "large" File
4c. Reading a "large" NON-SEQUENTIAL File
4d. Writing to the Array
4e. Reading & Writing from/to the Array (ie. uncompressing a large RAR)
Please Note: This section *requires* an understanding of Section 3.
4a. Reading a "small" File
Without implementing any optimizations, any type of non-optimized read pattern will yield the same result - performance about equal to a regular single disk. With that being said however, any non-integrated solution (meaning either hardware or software controller seperated from the motherboard) should implement at least an Elevator Seek algorithm, at which time performance with small files excels. In fact, RAID-1 is the only RAID level which performs better than a single disk on average with random seeks (with optimizations, of course). In my own testing (which was by no means conclusive) I found that just implementing elevator seek in software alone resulted in a 10% reduction in average seek time versus a single disk (and let's not forget that other array types are actively worse than a single disk).
4b. Reading a "large" File
Obviously the read scheme will make a difference here. Single Disk read models will not help at all, and on a per-file basis neither will per-job load balancing, but per stripe load balancing will result in 2-disk RAID-0 speeds for large file reads.
4c. Reading a "large" NON-SEQUENTIAL File
Take everything from 4a and 4b and add them together here. A proper per striple load balancing scheme with an elevator seek will provide not only RAID-0 like speeds for the sequential reading bits, but can also reduce seek time. It should be noted however that to make effective use of this requires a somewhat more sophisticated piece of controller logic to allow one drives heads to go to one piece of fragmented data while having the other drives heads go to a different piece of fragmented data and keep track of it all. In reality I have not seen any data to conclusively show me that any hardware was able to show a difference one way or another versus RAID-0 for a test like this.
4d. Writing to the Array
Writing to the array is where RAID-1 takes a hit. Because one disk will always write slower than the other and write job will not complete until both are done, it will on average take longer than a single disk would.
4e. Reading & Writing from/to the Array (ie. uncompressing a large RAR)
Like any other method there is simply no benefit to using RAID-1 for this purpose. This is something best left for using multipe single drives for.
Section 5: Side Notes
5a. When Drives go Bad...
5b. Software vs. Hardware RAID-1
5a. When Drives go Bad...
Your data is fully protected. You can have one of your two drives die on you and the other contains a full working copy of all your data. Some controllers allow a drive to fail without interrupting your service, while others will require a reboot. To rebuild the array simply find another hard drive and re-create it, generally there should be no formatting required. Rebuilding the array will take a fairly significant amount of time however, equivalent to however long it takes the new drive to write all the data that is on the old drive onto its platters.
5b. Software vs. Hardware RAID-1
While there is no specific requirement to use hardware RAID-1, there is a persuasive arguement. Software RAID-1 requires very few computer resources and has the ability to implement any form of optimization that the programmers could come up with without any real additional processing requirements. However, the main split comes in when you look at the type of load balancing that different methods offer. To date all software load balancing I have seen uses per-job load balancing, as do many cards (the Areca 1210, for example)... but if "per-stripe" load balancing is what you are looking for, a hardware RAID controller is almost certainly what you will need.
Section 6: Advantages (summary)
1. Complete single disk failure redundancy
2. Low cost (hardware controller not necessary, though read section 5b)
3. Potential for faster read access than a 2-disk RAID-0 array overall (see 4a, 4b)
Section 7: Disadvantages (summary)
1. High cost per GB (you lose 1/2 your potential storage capacity)
2. Write speed is lower than a single disk on average
3. Does not protect against data corruption
Section 8: Rules of Thumb
7a. When to use RAID-1
7b. When NOT to use RAID-1
7a. When to use RAID-1
Use RAID-1 when:
1. You want the fastest read speed you can pull out of two identical drives
2. You have a game or application that would greatly benefit from a reduction in seek time (lots of small file reading going on)
3. When data preservation is important to you
7b. When NOT to use RAID-1
1. When you care that you're losing half your storage capacity
2. When you will be using the array for lots of writing
Section 9: FAQ
Q: Should I just read the RAID-0 FAQ first because it's boring to repeat the same questions and answers?
A: Yes.
Section 10: Future of RAID-1
As SSD drives come out and hardware that can provide XOR calculations becomes more mainstream, RAID-1 will slowly die out. The fact is that as seek times stop becoming an issue and platter mechanics disappear the speed benefit of RAID-1 will disappear as well, and redundancy will likely be covered more by RAID-5 and RAID-6 arrays, which offer more storage capacity overall.
Section 11: References / Further Reading
http://en.wikipedia.org/wiki/Elevator_algorithm



Reply With Quote
Bookmarks