Raid Indepth

RAID AN INDEPTH LOOK

WHITE PAPER BY

SIVANESSEN E PILLAI ENTERPRISE SERVICESRedundant array of independent disks

In computing, a redundant array of independent disks, also known as redundant array of inexpensive disks (commonly abbreviated RAID) is a system which uses multiple hard drives to share or replicate data among the drives. Depending on the version (RAID Level) chosen, the benefit of RAID is one or more of increased data integrity, fault-tolerance, throughput or capacity compared to single drives. In its original implementations (in which it was an abbreviation for "redundant array of inexpensive disks"), its key advantage was the ability to combine multiple low-cost devices using older technology into an array that offered greater capacity, reliability, speed, or a combination of these things, than was affordably available in a single device using the newest technology.

At the very simplest level, RAID combines multiple hard drives into a single logical unit. Thus, instead of seeing several different hard drives, the operating system sees only one. RAID is typically used on server computers, and is usually (but not necessarily) implemented with identically-sized disk drivesHardware vs. software

RAID can be implemented either in dedicated hardware or custom software running on standard hardware. Additionally, there are hybrid RAIDs that are partly software- and partly hardware-based solutions.

With a software implementation, the operating system manages the disks of the array through the normal drive controller (IDE/ATA, SCSI, Fibre Channel, etc.). With present CPU speeds, software RAID can be faster than hardware RAID, though at the cost of using CPU power which might be best used for other tasks. One major exception is where the hardware implementation of RAID incorporates a battery backed-up write back cache which can speed up an application, such as an OLTP database server. In this case, the hardware RAID implementation flushes the write cache to secure storage to preserve data at a known point if there is a crash. The hardware approach is faster than accessing the disk drive and limited by RAM speeds, the rate at which the cache can be mirrored to another controller, the amount of cache and how fast it can flush the cache to disk. For this reason, battery-backed caching disk controllers are often recommended for high transaction rate database servers. In the same situation, the software solution is limited to no more flushes than the number of rotations or seeks per second of the drives. Another disadvantage of a pure software RAID is that, depending on the disk that fails and the boot arrangements in use, the computer may not be able to be rebooted until the array has been rebuilt.A hardware implementation of RAID requires at a minimum a special-purpose RAID controller. On a desktop system, this may be a PCI expansion card, or might be a capability built in to the motherboard. In larger RAIDs, the controller and disks are usually housed in an external multi-bay enclosure. The disks may be IDE, ATA, SATA, SCSI, Fibre Channel, or any combination thereof. The controller links to the host computer(s) with one or more high-speed SCSI, Fibre Channel or iSCSI connections, either directly, or through a fabric, or is accessed as network attached storage. This controller handles the management of the disks, and performs parity calculations (needed for many RAID levels). This option tends to provide better performance, and makes operating system support easier. Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.Both hardware and software versions may support the use of a hot spare, a preinstalled drive which is used to immediately (and almost always automatically) replace a failed drive. This reduces the mean time to repair period during which a second drive failure in the same RAID redundancy group can result in loss of dataExample:

Software RAID: Veritas Volume Manager from Veritas, Sun Volume Manager / Solstice Disk Suite from Sun Microsystems, Logical Volume Manager from HP.

Hardware RAID: Sun Storedge 3510 FC, Hitachi Thunder 9500 Series, EMC Claiion Series, HP EVA,VA and MSA.

Note: Please refer the Vendor specific document on the above said examples for more details.Standard RAID Levels:

We will study about the different types of RAID levels which are prevailing in the industry and also we will talk about the variuos advantages and dis-sdvantages of variuos RAID levels. Since the layout of RAID levels which would be implemented on the storage subsystem would have a signifiant impact on the overall performance of the system and application it is very crucial to understand these RAID levels in details.RAID Levels:RAID levels can classifed into following categories:

1. Standard RAID levels

2. Nested RAID levels

3. Properitery RAID levels

Standard RAID Levels:

RAID 0 :

A RAID 0 (also known as a striped set) splits data evenly across two or more disks with no parity information for redundancy. It is important to note that RAID 0 is not redundant. RAID 0 is normally used to increase performance.

A RAID 0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk. RAID 0 implementations with more than two disks are also possible, however the reliability of a given RAID 0 set is equal to the average reliability of each disk divided by the number of disks in the set.Since the file system at operating environemtn level is distriubuted across all the disks, the event of a sigle disk failure would result in file system corruption and loss of data. Hot swapping is not possible in this RAID level, since all the disks are dependant on each other for the data.

While the block size can technically be as small as a byte it is almost always a multiple of the hard disk sector size of 512 bytes. This lets each drive seek independently when randomly reading or writing data on the disk. If all the accessed sectors are entirely on one disk then the apparent seek time would be the same as a single disk. If the accessed sectors are spread evenly among the disks then the apparent seek time would be reduced by half for two disks, by two-thirds for three disks, etc., assuming identical disks. For normal data access patterns the apparent seek time of the array would be between these two extremes. The transfer speed of the array will be the transfer speed of all the disks added together.RAID 0 setup is useful for setups where the data is read-only and the downtime is not of important factor for business and where performance is mandatory during operation. This RAID level is popular gor gaming systems where performance is mandatory and the data is not important.

In the above depicted diagram, virtual blocks are divided into group of four and written into successive disks. These corresponding group of four blocks (VB000 VB011) are called a stripe. The number of cosecutive virtual disk blocks mapped to a single physical disk is called as stripe depth. Stripe depth multiples by the number of disk in called as a stripe size. Sometimes the stripe depth is often refered as a stripe element or segment size depending on the storage vendor. Hence one more defintion is stripe element multiplied by number of disks is called a s stripe size.

In the above figure, as you can see a single file which has record 000 to record 009 is splitted across three disks.Concatenation (JBOD)

Although a concatenation of disks (also called JBOD, or "Just a Bunch of Disks") is not one of the numbered RAID levels, it is a popular method for combining multiple physical disk drives into a single virtual one. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk.

In that it consists of an Array of Independent Disks (no redundancy), it can be thought of as a distant relation to RAID. JBOD is sometimes used to turn several odd-sized drives into one useful drive. Therefore, JBOD could use a 3 GB, 15 GB, 5.5 GB, and 12 GB drive to combine into a logical drive at 35.5 GB, which is often more useful than the individual drives separately.

Software RAIDs like veritas volume manager, Logical Volume manager can effectivey use this JBOD to create a single large virtual disk with redudndancy. But you wont be able to the performance benefits as it would be in hardware RAID 0.

RAID 1

A RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful when read performance is more important than minimizing the storage capacity used for redundancy. The array can only be as big as the smallest member disk, however. A classic RAID 1 mirrored pair contains two disks, which increases reliability by a factor of two over a single disk, but it is possible to have many more than two copies. Since each member can be addressed independently if the other fails, reliability is a linear multiple of the number of members. To truly get the full redundancy benefits of RAID 1, independent disk controllers are recommended, one for each disk. Some refer to this practice as splitting or duplexing.When reading, both disks can be accessed independently. Like RAID 0 the average seek time is reduced by half when randomly reading but because each disk has the exact same data the requested sectors can always be split evenly between the disks and the seek time remains low.

Mirroring: Simplest form of RAID.RAID 1 has many administrative advantages. For instance, in some 365*24 environments, it is possible to "Split the Mirror": declare one disk as inactive, do a backup of that disk, and then "rebuild" the mirror. This requires that the application support recovery from the image of data on the disk at the point of the mirror split.

Also, one common practice is to create an extra mirror of a volume (also known as a Business Continuance Volume or BCV. Source EMC. Shadow Image in Hitachi) which is meant to be split from the source RAID set and used independently. In some implementations, these extra mirrors can be split and then incrementally re-established, instead of requiring a complete RAID set rebuild.

RAID 2

A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. The disks are synchronized by the controller to run in perfect tandem. This is the only original level of RAID is not currently used. Extremely high data transfer rates are possible.

RAID 3

It uses byte-level striping with a dedicated parity disk. RAID 3 is very rare in practice. One of the side effects of RAID 3 is that it generally cannot service multiple requests simultaneously. This comes about because any single block of data will by definition be spread across all members of the set and will reside in the same location, so any I/O operation requires activity on every disk.

In our example below, a request for block "A1" would require all three data disks to seek to the beginning and reply with their contents. A simultaneous request for block B1 would have to wait.

Traditional

RAID 3

A1 A2 A3 Ap(1-3)

A4 A5 A6 Ap(4-6)

A7 A8 A9 Ap(7-9)

B1 B2 B3 Bp(1-3)

In Simple definition, RAID 3 will have its user data block distributed across all the disks. And also here the bottleneck will be the disk which holds the parity. If the parity disk is of less RPM and less performance, then total RAID 3 performance would be limited to of that disk.

RAID 3: Illustration

From the above figure, Disk A and Disk B hold the user data and the Disk C hold the parity Data. In the event of the Disk A failure, the data would be regenerated from Disk B and Disk C. Parity check data is computed by bit-by-bit exclusive OR of all the user data on the disks. In a RAID 3 array you would be effectively wasting one disk for check data. Hence the overhead is less on cost perspective, as compared to the RAID 1 where the overhead is 50%.

RAID 4

A RAID 4 uses block-level striping with a dedicated parity disk. RAID 4 looks similar to RAID 3 except that it stripes at the block, rather than the byte level. This allows each member of the set to act independently when only a single block is requested. If the disk controller allows it, a RAID 4 set can service multiple read requests simultaneously.

In our example below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1 would have to wait, but a request for B2 could be serviced concurrently.

Traditional

RAID 4

A1 A2 A3 Ap

B1 B2 B3 Bp

C1 C2 C3 Cp

D1 D2 D3 DpIn the real world scenario, we can see RAID 4 implementations in Netapp filers.

RAID 5

A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 has achieved popularity due to its low cost of redundancy. Generally RAID 5 is implemented with hardware support for parity calculations.

Sample RAID 5 Illustration.

Every time a block is written to a disk in a RAID 5, a parity block is generated within the same stripe. A block is often composed of many consecutive sectors on a disk. A series of blocks (a block from each of the disks in an array) is collectively called a "stripe". If another block, or some portion of a block, is written on that same stripe the parity block (or some portion of the parity block) is recalculated and rewritten. For small writes, this requires reading the old parity, reading the old data, writing the new parity, and writing the new data. The disk used for the parity block is staggered from one stripe to the next, hence the term "distributed parity blocks". RAID 5 writes are expensive (write penalty is more I RAID 5) in terms of disk operations and traffic between the disks and the controller. Hence RAID 5 in not recommeded for an application which asks for more write. However when an application write a new stripe then it takes only two writes. One for data and another for parity.The parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of a data sector results in a cyclic redundancy check (CRC) error. Distributing the parity across all the disk would reduce the I/O overhead caused by the need for updating the parity.

Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data on the failed drive "on the fly".

RAID 5 can accomadate a single disk failure, but not 2 disks failure.RAID 6

A RAID 6 extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks distributed across all member disks.

Like RAID 5 the parity is distributed in stripes, with the parity blocks in a different place in each stripe.

Traditional Typical

RAID 5 RAID 6A1 A2 A3 Ap A1 A2 A3 Ap AqB1 B2 Bp B3 B1 B2 Bp Bq B3C1 Cp C2 C3 C1 Cp Cq C2 C3Dp D1 D2 D3 Dp Dq D1 D2 D3RAID 6 is inefficient when used with a small number of drives but as arrays become bigger and have more drives the loss in storage capacity becomes less important and the probability of two disks failing at once becomes greater. RAID 6 provides protection against double disk failures and failures while a single disk is rebuilding. In the case where there is only one array it makes more sense than having a "hot spare" disk.

RAID 6 does not have a performance penalty for read operations, but it does have a performance penalty on write operations due to the overhead associated with the additional parity calculations.

Nested RAID:

RAID 0+1

A RAID 0+1 (also called RAID 01, though it shouldn't be confused with RAID 10) is a RAID used for both replicating and sharing data among disks. The difference between RAID 0+1 and RAID 1+0 is the location of each RAID system it is a mirror of stripes. Consider an example of RAID 0+1: six 120 GB drives need to be set up on a RAID 0+1. Below is an example where two 360 GB level 0 arrays are mirrored, creating 360 GB of total storage space:

RAID 1

/--------------------------\

| |

RAID 0 RAID 0

/-----------------\ /-----------------\

| | | | | |

120 GB 120 GB 120 GB 120 GB 120 GB 120 GB

A1 A2 A3 A1 A2 A3

A4 A5 A6 A4 A5 A6

B1 B2 B3 B1 B2 B3

B4 B5 B6 B4 B5 B6

The maximum storage space here is 360 GB, spread across two arrays. The advantage is that when a hard drive fails in one of the level 0 arrays, the missing data can be transferred from the other array. However, adding an extra hard drive to one stripe requires you to add an additional hard drive to the other stripes to balance out storage among the arrays.

It is not as robust as RAID 10 and cannot tolerate two simultaneous disk failures, if not from the same stripe. That is, once a single disk fails, each of the mechanisms in the other stripe is single point of failure. Also, once the single failed mechanism is replaced, in order to rebuild its data all the disks in the array must participate in the rebuild.RAID 10

A RAID 10, sometimes called RAID 1+0, or RAID 1&0, is similar to a RAID 0+1 with exception that the RAID levels used are reversedRAID 10 is a stripe of mirrors. Below is an example where three collections of 120 GB level 1 arrays are striped together to add up to 360 GB of total storage space:

RAID 0

/-----------------------------------\

| | |

RAID 1 RAID 1 RAID 1

/--------\ /--------\ /--------\

| | | | | |

120 GB 120 GB 120 GB 120 GB 120 GB 120 GB

A1 A1 A2 A2 A3 A3

A4 A4 A5 A5 A6 A6

B1 B1 B2 B2 B3 B3

B4 B4 B5 B5 B6 B6

All but one drive from each RAID 1 set could fail without damaging the data. However, if the failed drive is not replaced, the single working hard drive in the set then becomes a single point of failure for the entire array. If that single hard drive then fails, all data stored in the entire array is lost.

Extra 120GB hard drives could be added to any one of the level 1 arrays to provide extra redundancy. Unlike RAID 0+1, all the "sub-arrays" do not have to be upgraded simultaneously.

RAID 10 is often the primary choice for high-load databases, because the lack of parity to calculate gives it faster write speeds. In the event of one disk fail in a RAID 1 set, would ask for rebuilding the data within the RAID 1 set and hence the total time to rebuild the data is reduced a lot as compared to RAID 01.

RAID 50 (RAID 5+0)

A RAID 50 combines the block-level striping with distributed parity of RAID 5, with the straight block-level striping of RAID 0. This is a RAID 0 array striped across RAID 5 elements.

Below is an example where three collections of 120 GB RAID 5s are striped together to add up to 720 GB of total storage space:

RAID 0

/-----------------------------------------------------\

| | |

RAID 5 RAID 5 RAID 5

/-----------------\ /-----------------\ /-----------------\

| | | | | | | | |

120 GB 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB

A1 A2 Ap A3 A4 Ap A5 A6 Ap B1 Bp B2 B3 Bp B4 B5 Bp B6

Cp C1 C2 Cp C3 C4 Cp C5 C6

D1 D2 Dp D3 D4 Dp D5 D6 DpOne drive from each of the RAID 5 sets could fail without loss of data. However, if the failed drive is not replaced, the remaining drives in that set then become a single point of failure for the entire array. If one of those drives fails, all data stored in the entire array is lost. The time spent in recovery (detecting and responding to a drive failure, and the rebuild process to the newly inserted drive) represents a period of vulnerability to the RAID set.

The configuration of the RAID sets will impact the overall fault tolerancy. A construction of three seven-drive RAID 5 sets has higher capacity and storage efficiency, but can only tolerate three maximum potential drive failures. A construction of seven three-drive RAID 5 sets can handle as many as seven drive failures but has lower capacity and storage efficiency.

Proprietary RAID Levels:

RAID S or Parity RAID

RAID S is EMC Corporation's proprietary striped parity RAID system used in their Symmetrix storage systems. Each volume exists on a single physical disk, and multiple volumes are arbitrarily combined for parity purposes. EMC originally referred to this capability as RAID S, and then renamed it Parity RAID for the Symmetrix DMX platform. EMC now offers standard striped RAID 5 on the Symmetrix DMX as well.

Traditional EMC

RAID 5 RAID S

A1 A2 A3 Ap A1 B1 C1 1p

B1 B2 Bp B3 A2 B2 C2 2p

C1 Cp C2 C3 A3 B3 C3 3p

Dp D1 D2 D3 A4 B4 C4 4p

IBM ServeRAID 1E

The IBM ServeRAID adapter series supports 2-way mirroring on an arbitrary number of drives. For example, mirroring on 5 drives would look like

A1 A2 A3 A4 A5

A5 A1 A2 A3 A4

B1 B2 B3 B4 B5

B5 B1 B2 B3 B4

This configuration is tolerant of non-adjacent drives failing. Other storage systems including Sun's StorEdge T3 support this mode as well.Comparison of all RAID Levels

Raid Indepth

Documents

Transcript of Raid Indepth