RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

8
White Paper: RAID STRATEGIES IN SSD DEPLOYMENTS Optimizing Enterprise Arrays for Reliability, Capacity and Performance

Transcript of RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

Page 1: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

White Paper:

RAID StRAtegIeS In SSD DePloymentS

optimizing enterprise Arrays for Reliability, Capacity and Performance

Page 2: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

For any It professional, RAID (Redundant Array of Independent Drives) is likely a familiar concept. the earliest forms of RAID technology date all the way back to 1960s. originally, RAID was developed to replace high performance, but expensive mainframe disk drives with an array of lower cost PC hard disk drives (HDDs). While PC HDDs were substantially slower than mainframe disk drives, configuring multiple PC HDDs in an array multiplied the performance as the HDDs could be accessed in parallel. Additionally, by implementing redundancy in the form of parity, an array of drives was also far more reliable than any large single drive because the array

could withstand bit corruption or a drive failure without data loss as data could be recovered from parity. even though mainframe disk drives are now ancient history by modern technology standards, the need for RAID has remained because the benefits of a redundant array versus a large single disk are indisputable. Because enterprise storage manages highly critical company and customer data, the storage architecture must have built-in redundancy against unexpected drive failures.

While SSDs have higher reliability and lower failure rates than HDDs, and Samsung’s enterprise SSDs have specifically been designed

and built with reliability as the highest priority, when deploying thousands of drives there is always statistically going to be a small number of drives that experience an early, unexpected failure. Due to the criticalness of enterprise data, It administrators can’t take any risk of losing data because of a drive failure.

there are a variety of different types of RAID, with each having its unique advantages and disadvantages. there is no RAID type, or level as it is often called, that is superior to others because every level makes tradeoffs between capacity, performance and reliability. this whitepaper explains the most commonly used RAID levels in detail and compares them in different use scenarios.

iNTRODUCTiON: RAiD STRATegieS iN SSD DeplOymeNTS

SSD

evolution of Storage

Page 3: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

RAID implementations can be split into two: hardware- and software-based solutions. traditionally, RAID in enterprise has been implemented through hardware by using RAID-optimized Host Bus Adapters (HBAs). HBAs connect to the host’s PCIe interface and contain a custom controller chip with multiple SAtA/SAS connectors, which allows the RAID array to operate independently from the host.

In other words, the array simply shows up as one large logical volume to the host because all the array management is done by the HBA.

the advantages of hardware RAID have been the reduced host CPU load and increased capacity due to a higher number of SAtA/SAS ports. especially parity based RAID 5 and 6 require considerable compute power, but the calculations are

very repetitive, hence a custom chip is far more efficient than performing the calculations on the host’s general purpose CPU. A hardware-based solution also ensures consistent performance because the controller chip is dedicated to RAID, whereas the host CPU has various other processes to perform, which would impact the performance of the RAID array.

However, with the rise of data centers and advances in CPU performance, software based solutions have become more alluring due to the reduced cost. many modern filesystems, such as ZFS, and operating systems also have built-in RAID-like

features, which makes software RAID more convenient and efficient to implement.

Furthermore, the SSD industry has adopted PCI express as the new standard interface for SSDs, which connects

SSDs directly to the CPU. With PCIe, HBAs are no longer be needed and would only add unnecessary latency, meaning that the future is software based RAID solutions, or an alternative technology such as erasure codes.

The ChANge iN RAiD implemeNTATiON

CPU Host Bus Adapter SSD/HDDPCI Express

The Past

SATA/SAS

CPU PCIe NVMe SSDPCI Express

The Future

Page 4: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

RAID 0: In RAID 0, data is divided (“striped”) between all the members of the array. When the array receives a new write Io, the data is divided into smaller chunks of data determined by the stripe size, and the chunks are then equally distributed to member drives of the array. Because data is written to and read from all member drives simultaneously, the performance of a RAID 0 array is a multiple of the number of member drives. the capacity of a RAID 0 array is the combined capacity of all member drives.

the downside of RAID 0 is that it doesn’t have any redundancy. If one of the drives in the array fails, all data in the array will be lost, which means that the reliability of a RAID 0 array is inversely proportional to the number of member drives. Because of high unreliability, RAID 0 is not recommended for any enterprise use. RAID 0, can, however, be a good option for PC enthusiasts and power users who need high performance, but are not processing critical data.

RAID 1: In RAID 1, data is written (“mirrored”) to every member drive of the array. Because a write Io is written in full to every drive, RAID 1 doesn’t improve write performance, but read performance is improved as data can be read from multiple drives simultaneously. the benefit of RAID 1 is its very high redundancy because an array is fully functional as long as one member drive is online because all member drives contain exactly the same data.

the downside, however, is that the capacity of a RAID 1 array is only the capacity of one member drive. Hence RAID 1 isn’t cost efficient for arrays consisting of a large number of drives, unless extreme redundancy and reliability is needed. Due to the poor cost efficiency with large arrays, RAID 1 is rarely used in enterprise storage deployments. RAID 1 is ideal for consumer and small business storage where the capacity of a single drive is sufficient, but increased reliability is needed.

The DiffeReNT levelS Of RAiD

RAID 0

A1A3A5A7

Drive 0 Drive 1

A2A4A6A8

RAID 1

A1A2A3A4

Drive 0 Drive 1

A1A2A3A4

Page 5: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

RAID 10: RAID 10, or RAID 1+0 as it’s sometimes called, merges the functionality of RAID 1 and RAID 0. It’s effectively a stripe of mirrors, meaning that two or more RAID 1 sub-arrays are combined into one RAID 0 array. By doing so, RAID 10 features some of the performance and capacity benefits of RAID 0 because the RAID 1 sub-arrays can be accessed simultaneously, and the RAID 1 sub-arrays have redundancy due to mirroring for higher reliability.

the read performance of a RAID 10 array is a multiple of the number of member drives because data can be simultaneously read from every drive. Write performance, on the other hand, is dependent on the number of RAID 1 sub-arrays as data is striped across all sub-arrays similar to RAID 0, but in

the sub-array the data is written to every member drive as in RAID 1. the total capacity of a RAID 10 array is also dependent on the number of RAID 1 sub-arrays, rather than the number of drives, because the capacity of one RAID 1 sub-array is the capacity of one member drive.

the redundancy of a RAID 10 array depends on the number of drives in every RAID 1 sub-array. As data is mirrored to every drive in the sub-array, redundancy and reliability increase with more drives, but capacity efficiency decreases.

For example, with nine 1tB drives and three-way mirroring, the capacity of a RAID 10 array would be 3tB because the RAID 0 array consists of three 1tB RAID 1 sub-arrays. the minimum drive failure tolerance would be

two drives due to three-way mirroring (i.e. three drives per RAID 1 sub-array). maximum read performance would be 9x compared to a single drive, whereas write performance would be threefold.

RAID 10 requires at least four drives, hence it’s generally out of reach for consumers and is more suitable for enterprise storage deployments. the ability to select redundancy level through RAID 1 sub-array configuration makes RAID 10 very versatile for a wide range of enterprise applications because it can offer the reliability required by critical data, while maintaining high performance and reasonable cost efficiency

RAID 1

RAID 0

RAID 10

A1A3A5A7

Drive 0 Drive 1

A1A3A5A7

RAID 1

A2A4A6A8

Drive 2 Drive 3

A2A4A6A8

Page 6: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

RAID 5: RAID 5 consists of RAID 0 like striping and introduces a concept called parity data. Whereas in RAID 1 all data is mirrored to every member disk of the array, RAID 5 creates parity data and distributes it among all member drives. When a RAID 5 array receives a new write Io, the data is divided into smaller parts similar to RAID 0, but in addition parity data is created and written to one member drive, which changes all the time (i.e. all member drives contain user and parity data).

A RAID 5 array can withstand one drive failure without any data loss as the array can be rebuilt from parity data. RAID 5 is also very capacity efficient because only the capacity of one drive is used for parity regardless of the number of drives, leaving the capacity of the rest available to the user.

theoretically, RAID 5 features very high performance because data can be written to and read from all drives. However, in real world, especially write performance is dictated by the specific software/hardware implementation of RAID 5 because parity calculations require quite a bit of processing power.

RAID 5 is very popular in enterprise storage deployments and also among PC enthusiasts. With great capacity efficiency, good performance and redundancy against drive failures, RAID 5 is a good fit for most enterprise applications.

RAID 5

A1B1C1Dp

Drive 0 Drive 1

A2B2CpD1

A3BpC2D2

Drive 2 Drive 3

ApB3C3D3

RAID 6

A1B1C1Dp

A2B2CpDq

A3BpCqD1

ApBqC2D2

Drive 0 Drive 1 Drive 2 Drive 3 Drive 4

AqB3C3D3

Eq E1 E2 E3 Ep

RAID 6: RAID 6 is an extension of RAID 5 and simply adds redundancy for two drive failures by adding a second parity drive. By doing so, the capacity and performance of RAID 6 are slightly lower compared to RAID 5, but reliability is increased.

Page 7: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

As shown in the table above, every RAID level makes a trade off between performance, capacity efficiency and drive failure tolerance. thus the most suitable RAID level depends on the requirements set by the workload and criticalness of data. In enterprise storage, the choices are generally between RAID 10, 5 and 6

RAID 10 vs RAID 5/6: RAID 10 and 5/6 are fundamentally different because RAID 10 relies on mirroring, whereas RAID 5 and 6 utilize parity data to create redundancy against drive failures. the use of parity data is substantially more capacity efficient than mirroring, but the disadvantage is long downtime due to rebuild in case of a drive failure. In other words, a RAID 10 array can continue to fully operate despite a drive failure, whereas a RAID 5/6 array experiences a noticeable loss in performance as data has to be recovered from parity inflight. Additionally, the rebuild time of a RAID 5/6 array is considerably longer. Depending on the hardware and software configuration, the rebuild time

can range from as short as tens of minutes to up to several days when dealing with large, high capacity arrays, resulting in longer downtime compared to a RAID 10 array.

When managing critical data that needs to be available at all times, any downtime can be out of question. Such enterprise usages can include for example financial or medical applications, where downtime can result in serious financial loss or even put a human life in danger. In short, for applications where data availability is more important than capacity efficiency, RAID 10 is often a better option than RAID 5 or 6, given the inherent benefits of mirroring technology. Under most scenarios, two-way mirroring (2-drive RAID 1 sub-arrays) provides sufficient level of redundancy, but the most critical data may require three-way mirroring to ensure absolutely no data loss or downtime. RAID 5 vs RAID 6: For any other application, where data availability isn’t extremely critical, RAID 5 and 6 present a better option due to significantly higher capacity efficiency, which reduces cost.

Selecting between RAID 5 and 6 is much simpler than RAID 10 versus RAID 5/6 because the difference between the two is mostly the level of redundancy. In a nutshell, RAID 5 provides protection against a single drive failure in the array, whereas RAID 6 offers two.

Applications that are more sensitive to downtime will benefit from the increased level of redundancy in RAID 6 because the likelihood that three drives fail simultaneously and all the data of the array has to be recovered from a backup is below marginal. moreover, RAID 6 reduces the risk of an unrecoverable error during a rebuild. As all data of the array has to be recovered during a rebuild, there is a chance that a bit has become corrupt overtime, which may corrupt the whole array. As RAID 6 stores parity on two drives, the corrupt bit can still be recovered from another parity drive. this issue mostly persists with hard disk drives (HDDs), which don’t have as strong error correction code (eCC) engines as SSDs, but especially with large SSD arrays the possibility of a bit error should be taken into account.

ChOOSiNg The mOST SUiTAble RAiD level wiTh SSDs

Read Performance

Write Performance

Capacity efficiency

Drive Failure tolerance

minimum number of Drives

RAID 0 ooooo ooooo ooooo o 2RAID 1 ooooo o o ooooo 2RAID 10 ooooo oooo oo ooo 4RAID 5 oooo oooo oooo ooo 3RAID 6 oooo ooo ooo oooo 4

Page 8: RAID Strategies in SSD Deployments: Optimizing Enterprise Arrays

the RAID level can have significant impact on user experience and overall storage cost. Understanding the criticalness of data before selecting the RAID level is very important because the RAID level is ultimately determined by the redundancy needs and sensitivity to downtime. If the characteristics of the usage are not well known or understood,

it’s better to be conservative and opt for a RAID level with higher redundancy to ensure that the service is not put in risk. However, more redundancy means higher cost, hence it’s not wise to select a RAID level with unnecessarily high redundancy. Under most circumstances, RAID 5 and 6 provide the best mix of performance, capacity and redundancy. RAID 6 is

recommended for large multi-drive arrays and applications that require an extra level of data protection, but RAID 5 is generally the best option for most SSD deployments. only some specific data critical applications will benefit from RAID 10, and in such applications the better data availability will outweigh the higher cost of a RAID 10 array.

CONClUSiON

learn more: samsung.com/enterprisessd | insights.samsung.com | 1-866-SAm4BIZ

Follow us: youtube.com/samsungbizusa | @SamsungBizUSA | insights.samsung.com

©2016 Samsung electronics America, Inc. All rights reserved. Samsung is a registered trademark of Samsung electronics Co., ltd. All products, logos and brand names are trademarks or registered trademarks of their respective companies. this white paper is for informational purposes only. Samsung makes no warranties, express or implied, in this white paper. WHP-SSD-RAIDStRAtegIeS-FeB16J

About the Author Kristian Vättö is a technical marketing specialist and started his career as a news editor at Anandtech.com in 2011. He later became the site’s SSD editor and was responsible for producing highly-detailed and professional SSD reviews. In addition to his work with Samsung, Kristian is currently studying economics at the University of tampere in Finland.

SAmSUng enteRPRISeSSD PoRtFolIo

PM863 Series Data Center SSDs•3bitMLCNAND•Designedforread-intensive

applications•BuiltinPowerLossProtection•SATA6Gb/sInterface•Form-factors:2.5”

SM863 Series Data Center SSDs•2bitMLCNAND•Designedforwrite-intensive

applications•BuiltinPowerLossProtection•SATA6Gb/sInterface•Form-factors:2.5”

950 Pro Series Client PC SSDs•2-bitMLCV-NAND•Designedforhigh-endPCs•PCIeInterface•NVMeprotocol•Form-factors:M.2

Reliability

Capacity Perfomance

RAID 5

RAID 6

RAID 10

RAID 1

RAID 0