Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

5
White Paper: Tiered Storage Architecture Taking Advantage of New Classes of Data Center Storage

Transcript of Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

Page 1: Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

White Paper:

Tiered Storage Architecture

Taking Advantage of New Classes of Data Center Storage

Page 2: Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

The world is becoming smarter all the time. Smartphones, tablets and smart TVs are already common household items and with Internet of Things (IoT), more and more everyday items are becoming smart in one way or the other. While smart devices are making life more convenient, they are also putting enormous pressure on enterprise storage. The generation and consumption of data are the corner stones of connected devices, so data must be stored, analyzed and also made quickly accessible in order to enjoy the full potential that connected devices offer.

With more Internet-connected devices in use, the amount of data that is generated on a yearly basis is growing exponentially, and by 2020 the data universe is expected to reach 44 zettabytes with growth forecast to continue at 40% year over year. According to analyst firm IDC, however, about 90% of the world’s data is considered to be “cold data”, which is only accessed infrequently. In other words, we only access about 10% of the world’s data on a regular basis.

Facebook is an excellent real-world example of hot versus cold data. When a photo is uploaded to Facebook, it is pushed to friends’ and followers’ timelines, making the photo “hot” as it’s accessed by hundreds, even thousands, of users within minutes. The photo will also generate associated data in the form of likes, comments and tags, but eventually the photo will fade away from people’s timelines

and become just another photo in one’s profile. The photo may still be viewed by a user every now and then, but basically the photo and its associated data that was once hot and frequently accessed has now become cold, infrequently accessed data. Given the different access frequencies, hot and cold data obviously have different storage requirements. Because hot data is accessed frequently and possibly by thousands of people at the same time, it needs to be stored in a storage device that is capable of providing high performance and low latency. However, the cost of high performance storage, such as PCIe NVMe SSDs, is always higher. For this reason, it is not

cost-efficient to store all data in the same type of device if only a fraction is accessed regularly.

A tiered storage architecture offers a cost-efficient solution for today’s enterprise storage needs. Instead of storing all data in one class of storage, tiered architecture provides several different tiers of storage with each having unique performance and cost characteristics. Tiering provides the best performance and lowest cost by storing data in the appropriate tier based on the access frequency of the data. In other words, hot data can reside in a high-performance tier to ensure low access latencies, whereas colder data can be stored in tiers with lower cost per gigabyte.

SmArT World: The ChAllenge for enTerpriSe STorAge

Source: Oracle 2012

2010

5

10

20

30

40

50Data in zettabytes

2015 2020

Identifying “hot” and“cold” data is key to

tiering storage

Data growing at a 40% compound

annual rate

10% hot data

90% cold data

Page 3: Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

Computer memory architecture has always consisted of several tiers. A modern CPU alone has three levels of SRAM caches inside (L1, L2 & L3), which are accompanied by a system-wide DRAM cache sitting on the DDR interface. It’s natural to extend the tiered architecture to storage as well, because modern storage devices span across multiple latency tiers. PCIe NVMe SSDs offer the highest performance and lowest latency, but SATA 6Gbps SSDs provide lower cost per gigabyte. Hard disk drives (HDDs) reside at the bottom of the latency tier and are orders of magnitude slower than even SATA 6Gbps SSDs, but the cost per gigabyte is the lowest of all by a significant margin. When speaking of caching and tiering, it’s important to make a clear distinction between the two. Caching comes in three different forms (write-around, write-through and write-back caching) and is typically employed with memory (SRAM/DRAM) and storage (SSD/HDD), but it can be utilized with different classes of storage as well. Basically, the faster form of memory/storage is used as a temporary data cache to improve read and/or write performance depending on the chosen cache type.

• Write-around cache is effectively a read-only cache, as all data is first written to the slower tier and the caching algorithms then determine what data

is accessed frequently and copies it to the faster tier for lower latency access.

• Write-through cache writes to both fast and slow storage tiers simultaneously and a write operation is only considered complete when it has been written to both tiers. In other words, write-through cache is also a read-only cache because the performance is still determined by the slower tier and the idea behind write-through caching is that the data that was written the most recently is also the most likely to be read next.

• Write-back cache is the only one that improves write performance because data is first written to the faster tier and then later moved to the slower tier.

Tiering is fundamentally different from caching because there are no temporary caches - just different tiers of permanent storage. Whereas caching copies data from the slower tier to a faster one for improved read latency, in tiering, data is never

copied – it is always moved in full from one tier to another. That creates a space efficiency advantage because the capacity of all storage tiers is available to the host, whereas in caching, only the capacity of the slower tier is accessible because data is copied and not moved.

Another fundamental difference between tiering and caching is that tiering supports more than two tiers of storage. Because data is moved between tiers based on access frequency, there is practically no upper limit on the number of tiers that a tiered storage architecture can have. Caching architectures typically only work with two tiers, as the faster tier is used to accelerate the slower main tier where all data is ultimately stored long-term. A multi-tier architecture enables higher performance and lower cost because the whole storage architecture can be designed to best fit a specific workload, which may consist of varying levels of data access frequencies.

UnderSTAnding The differenCe BeTWeen Tiering And CAChing

Type Latency Typical Size

L1 Cache 1-3 ns 32KB per CPU core

L2 Cache 3-10 ns 256KB per CPU core

L3 Cache 10-20 ns 2-20MB per CPU package

DRAM 30-60 ns 2-32GB per module

PCIe NVMe SSD 20,000-100,000 ns 400-3,200GB per drive

SATA SSD 40,000-110,000 ns 120-3,840GB per drive

HDD 3,000,000-10,000,000 ns 500-8,000GB per drive

Latencies in Modern Computer Architecture

A nanosecond (ns) is one billionth (10-9) of a second

Page 4: Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

Generally speaking, the cost of storage is dictated by performance: the more performance a drive provides, the higher the price per gigabyte. When variation in data access frequencies is added to the equation, it does not make sense to use just one tier of storage. If only high-performance storage was used, the cost would be through the roof and since the array would end up storing mostly cold data, the return on investment (ROI) would be very poor. Similarly, if only low cost storage was used, the performance would be very limited, which would deteriorate user experience and could limit the growth of the business.

Performance and capacity per dollar are the key metrics in tiering. Each type of storage has its unique performance and cost characteristics, so a multi-tiered storage architecture enables the highest cost efficiency as each storage type can be used in its appropriate tier.

All the most frequently accessed data and all incoming writes are initially stored in Tier 1, resulting in very high read and write activity. Because PCIe NVMe SSDs provide the highest performance per dollar and watt, using them in Tier 1 is the most cost-efficient solution. It would take multiple SATA SSDs or thousands of HDDs to match the performance of a single PCIe NVMe SSD, which would be far costlier to acquire despite PCIe NVMe SSDs commanding a higher cost per gigabyte.

Additionally, multiple SATA SSDs and especially hundreds of HDDs would consume significantly more power, thus a PCIe NVMe SSD also provides lower total cost of ownership (TCO) by reducing electricity expenses. 2-bit MLC SATA SSDs, such as the Samsung SM863 Series, are the optimal choice for Tier 2 because they still provide high performance, but offer lower cost per gigabyte than PCIe NVMe SSDs do. Compared to 3-bit MLC, 2-bit MLC also has higher write endurance, which is beneficial since the data in the upper tiers is more likely to be modified than the static, cold data that has already reached Tier 3. On the other hand, 3-bit MLC offers read performance similar to 2-bit MLC but at a lower cost per gigabyte, making it ideal for Tier 3 where the data is mostly read-only.

While SSDs offer superior performance, density and power efficiency, HDDs are priced

noticeably lower per gigabyte than SSDs are. Thus HDDs may still have a place at the bottom of the storage tier, especially in scenarios where large amounts of cold data need to be stored. In such scenarios, the acquisition cost of SSD-only storage would be very high, so even though SSDs offer higher density (less racks and space required) and power efficiency (lower electricity cost), the cost may be too high if hundreds or thousands of petabytes of storage is needed. For deep archives of very large datasets, even tape can be used below the HDD tier for the data that is almost never accessed, but still needs to be retained for possible future use.

Tiering STorAge for performAnCe And TCo

For more details about the differences between 2-bit and 3-bit MLC, please refer to “Evaluating MLC vs TLC vs V-NAND for Enterprise Applications” white paper.

LEARN MORE

Storage Options for Each Tier

Tier 1PCIe NVMe

SSD

Tier 22-bit MLC SSD

Tier 33-bit MLC SSD

Tier 4HDDs

SM863

PM863Perfo

rman

ce p

er $

Capacity per $

Page 5: Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage

Tiering allows enterprises to take full advantage of all the different classes of storage that exist in today’s marketplace. Cost-efficient, high performance storage is one of the key factors enabling the growth of modern data centers and enterprises,

and by combining the highest performance storage with a large pool of slower tiers, tiering enables the necessary performance for hot data, while keeping the TCO as low as possible, even for very large datasets.

Learn more: samsung.com/enterprisessd | 1-866-SAM4BIZ

Follow us: youtube.com/samsungbizusa | @SamsungBizUSA | insights.samsung.com

© 2016 Samsung Electronics America, Inc. All rights reserved. Samsung is a registered trademark of Samsung Electronics Co., Ltd. All products, logos and brand names are trademarks or registered trademarks of their respective companies. This white paper is for informational purposes only. Samsung makes no warranties, express or implied, in this white paper. WHP-SSD-TIERINGSTORAGE-JAN16J

SAMSUNG ENTERPRISESSD PORTFOLIO

PM863 Series Data Center SSDs•3bitMLCNAND•Designedforread-intensive

applications•BuiltinPowerLossProtection•SATA6Gb/sInterface•Form-factors:2.5”

SM863 Series Data Center SSDs•2bitMLCNAND•Designedforwrite-intensive

applications•BuiltinPowerLossProtection•SATA6Gb/sInterface•Form-factors:2.5”

ConClUSion

About the Author Kristian Vättö is a technical marketing specialist and started his career as a news editor at AnandTech.com in 2011. He later became the site’s SSD editor and was responsible for producing highly-detailed and professional SSD reviews. In addition to his work with Samsung, Kristian is currently studying economics at the University of Tampere in Finland.