Redhat Gluster Storage-Arch

download Redhat Gluster Storage-Arch

of 12

Transcript of Redhat Gluster Storage-Arch

  • 7/30/2019 Redhat Gluster Storage-Arch

    1/12

    www.rh.m

    Oer the past ten years, enterprises hae seen enormous gains in scalability, exibility, and

    affordability as they migrated from proprietary, monolithic serer architectures to architectures

    that are irtualized, open source, standardized, and commoditized.

    Unfortunately, storage has not kept pace with computing. The proprietary, monolithic, shared-

    all, and scale-up solutions that dominate the storage industry today do not delier the scalabil-

    ity, exibility, and economics that modern datacenter and cloud computing enironments need

    in a hyper-growth, irtualized, and increasingly cloud-based world. Red Hat Storage was created

    to address this gap.

    1 Abstract

    2 Red Hat Storage desgn goals

    Elasticity

    Linear scaling

    Scale-out with Red Hat Storage

    6 Techncal dfferentators

    Software-only

    Open source

    Complete storage operating system stackUser space

    Modular, stackable architecture

    Data stored in natie formats

    No metadata with the elastic hash algorithm

    Red Hat Storage Global namespace technology

    Standard based le and object store

    9 Red Hat Storage advanced tocs

    Elastic volume management

    Renaming or moing les

    9 Uniedleandobject

    9 Hgh avalablty

    N-way local synchronous replication

    Geo-rep long distance asynchronous replication

    Replication in the priate cloud/datacenter,

    public cloud, and hybrid cloud enironments

    10 Concluson

    11 Glossary

    TABLE OF CONTENTS

    ABSTRACT

    Red Hat StoRage SeRveR

    An introduction to Red Hat Storage Serer architecture

  • 7/30/2019 Redhat Gluster Storage-Arch

    2/12

    Red Hat StoRage SeRveR deSign goalS

    Red Hat Storage Serer is a scale-out network-attached storage (NAS) and object storage soft-

    ware solution for priate cloud or datacenter, public cloud, and hybrid cloud enironments. It issoftware-only, open source, and designed to meet unstructured, semi-structured and big data

    storage requirements. It enables enterprises to combine large numbers of commodity storage

    and compute resources into a high-performance, irtualized, and centrally managed storage

    pool. Both capacity and performance can scale linearly and independently on-demand, from a

    few terabytes to petabytes and beyond, using both on-premise commodity hardware and the

    public cloud compute and storage infrastructure. By combining commodity economics with a

    scale-out approach, Red Hat customers can achiee radically better price and performance in

    an easily deployed and managed solution that can be congured for increasingly demanding

    workloads.

    At the heart of Red Hat Storage Serer is GlusterFS, an open source, massiely scalable distrib-

    uted le system. This whitepaper discusses some of the unique technical aspects of the

    Red Hat Storage Serer architecture, speaking to those aspects of the system that are designedto proide linear scale-out of both performance and capacity without sacricing resiliency.

    Red Hat Storage Serer was designed to achiee seeral major goals:

    Elastcty

    Elasticity is the notion that an enterprise should be able to exibly adapt to the growth (or

    reduction) of data and to add or remoe resources to a storage pool as needed without disrupt-

    ing the system. Red Hat Storage Serer was designed to allow enterprises to add or delete users,

    application data, olumes and storage nodes, etc., without disrupting any running workloads

    within the infrastructure.

    Lnear scalng

    Linear scaling is a much-abused phrase within the storage industry. It should mean, for example,

    that twice the amount of storage systems will delier twice the realized performancetwice the

    throughput (as measured in gigabytes per second) with the same aerage response time per

    external le system I/O eent (i.e., how long a NFS client will wait for the le serer to return the

    information associated with each NFS client request).

    Similarly, if an organization has acceptable leels of performance, but wants to increase capac-

    ity, it should be able to do so without decreasing performance or getting non-linear returns in

    capacity.

    Unfortunately, most storage systems do not demonstrate linear scaling. This seems somewhatcounter-intuitie, since it is so easy to purchase another set of disks to double the size of aail-

    able storage. The caeat in doing so is that the scalability of storage has multiple dimensions,

    capacity being only one of them.

    Adding capacity is only one dimension; the systems managing the disk storage need to scale as

    well. There needs to be enough CPU capacity to drie all of the spindles at their peak capacity.

    The le system must scale to support the total size. The metadata telling the system where all

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    2www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    3/12

    the les are located must scale at the same rate disks are added. The network capacity aail-

    able must scale to meet the increased number of clients accessing those disks. In short, it is not

    storage that needs to scale as much as it is the complete storage system that needs to scale.

    Traditional le system models and architectures are unable to scale in this manner and there-

    fore can neer achiee true linear scaling of performance. For traditional distributed systems,

    each storage node must always incur the oerhead of interacting with one or more other stor-

    age nodes for eery le operation, and that oerhead subtracts from the scalability simply by

    adding to the list of tasks and the amount of work to be done.

    Een if those additional tasks could be done with near-zero effort (in the CPU and other system

    resources sense of the term), latency problems remain. Latency results from waiting for the

    responses across the networks connecting the distributed storage nodes in those traditional

    system architectures and nearly always impacts performance. This type of latency increases

    proportionally relatie to the speed and responsienessor lack ofof the networking connect-

    ing the nodes to each other. Attempts to minimize coordination oerhead often result in unac-

    ceptable increases in risk.

    This is why claims of linear scalability often break down for traditional distributed architectures.

    Instead, as illustrated in Figure 1, most traditional systems demonstrate logarithmic scalabil-

    itystorages useful capacity grows more slowly as it gets larger. This is due to the increased

    oerhead necessary to maintain data resiliency. Examining the performance of some storage

    networks reects this limitation as larger units offer slower aggregate performance than their

    smaller counterparts.

    Scale-out th Red Hat Storage Server

    Red Hat Storage Serer is designed to proide

    a scale-out architecture for both performance

    and capacity. This implies that the system should

    be able to scale up (or down) along multiple

    dimensions.

    By aggregating the disk, CPU, and I/O resources of

    large numbers of inexpensie systems, an enter-

    prise should be able to create one ery large and

    high-performing storage pool. If the enterprise

    wants to add more capacity to scale out a system,

    they can do so by adding more inexpensie disks. If

    the enterprise wants to gain performance, they can

    do so by deploying more inexpensie seer nodes.

    Red Hat Storage Serers unique architecture is designed to delier the benets of scale-out

    (more units means more capacity, more CPU, and more I/O), while aoiding the corresponding

    oerhead and risk associated with keeping large numbers of storage nodes in sync.

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    3www.rh.m

    Figure 1

  • 7/30/2019 Redhat Gluster Storage-Arch

    4/12

    In practice, both performance and capacity can be scaled out linearly with Red Hat Storage

    Serer. We can do this by employing three fundamental techniques:

    The elimination of a metadata serer

    Effectie distribution of data to achiee scalability and reliability

    The use of parallelism to maximize performance ia a fully distributed architecture

    To illustrate how Red Hat Storage Serer scales, Figure 2 shows how a baseline system can be

    scaled to increase both performance and capacity. The discussion below uses some illustratie

    performance and capacity numbers.

    A typical direct-attached Red Hat Storage Serer conguration will hae a moderate number

    of disks attached to two or more serer nodes, which act as NAS heads (or storage nodes). For

    example, to support a requirement for 24 TB of capacity, a deployment might hae two serers,

    each of which contains a quantity of 12 one-terabyte SATA dries. (See Cong A.)

    If a customer has found that the performance leels are acceptable but wants to increase capac-

    ity by 25%, they could add another four one-terabyte dries to each serer and will not gener-

    ally experience performance degradation (i.e., each serer would hae 16 one-terabyte dries).

    (See Cong B.) Note that they do not need to upgrade to larger or more powerful hardware;

    they simply add eight more inexpensie SATA dries.

    On the other hand, if the customer is happy with 24 TB of capacity but wants to double perfor-mance, they could distribute the dries among four serers, rather than two (i.e., each serer

    would hae six one-terabyte dries, rather than 12). Note that in this case, they are adding two

    more low-price serers and can simply redeploy existing dries. (See Cong C.)

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    4www.rh.m

    Figure 2

  • 7/30/2019 Redhat Gluster Storage-Arch

    5/12

    If they want to quadruple both performance and capacity, they could distribute among eight

    serers (i.e., each serer would hae 12 one-terabyte dries). (See Cong D.)

    Note that by the time a solution has approximately 10 dries, the performance bottleneck has

    generally already moed to the network. (See Cong D.)

    In order to maximize performance, we can upgrade from a 1-Gigabit Ethernet network to a

    10-Gigabit Ethernet network. Note that performance in this example is more than 25 times what

    we saw in the baseline. This is eidenced by an increase in performance from 200 MB/s in the

    baseline conguration to 5,000 MB/s. (See Cong E.)

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    5www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    6/12

    As you will note, the power of the scale-out model is

    that both capacity and performance can scale linearly

    to meet requirements. It is not necessary to know what

    performance leels will be needed two or three years out.Instead, congurations can be easily adjusted as the need

    demands.

    While the aboe discussion was using round, theoretical

    numbers, actual performance tests have proven this lin-

    ear scaling. The results, illustrated in Figure 3, show write

    throughput scaling linearly from 100 MB/s on one serer

    (e.g., storage node) to 800 MB/s (on eight systems) in a

    1 GbE enironment. However,onanInnibandnetwork,

    e have seen rte throughut scale from 1.5 GB/s (one

    system) to 12 GB/s (eght systems).

    We hae experience with Red Hat Storage Serer beingdeployed in a multitude of scale-out scenarios. For

    example, Red Hat Storage Serer has been successfully

    deployed in multi-petabyte archial scenarios, where the

    goal was moderate performance in the

  • 7/30/2019 Redhat Gluster Storage-Arch

    7/12

    Figure Modular and stackable architecture

    Red Hat Storage Server is a

    complete storage stack

    in userspace

    Everything is a volume

    Volume = C shared-object library

    Stack volumes to create

    configuration

    Customize to match uniqueworkload needs

    Open APIs

    Volume-to-volume andPlatform-to-world

    Development is like application

    programming

    Rapid time to market

    Third party development

    synchronous replication as well as asynchronous long-distance replication ia

    Red Hat Geo-Replication. In essence, by taking a lesson from micro-kernel architectures, we

    hae designed Red Hat Storage Serer to delier a complete storage operating system stack in

    user space.

    User sace

    Unlike traditional le systems, Red Hat

    Storage Serer operates in user space.

    This makes installing and upgrading

    Red Hat Storage Serer signicantly eas-

    ier. And it means that users who choose to

    deelop on top of Red Hat Storage Serer

    need only hae general C programming

    skills, not specialized kernel expertise.

    Modular,stackablearchitecture

    Red Hat Storage Serer is designed using

    a modular and stackable architecture

    approach. To congure Red Hat Storage

    Serer for highly specialized eniron-

    ments (i.e., large number of large ies,

    huge numbers of ery small les, eni-

    ronments with cloud storage, arious

    transport protocols, etc.), it is a simple

    matter of including or excluding particular

    modules.

    For the sake of stability, certain options

    should not be changed once the system is

    in use (for example, one would not remoe

    a function such as replication if high avail-

    ability was a desired functionality).

    Data stored n natve formats

    With Red Hat Storage Serer, data is stored on disk using natie formats (i.e. XFS). Red Hat

    Storage Serer has implemented arious self-healing processes for data. As a result, the system

    is extremely resilient. Furthermore, les are naturally readable without Red Hat Storage Serer.

    If a customer chooses to migrate away from Red Hat Storage Serer, their data is still com-

    pletely usable without any required modications or data migration.

    No metadata th the elastc hash algorthm

    In a scale-out system, one of the biggest challenges is keeping track of the logical and physical

    location of data (location metadata). Most distributed systems sole this problem by creating a

    separate index with le names and location metadata. Unfortunately, this creates both a central

    point of failure and a huge performance bottleneck. As traditional systems add more les, more

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    7www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    8/12

    serers, or more disks, the central metadata serer becomes a performance chokepoint. This

    becomes an een bigger challenge if the workload consists primarily of small les and the ratio

    of metadata to data increases.

    Unlike other storage systems with a distributed le system, Red Hat Storage Serer does not

    create, store, or use a separate index of metadata in any way. Instead, Red Hat Storage Serer

    places and locates les algorithmically. All storage node serers in the cluster hae the intel-

    ligence to locate any piece of data without looking it up in an index or querying another serer.

    All a storage node serer needs to do to locate a le is to know the pathname and lename and

    apply the algorithm. This fully parallelizes data access and ensures linear performance scaling.

    The performance, aailability, and stability adantages of not using metadata are signicant

    and, in some cases, dramatic.

    Red Hat Storage Server global namesace technology

    While many extol the irtues of their namespace capability as enabling easier management of

    network storage, Red Hat Storage Serer global namespace technology enables an een greatercapability and has enabled innoatie IT solutions that are changing the way Red Hat customers

    leerage cloud technology and legacy applications.

    in the rvate cloud or datacenter

    Red Hat Storage Serer global namespace technology enables Red Hat customers who lin-

    early scale their Red Hat Storage Serer NAS enironments to tie together hundreds of storage

    nodes and associated les into one global namespace. The result is one common mount point

    for a large pool of network-attached storage. In some cases, where the Red Hat Storage Serer

    natie access client is used in place of NFS3, multiple parallel access is supported by the le

    system.

    In the public cloud, Red Hat Storage Serer global namespace technology enables multiple com-

    pute instances and storage to be congured in a massiely scaleable pool of network-attached

    storage. For example, within the AWS cloud, Red Hat Storage Serer global namespace enables

    the pooling of large quantities of Elastic Compute Cloud (EC2) instances and Elastic Block

    Storage (EBS) to form a NAS in the AWS cloud. EC2 instances (storage nodes) and EBS can be

    added non-disruptiely resulting in linear scaling of both performance and capacity. Being able

    to deploy NAS in the cloud and run POSIX-compliant applications within the cloud using Red Hat

    Storage Serer le storage accelerates cloud adoption and enables new and creatie enterprise

    customer business solutions.

    Standardbasedleandobjectstore

    With Red Hat Storage Serer, all standard industry clients for le and object access are sup-ported including NFS, CIFS/SMB, and OpenStack swift. Applications accessing storage do not

    hae to deal with proprietary clients and closed interfaces ensuring application portability and

    no endor lock-in.

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    8www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    9/12

    Red Hat StoRage SeRveR advanced topicS

    Elastc volume management

    Gien the elastic hashing approach assigns les to logical olumes, a question often arises: How

    do you assign logical olumes to physical olumes?

    volume management is completely elastic. Storage olumes are abstracted from the underly-

    ing hardware and can grow, shrink, or be migrated across physical storage nodes as systems as

    necessary. Storage node serers can be added or remoed on-the-y with data automatically

    rebalanced across the cluster. Data is always online, and there is no application downtime. File

    system conguration changes are accepted at runtime and propagated throughout the cluster

    allowing changes to be made dynamically as workloads uctuate or for performance tuning.

    Renamingormovingles

    If a le is renamed, the hashing algorithm will obiously result in a different alue, which willfrequently result in the le being assigned to a different logical olume, which might itself be

    located in a different physical location. Since les can be large and rewriting and moing les

    is generally not a real-time operation, Red Hat Storage Serer soles this problem by creating

    a pointer at the time a le (or set of les) is renamed. Thus, a client looking for a le under the

    new name would look in a logical olume and be redirected to the old logical olume location.

    As background processes result in les ultimately being moed, the pointers are then remoed.

    Similarly, if les need to be moed or reassigned (e.g., if a disk becomes hot or degrades in per-

    formance), reassignment decisions can be made in real-time, while the physical migration of les

    can happen as a background process.

    Unified file and object

    Red Hat Storage Serer unies NAS and object storage technology. It proides a system for

    data storage that enables users to access the same data, both as an object and as a le, thus

    simplifying management and controlling storage costs. Red Hat Storage Serer based on

    GlusterFS already allows users to store and retriee data as les using traditional le system/

    NAS interfaces like NFS, CIFS, and natie Fuse. In Red Hat Storage Serer 2.0 object access

    has been added (based on OpenStack swift), which allows users to store and retriee content

    through a simple ReST (Representational State Transfer) API as objects. This is ery useful

    when there is a need for one interface to store data (as object) and use a separate interface to

    retriee and process the same data (as le).

    HigH availability

    N-ay local synchronous relcaton

    Generally speaking, we recommend the use of mirroring (2, 3, or n-way) to ensure aailabil-

    ity. In this scenario, each storage node serers le data is replicated to another storage node

    serer using synchronous writes. The benets of this strategy are full fault-tolerance; failure of

    a single storage serer is completely transparent to Red Hat Storage Serer clients. In addition,

    reads are spread across all members of the mirror. Using Red Hat Storage Serer, there can be

    an unlimited number of storage node members in a mirror. While the elastic hashing algorithm

    assigns les to unique logical olumes, Red Hat Storage Serer ensures that eery le is located

    on at least two different storage system serer nodes.

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    9www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    10/12

    Geo-re long dstance asynchronous relcaton

    For long distance data replication requirements, Red Hat Storage Serer supports Geo-Rep long

    distance replication. Customers can congure storage serer nodes and Red Hat Storage Sererto asynchronously replication data oer ast geographical distances.

    Relcaton n the rvate cloud/datacenter, ublc cloud, and hybrdcloud envronments

    Both n-way synchronous and Geo-Rep asynchronous data replication are supported in the pri-

    ate cloud/datacenter, public cloud, and hybrid cloud enironments.

    Within the Amazon Web Serices (AWS) cloud, Red Hat Storage Serer supports n-way synchro-

    nous replication across aailability zones and Geo-Rep asynchronous replication across AWS

    Regions. In fact, Red Hat Storage Serer is the only way to ensure high aailability for NAS stor-

    age within the AWS infrastructure.

    While Red Hat Storage Serer offers software-leel disk and serer redundancy at the storage

    node serer leel, in some cases we also recommend the use of hardware RAID (e.g., RAID 5 or

    6) within indiidual storage system serers to proide an additional leel of protection where

    required. Red Hat solutions architects can adise you on the best form of storage leel and stor-

    age node leel data protection and replication strategies for your specic requirements.

    conclUSion

    By deliering increased scalability, exibility, affordability, performance, and ease-of-use in con-

    cert with reduced acquisition and maintenance costs, Red Hat Storage Serer is a reolution-

    ary step forward in data management. Multiple adanced architectural design decisions make

    it possible for Red Hat Storage Serer to delier great performance, greater exibility, greater

    manageability, and greater resilience at a signicantly reduced oerall cost. The complete elimi-

    nation of location metadata ia the use of the Elastic Hashing Algorithm is at the heart of many

    of Red Hat Storage Serers fundamental adantages, including its remarkable resilience, which

    dramatically reduces the risk of data loss, data corruption, and data becoming unaailable.

    Red Hat Storage Serer can be deployed in the priate could or datacenter ia Red Hat Storage

    Serer Serer for On-premise software, an ISO image installed on commodity serer and stor-

    age hardware, resulting in a powerful, turn-key, massiely scalable, and highly aailable NAS

    enironment. Additionally, Red Hat Storage Serer can be deployed in the public cloud ia

    Red Hat Storage Serer for Public Cloud (e.g., within the AWS cloud) and deliers all the fea-

    tures and functionally possible in the priate cloud/datacenter to the public cloudessentially

    massiely scalable and highly aailable NAS in the cloud.

    To start a functional trial of Red Hat Storage Serer, isit www.redhat.com. To speak with a

    Red Hat representatie about how to sole your storage challenges, call 1-888-REDHAT-1.

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    10www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    11/12

    gloSSaRy

    Distributedlesystem: Any le system that allows access to les from multiple hosts sharing

    ia a computer network.

    Metadata: Data proiding information about one or more other pieces of data.

    Namespace: An abstract container or enironment created to hold a logical grouping of unique

    identiers or symbols. Each Red Hat Storage Serer cluster exposes a single namespace as a

    POSIX mount point that contains eery le in the cluster.

    POSIX(PortableOperatingSystemInterface[forUNIX]):A family of related standards speci-

    ed by the IEEE to dene the application programming interface (API), along with shell and

    utilities interfaces for software compatible with ariants of the UNIX operating system. Red Hat

    Storage Serer exports a fully POSIX-compliant le system.

    RAID(RedundantArrayofInexpensiveDisks):A technology that proides increased storagereliability through redundancy, combining multiple low-cost, less-reliable disk dries compo-

    nents into a logical unit where all dries in the array are interdependent.

    Userspace: Applications running in user space dont directly interact with hardware, instead

    using the kernel to moderate access. Userspace applications are generally more portable than

    applications in kernel space. Red Hat Storage Serer is a user space application.

    N-wayreplication: Local synchronous data replication typically deployed across campus or

    Amazon Web Serices Aailability Zones.

    Geo-Rep(Replication): Long-distance replication typically deployed from one priate cloud/

    datacenter to another, or one cloud region (i.e. AWS Region) to another located further than a

    50 mile radius of the primary data location.

    wHiTEpApER AN INTRODUCTION TO RED HAT STORAGE SERvER ARCHITECTURE

    11www.rh.m

  • 7/30/2019 Redhat Gluster Storage-Arch

    12/12

    SALES AND iNQUiRiES LATiN AMERiCA

    +54 11 4329 7300

    www.latam.redhat.com

    [email protected]

    NORTH AMERiCA

    1888REDHAT1

    www.redhat.com

    EUROpE, MiDDLE EAST

    AND AFRiCA

    00800 7334 2835

    www.europe.redhat.com

    [email protected]

    ASiA pACiFiC

    +65 6490 4200

    www.apac.redhat.com

    [email protected]

    Red Hat was founded in 1993 and is headquartered in Raleigh, NC. Today, with more than 70

    ofces around the world, Red Hat is the largest publicly traded technology company fully com-

    mitted to open source. That commitment has paid off oer time, for us and our customers, pro-

    ing the alue of open source software and establishing a iable business model built around theopen source way.

    ABOUT RED HAT

    Copyright 2012 Red Hat, Inc. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, and RHCE are trademarks of

    R d H t I i t d i th U S d th t i Li

    i th i t d t d k f Li T ld i th U S d th t i

    www.rh.m#9535877_0612