San-xiv Storage Excellent Concepts

390
ibm.com/redbooks Front cover IBM XIV Storage System: Concepts, Architecture, and Usage Bertrand Dufrasne Giacomo Chiapparini Attila Grosz Mark Kremkus Lisa Martinez Markus Oscheka Guenter Rebmann Christopher Sansone Explore the XIV concepts and architecture Install XIV in your environment Understand usage guidelines and scenarios

description

this is good for XIV storage array

Transcript of San-xiv Storage Excellent Concepts

  • ibm.com/redbooks

    Front cover

    IBM XIV Storage System:Concepts, Architecture, and Usage

    Bertrand DufrasneGiacomo Chiapparini

    Attila GroszMark Kremkus

    Lisa MartinezMarkus Oscheka

    Guenter RebmannChristopher Sansone

    Explore the XIV concepts and architecture

    Install XIV in your environment

    Understand usage guidelines and scenarios

  • International Technical Support Organization

    IBM XIV Storage System:Concepts, Architecture, and Usage

    January 2009

    SG24-7659-00

  • Copyright International Business Machines Corporation 2009. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.

    First Edition (January 2009)This edition applies to the XIV Storage System (2810-A14) with Version 10.0.0 of the XIV Storage System software (5639-XXA) and the XIV Storage System GUI and Extended Command Line Interface (XCLI) Version 2.2.43.

    Note: Before using this information and the product it supports, read the information in Notices on page ix.

  • Contents

    Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiThe team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

    Chapter 1. IBM XIV Storage System overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.1.1 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Key design points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 The XIV Storage System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Chapter 2. XIV logical architecture and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Massive parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2.1 Grid architecture over monolithic architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Logical parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.3 Full storage virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3.1 Logical system concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.2 System usable capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.3 Storage Pool concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.4 Capacity allocation and thin provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.4 Reliability, availability, and serviceability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4.1 Resilient architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4.2 Rebuild and redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4.3 Minimized exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    Chapter 3. XIV physical architecture and components . . . . . . . . . . . . . . . . . . . . . . . . . 453.1 IBM XIV Storage System Model A14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.1.1 Hardware characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 IBM XIV hardware components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.2.1 The rack and the UPS modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2.2 Data and Interface Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2.3 SATA disk drives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2.4 The patch panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2.5 Interconnection and switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2.6 Hardware needed by support and IBM SSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    3.3 Redundant hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.1 Power redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.2 Switch/interconnect redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    3.4 Hardware parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    Chapter 4. Physical planning and installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2 Ordering IBM XIV hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    4.2.1 Feature codes and hardware configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.2 2810-A14 Capacity on Demand ordering options . . . . . . . . . . . . . . . . . . . . . . . . . 66 Contents iii

  • 4.3 Physical planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.3.1 Site requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    4.4 Basic configuration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.5 Network connection considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    4.5.1 Fibre Channel connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.5.2 iSCSI connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.5.3 Mixed iSCSI and Fibre Channel host access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.5.4 Management connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.5.5 Mobile computer ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.5.6 Remote access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4.6 Remote Copy connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.6.1 Remote Copy links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.6.2 Remote Target Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    4.7 Planning for growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.7.1 Future requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    4.8 IBM XIV installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.8.1 Physical installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.8.2 Basic configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.8.3 Complete the physical installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    Chapter 5. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.1 IBM XIV Storage Management software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    5.1.1 XIV Storage Management software installation . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Managing the XIV Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    5.2.1 Launching the Management Software GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2.2 Log on to the system with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    5.3 Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.3.1 Managing Storage Pools with XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.3.2 Manage Storage Pools with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    5.4 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.4.1 Managing volumes with the XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.4.2 Managing volumes with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    5.5 Host definition and mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.1 Managing hosts and mappings with XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.2 Managing hosts and mappings with XCLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    5.6 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    Chapter 6. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.1 Physical access security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.2 User access security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    6.2.1 Role Based Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.2.2 Manage user rights with the GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.2.3 Managing users with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.2.4 LDAP and Active Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    6.3 Password management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.4 Managing multiple systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.5 Event logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

    6.5.1 Viewing events in the XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1426.5.2 Viewing events in the XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.5.3 Define notification rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    Chapter 7. Host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.1 Connectivity overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148iv IBM XIV Storage System: Concepts, Architecture, and Usage

    7.1.1 Module, patch panel, and host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

  • 7.1.2 FC and iSCSI simplified access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.1.3 Remote Mirroring connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    7.2 Fibre Channel (FC) connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527.2.1 Preparation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527.2.2 FC configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1537.2.3 Zoning and VSAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.2.4 Identification of FC ports (initiator/target) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1567.2.5 IBM XIV logical FC maximum values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    7.3 iSCSI connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1597.3.1 iSCSI configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607.3.2 Link aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1627.3.3 IBM XIV Storage System iSCSI setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1627.3.4 Identifying iSCSI ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1647.3.5 Using iSCSI hardware or software initiator (recommendation) . . . . . . . . . . . . . . 1677.3.6 IBM XIV logical iSCSI maximum values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1687.3.7 Boot from iSCSI target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

    7.4 Logical configuration for host connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.4.1 Required generic information and preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.4.2 Prepare for a new host: XIV GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1737.4.3 Prepare for a new host: XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

    Chapter 8. OS-specific considerations for host connectivity . . . . . . . . . . . . . . . . . . . 1798.1 Attaching Microsoft Windows host to XIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

    8.1.1 Windows host FC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818.1.2 Windows host iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1858.1.3 Management volume LUN 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

    8.2 Attaching AIX hosts to XIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1938.2.1 AIX host FC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1948.2.2 AIX host iSCSI configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1998.2.3 Management volume LUN 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

    8.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2048.3.1 Support issues that distinguish Linux from other operating systems . . . . . . . . . 2048.3.2 FC and multi-pathing for Linux using PROCFS . . . . . . . . . . . . . . . . . . . . . . . . . 2048.3.3 FC and multi-pathing for Linux using SYSFS . . . . . . . . . . . . . . . . . . . . . . . . . . . 2108.3.4 Linux iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

    8.4 Sun Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2188.4.1 FC and multi-pathing configuration for Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . 2188.4.2 iSCSI configuration for Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

    8.5 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2238.5.1 FC and multi-pathing for VMware ESX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2238.5.2 ESX Server iSCSI configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

    Chapter 9. Performance characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2359.1 Performance concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    9.1.1 Full disk resource utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2369.1.2 Caching considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2369.1.3 Data mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2379.1.4 SATA drives compared to FC drives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2389.1.5 Snapshot performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2389.1.6 Remote Mirroring performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

    9.2 Best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2399.2.1 Distribution of connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2399.2.2 Host configuration considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Contents v

  • 9.2.3 XIV sizing validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2409.3 Performance statistics gathering with XIV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

    9.3.1 Using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2409.3.2 Using the XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

    Chapter 10. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24910.1 System monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

    10.1.1 Monitoring with the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25010.1.2 Monitoring with XCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25510.1.3 SNMP-based monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26010.1.4 XIV SNMP setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26210.1.5 Using IBM Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

    10.2 Call Home and remote support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27310.2.1 Setting up Call Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27310.2.2 Remote support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28110.2.3 Repair flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

    Chapter 11. Copy functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28511.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

    11.1.1 Architecture of snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28611.1.2 Volume snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28811.1.3 Consistency Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30011.1.4 Snapshot with Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31011.1.5 Windows Server 2003 Volume Shadow Copy Service . . . . . . . . . . . . . . . . . . . 31111.1.6 MySQL database backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

    11.2 Volume Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31711.2.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31811.2.2 Performing a Volume Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31811.2.3 Creating an OS image with Volume Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

    Chapter 12. Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32312.1 Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

    12.1.1 Remote Mirror overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32412.1.2 Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32512.1.3 Initial setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32512.1.4 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33112.1.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33212.1.6 Disaster Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33312.1.7 Reads and writes in a Remote Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33312.1.8 Role switchover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33312.1.9 Remote Mirror step-by-step illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33412.1.10 Recovering from a failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

    Chapter 13. Data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35313.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35413.2 Handling I/O requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35513.3 Data migration stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

    13.3.1 Initial configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35713.3.2 Testing the configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35913.3.3 Activate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35913.3.4 Migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35913.3.5 Synchronization complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36013.3.6 Delete the data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360vi IBM XIV Storage System: Concepts, Architecture, and Usage

  • Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Contents vii

  • viii IBM XIV Storage System: Concepts, Architecture, and Usage

  • Notices

    This information was developed for products and services offered in the U.S.A.

    IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

    IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

    The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

    This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

    Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

    IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

    Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

    This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

    COPYRIGHT LICENSE:

    This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. Notices ix

  • TrademarksIBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml

    The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

    AIX 5LAIXAlertsBladeCenterDB2 Universal DatabaseDB2DS4000

    DS6000DS8000i5/OSIBMNetViewPOWERRedbooks

    Redbooks (logo) System StorageSystem xSystem zTivoliTotalStorageXIV

    The following terms are trademarks of other companies:

    Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries, or both.

    Snapshot, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries.

    SUSE, the Novell logo, and the N logo are registered trademarks of Novell, Inc. in the United States and other countries.

    Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates.

    QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States.

    VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions.Java, MySQL, RSM, Solaris, Sun, Sun StorEdge, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

    Active Directory, ESP, Microsoft, MS, SQL Server, Windows Server, Windows Vista, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

    Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.

    UNIX is a registered trademark of The Open Group in the United States and other countries.

    Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

    Mozilla, Firefox, as well as the Firefox logo are owned exclusively by the Mozilla Foundation . All rights in the names, trademarks, and logos of the Mozilla Foundation, including without limitation

    Other company, product, or service names may be trademarks or service marks of others. x IBM XIV Storage System: Concepts, Architecture, and Usage

  • Preface

    This IBM Redbooks publication describes the concepts, architecture, and implementation of the IBM XIV Storage System (2810-A14).The XIV Storage System is designed to be a scalable enterprise storage system that is based upon a grid array of hardware components. It can attach to both Fibre Channel Protocol (FCP) and IP network Small Computer System Interface (iSCSI) capable hosts. This system is a good fit for clients who want to be able to grow capacity without managing multiple tiers of storage to maximize performance and minimize cost. The XIV Storage System is well suited for mixed or random access workloads, such as the processing of transactions, video clips, images, and e-mail, and industries, such as telecommunications, media and entertainment, finance, and pharmaceutical, as well as new and emerging workload areas, such as Web 2.0.

    The first chapters of this book provide details about several of the unique and powerful concepts that form the basis of the XIV Storage System logical and physical architecture. We explain how the system was designed to eliminate direct dependencies between the hardware elements and the software that governs the system.

    In subsequent chapters, we explain the planning and preparation tasks that are required to deploy the system in your environment, which is followed by a step-by-step procedure describing how to configure and administer the system. We provide illustrations about how to perform those tasks by using the intuitive, yet powerful XIV Storage Manager GUI or the Extended Command Line Interface (XCLI). We explore and illustrate the use of snapshots and Remote Copy functions.

    The book also outlines the requirements and summarizes the procedures for attaching the system to various host platforms.

    This IBM Redbooks publication is intended for those people who want an understanding of the XIV Storage System and also targets readers who need detailed advice about how to configure and use the system.

    The team that wrote this bookThis book was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center.

    Bertrand Dufrasne is an IBM Certified I/T Specialist and Project Leader for System Storage disk products at the International Technical Support Organization, San Jose Center. He has worked at IBM in various I/T areas. He has authored many IBM Redbooks publications and has also developed and taught technical workshops. Before joining the ITSO, he worked for IBM Global Services as an Application Architect. He holds a degree in Electrical Engineering.

    Giacomo Chiapparini is an IBM accredited Senior IT Specialist for Storage Solutions in the Field Sales Support Team Switzerland. He has over 10 years of experience with IBM in storage hardware and software support, design, deployment, and management. He has developed and taught IBM classes in all areas of storage hardware. Before joining the STG FTSS team in 2008, he was the storage services technical expert in IBM Switzerland. Copyright IBM Corp. 2009. All rights reserved. xi

  • Attila Grosz is a Field Technical Sales Specialist at the IBM Systems and Technology Group in Budapest, Hungary. He is a member of the CEMAAS STG Systems Architect team. He is responsible for System Storage presales technical support within STG. He has 10 years of experience with storage in Open Systems environments, including AIX, Linux, and Windows. Attila has worked at IBM since 1999, in various divisions. He holds a Communication-Technical Engineering degree from the University of Godollo, Hungary.

    Mark Kremkus is a Senior Accredited I/T Specialist based in Austin, Texas. He has seven years of experience providing consultative sales support for the full spectrum of IBM Storage products. His current area of focus involves creating and presenting Disk Magic studies for the full family of DS4000, DS6000, DS8000, and SAN Volume Controller (SVC) products across a broad range of open and mainframe environments. He holds a Bachelor of Science degree in Electrical Engineering from Texas A&M University, and graduated with honors as an Undergraduate Research Fellow in MRI technology.

    Lisa Martinez is a Senior Software Engineer working in the DS8000 System Test Architecture in Tucson, Arizona. She has nine years of experience in Enterprise Disk Test. She holds a Bachelor of Science degree in Electrical Engineering from the University of New Mexico and a Computer Science degree from New Mexico Highlands University. Her areas of expertise include Open Systems and IBM System Storage DS8000 including Copy Services, with recent experience in System z.

    Markus Oscheka is an IT Specialist for Proof of Concepts and Benchmarks in the Enterprise Disk High End Solution Europe team in Mainz, Germany. His areas of expertise include setup and demonstration of IBM System Storage and TotalStorage solutions in various environments, such as AIX, Linux, Windows, Hewlett-Packard UNIX (HP-UX), and Solaris. He has worked at IBM for seven years. He has performed many Proof of Concepts with Copy Services on DS6000/DS8000, as well as Performance-Benchmarks with DS4000/DS6000/DS8000. He has written extensively in various IBM Redbooks publications and acts as the co-project lead for these IBM Redbooks publications, including DS6000/DS8000 Architecture and Implementation and DS6000/DS8000 Copy Services. He holds a degree in Electrical Engineering from the Technical University in Darmstadt.

    Guenter Rebmann is an IBM Certified Specialist for High End Disk Solutions, working for the EMEA DASD Hardware Support Center in Mainz, Germany. Guenter has more than 20 years of experience in large system environments and storage hardware. Currently, he provides support for the EMEA Regional FrontEnd Support Centers with High End Disk Subsystems, such as the ESS, DS8000, and previous High End Disk products. Since 2004, he has been a member of the Virtual EMEA Team (VET) Support Team.Christopher Sansone is a performance analyst located in IBM Tucson. He currently works with DS8000, DS6000, and XIV storage products to generate marketing material and assist with performance issues related to these products. Prior to working in performance, he worked in several development organizations writing C code for Fibre Channel storage devices, including DS8000, ESS 800, TS7740, and Virtual Tape Server (VTS). Christopher holds a Masters degree in Electrical Engineering from NTU and a Bachelors degree in Computer Engineering from Virginia Tech.xii IBM XIV Storage System: Concepts, Architecture, and Usage

  • Figure 1 The team: Lisa, Attila (back), Christopher, Bert, Markus, Guenter, Mark, and Giacomo

    Special thanks to:

    John BynumWorldwide Technical Support ManagementIBM US, San Jose

    For their technical advice and support, many thanks to:

    Rami ElronAviad Offer

    Thanks to the following people for their contributions to this project:Barbara ReedDarlene RossHelen BurtonJuan YanesJohn CherbiniRichard HeffelJim SedgwickBrian ShermanDan BradenRosemary McCutchenKip WagnerMaxim KooserIzhar SharonMelvin FarrisDietmar Dausner Preface xiii

  • Alexander WarmuthRalf WohlfarthWenzel KalabzaTheeraphong Thitayanun

    Become a published authorJoin us for a two- to six-week residency program. Help write a book dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You will have the opportunity to team with IBM technical professionals, IBM Business Partners, and Clients.

    Your efforts will help increase product acceptance and client satisfaction. As a bonus, you will develop a network of contacts in IBM development labs, and increase your productivity and marketability.

    Find out more about the residency program, browse the residency index, and apply online at:ibm.com/redbooks/residencies.html

    Comments welcomeYour comments are important to us.

    We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review IBM Redbooks publications form found at:

    ibm.com/redbooks

    Send your comments in an e-mail to:[email protected]

    Mail your comments to:IBM Corporation, International Technical Support OrganizationDept. HYTD Mail Station P0992455 South RoadPoughkeepsie, NY 12601-5400xiv IBM XIV Storage System: Concepts, Architecture, and Usage

  • Chapter 1. IBM XIV Storage System overview

    IBM XIV Storage System is an innovative, fully scalable enterprise storage system that is based on a grid of standard, off-the-shelf hardware components.

    This chapter provides a high level overview of the IBM XIV Storage System.

    1 Copyright IBM Corp. 2009. All rights reserved. 1

  • 1.1 OverviewThe XIV Storage System architecture is designed to deliver performance, scalability, and ease of management while harnessing the high capacity and cost benefits of Serial Advanced Technology Attachment (SATA) drives. The system employs off-the-shelf products as opposed to traditional offerings, which need more expensive components that use proprietary designs.

    1.1.1 System componentsThe XIV Storage System is comprised of the following components, which are visible in Figure 1-1: Host Interface Modules consisting of six modules, each containing 12 SATA Disk Drives Data Modules made up of nine modules, each containing 12 SATA Disk Drives A Uninterruptible Power Supply (UPS) module complex made up of three redundant UPS

    units Two Ethernet Switches and an Ethernet Switch Redundant Power Supply (RPS) A Maintenance Module An Automatic Transfer Switch (ATS) for external power supply redundancy A modem, connected to the Maintenance module for externally servicing the system

    Figure 1-1 IBM XIV Storage System 2810-A14 components: Front and rear view2 IBM XIV Storage System: Concepts, Architecture, and Usage

  • All of the modules in the system are linked through an internal redundant Gigabit Ethernet network, which enables maximum bandwidth utilization and is resilient to at least any single component failure.

    The system and all of its components come pre-assembled and wired in a lockable rack.

    1.1.2 Key design pointsNext, we describe the key aspects of the XIV Storage System architecture.

    Massive parallelismThe system architecture ensures full exploitation of all system components. Any I/O activity involving a specific logical volume in the system is always inherently handled by all spindles. The system harnesses all storage capacity and all internal bandwidth, and it takes advantage of all available processing power, which is as true for host-initiated I/O activity as it is for system-initiated activity, such as rebuild processes and snapshot generation. All disks, CPUs, switches, and other components of the system contribute at all times.

    Workload balancing The workload is evenly distributed over all hardware components at all times. All disks and modules are utilized equally, regardless of access patterns. Despite the fact that applications might access certain volumes more frequently than other volumes, or access certain parts of a volume more frequently than other parts, the load on the disks and modules will be balanced perfectly.

    Pseudo-random distribution ensures consistent load-balancing even after adding, deleting, or resizing volumes, as well as adding or removing hardware. This balancing of all data on all system components eliminates the possibility of a hot-spot being created.

    Self-healingProtection against double disk failure is provided by an efficient rebuild process that brings the system back to full redundancy in minutes. In addition, the XIV Storage System extends the self-healing concept, resuming redundancy even after failures in components other than disks.

    True virtualizationUnlike other system architectures, storage virtualization is inherent to the basic principles of the XIV Storage System design. Physical drives and their locations are completely hidden from the user, which dramatically simplifies storage configuration, letting the system lay out the users volume in the optimal way. The automatic layout maximizes the systems performance by leveraging system resources for each volume, regardless of the users access patterns.

    Thin provisioningThe system enables thin provisioning, which is the capability to allocate storage to applications on a just-in-time and as needed basis, allowing significant cost savings compared to traditional provisioning techniques. The savings are achieved by defining a logical capacity that is larger than the physical capacity. This capability allows users to improve storage utilization rates, thereby significantly reducing capital and operational expenses by allocating capacity based on total space consumed, rather than total space allocated.Chapter 1. IBM XIV Storage System overview 3

  • We discuss these key design points and underlying architectural concepts in detail in Chapter 2, XIV logical architecture and concepts on page 7.

    1.1.3 The XIV Storage System Software The IBM XIV software provides the functions of the system, which include: Support for 16 000 snapshots utilizing advanced writable snapshot technology

    The snapshot capabilities within the XIV Storage System Software utilize a metadata, redirect-on-write design that allows snapshots to occur in a subsecond time frame with little performance overhead. Up to 16 000 full or differential copies can be taken. Any of the snapshots can be made writable, and then snapshots can be taken of the newly writable snapshots. Volumes can even be restored from these writable snapshots.

    Support for synchronous mirroring to another XIV 2810-A14 deviceSynchronous mirroring can be performed over Fibre Channel (FC) or IP network Small Computer System Interface (iSCSI) connections and are protocol agnostic (iSCSI volumes can be mirrored over FC and the reverse is also true). It is also possible to test the secondary mirror site without stopping the mirroring.

    Support of thin provisioningThin provisioning allows storage administrators to define logical volume sizes that are larger than the physical capacity of the system. Unlike other approaches, the physical capacity only needs to be larger than the actual written data, not larger than the logical volumes. Physical capacity needs to be increased only when actual written data increases.

    Support for in-band data migration of heterogeneous storageThe XIV Storage System Software is also capable of acting as a host, gaining access to volumes on an existing system. The machine is then configured as a proxy to answer requests between the current hosts and the current storage while migrating all existing data in the background. In addition, XIV supports thick-to-thin data migration, which allows the XIV Storage System to reclaim any allocated space that is not occupied by actual data. In other words, it automatically shrinks volumes upon migrating data from a non-XIV system, offering great power and space savings.

    Robust user auditing with access control listsThe XIV Storage System Software offers the capability for robust user auditing with access control lists (ACLs) in order to provide more control and historical information.

    IBM XIV Storage Manager GUIIBM XIV Storage Manager acts as the management console for the XIV Storage System. A simple and intuitive GUI enables storage administrators to manage and monitor all system aspects easily, with almost no learning curve.4 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Figure 1-2 The IBM XIV Storage Manager GUI

    IBM XIV Storage XCLIThe XIV Storage System offers also a comprehensive set of Extended Command Line Interface (XCLI) commands to configure and monitor the system. The XCLI can be used in a shell environment to interactively configure the system or as part of a script to perform lengthy and complex tasks.Chapter 1. IBM XIV Storage System overview 5

  • 6 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Chapter 2. XIV logical architecture and concepts

    This chapter elaborates on several of the XIV underlying design and architectural concepts that were introduced in the executive overview chapter.

    The topics described in this chapter include: Architectural elements Parallelism Virtualization Data distribution Rebuild redundancy Self-healing and resiliency Caching Thin provisioning

    2 Copyright IBM Corp. 2009. All rights reserved. 7

  • 2.1 Architectural overviewThe XIV Storage System architecture incorporates a variety of features designed to uniformly distribute data across key internal resources. This unique data distribution method fundamentally differentiates the XIV Storage System from conventional storage subsystems, thereby effecting numerous availability, performance, and management benefits across both physical and logical elements of the system.

    Hardware elementsIn order to convey the conceptual principles that comprise the XIV Storage System architecture, it is useful to first to provide a glimpse of the physical infrastructure.

    The primary components of the XIV Storage System are known as modules. Modules provide processing, cache, and host interfaces and are based on standard Intel and Linux systems. They are redundantly connected to one another through an internal switched Ethernet fabric. All of the modules work together concurrently as elements of a grid architecture, and therefore, the system harnesses the powerful parallelism inherent to a distributed computing environment, as shown in Figure 2-1. We discuss the grid architecture in 2.2, Massive parallelism on page 10.

    Figure 2-1 IBM XIV Storage System major hardware elements

    Data ModulesAt a conceptual level, the Data Modules function as the elementary building blocks of the system, providing physical capacity, processing power, and caching, in addition to advanced system-managed services that comprise the systems internal operating environment. The equivalence of hardware across Data Modules and the Data Module ability to share and manage system software and services are key elements of the physical architecture, as depicted in Figure 2-2 on page 9.

    UPS units

    Data Modules

    Interface/Data Modules

    Ethernet Switches

    Although externally similar in appearance, Data and Interface/Data Modules differ in functions, interfaces, and in how they are interconnected.8 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Interface ModulesFundamentally, Interface Modules are equivalent to Data Modules in all aspects, with the following exceptions: In addition to disk, cache, and processing resources, Interface Modules are designed to

    include both Fibre Channel and iSCSI interfaces for host system connectivity as well as Remote Mirroring. Figure 2-2 conceptually illustrates the placement of Interface Modules within the topology of the XIV IBM Storage System architecture.

    The system services and software functionality associated with managing external I/O reside exclusively on the Interface Modules.

    Ethernet switchesThe XIV Storage System contains a redundant switched Ethernet fabric that conducts both data and metadata traffic between the modules. Traffic can flow in the following ways: Between two Interface Modules Between an Interface Module and a Data Module Between two Data Modules

    Figure 2-2 Architectural overview

    Note: It is important to realize that Data Modules and Interface Modules are not connected to the Ethernet switches in the same way. For further details about the hardware components, refer to Chapter 3, XIV physical architecture and components on page 45.

    Note: Figure 2-2 depicts the conceptual architecture of the system only; do not misinterpret the number of connections and such as a precise hardware layout.

    Interface and Data Modules are connected to each other through an internal IP switched network.Chapter 2. XIV logical architecture and concepts 9

  • 2.2 Massive parallelismThe concept of parallelism pervades all aspects of the XIV Storage System architecture by means of a balanced, redundant data distribution scheme in conjunction with a pool of distributed (or grid) computing resources. In order to explain the principle of parallelism further, it is helpful to consider the ramifications of both the hardware and software implementations independently. We subsequently examine virtualization principles in 2.3, Full storage virtualization on page 14.

    2.2.1 Grid architecture over monolithic architecture Conceptually, while both monolithic and grid architectures are capable of accomplishing identical tasks, the topologies are fundamentally divergent by definition.

    For purposes of this discussion, first consider the concept of a computing resource, which we define as an independent entity containing all components necessary to process, store, transmit, and receive data: Monolithic architectures are characterized by a single powerful, integrated, customized,

    and generally costly computing resource. Monolithic architectures are more resistant to adopt new or external technologies, and while being able to demonstrate scalability to a certain extent, they require careful matching between system resources, which cannot be easily attained across the system just by adding more of the existing building blocks. Note that a clustered topology, while not truly comprised of a single system, is still considered a monolithic architecture. Unlike a grid architecture, a given work task remains undivided from the perspective of the computing resource as a whole.

    Grid architectures utilize a number of horizontally distributed, relatively expendable, identical (or very similar) computing resources tied together through sophisticated operational algorithms. A given work task is strategically split and performed in parallel across many computing resources before being consolidated into a cohesive work product. The power of the system is increased by adding more modules (that execute jobs of a similar type) so that the overall workload is distributed across more modules.

    Monolithic subsystemsConventional storage subsystems utilize proprietary, custom-designed hardware components (rather than generally available hardware components) and interconnects that are specifically engineered to be integrated together to achieve target design performance and reliability objectives.The complex high performance architectural aspects of redundant monolithic systems generally effectuate one or more of the following characteristics: Openness: Components that need to be replaced due to a failure or a hardware upgrade

    are generally manufacturer-specific due to the custom design inherent to the system. The system cannot easily leverage newer hardware designs or components introduced to the market.

    Performance: Even in a N+1 clustered system, the loss of a clustered component not only might have a significant impact on the way that the system functions, but might also impact the performance experienced by hosts and host applications.

    Upgradability and scalability: The ability to upgrade the system by scaling up resources: Though the system might remain operational, the process of upgrading system

    resources has the potential to impact performance and availability for the duration of the upgrade procedure.10 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Upgrades generally require careful and potentially time-consuming planning and administration, and might even require a degree of outage under certain circumstances.

    Although a specific layer of the vertically integrated monolithic storage subsystem hierarchy can be enhanced during an upgrade, it is possible that: It will result in an imbalance by skewing the ratio of resources, such as cache,

    processors, disks, and buses, thus precluding the full benefit of the upgrade by allowing certain resources, or portions thereof, to go unused.

    Architectural limitations of the monolithic system might prevent a necessary complementary resource from scaling. For example, a disk subsystem might accommodate an upgrade to the number of drives, but not the processors, resulting in a limitation of the performance potential of the overall system.

    Generally, monolithic systems cannot be scaled out by adding computing resources: The major disadvantage of monolithic architectures is their proprietary nature, which

    impedes the adoption of new technologies, even partially. Monolithic architectures are harder to extend through external products or technologies, even though they typically contain all of the necessary ingredients for functioning.

    At a certain point, it is necessary to simply migrate data to a newer subsystem, because the upgradeability of the current system has been exhausted, resulting in: The need for a large initial acquisition or hardware refresh. The necessity of potentially time-consuming data migration planning and

    administration.

    Figure 2-3 illustrates the concept of a monolithic storage subsystem architecture.

    Figure 2-3 Monolithic storage subsystem architecture

    Cache

    Controllers

    Interface Interface Interface

    Building blocks: Disks Cache Controllers Interfaces Interconnects

    JBOD JBODChapter 2. XIV logical architecture and concepts 11

  • IBM XIV Storage System grid overviewThe XIV Grid design entails the following characteristics: Both Interface Modules and Data Modules work together in a distributed computing sense.

    In other words, although Interface Modules have additional interface ports and assume certain unique functions, they also contribute to the system operations equally to Data Modules.

    The modules communicate with each other through the internal, redundant Ethernet network.

    The software services and distributed computing algorithms running within the modules collectively manage all aspects of the operating environment.

    Design principlesThe XIV Storage System grid architecture, by virtue of its distributed topology and standard Intel and Linux building block components, ensures that the following design principles are possible: Performance:

    The relative effect of the loss of a given computing resource, or module, is minimized. All modules are able to participate equally in handling the total workload.

    This design principle is true regardless of access patterns. The system architecture enables excellent load balancing, even if certain applications access certain volumes, or certain parts within a volume, more frequently.

    Openness: Modules consist of standard ready to use components.Because components are not specifically engineered for the subsystem, the resources and time required for the development of newer hardware technologies are minimized. This benefit, coupled with the efficient integration of computing resources into the grid architecture, enables the subsystem to realize the rapid adoption of the newest hardware technologies available without the need to deploy a whole new subsystem.

    Upgradability and scalability: Computing resources can be dynamically changed: Scaled out by either adding new modules to accommodate both new capacity and

    new performance demands, or by tying together groups of modules Scaled up by upgrading modules

    Figure 2-4 on page 13 depicts a conceptual view of the XIV Storage System grid architecture and its design principles.

    Important: While the grid architecture of the XIV Storage System enables the potential for great flexibility, the current supported hardware configuration contains a fixed number of modules.12 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Figure 2-4 IBM XIV Storage System scalable conceptual grid architecture

    Proportional scalabilityWithin the XIV Storage System, each module is a discrete computing (and capacity) resource containing all of the pertinent hardware elements that are necessary for a grid topology (processing, caching, and storage). All modules are connected through a scalable network. This aspect of the grid infrastructure enables the relative proportions of cache, processors, disk, and interconnect bandwidth to remain optimal even in the event that modules are added or removed: Linear cache growth: The total system cache size and cache bandwidth increase linearly

    with disk capacity, because every module is a self-contained computing resource that houses its own cache. Note that the cache bandwidth scales linearly in terms of both host-to-cache and cache-to-disk throughput, and the close proximity of cache, processor, and disk is maintained.

    Proportional interface growth: Interface Modules house iSCSI and Fibre Channel host interfaces and are able to access not only the local resources within the module, but the entire system. Adding modules to the system proportionally scales both the number of host interfaces and the bandwidth to the internal resources.

    Constant switching capacity: The internal switching capacity is designed to scale proportionally as the system grows, preventing bottlenecks regardless of the number of modules. This capability ensures that internal throughput scales proportionally to capacity.

    Embedded processing power: Because each module incorporates its own processing power in conjunction with cache and disk components, the ability of the system to perform processor-intensive tasks, such as aggressive prefetch caching, sophisticated cache updates, snapshot management, and data distribution, is always maintained regardless of

    Important: Figure 2-4 is a conceptual depiction of the XIV Storage System grid architecture, and therefore, is not intended to accurately represent numbers of modules, module hardware, switches, and so on.

    Interface Interface

    Design principles: Massive parallelism Granular distribution Coupled disk, RAM and CPU Off-the-shelf components User simplicity

    Data Module

    Interface Interface Interface

    Data Module Data Module Data Module Data Module Data Module Data Module

    SwitchingChapter 2. XIV logical architecture and concepts 13

    of the system capacity.

  • 2.2.2 Logical parallelismIn addition to the hardware parallelism, the XIV Storage System also employs sophisticated and patented data distribution algorithms to achieve optimal parallelism.

    Pseudo-random algorithmThe spreading of data occurs in a pseudo-random fashion. While a discussion of random algorithms is beyond the scope of this book, the term pseudo-random is intended to describe the uniform but random spreading of data across all available disk hardware resources while maintaining redundancy. Figure 2-5 on page 17 provides a conceptual representation of the pseudo-random distribution of data within the XIV Storage System.

    For more details about the topic of data distribution and storage virtualization, refer to 2.3.1, Logical system concepts on page 16.

    Modular software designThe XIV Storage System internal operating environment stems from a set of software functions that are loosely coupled with the hardware modules. Various aspects of managing the system are associated with instances of specific software functions that can reside on one or more modules and can be redistributed among modules as required, thus ensuring resiliency under changing hardware conditions.

    2.3 Full storage virtualizationWhile the concept of virtualized storage is far from new, the data distribution algorithms employed by the XIV Storage System are novel in that they are deeply integrated into the subsystem architecture itself, instead of at the host or storage area network level. In the latter case, storage virtualization occurs across separate storage subsystems through an additional layer of virtualization between hosts and storage. The XIV Storage System is unique in that it is based on an innovative implementation of full storage virtualization within the subsystem itself. This approach permits logical structures within the subsystem to change dynamically and transparently while maintaining excellent load balancing and resource utilization at the lowest level of hardware granularity.

    In order to fully appreciate the value inherent to the virtualization design that is used by the XIV Storage System, it is helpful to remember several aspects of the physical and logical relationships that comprise conventional storage subsystems. Specifically, traditional subsystems rely on storage administrators to carefully plan the relationship between logical structures, such as Storage Pools and volumes, and physical resources, such as drives and arrays, in order to strategically balance workloads, meet capacity demands, eliminate hot-spots, and provide adequate performance. Thus, whenever there is a new demand for capacity or performance, or a change to existing resources is required, the storage administrator is responsible for replanning and rebalancing resources, which can be an error-prone and time-consuming process.

    Note: The XIV Storage System exploits mass parallelism at both the hardware and software levels.14 IBM XIV Storage System: Concepts, Architecture, and Usage

  • IBM XIV Storage System virtualization benefitsThe implementation of full storage virtualization employed by the XIV Storage System eliminates many of the potential logistical and operational drawbacks that can be present with conventional storage subsystems, while maximizing the overall potential of the subsystem.

    The XIV Storage System virtualization offers the following benefits: Easier volume management:

    Logical volume placement is driven by the distribution algorithms, freeing the storage administrator from planning and maintaining volume layout. The data distribution algorithms manage all of the data in the system collectively without deference to specific logical volume definitions.

    Any interaction, whether host or subsystem driven, with a specific logical volume in the system is inherently handled by all resources; it harnesses all storage capacity, all internal bandwidth, and all processing power currently available in the subsystem.

    Logical volumes are not exclusively associated with a subset of physical resources, nor is there a permanent static relationship between logical volumes and specific physical resources: Logical volumes can be dynamically resized. Logical volumes can be thinly provisioned, as discussed in 2.3.4, Capacity

    allocation and thin provisioning on page 24. Consistent performance and scalability:

    Hardware resources are always utilized equitably, because all logical volumes always span all physical resources and are therefore able to reap the performance potential of the full subsystem: Virtualization algorithms automatically redistribute the logical volumes data and

    workload when new hardware is added, thereby maintaining the system balance while preserving transparency to the attached hosts.

    Conversely, equilibrium and transparency are maintained during the phase-out of old or defective hardware resources.

    There are no pockets of capacity, orphaned spaces, or resources that are inaccessible due to array mapping constraints or data placement.

    Maximized availability and data integrityThe full virtualization scheme enables the IBM XIV Storage Subsystem to manage and maintain data redundancy as hardware changes: In the event of a hardware failure or when hardware is phased out, data is

    automatically, efficiently, and rapidly rebuilt across all the drives and modules in the system, thereby preserving host transparency, equilibrium, and data redundancy at all times while virtually eliminating any performance penalty associated with conventional RAID rebuild activities.

    When new hardware is added to the system, data is transparently redistributed across all resources to restore equilibrium to the system.

    Flexible snapshots: Full storage virtualization incorporates snapshots that are differential in nature; only

    updated data consumes physical capacity: Many concurrent snapshots (Up to 16 000 volumes and snapshots can be defined.)

    Note that many concurrent snapshots are possible, because a snapshot uses physical space only after a change has occurred on the source.Chapter 2. XIV logical architecture and concepts 15

  • Multiple snapshots of a single master volume can exist independently of each other. Snapshots can be cascaded, in effect, creating snapshots of snapshots.

    Snapshot creation and deletion do not require data to be copied and hence occur immediately.

    While updates occur to master volumes, the systems virtualized logical structure enables it to elegantly and efficiently preserve the original point-in-time data associated with any and all dependent snapshots by simply redirecting the update to a new physical location on disk. This process, which is referred to as redirect on write, occurs transparently from the host perspective by virtue of the virtualized remapping of the updated data and minimizes any performance impact associated with preserving snapshots, regardless of the number of snapshots defined for a given master volume.

    Because the process uses redirect on write and does not necessitate data movement, the size of a snapshot is independent of the source volume size.

    Data migration efficiency: XIV supports thin provisioning. When migrating from a system that only supports

    regular (or thick) provisioning, XIV allows thick-to-thin provisioning of capacity. Thin-provisioned capacity is discussed in 2.3.4, Capacity allocation and thin provisioning on page 24.

    Due to the XIV pseudo-random and uniform distribution of data, the performance impact of data migration on production activity is minimized, because the load is spread evenly over all resources.

    2.3.1 Logical system conceptsIn this section, we elaborate on the logical system concepts, which form the basis for the system full storage virtualization.

    Logical constructsThe XIV Storage System logical architecture incorporates constructs that underlie the storage virtualization and distribution of data, which are integral to its design. The logical structure of the subsystem ensures that there is optimum granularity in the mapping of logical elements to both modules and individual physical disks, thereby guaranteeing an ideal distribution of data across all physical resources.

    PartitionsThe fundamental building block of logical volumes is known as a partition. Partitions have the following characteristics: All partitions are 1 MB (1024 KB) in size. A partition contains either a primary copy or secondary copy of data:

    Each partition is mapped to a single physical disk: This mapping is dynamically managed by the system through a proprietary

    pseudo-random distribution algorithm in order to preserve data redundancy and equilibrium. For more information about the topic of data distribution, refer to Logical volume layout on physical disks on page 19.

    Note: The XIV snapshot process uses redirect on write, which is more efficient than the copy on write that is used by other storage subsystems.16 IBM XIV Storage System: Concepts, Architecture, and Usage

  • The storage administrator has no control or knowledge of the specific mapping of partitions to drives.

    Secondary partitions are always placed onto a physical disk that does not contain the primary partition.In addition, secondary partitions are also in a module that does not contain its corresponding primary partition.

    Figure 2-5 Pseudo-random data distribution1

    Important: In the context of the XIV Storage System logical architecture, a partition consists of 1 MB (1024 KB) of data. Do not confuse this definition with other definitions of the term partition.

    1 Copyright 2005-2008 Mozilla. All Rights Reserved. All rights in the names, trademarks, and logos of the Mozilla Foundation, including without limitation, Mozilla, Firefox, as well as the Firefox logo, are owned exclusively by the

    The diagram illustrates that data is uniformly and randomly distributed over all disks. Each 1 MB of data is duplicated in a primary and a secondary partition. For the samedata, the system ensures that the primary partition and its corresponding secondary partition are not located on the same disk and are also not within the same module.

    1Chapter 2. XIV logical architecture and concepts 17

    Mozilla Foundation.

  • Logical volumesThe XIV Storage System presents logical volumes to hosts in the same manner as conventional subsystems; however, both the granularity of logical volumes and the mapping of logical volumes to physical disks fundamentally differ: As discussed previously, every logical volume is comprised of 1 MB (1024 KB) pieces of

    data known as partitions. The physical capacity associated with a logical volume is always a multiple of 17 GB

    (decimal).Therefore, while it is possible to present a block-designated (refer to Creating volumes on page 107) logical volume to a host that is not a multiple of 17 GB, the maximum physical space that is allocated for the volume will always be the sum of the minimum number of 17 GB increments needed to meet the block-designated capacity. Note that the initial physical capacity actually allocated by the system upon volume creation can be less than this amount, as discussed in Hard and soft volume sizes on page 25.

    The maximum number of volumes that can be concurrently defined on the system is limited by: The logical address space limit:

    The logical address range of the system permits up to 16 377 volumes, although this constraint is purely logical, and therefore, is not normally a practical consideration.

    Note that the same address space is used for both volumes and snapshots. The limit imposed by the logical and physical topology of the system for the minimum

    volume size.The physical capacity of the system, based on 180 drives with 1 TB of capacity per drive and assuming the minimum volume size of 17 GB, limits the maximum volume count to 4 605 volumes. Again, a system with active snapshots can have more than 4 605 addresses assigned collectively to both volumes and snapshots, because volumes and snapshots share the same address space.

    Logical volumes are administratively managed within the context of Storage Pools, discussed in 2.3.3, Storage Pool concepts on page 22.Storage Pools are not part of the logical hierarchy inherent to the systems operational environment, because the concept of Storage Pools is administrative in nature.

    Storage PoolsStorage Pools are purely logical entities that enable storage administrators to manage relationships between volumes and snapshots and to define separate capacity provisioning and snapshot requirements for separate applications and departments. Storage Pools are not tied in any way to specific physical resources, nor are they part of the data distribution scheme. We discuss Storage Pools and their associated concepts in 2.3.3, Storage Pool concepts on page 22.

    Important: The logical address limit is ordinarily not a practical consideration during planning, because under most conditions, this limit will not be reached; it is intended to exceed the adequate number of volumes for all conceivable circumstances.18 IBM XIV Storage System: Concepts, Architecture, and Usage

  • SnapshotsA snapshot represents a point-in-time copy of a volume. Snapshots are governed by almost all of the principles that apply to volumes. Unlike volumes, snapshots incorporate dependent relationships with their source volumes, which can be either logical volumes or other snapshots. Because they are not independent entities, a given snapshot does not necessarily wholly consist of partitions that are unique to that snapshot. Conversely, a snapshot image will not share all of its partitions with its source volume if updates to the source occur after the snapshot was created. Chapter 11, Copy functions on page 285 examines snapshot concepts and practical considerations, including locking behavior and implementation.

    Logical volume layout on physical disksThe XIV Storage System facilitates the distribution of logical volumes over disks and modules by means of a dynamic relationship between primary data copies, secondary data copies, and physical disks. This virtualization of resources in the XIV Storage System is governed by a pseudo-random algorithm.

    Partition tableMapping between a logical partition number and the physical location on disk is maintained in a partition table. The partition table maintains the relationship between the partitions that comprise a logical volume and its physical location on disk.

    Volume layoutAt a high level, the data distribution scheme is an amalgam of mirroring and striping. While it is tempting to think of this scheme in the context of RAID 1+0 (10) or 0+1, the low-level virtualization implementation precludes the usage of traditional RAID algorithms in the architecture. Conventional RAID implementations cannot incorporate dynamic, intelligent, and automatic management of data placement based on knowledge of the volume layout, nor is it feasible for a traditional RAID system to span all drives in a subsystem due to the vastly unacceptable rebuild times that can result.

    As discussed previously, the XIV Storage System architecture divides logical volumes into 1 MB partitions. This granularity and the mapping strategy are integral elements of the logical design that enable the system to realize the following features and benefits: Partitions are distributed on all disks using what is defined as a pseudo-random

    distribution function, which was introduced in 2.2.2, Logical parallelism on page 14. The distribution algorithms seek to preserve the statistical equality of access among all

    physical disks under all conceivable real-world aggregate workload conditions and associated volume access patterns. Essentially, while not truly random in nature, the distribution algorithms in combination with the system architecture preclude the occurrence of the phenomenon traditionally known of as hot-spots: The XIV Storage System contains 180 disks, and each volume is allocated across

    at least 17 GB (decimal) of capacity that is distributed evenly across all disks. Each logically adjacent partition on a volume is distributed across a different disk;

    partitions are not combined into groups before they are spread across the disks. The pseudo-random distribution ensures that logically adjacent partitions are never

    striped sequentially across physically adjacent disks. Refer to 2.2.2, Logical parallelism on page 14 for a further overview of the partition mapping topology.

    Note: Both the distribution table and the partition table are redundantly maintained among the modules.Chapter 2. XIV logical architecture and concepts 19

  • Each disk has its data mirrored across all other disks, excluding the disks in the same module.

    Each disk holds approximately one percent of any other disk in other modules. Disks have an equal probability of being accessed from a statistical standpoint,

    regardless of aggregate workload access patterns.

    As discussed previously in IBM XIV Storage System virtualization benefits on page 15: The storage system administrator does not plan the layout of volumes on the modules. Provided there is space available, volumes can always be added or resized instantly

    with negligible impact on performance. There are no unusable pockets of capacity known as orphaned spaces.

    When the system is scaled out through the addition of modules, a new goal distribution is created whereby just a minimum of partitions is moved to the newly allocated capacity to arrive at the new distribution table.The new capacity is fully utilized within several hours and with no need for any administrative intervention. Thus, the system automatically returns to a state of equilibrium among all resources.

    Upon the failure or phase-out of a drive or a module, a new goal distribution is created whereby data in non-redundant partitions is copied and redistributed across the remaining modules and drives.The system rapidly returns to a state in which all partitions are again redundant, because all disks and modules participate in the enforcement of the new goal distribution.

    Volumes and snapshotsThe relationship between volumes and snapshots in the context of the data layout is: Volumes and snapshots are mapped using the same distribution scheme. A given partition of a primary volume and its snapshot is stored on the same disk drive:

    As a result, a write to this partition is redirected within the module, minimizing latency and utilization associated with additional interactions between modules.

    As updates occur to master volumes, the systems virtualized logical structure enables it to elegantly and efficiently preserve the original point-in-time data associated with any and all dependent snapshots by simply redirecting the update to a new physical location on the disk. This process, which is referred to as redirect on write, occurs transparently from the host perspective by virtue of the virtualized remapping of the updated data and minimizes any performance impact associated with preserving snapshots regardless of the number of snapshots defined for a given master volume.

    2.3.2 System usable capacityThe XIV Storage System reserves physical disk capacity for: Global spare capacity Metadata, including statistics and traces Mirrored copies of data

    Note: When the number of disks or modules changes, the system defines a new data layout that preserves redundancy and equilibrium. This target data distribution is called the goal distribution and is discussed in Goal distribution on page 37.20 IBM XIV Storage System: Concepts, Architecture, and Usage

  • Global spare capacityThe dynamically balanced distribution of data across all physical resources by definition obviates the inclusion of dedicated spare drives that are necessary with conventional RAID technologies. Instead, the XIV Storage System reserves capacity on each disk in order to provide adequate space for the redistribution and recreation of redundant data in the event of a hardware failure.

    This global spare capacity approach offers advantages over dedicated hot spare drives, which are used only upon failure and are not used otherwise, therefore reducing the number of spindles that the system can leverage for better performance. Also, those non-operating disks are typically not subject to background scrubbing processes, whereas in XIV, all disks are operating and subject to examination, which helps detect potential reliability issues with drives. Finally, it is thought that exposing drives to a sudden spike in usage, which happens when using hot dedicated spare disks upon failure, increases the likelihood of failure for those drives.

    The global reserved space includes sufficient space to withstand the failure of a full module in addition to three disks, enabling the system to execute a new goal distribution, which is discussed in 2.4.2, Rebuild and redistribution on page 37, and to return to full redundancy even after multiple hardware failures. Space for hot spare is reserved as a percentage of each drives overall capacity, because the reserve spare capacity does not reside on dedicated disks.

    Metadata and system reserveThe system reserves roughly 4% of the physical capacity for statistics and traces, as well as the distribution and partition tables.

    Net usable capacityThe calculation of the net usable capacity of the system consists of the total disk count, less disk space reserved for sparing (which is the equivalent of one module plus three more disks), multiplied by the amount of capacity on each disk that is dedicated to data (that is 96%), and finally reduced by a factor of 50% to account for data mirroring.

    Important: The system will tolerate multiple hardware failures, including up to an entire module in addition to three subsequent drive failures outside of the failed module, provided that a new goal distribution is fully executed before a subsequent failure occurs. If the system is less than 100% full, it can sustain more subsequent failures based on the amount of unused disk space that will be allocated at the event of failure as a spare capacity. For a thorough discussion of how the system uses and manages reserve capacity under specific hardware failure scenarios, refer to 2.4, Reliability, availability, and serviceability on page 33.

    Note: The XIV Storage System does not manage a global reserved space for snapshots. We explore this topic in the next section.

    Note: The calculation of the usable space is:

    Usable capacity = [drive space x (% utilized for data) x [Total Drives - Hot Spare reserve]/2Usable capacity = [1000 GB x 0.96 x [180-[12 + 3]]]/2 = 79 113 GB (decimal)Chapter 2. XIV logical architecture and concepts 21

  • 2.3.3 Storage Pool conceptsWhile the hardware resources within the XIV Storage System are virtualized in a global sense, the available capacity in the system can be administratively portioned into separate and independent Storage Pools. The concept of Storage Pools is purely administrative in that they are not a layer of the functional hierarchical logical structure employed by the system operating environment, which is discussed in 2.3.1, Logical system concepts on page 16. Instead, the flexibility of Storage Pool relationships from an administrative standpoint derives from the granular virtualization within the system. Essentially, Storage Pools function as a means to effectively manage a related group of logical volumes and their snapshots.

    Improved management of storage spaceStorage Pools form the basis for controlling the usage of storage space by specific applications, a group of applications, or departments, enabling isolated management of relationships within the associated group of logical volumes and snapshots while imposing a capacity quota.

    A logical volume is defined within the context of one and only one Storage Pool, and knowing that a volume is equally distributed among all system disk resources, it follows that all Storage Pools must also span all system resources.

    As a consequence of the system virtualization, there are no limitations on the size of Storage Pools or on the associations between logical volumes and Storage Pools. In fact, manipulation of Storage Pools consists exclusively of metadata transactions and does not impose any data copying from one disk or module to another disk or module. Therefore, changes are completed instantly and without any system overhead or performance degradation.

    Consistency GroupsA Consistency Group is a group of volumes of which a snapshot can be made at the same point in time, thus ensuring a consistent image of all volumes within the group at that time. The concept of Consistency Group is ubiquitous among storage subsystems, because there are many circumstances in which it is necessary to perform concurrent operations collectively across a set of volumes, so that the result of the operation preserves the consistency among volumes. For example, effective storage management activities for applications that span multiple volumes, or for creating point-in-time backups, is not possible without first employing Consistency Groups.

    A notable practical scenario necessitating Consistency Groups arises when a consistent, instantaneous image of the database application (spanning both the database and the transaction log) is required. Obviously, taking snapshots of the volumes serial