Administering Vmware Site Recovery Manager 5 0 Vmware Press Technology

download Administering Vmware Site Recovery Manager 5 0 Vmware Press Technology

If you can't read please download the document

description

Administering Vmware Site Recovery Manager 5 0 Vmware Press Technology

Transcript of Administering Vmware Site Recovery Manager 5 0 Vmware Press Technology

  • VMware Press is the official publisher of VMware booksand training materials, which provide guidance on thecritical topics facing todays technology professionals andstudents. Enterprises, as well as small- and medium-sizedorganizations, adopt virtualization as a more agile way ofscaling IT to meet business needs. VMware Pressprovides proven, technically accurate information that willhelp them meet their goals for customizing, building, andmaintaining their virtual environment. With books, certification and study guides, video training,and learning tools produced by world-class architects andIT experts, VMware Press helps IT professionals master adiverse range of topics on virtualization and cloudcomputing and is the official source of reference materialsfor preparing for the VMware Certified ProfessionalExamination. VMware Press is also pleased to have localization partnersthat can publish its products into more than forty-twolanguages, including, but not limited to, Chinese(Simplified), Chinese (Traditional), French, German, Greek,Hindi, Japanese, Korean, Polish, Russian, and Spanish. For more information about VMware Press please visithttp://www.vmware.com/go/vmwarepress

  • Administering VMwareSite Recovery Manager 5.0

    Technology Hands-on

    Mike Laverick

  • Upper Saddle River, NJ Boston Indianapolis San

    FranciscoNew York Toronto Montreal London Munich Paris

    MadridCapetown Sydney Tokyo Singapore Mexico City

  • Administering VMware Site Recovery Manager 5.0Copyright 2012 VMware, Inc. Published by Pearson Education, Inc. Publishing as VMware Press All rights reserved. Printed in the United States of America.This publication is protected by copyright, and permissionmust be obtained from the publisher prior to any prohibitedreproduction, storage in a retrieval system, or transmissionin any form or by any means, electronic, mechanical,photocopying, recording, or likewise. To obtain permissionto use material from this work, please submit a writtenrequest to Pearson Education, Inc., PermissionsDepartment, One Lake Street, Upper Saddle River, NewJersey 07458, or you may fax your request to (201) 236-3290. All terms mentioned in this book that are known to betrademarks or service marks have been appropriatelycapitalized. The publisher cannot attest to the accuracy ofthis information. Use of a term in this book should not beregarded as affecting the validity of any trademark orservice mark. VMware terms are trademarks or registered trademarks of

  • VMware in the United States, other countries, or both. VMWARE PRESS PROGRAM MANAGERAndrea Eubanks de Jounge ASSOCIATE PUBLISHERDavid Dusthimer ACQUISITIONS EDITORJoan Murray DEVELOPMENT EDITORSusan Zahn MANAGING EDITORJohn Fuller FULL-SERVICE PRODUCTION MANAGERJulie B. Nahil COPY EDITORAudrey Doyle PROOFREADERKelli M. Brooks INDEXER

  • Jack Lewis EDITORIAL ASSISTANTVanessa Evans BOOK DESIGNERGary Adair COMPOSITORKim Arney

    Warning and Disclaimer Every effort has been made to make this book as completeand as accurate as possible, but no warranty or fitness isimplied. The information provided is on an as is basis.The author, VMware Press, VMware and the publisher shallhave neither liability nor responsibility to any person orentity with respect to any loss or damages arising from theinformation contained in this book or from the use of theCD or programs accompanying it. The opinions expressed in this book belong to the authorand are not necessarily those of VMware.

    Corporate and Government Sales

  • VMware Press offers excellent discounts on this book whenordered in quantity for bulk purchases or special sales,which may include electronic versions and/or customcovers and content particular to your business, traininggoals, marketing focus, and branding interests. For moreinformation, please contact U.S. Corporate andGovernment Sales, (800) 382-3419,[email protected]. For sales outside theUnited States, please contact International Sales,[email protected]. Library of Congress Control Number: 2011919183 ISBN-13: 978-0-321-79992-0ISBN-10: 0-321-79992-5 Text printed in the United States on recycled paper at RRDonnelley in Crawfordsville, Indiana.First printing, December 2011

  • This book is dedicated to Carmelfor putting up with me

    and my endless ramblings about virtualization.

  • Contents

    Preface Acknowledgments About the Author Chapter 1. Introduction to Site Recovery Manager

    Whats New in Site Recovery Manager 5.0

    vSphere 5 Compatibility

    vSphere Replication

    Automated Failback and Reprotect

    VM Dependencies

    Improved IP Customization

    A Brief History of Life before VMware SRM

    What Is Not a DR Technology?

  • vMotion

    VMware HA Clusters

    VMware Fault Tolerance

    Scalability for the Cloud

    What Is VMware SRM?

    What about File Level Consistency?

    Principles of Storage Management and Replication

    Caveat #1: All Storage Management Systems Are theSame

    Caveat #2: All Storage Vendors Sell Replication

    Caveat #3: Read the Manual

    Summary

    Chapter 2. Getting Started with Dell EqualLogicReplication

    Creating an EqualLogic iSCSI Volume

    Granting ESXi Host Access to the EqualLogic iSCSIVolume

  • Enabling Replication for EqualLogic

    Configuring Replication Partners

    Configuring Replication of the iSCSI Volume

    Configuring a Schedule for Replication

    Using EqualLogic Host Integration for VMware Edition(HIT-VE)

    Summary

    Chapter 3. Getting Started with EMC CelerraReplication

    Creating an EMC Celerra iSCSI Target

    Granting ESX Host Access to the EMC Celerra iSCSITarget

    Creating a New File System

    Creating an iSCSI LUN

    Configuring Celerra Replication

    Summary

  • Chapter 4. Getting Started with EMC CLARiiONMirrorView

    Creating a Reserved LUN Pool

    Creating an EMC LUN

    Configuring EMC MirrorView

    Creating a Snapshot for SRM Tests

    Creating Consistency Groups (Recommended)

    Granting ESX Host Access to CLARiiON LUNs

    At the Recovery Site CLARiiON (New Jersey)

    At the Protected Site CLARiiON (New York)

    Using the EMC Virtual Storage Integrator Plug-in (VSI)

    Summary Chapter 5. Getting Started with the HP StorageWorksP4000 Virtual SAN Appliance with Remote Copy

    Some Frequently Asked Questions about the HPP4000 VSA

    Downloading and Uploading the VSA

  • Importing the StorageWorks P4000 VSA

    Modifying the VSAs Settings and First-Power-OnConfiguration

    Primary Configuration of the VSA Host

    Installing the Management Client

    Configuring the VSA (Management Groups, Clusters,and Volumes)

    Adding the VSAs to the Management Console

    Adding the VSAs to Management Groups

    Creating a Cluster

    Creating a Volume

    Licensing the HP VSA

    Configuring the HP VSA for Replication

    Monitoring Your Replication/Snapshot

    Adding ESX Hosts and Allocating Volumes to Them

    Adding an ESX Host

  • Allocating Volumes to ESX Hosts

    Granting ESX Host Access to the HP VSA iSCSITarget

    Monitoring Your iSCSI Connections

    The HP StorageWorks P4000 VSA: Creating a TestVolume at the Recovery Site

    Shutting Down the VSA

    Summary

    Chapter 6. Getting Started with NetApp SnapMirror

    Provisioning NetApp NFS Storage for VMware ESXi

    Creating a NetApp Volume for NFS

    Granting ESXi Host Access to NetApp NFS Volumes

    Creating NetApp Volumes for Fibre Channel andiSCSI

    Granting ESXi Host Access to the NetApp iSCSITarget

    Configuring NetApp SnapMirror

  • Confirm IP Visibility (Mandatory) and Name Resolution(Optional)

    Enable SnapMirror (Both the Protected and RecoveryFilers)

    Enable Remote Access (Both the Protected andRecovery Filers)

    Configure SnapMirror on the Recovery Site NetAppFiler (New Jersey)

    Introducing the Virtual Storage Console (VSC)

    Summary

    Chapter 7. Installing VMware SRM

    Architecture of the VMware SRM

    Network Communication and TCP Port Numbers

    Storage Replication Components

    VMware Components

    More Detailed Information about Hardware andSoftware Requirements

    Scalability of VMware SRM

  • Designed for Both Failover and Failback?

    A Word about Resignaturing VMFS Volumes

    VMware SRM Product Limitations and Gotchas

    Licensing VMware SRM

    Setting Up the VMware SRM Database with MicrosoftSQL Server 2008

    Creating the Database and Setting Permissions

    Configuring a DSN Connection on the SRM Server(s)

    Installing the VMware SRM Server

    Installing the SRM Software

    Installing a Storage Replication Adapter: Example HPSRA

    Installing the vSphere Client SRM Plug-in

    Handling Failures to Connect to the SRM Server

    Summary

    Chapter 8. Configuring vSphere Replication

  • (Optional)

    How vSphere Replication Works

    vSphere Replication Limitations

    Installing vSphere Replication

    Setting the vCenter Managed IP Address

    Configuring a Database for the VRMS

    Deploying the VRMS

    Configuring the VRMS

    Configuring the VRMS Connection

    Deploying the VRS

    Registering the VRS

    Enabling and Monitoring vSphere Replication

    Moving, Pausing, Resuming, Removing, and ForcingSynchronization

    Enabling Replication for Physical Couriering

    Configuring Datastore Mappings

  • Summary

    Chapter 9. Configuring the Protected Site

    Connecting the Protected and Recovery Site SRMs

    Configuring Inventory Mappings

    Configuring Resource Mappings

    Configuring Folder Mappings

    Configuring Network Mappings

    Assigning Placeholder Datastores

    Configuring Array Managers: An Introduction

    Configuring Array Managers: Dell EqualLogic

    Configuring Array Managers: EMC Celerra

    Configuring Array Managers: EMC CLARiiON

    Configuring Array Managers: NetApp FSA

    Creating Protection Groups

    Failure to Protect a Virtual Machine

  • Bad Inventory Mappings

    Placeholder VM Not Found

    VMware Tools Update ErrorDevice Not Found:CD/DVD Drive 1

    Delete VM Error

    Its Not an Error, Its a Naughty, Naughty Boy!

    Summary

    Chapter 10. Recovery Site Configuration

    Creating a Basic Full-Site Recovery Plan

    Testing Storage Configuration at the Recovery Site

    Overview: First Recovery Plan Test

    Practice Exercise: First Recovery Plan Test

    Cleaning Up after a Recovery Plan Test

    Controlling and Troubleshooting Recovery Plans

    Pause, Resume, and Cancel Plans

    Error: Cleanup Phase of the Plan Does Not Always

  • Happen with iSCSI

    Error: Loss of the Protection Group Settings

    Error: Cleanup Fails; Use Force Cleanup

    Error: Repairing VMs

    Error: Disconnected Hosts at the Recovery Site

    Recovery Plans and the Storage Array Vendors

    Dell EqualLogic and Testing Plans

    EMC Celerra and Testing Plans

    NetApp and Testing Plans

    Summary Chapter 11. Custom Recovery Plans

    Controlling How VMs Power On

    Configuring Priorities for Recovered Virtual Machines

    Adding VM Dependencies

    Configuring Start-Up and Shutdown Options

  • Suspending VMs at the Recovery Site

    Adding Additional Steps to a Recovery Plan

    Adding Prompt Steps

    Adding Command Steps

    Adding Command Steps with VMware PowerCLI

    Managing PowerCLI Authentication and Variables

    Adding Command Steps to Call Scripts within theGuest Operating System

    Configuring IP Address Changes for Recovery VirtualMachines

    Creating a Manual IP Guest Customization

    Configuring Bulk IP Address Changes for theRecovery Virtual Machine (dr-ip-exporter)

    Creating Customized VM Mappings

    Managing Changes at the Protected Site

    Creating and Protecting New Virtual Machines

    Renaming and Moving vCenter Inventory Objects

  • Other Objects and Changes in the vSphere and SRMEnvironment

    Storage vMotion and Protection Groups

    Virtual Machines Stored on Multiple Datastores

    Virtual Machines with Raw Device/Disk Mappings

    Multiple Protection Groups and Multiple RecoveryPlans

    Multiple Datastores

    Multiple Protection Groups

    Multiple Recovery Plans

    The Lost Repair Array Managers Button

    Summary

    Chapter 12. Alarms, Exporting History, and AccessControl

    vCenter Linked Mode and Site Recovery Manager

    Alarms Overview

    Creating a New Virtual Machine to Be Protected by an

  • Alarm (Script)

    Creating a Message Alarm (SNMP)

    Creating an SRM Service Alarm (SMTP)

    Exporting and History

    Exporting Recovery Plans

    Recovery Plan History

    Access Control

    Creating an SRM Administrator

    Summary Chapter 13. Bidirectional Relationships and SharedSite Configurations

    Configuring Inventory Mappings

    Refreshing the Array Manager

    Creating the Protection Group

    Creating the Recovery Plan

    Using vApps to Control Start-Up Orders

  • Shared Site Configurations

    Installing VMware SRM with Custom Options to theNew Site (Washington DC)

    Installing VMware SRM Server with Custom Options tothe Recovery Site

    Pairing the Sites Together

    Decommissioning a Site

    Summary

    Chapter 14. Failover and Failback

    Planned Failover: Protected Site Is Available

    Dell EqualLogic and Planned Recovery

    NetApp and Planned Recovery

    Automated Failback from Planned Migration

    Unplanned Failover

    Protected Site Is Dead

    Planned Failback after a Disaster

  • Summary

    Chapter 15. Scripting Site Recovery

    Scripted Recovery for a Test

    Managing the Storage

    Rescanning ESX Hosts

    Resignaturing VMFS Volumes

    Mounting NFS Exports

    Creating an Internal Network for the Test

    Adding Virtual Machines to the Inventory

    Fixing VMX Files for the Network

    Summary Chapter 16. Upgrading from SRM 4.1 to SRM 5.0

    Upgrading vSphere

    Step 1: Run the vCenter Host Agent Pre-UpgradeChecker

  • Step 2: Upgrade vCenter

    Step 3: Upgrade the vCenter Client

    Step 4: Upgrade the VMware Update Manager (VUM)

    Step 5: Upgrade the VUM Plug-in

    Step 6: Upgrade Third-Party Plug-ins (Optional)

    Step 7: Upgrade the ESX Hosts

    Upgrading Site Recovery Manager

    Step 8: Upgrade SRM

    Step 9: Upgrade VMware Tools (Optional)

    Step 10: Upgrade Virtual Hardware (Optional)

    Step 11: Upgrade VMFS Volumes (Optional)

    Step 12: Upgrade Distributed vSwitches (Optional)

    Summary

    Index

  • Preface

    This edition of Administering VMware Site RecoveryManager 5.0 is not only a new edition of this book but oneof the first books published by VMware Press. About This Book

    Version 5.0 represents a major milestone in thedevelopment of VMware Site Recovery Manager (SRM).The need to write a book on SRM 5.0 seems morepressing than ever because of the many new features andenhancements in this version. I think these enhancementsare likely to draw to the product a whole new raft of peoplewho previously may have overlooked it. Welcome to thewonderful world that is Site Recovery Manager! This is a complete guide to using SRM. The version of bothESX and vCenter that we use in the book is 5.0. This bookwas tested against the ESX5i release. This is in markedcontrast to the first edition of this book and the SRMproduct where ESXi was not initially supported. In theprevious edition of the book I used abstract names for myvCenter structures, literally calling the vCenter in theProtected Site virtualcenterprotectedsite.rtfm-ed.co.uk.Later I used two cities in the United Kingdom (London andReading) to represent a Protected Site and a RecoverySite. This time around I have done much the same thing.But the protected location is New York and the recoverylocation is New Jersey. I thought that as most of my readersare from the United States, and there isnt a person on theplanet who hasnt heard of these locations, people wouldmore quickly latch on to the scenario. Figure P.1 shows mystructure, with one domain (corp.com) being used in NewYork and New Jersey. Each site has its own Microsoft

  • Active Directory domain controller, and there is a routerbetween the sites. Each site has its own vCenter, MicrosoftSQL Server 2008, and SRM Server. In this case I chose notto use the linked mode feature of vCenter 5; I will introducethat configuration later in the book. I made this decisionmerely to keep the distinction clear: that I have twoseparate locations or sites. Figure P.1 Two vCenter environments side by side

    You, the Reader

    I have a very clear idea of the kind of person reading thisbook. Ideally, you have been working with VMware vSpherefor some timeperhaps you have attended an authorizedcourse in vSphere 4 such as the Install, Configure andManage class, or even the Fast Track class. On top ofthis, perhaps you have pursued VMware CertifiedProfessional (VCP) certification. So, what am I getting at?This is not a dummys or idiots guide to SRM. You aregoing to need some background, or at least read my otherguides or books, to get up to speed. Apart from that, I willbe gentle with youassuming that you have forgotten someof the material from those courses, such as VMFSmetadata, UUIDs, and VMFS resignaturing, and that youjust have a passing understanding of storage replication.

  • Finally, the use of storage products in this book shouldntbe construed as a recommendation of any particularvendor. I just happened to meet the HP LeftHand Networksguys at VMworld Europe 2008 Cannes. They very kindlyoffered to give me two NFR licenses for their storagetechnologies. The other storage vendors who helped mewhile I was writing this book have been equally generous. In2008, both Chad Sakac of EMC and Vaughn Stewart ofNetApp arranged for my lab environment to be kitted out inthe very latest versions of their CLARiiON/Celerra andNetApp FSA systems. This empowered me to be muchmore storage-neutral than I was in previous editions of thisbook. For this version of the book I was fortunate to alsoadd coverage of the Dell EqualLogic system. Toward thatend, I would like to thank Dylan Locsin and William Urban ofDell for their support. What This Book Covers

    Here is a quick rundown of what is covered inAdministering VMware Site Recovery Manager 5.0. Chapter 1, Introduction to Site Recovery Manager This chapter provides a brief introduction to Site

    Recovery Manager and discusses some use cases. Chapter 2, Getting Started with Dell EqualLogic

    Replication This chapter guides readers through the configuration

    of replication with Dell EqualLogic arrays, and coversthe basic configuration of the ESXi iSCSI initiator.

    Chapter 3, Getting Started with EMC CelerraReplication

    This chapter guides readers through the configurationof replication with EMC Celerra arrays, and covers thebasic configuration of the ESXi iSCSI initiator.

  • Chapter 4, Getting Started with EMC CLARiiONMirrorView

    This chapter guides readers through the configurationof replication with CLARiiON arrays.

    Chapter 5, Getting Started with the HPStorageWorks P4000 Virtual SAN Appliance withRemote Copy

    This chapter guides readers through the configurationof replication with the HP P4000 VSA, and covers thebasic configuration of the ESXi iSCSI initiator.

    Chapter 6, Getting Started with NetApp SnapMirror This chapter guides readers through the configuration

    of NetApp replication arrays, and covers configurationfor FC, iSCSI, and NFS.

    Chapter 7, Installing VMware SRM This chapter covers the installation of VMware Site

    Recovery Manager, and details post-configurationsteps such as installing an array vendors SiteRecovery Adapter software.

    Chapter 8, Configuring vSphere Replication(Optional)

    This optional chapter details the steps required toconfigure vSphere Replication (VR).

    Chapter 9, Configuring the Protected Site This chapter covers the initial setup of the Protected

    Site and deals with such steps as pairing the sites,inventory mappings, array manager configuration, andplaceholder datastore configuration. It also introducesthe concept of the SRM Protection Group.

    Chapter 10, Recovery Site Configuration This chapter covers the basic configuration of the

    Recovery Plan at the Recovery Site.

  • Chapter 11, Custom Recovery Plans This chapter discusses how Recovery Plans can have

    very detailed customization designed around abusiness need. It also explains the use of messageprompts, command steps, and the re-IP of virtualmachines.

    Chapter 12, Alarms, Exporting History, and AccessControl

    This chapter outlines how administrators can configurealarms and alerts to assist in the day-to-daymaintenance of SRM. It details the reportingfunctionality available in the History components.Finally, it covers a basic delegation process to allowothers to manage SRM without using built-inpermission assignments.

    Chapter 13, Bidirectional Relationships and SharedSite Configurations

    The chapter outlines more complicated SRMrelationships where SRM protects VMs at multiplesites.

    Chapter 14, Failover and Failback This chapter covers the real execution of a Recovery

    Plan, rather than merely a test. It details the plannedmigration and disaster recovery modes, as well asoutlining the steps required to failback VMs to theiroriginal locale.

    Chapter 15, Scripting Site Recovery This chapter covers what to do if Site Recovery

    Manager is not available. It discusses how to domanually everything that Site Recovery Managerautomates.

    Chapter 16, Upgrading from SRM 4.1 to SRM 5.0 This chapter offers a high-level view of how to upgrade

  • SRM 4.1 to SRM 5.0. It also covers upgrading thedependencies that allow SRM 5.0 to function, includingupgrading ESX, vCenter, Update Manager, and virtualmachines.

    Hyperlinks

    The Internet is a fantastic resource, as we all know.However, printed hyperlinks are often quite lengthy, aredifficult to type correctly, and frequently change. Ive createda very simple Web page that contains all the URLs in thisbook. I will endeavor to keep this page up to date to makelife easy for everyone concerned. The single URL you needfor all the links and online content is www.rtfm-ed.co.uk/srm.html Please note that depending on when you purchased thisbook, the location of my resource blog might havechanged. Beginning in late January 2012, I should have anew blog for you to access all kinds of virtualizationinformation: www.mikelaverick.com At the time of this writing, there are still a number of storagevendors that have yet to release their supporting softwarefor VMware Site Recovery Manager. My updates on thosevendors will be posted to this books Web page: http://informit.com/title/9780321799920 Author Disclaimer

    No book on an IT product would be complete without adisclaimer. Here is mine: Although every precaution hasbeen taken in the preparation of this book, the contributorsand author assume no responsibility for errors oromissions. Neither is any liability assumed for damages

  • resulting from the use of the information contained herein.Phew, glad thats over with! Thank you for buying this book. I know Im not quite JamesJoyce, but I hope that people find reading this book bothentertaining and instructive.

  • Acknowledgments

    Before we move on to Chapter 1, I would like to thank themany people who helped me as I wrote this book. First, Iwould like to thank Carmel Edwards, my partner. She putsup with my ranting and raving about VMware andvirtualization. Carmel is the first to read my work and didthe first proofread of the manuscript. Second, I would like to thank Adam Carter, formerly of HPLeftHand Networks; Chad Sakac of EMC; Vaughn Stewartof NetApp; and Andrew Gilman of Dell. All four individualswere invaluable in allowing me to bounce ideas around andto ask newbie-like questionsregarding not just theirtechnologies, but storage issues in general. If I sound likesome kind of storage guru in this book, I have these guys tothank for that. (Actually, Im not a guru at all, even in terms ofVMware products. I cant even stand the use of the wordguru.) Within EMC, I would like to especially thank AlexTanner, who is part of Chads Army and was instrumentalin getting me set up with the EMC NS-120 systems as wellas giving me ongoing help and support as I rewrote thematerial in the previous edition for use in this edition of thebook. I would also like to thank Luke Reed of NetApp whohelped in a very similar capacity in updating my storagecontrollers so that I could use them with the latest version of

  • controllers so that I could use them with the latest version ofONTAP. Third, I would like to thank Jacob Jenson of the VMwareDR/BC Group and the SRM Team generally. I would alsolike to thank Mornay Van Der Walt of VMware. Mornay isthe director for Enterprise & Technical Marketing. I first metMornay at Cannes in 2008, and he was instrumental inintroducing me to the right people when I first took on SRMas a technology. He was also very helpful in assisting mewith my more obscure technical questions surrounding theearly SRM product without which the idea of writing a bookwould have been impossible. I would also like to thank LeeDilworth of VMware in the UK. Lee has been very helpful inmy travels with SRM, and its to him that I direct my emailswhen even I cant work out what is going on! I would like to thank Cormac Hogan, Tim Oudin, CraigWaters, and Jeff Drury for their feedback. Im often askedhow much of a technical review books like mine go through.The answer is a great dealand this review process isoften as long as the writing process. People often offer toreview my work, but almost never have the time to do it. SoI would like to thank these guys for taking the time andgiving me their valuable feedback.

  • About the Author

    Mike Laverick is a former VMware instructor with 17 yearsof experience in technologies such as Novell, Windows,Citrix, and VMware. He has also been involved with theVMware community since 2003. Laverick is a VMwareforum moderator and member of the London VMware UserGroup. Laverick is the man behind the virtualization websiteand the blog RTFM Education, where he publishes freeguides and utilities for VMware customers. Laverickreceived the VMware vExpert award in 2009, 2010, and2011. Since joining TechTarget as a contributor, Laverick hasalso found the time to run a weekly podcast called,alternately, the Chinwag and the Vendorwag. Laverickhelped found the Irish and Scottish VMware user groupsand now regularly speaks at larger regional eventsorganized by the Global VMUG in North America, EMEA,and APAC. Laverick previously published several books onVMware Virtual Infrastructure 3, vSphere 4, Site RecoveryManager, and View.

  • We Want to Hear from You!

    As the reader of this book, you are our most importantcritic and commentator. We value your opinion and want toknow what were doing right, what we could do better, whatareas youd like to see us publish in, and any other wordsof wisdom youre willing to pass our way. As an associate publisher for Pearson, I welcome yourcomments. You can email or write me directly to let meknow what you did or didnt like about this bookas well aswhat we can do to make our books better. Please note that I cannot help you with technicalproblems related to the topic of this book. We do have aUser Services group, however, where I will forward specifictechnical questions related to the book. When you write, please be sure to include this books titleand author as well as your name, email address, andphone number. I will carefully review your comments andshare them with the author and editors who worked on thebook. Email: [email protected] Mail: David Dusthimer

  • Associate Publisher Pearson 800 East 96th Street Indianapolis, IN 46240 USA

  • Chapter 1. Introduction to SiteRecovery Manager

    Before I embark on the book proper I want to outline someof the new features in SRM. This will be of particularinterest to previous users, as well as to new adopters, asthey can see how far the product has come since theprevious release. I also want to talk about what life was likebefore SRM was developed. As with all forms ofautomation, its sometimes difficult to see the benefits of atechnology if you have not experienced what life was likebefore its onset. I also want at this stage to make it clearwhat SRM is capable of and what its technical remit is. Itsnot uncommon for VMware customers to look at othertechnologies such as vMotion and Fault Tolerance (FT) andattempt to construct a disaster recovery (DR) use casearound them. While that is entirely plausible, care must betaken to build solutions that use technologies in ways thathave not been tested or are not supported by VMware. Whats New in Site Recovery Manager5.0

    To begin, I would like to flag whats new in the SRMproduct. This will form the basis of the new content in thisbook. This information is especially relevant to people whopurchased my previous book, as these changes are whatmade it worthwhile for me to update that book to becompatible with SRM 5.0. In the sections that follow I listwhat I feel are the major enhancements to the SRMproduct. Ive chosen not to include a change-log-style list ofevery little modification. Instead, I look at new features thatmight sway a customer or organization into adopting SRM.These changes address flaws or limitations in the previous

  • product that may have made adopting SRM difficult in thepast. vSphere 5 Compatibility

    This might seem like a small matter, but when vSphere 5was released some of the advanced management systemswere quickly compatible with the new platforma situationthat didnt happen with vSphere 4. I think many peopleunderestimate what a huge undertaking from adevelopment perspective vSphere 5 actually is. VMwareisnt as big as some of the ISVs it competes with, so it hasto be strategic in where it spends its developmentresources. Saturating the market with product release afterproduct release can alienate customers who feeloverwhelmed by too much change too quickly. I wouldprefer that VMware take its time with product releases andproperly QA the software rather than roll out new versionsinjudiciously. The same people who complained about anydelay would complain that it was a rush job had thesoftware been released sooner. Most of the people whoseemed to complain the most viciously about the delays invSphere 4 were contractors whose livelihoods dependedon project sign-off; in short, they were often looking out forthemselves, not their customers. Most of my big customersdidnt have immediate plans for a rollout of vSphere 5 onthe day of General Availability (GA), and we all know ittakes time and planning to migrate from one version toanother of any software. Nonetheless, it seems thats ashake-up in which VMware product management has beeneffective, with the new release of SRM 5.0 coming in ontime at the station. vSphere Replication

    One of the most eagerly anticipated new features of SRMis vSphere Replication (VR). This enables customers to

  • replicate VMs from one location to another using VMwareas the primary engine, without the need for third-partystorage-array-based replication. VR will be of interest tocustomers who run vSphere in many branch offices, and yetstill need to offer protection to their VMs. I think the biggesttarget market may well be the SMB sector for whomexpensive storage arrays, and even more expensive array-based replication, is perhaps beyond their budget. Iwouldnt be surprised to find that the Foundation SKUsreflect this fact and will enable these types of customers toconsume SRM in a cost-effective way. Of course, if youre a large enterprise customer whoalready enjoys the benefits of EMC MirrorView or NetAppSnapMirror, this enhancement is unlikely to change the wayyou use SRM. But with that said, I think VR could be ofinterest to enterprise customers; it will depend on theirneeds and situations. After all, even in a large enterpriseits unlikely that all sites will be using exactly the same arrayvendor in both the Protected and Recovery Sites. So thereis a use case for VR to enable protection to take placebetween dissimilar arrays. Additionally, in largeenvironments it may take more time than is desirable forthe storage team to enable replication on the rightvolumes/LUNs, now that VMware admins are empoweredto protect their VMs when they see fit. Its worth saying that VR is protocol-neutraland that thiswill be highly attractive to customers migrating from onestorage protocol to anotherso VR should allow forreplication between Fibre Channel and NFS, for example,just like customers can move a VM around with VMwaresStorage vMotion regardless of storage protocol type. Thisis possible because, with VR, all that is seen is adatastore, and the virtual appliance behind VR doesntinterface directly with the storage protocols that the ESXhost sees. Instead, the VR appliance communicates to theagent on the ESX host that then transfers data to the VRappliance. This should allow for the protection of VMs, evenif local storage is usedand again, this might be very

  • attractive to the SMB market where direct attached storageis more prevalent. Automated Failback and Reprotect

    When SRM was first released it did not come with afailback option. Thats not to say failback wasnt possible; itjust took a number of steps to complete the process. Ivedone innumerable failovers and failbacks with SRM 1.0 and4.0, and once you have done a couple you soon get into theswing of them. Nonetheless, an automated failbackprocess is a feature that SRM customers have had on theirwish lists for some time. Instructions to manage the storagearrays are encoded in what VMware calls Site RecoveryAdapters (SRAs). Previously, the SRA only automated thetesting and running of SRMs Recovery Plans. But now theSRAs support the instructions required to carry out afailback routine. Prior to this, the administrator had to usethe storage vendors management tools to managereplication paths. Additionally, SRM 5.0 ships with a process that VMware iscalling Reprotect Mode. Prior to the reprotect feature it wasup to the administrator to clear out stale objects in thevCenter inventory and re-create objects such as ProtectionGroups and Recovery Plans. The new reprotect featuregoes a long way toward speeding up the failback process.With this improvement you can see VMware is making theVM more portable than ever before. Most VMware customers are used to being able to moveVMs from one physical server to another with vMotionwithin the site, and an increasing number would like toextend this portability to their remote locations. This iscurrently possible with long-distance live migratetechnologies from the likes of EMC and NetApp, but theserequire specialized technologies that are distance-limitedand bandwidth-thirsty and so are limited to top-endcustomers. With an effective planned migration from SRM

  • and a reprotect process, customers would be able to moveVMs around from site to site. Clearly, the direction VMwareis taking is more driven toward managing the completelifecycle of a VM, and that includes the fact that datacenterrelocations are part of our daily lives. VM Dependencies

    One of the annoyances of SRM 1.0 and 4.0 was the lack ofa grouping mechanism for VMs. In previous releases allprotected VMs were added to a list, and each one had tobe moved by hand to a series of categories: High, Low, orNormal. There wasnt really a way to create objects thatwould show the relationships between VMs, or groupings.The new VM Dependencies feature will allow customers tomore effectively show the relationships between VMs froma service perspective. In this respect we should be able toconfigure SRM in such a way that it reflects the way mostenterprises categorize the applications and services theyprovide by tiers. In addition to the dependencies feature,SRM now has five levels of priority order rather than theprevious High, Low, and Normal levels. You might find that,given the complexity of your requirements, these offer allthe functionality you need. Improved IP Customization

    Another great area of improvement comes in themanagement of IP addresses. In most cases you will findthat two different sites will have entirely different IP subnetranges. According to VMware research, nearly 40% ofSRM customers are forced to re-IP their VMs. Sadly, its aminority of customers who have, or can get approval for, astretched VLAN configuration where both sites believethey make up the same continuous network, despite beingin entirely different geographies. One method of makingsure that VMs with a 10.x.y.z address continue to function in

  • a 192.168.1.x network is to adopt the use of NetworkAddress Translation (NAT) technologies, such that VMsneed not have their IP address changed at all. Of course, SRM has always offered a way to change the IPaddress of Windows and Linux guests using the GuestCustomization feature with vCenter. Guest Customization isnormally used in the deployment of new VMs to ensure thatthey have unique hostnames and IP addresses when theyhave been cloned from a template. In SRM 1.0 and 4.0, itwas used merely to change the IP address of the VM. Earlyin SRM a command-line utility, dr-ip-exporter, was createdto allow the administrator to create many guestcustomizations in a bulk way using a .csv file to store thespecific IP details. While this process worked, it wasnteasy to see that the original IP address was related to therecovery IP address. And, of course, when you came tocarry out a failback process all the VMs would need to havetheir IP addresses changed back to the original from theProtected Site. For Windows guests the process wasparticularly slow, as Microsoft Sysprep was used to triggerthe re-IP process. With this new release of SRM we have amuch better method of handling the whole re-IP processwhich will be neater and quicker and will hold all theparameters within a single dialog box on the properties ofthe VM. Rather than using Microsoft Sysprep to change theIP address of the VM, much faster scripting technologieslike PowerShell, WMI, and VBScript can be used. In thelonger term, VMware remains committed to investing intechnologies both internally and with its key partners. Thatcould mean there will be no need to re-IP the guestoperating system in the future. A Brief History of Life before VMwareSRM

    To really appreciate the impact of VMwares SRM, itsworth it to pause for a moment to think about what life was

  • like before virtualization and before VMware SRM wasreleased. Until virtualization became popular, conventionalDR meant dedicating physical equipment at the DRlocation on a one-to-one basis. So, for every business-critical server or service there was a duplicate at the DRlocation. By its nature, this was expensive and difficult tomanagethe servers were only there as standbys waitingto be used if a disaster happened. For people who lackedthose resources internally, it meant hiring out rack space ata commercial location, and if that included servers as well,that often meant the hardware being used was completelydifferent from that at the physical location. Although DR islikely to remain a costly management headache,virtualization goes a long way toward reducing the financialand administrative penalties of DR planning. In the main,virtual machines are cheaper than physical machines. Wecan have many instances of softwareWindows, forexamplerunning on one piece of hardware, reducing theamount of rack space required for a DR location. We nolonger need to worry about dissimilar hardware; as long asthe hardware at the DR location supports VMware ESX,our precious time can be dedicated to getting the serviceswe support up and running in the shortest time possible. One of the most common things Ive heard in courses andat conferences from people who are new to virtualization is,among other things:

    Were going to try virtualization in our DR location,before rolling it out into production.

    This is often used as a cautious approach by businessesthat are adopting virtualization technologies for the firsttime. Whenever this is said to me I always tell the individualconcerned to think about the consequences of what hessaying. In my view, once you go down the road ofvirtualizing your DR, it is almost inevitable that you will wantto virtualize your production systems. This is the case fortwo main reasons. First, you will be so impressed andconvinced by the merits of virtualization anyway that you will

  • want to do it. Second, and more important in the context ofthis book, is that if your production environment is notalready virtualized how are you going to keep your DRlocations synchronized with the primary location? There are currently a couple of ways to achieve this. Youcould rely solely on conventional backup and restore, butthat wont be very slick or very quick. A better alternativemight be to use some kind of physical to virtual conversion(P2V) technology. In recent years many of the P2Vproviders, such as Novell and Leostream, haverepositioned their offerings as availability tools, the ideabeing that you use P2V software to keep the productionenvironment synchronized with the DR location. Thesetechnologies do work, and there will be some merits toadopting this strategysay, for services that must, forwhatever reason, remain on a physical host at the primarylocation. But generally I am skeptical about this approach. Isubscribe to the view that you should use the right tools forthe right job; never use a wrench to do the work of ahammer. From its very inception and design you willdiscover flaws and problemsbecause you are using atool for a purpose for which it was never designed. For me,P2V is P2V; it isnt about DR, although it can bereengineered to do this task. I guess the proof is in thequality of the reengineering. In the ideal VMware world,every workload would be virtualized. In 2010 we reached atipping point where more new servers were virtualmachines than physical machines. However, in terms ofpercentage it is still the case that, on average, only 30% ofmost peoples infrastructure has been virtualized. So, atleast for the mid-term, we will still need to think about howphysical servers are incorporated into a virtualized DRplan. Another approach to this problem has been to virtualizeproduction systems before you virtualize the DR location.By doing this you merely have to use your storage vendorsreplication or snapshot technology to pipe the data files thatmake up a virtual machine (VMX, VMDK, NVRAM, log,

  • Snapshot, and/or swap files) to the DR location. Althoughthis approach is much neater, this in itself introduces anumber of problems, not least of which is getting up tospeed with your storage vendors replication technologyand ensuring that enough bandwidth is available from theProtected Site to the Recovery Site to make it workable.Additionally, this introduces a management issue. In thelarge corporations the guys who manage SRM may notnecessarily be the guys who manage the storage layer. Soa great deal of liaising, and sometimes cajoling, wouldhave to take place to make these two teams speak andinteract with each other effectively. But putting these very important storage considerations toone side for the moment, a lot of work would still need to bedone at the virtualization layer to make this sing. Thesereplicated virtual machines need to be registered on anESX host at the Recovery Site, and associated with thecorrect folder, network, and resource pool at thedestination. They must be contained within some kind ofmanagement system on which to be powered, such asvCenter. And to power on the virtual machine, the metadataheld within the VMX file might need to be modified by handfor each and every virtual machine. Once powered on (inthe right order), their IP configuration might needmodification. Although some of this could be scripted, itwould take a great deal of time to create and verify thosescripts. Additionally, as your production environmentstarted to evolve, those scripts would need constantmaintenance and revalidation. For organizations that makehundreds of virtual machines a month, this can quicklybecome unmanageable. Its worth saying that if yourorganization has already invested a lot of time in scriptingthis process and making a bespoke solution, you might findthat SRM does not meet all your needs. This is a kind oftruism. Any bespoke system created internally is alwaysgoing to be more finely tuned to the businesssrequirements. The problem then becomes maintaining it,testing it, and proving to auditors that it works reliably.

  • It was within this context that VMware engineers beganworking on the first release of SRM. They had a lofty goal:to create a push-button, automated DR system to simplifythe process greatly. Personally, when I compare it toalternatives that came before it, Im convinced that out ofthe plethora of management tools added to the VMwarestable in recent years VMware SRM is the one with theclearest agenda and remit. People understand andappreciate its significance and importance. At last we canfinally use the term virtualizing DR without it actually beinga throwaway marketing term. If you want to learn more about this manual DR, VMwarehas written a VM book about virtualizing DR that is called APractical Guide to Business Continuity & DisasterRecovery with VMware Infrastructure. It is free andavailable online here:

    www.vmware.com/files/pdf/practical_guide_bcdr_vmb.pdf I recommend reading this guide, perhaps before readingthis book. It has a much broader brief than mine, which isnarrowly focused on the SRM product. What Is Not a DR Technology?

    In my time of using VMware technologies, various featureshave come along which people often either confuse for ortry to engineer into being a DR technologyin other words,they try to make a technology do something it wasntoriginally designed to do. Personally, Im in favor of usingthe right tools for the right job. Lets take each of thesetechnologies in turn and try to make a case for their use inDR. vMotion

    In my early days of using VMware I would often hear my

  • clients say they intended to use vMotion as part of their DRplan. Most of them understood that such a statement couldonly be valid if the outage was in the category of a plannedDR event such as a power outage or the demolition of anearby building. Increasingly, VMware and the network andstorage vendors have been postulating the concept of long-distance vMotion for some time. In fact, one of thecontributors to this book, Chad Sakac of EMC, had asession at VMworld San Francisco 2009 about this topic.Technically, it is possible to do vMotion across largedistances, but the technical challenges are not to beunderestimated or taken lightly given the requirements ofvMotion for shared storage and shared networking. We willno doubt get there in the end; its the next logical step,especially if we want to see the move from an internal cloudto an external cloud become as easy as moving a VM fromone ESX host in a blade enclosure to another. Currently, todo this you must shut down your VMs and cold-migratethem to your public cloud provider. But putting all this aside, I think its important to say thatVMware has never claimed that vMotion constitutes a DRtechnology, despite the FUD that emanates from itscompetitors. As an indication of how misunderstood bothvMotion and the concept of what constitutes a DR locationare, one of these clients said to me that he could carryvMotion from his Protected Site to his Recovery Site. Iasked him how far away the DR location was. He said itwas a few hundred feet away. This kind of wonky thinkingand misunderstanding will not get you very far down theroad of an auditable and effective DR plan. The real usageof vMotion currently is being able to claim a maintenancewindow on an ESX host without affecting the uptime of theVMs within a site. Once coupled with VMwares DistributedResource Scheduler (DRS) technology, vMotion alsobecomes an effective performance optimizationtechnology. Going forward, it may indeed be easier to carryout a long-distance vMotion of VMs to avoid an impendingdisaster, but much will depend on the distance and scope

  • of the disaster itself. Other things to consider are thenumber of VMs that must be moved, and the time it takes tocomplete that operation in an orderly and graceful manner. VMware HA Clusters

    Occasionally, customers have asked me about thepossibility of using VMware HA technology across twosites. Essentially, they are describing a stretched clusterconcept. This is certainly possible, but it suffers from thetechnical challenges that confront geo-based vMotion:access to shared storage and shared networking. Thereare certainly storage vendors that will be happy to assistyou in achieving this configuration; examples includeNetApp with its MetroCluster and EMC with its VPLEXtechnology. The operative word here is metro. This type ofclustering is often limited by distance (say, from one part ofa city to another). So, as in my anecdote about my client,the distances involved may be too narrow to be regardedas a true DR location. When VMware designed HA, its goalwas to be able to restart VMs on another ESX host. Itsprimary goal was merely to protect VMs from a failedESX host, which is far from being a DR goal. HA was, inpart, VMwares first attempt to address the eggs in onebasket anxiety that came with many of the serverconsolidation projects we worked on in the early part of thepast decade. Again, VMware has never made claims thatHA clusters constitute a DR solution. Fundamentally, HAlacks the bits and pieces to make it work as a DRtechnology. For example, unlike SRM, there is really no wayto order its power-on events or to halt a power-on event toallow manual operator intervention, and it doesnt contain ascripting component to allow you to automate residualreconfiguration when the VM gets started at the other site.The other concern I have with this is when customers try tocombine technologies in a way that is not endorsed orQAd by the vendor. For example, some folks think aboutoverlaying a stretched VMware HA cluster on top of their

  • SRM deployment. The theory is that they can get the best ofboth worlds. The trouble is the requirements of stretchedVMware HA and SRM are at odds with each other. In SRMthe architecture demands two separate vCenters managingdistinct ESX hosts. In contrast, VMware HA requires thatthe two or more hosts that make up an HA cluster bemanaged by just one vCenter. Now, I dare say that with alittle bit of planning and forethought this configuration couldbe engineered. But remember, the real usage of VMwareHA is to restart VMs when an ESX host fails within a sitesomething that most people would not regard as a DRevent. VMware Fault Tolerance

    VMware Fault Tolerance (FT) was a new feature ofvSphere 4. It allowed for a primary VM on one host to bemirrored on a secondary ESX host. Everything thathappens on the primary VM is replayed in lockstep withthe secondary VM on the different ESX host. In the event ofan ESX host outage, the secondary VM will immediatelytake over the primarys role. A modern CPU chipset isrequired to provide this functionality, together with two 1GBvmnics dedicated to the FT Logging network that is used tosend the lockstep data to the secondary VM. FT scales toallow for up to four primary VMs and four secondary VMson the ESX host, and when it was first released it waslimited to VMs with just one vCPU. VMware FT is really anextension of VMware HA (in fact, FT requires HA to beenabled on the cluster) that offers much better availabilitythan HA, because there is no restart of the VM. As withHA, VMware FT has quite high requirements, as well asshared networking and shared storagealong withadditional requirements such as bandwidth and networkredundancy. Critically, FT requires very low-latency links tomaintain the lockstep functionality, and in mostenvironments it will be cost-prohibitive to provide thebandwidth to protect the same number of VMs that SRM

  • currently protects. The real usage of VMware FT is toprovide a much better level of availability to a selectnumber of VMs within a site than currently offered byVMware HA. Scalability for the Cloud

    As with all VMware products, each new release introducesincreases in scalability. Quite often these enhancementsare overlooked by industry analysts, which is ratherdisappointing. Early versions of SRM allowed you toprotect a few hundred VMs, and SRM 4.0 allowed theadministrator to protect up to 1,000 VMs per instance ofSRM. That forced some large-scale customers to createpods of SRM configurations in order to protect the manythousands of VMs that they had. With SRM 5.0, thescalability numbers have jumped yet again. A single SRM5.0 instance can protect up to 1,000 VMs, and can run upto 30 individual Recovery Plans at any one time. Thiscompares very favorably to only being able to protect up to1,000 VMs and run just three Recovery Plans in theprevious release. Such advancements are absolutelycritical to the long-term integration of SRM into cloudautomation products, such as VMwares own vCloudDirector. Without that scale it would be difficult to leveragethe economies of scale that cloud computing brings, whilestill offering the protection that production and Tier 1applications would inevitably demand. What Is VMware SRM?

    Currently, SRM is a DR automation tool. It automates thetesting and invocation of disaster recovery (DR), or as it isnow called in the preferred parlance of the day, businesscontinuity (BC), of virtual machines. Actually, its morecomplicated than that. For many, DR is a procedural event.A disaster occurs and steps are required to get the

  • business functional and up and running again. On the otherhand, BC is more a strategic event, which is concernedwith the long-term prospects of the business post-disaster,and it should include a plan for how the business might oneday return to the primary site or carry on in another locationentirely. Someone could write an entire book on this topic;indeed, books have been written along these lines, so I donot intend to ramble on about recovery time objectives(RTOs), recovery point objectives (RPOs), and maximumtolerable downtimes (MTDs)thats not really the subject ofthis book. In a nutshell, VMware SRM isnt a silver bulletfor DR or BC, but a tool that facilitates those decisionprocesses planned way before the disaster occurred. Afterall, your environment may only be 20% or 30% virtualized,and there will be important physical servers to consider aswell. This book is about how to get up and running withVMwares SRM. I started this section with the wordcurrently. Whenever I do that, Im giving you a hint thateither technology will change or I believe it will. Personally, Ithink VMwares long-term strategy will be to lose the R inSRM and for the product to evolve into a Site Managementutility. This will enable people to move VMs from theinternal/private cloud to an external/public cloud. It mightalso assist in datacenter moves from one geographicallocation to anotherfor example, because a lease on thedatacenter might expire, and either it cant be renewed or itis too expensive to renew. With VMware SRM, if you lose your primary or ProtectedSite the goal is to be able to go to the secondary orRecovery Site: Click a button and find your VMs beingpowered on at the Recovery Site. To achieve this, yourthird-party storage vendor must provide an engine forreplicating your VMs from the Protected Site to theRecovery Siteand your storage vendor will also provide aSite Recovery Adapter (SRA) which is installed on yourSRM server.

  • As replication or snapshots are an absolute requirement forSRM to work, I felt it was a good idea to begin by coveringa couple of different storage arrays from the SRMperspective. This will give you a basic run-through on howto get the storage replication or snapshot piece workingespecially if you are like me and you would not classifyyourself as a storage expert. This book does not constitutea replacement for good training and education in thesetechnologies, ideally coming directly from the storage arrayvendor. If you are already confident with your particularvendors storage array replication or snapshot features youcould decide to skip ahead to Chapter 7, Installing VMwareSRM. Alternatively, if youre an SMB/SME or you areworking in your own home lab, you may not have the luxuryof access to array-based replication. If this is the case, Iwould heartily recommend that you skip ahead to Chapter8, Configuring vSphere Replication (Optional). In terms of the initial setup, I will deliberately keep it simple,starting with a single LUN/volume replicated to anotherarray. However, later on I will change the configuration sothat I have multiple LUNs/volumes with virtual machines thathave virtual disks on those LUNs. Clearly, managingreplication frequency will be important. If we have multipleVMDK files on multiple LUNs/volumes, the parts of the VMcould easily become unsynchronized or even missedaltogether in the replication strategy, thus creating half-baked, half-complete VMs at the DR location. Additionally,at a VMware ESX host level, if you use VMFS extents butfail to include all the LUNs/volumes that make up thoseextents, the extent will be broken at the recovery locationand the files making up the VM will be corrupted. So, howyou use LUNs and where you store your VMs can be morecomplicated than this simple example will first allow. Thisdoesnt even take into account the fact that different virtualdisks that make up a VM can be located on differentLUNs/volumes with radically divergent I/O capabilities. Ourfocus is on VMware SRM, not storage. However, with thissaid, a well-thought-out storage and replication structure isfundamental to an implementation of SRM.

  • fundamental to an implementation of SRM. What about File Level Consistency?

    One question you will (and should) ask is what level ofconsistency will the recovery have? This is very easy toanswer: the same level of consistency had you notvirtualized your DR. Through the storage layer we could bereplicating the virtual machines from one site to anothersynchronously. This means the data held at both sites isgoing to be of a very high quality. However, what is notbeing synchronized is the memory state of your servers atthe production location. This means if a real disasteroccurs, that memory state will be lost. So, whateverhappens there will be some kind of data loss, unless yourstorage vendor has a way to quiesce the applications andservices inside your virtual machine. So, although you may well be able to power on virtualmachines in a recovery location, you may still need to useyour application vendors tools to repair these systemsfrom this crash-consistent state; indeed, if these vendortools fail you may be forced to repair the systems withsomething called a backup. With applications such asMicrosoft SQL and Exchange this could potentially take along time, depending on whether the data is inconsistentand on the quantity to be checked and then repaired. Youshould really factor this issue into your recovery timeobjectives. The first thing to ensure in your DR plan is thatyou have an effective backup and restore strategy to handlepossible data corruption and virus attacks. If you rely totallyon data replication you might find that youre bitten by theold IT adage of Garbage in equals garbage out. Principles of Storage Management andReplication

    I n Chapter 2, Getting Started with Dell EqualLogic

  • Replication, I will document in detail a series of differentstorage systems. Before I do that, I want to write very brieflyand generically about how the vendors handle storagemanagement, and how they commonly manage duplicationof data from one location to another. By necessity, thefollowing section will be very vanilla and not vendor-specific. When I started writing the first edition of this book I hadsome very ambitious (perhaps outlandish) hopes that Iwould be able to cover the basic configuration of everystorage vendor and explain how to get VMwares SRMcommunicating with them. However, after a short time Irecognized how unfeasible and unrealistic this ambitionwas! After all, this is a book about VMwares SRMstorage and replication (not just storage) is an absoluterequirement for VMwares SRM to function, so I would feelit remiss of me not to at least outline some basic conceptsand caveats for those for whom storage is not their dailymeat and drink. Caveat #1: All Storage Management Systems Are theSame

    I know this is a very sweeping statement that my storagevendor friends would widely disagree with. But in essence,all storage management systems are the same; its just thatstorage vendors confuse the hell out of everyone (and me inparticular) by using their own vendor-specific terms. Thestorage vendors have never gotten together and agreed onterms. So, what some vendors call a storage group, otherscall a device group and yet others call a volume group.Likewise, for some a volume is a LUN, but for othersvolumes are collections of LUNs. Indeed, some storage vendors think the word LUN is somekind of dirty word, and storage teams will look at you likeyou are from Planet Zog if you use the word LUN. In short,download the documentation from your storage vendor, and

  • immerse yourself in the companys terms and language sothat they become almost second nature to you. This willstop you from feeling confused, and will reduce the numberof times you put your foot in inappropriate places whendiscussing data replication concerns with your storageguys. Caveat #2: All Storage Vendors Sell Replication

    All storage vendors sell replication. In fact, they may wellsupport three different types, and a fourth legacy type thatthey inherited from a previous development or acquisitionand oh, they will have their own unique trademarkedproduct names! Some vendors will not implement orsupport all their types of replication with VMware SRM;therefore, you may have a license for replication type A, butyour vendor only supports types B, C, and D. This mayforce you to upgrade your licenses, firmware, andmanagement systems to support either type B, C, or D.Indeed, in some cases you may need a combination offeatures, forcing you to buy types B and C or C and D. Infairness to the storage vendors, as SRM has matured youwill find that many vendors support all the different types ofreplication, and this has mainly been triggered byresponding to competitors that do as well. In a nutshell, it could cost you money to switch to the righttype of replication. Alternatively, you might find that althoughthe type of replication you have is supported, it isnt themost efficient from an I/O or storage capacity perspective.A good example of this situation is with EMCs CLARiiONsystems. On the CLARiiON system you can use areplication technology called MirrorView. In 2008,MirrorView was supported by EMC with VMwares SRM,but only in an asynchronous mode, not in a synchronousmode. However, by the end of 2008 this support changed.This was significant to EMC customers because of thepractical limits imposed by synchronous replication.Although synchronous replication is highly desirable, it is

  • frequently limited by the distance between the Protectedand Recovery Sites. In short, the Recovery Site is perhapstoo close to the Protected Site to be regarded as a true DRlocation. At the upper level, synchronous replicationsmaximum distance is in the range of 400450 kilometers(248.5280 miles); however, in practice the real-worlddistances can be as small as 5060 kilometers (3137miles). The upshot of this limitation is that withoutasynchronous replication it becomes increasingly difficult toclass the Recovery Site as a genuine DR location.Distance is clearly relative; in the United States theselimitations become especially significant as the recenthurricanes have demonstrated, but in my postage-stamp-sized country they are perhaps less pressing! If youre looking for another example of these vendor-specific support differences, HP EVAs are supported withSRM; however, you must have licenses for HPs BusinessCopy feature and its Continuous Access technology for thisfeature and technology to function properly. The BusinessCopy license is only used when snapshots are createdwhile testing an SRM Recovery Plan. The ContinuousAccess license enables the replication of what HP ratherconfusingly calls vdisks in the storage groups. Caveat #3: Read the Manual

    Storage management systems have lots of containerswhich hold other containers and so on. This means thesystem can be very flexibly managed. You can think of thisas being a bit like Microsofts rich and varied groupstructure options in Active Directory. Beware thatsometimes this means storage replication is limited to aparticular type of container or level. This means you or yourstorage team has to very carefully determine how you willgroup your LUNs to ensure that you only replicate what youneed to and that your replication process doesnt, in itself,cause corruption by mismatched replication schedules.Critically, some storage vendors have very specific

  • requirements about the relationships among these variouscontainers when used with VMware SRM. Additionally,some storage vendors impose naming requirements for thenames of these objects and snapshots. If you deviate fromthese recommendations you might find that you cant evenget SRM to communicate with your storage correctly. In anutshell, its a combination of the right type of replicationand the right management structures that will make it workand you can only know that by consulting thedocumentation provided by your storage vendor. In short,RTFM! Now that we have these caveats in place, I want to map outthe structures of how most storage vendors systems work,and then outline some storage planning considerations. Iwill initially use non-vendor-specific terms. Figure 1.1 is adiagram of a storage array that contains many drives. Figure 1.1 A storage array with many groupings

  • Here is an explanation of the callouts in the figure. A This is the array you are using. Whether this is Fibre

    Channel, iSCSI, or NFS isnt dreadfully important inthis case.

    B This shows that even before allowing access, manystorage vendors allow disks in the array to begrouped. For example, NetApp refers to thisgrouping as a disk aggregate, and this is often yourfirst opportunity to set a default RAID level.

    C This is another group, referred to by some vendorsas a storage group, device group, or volume group.

    D Within these groups we can have blocks of storage,and most vendors do call these LUNs. With somevendors they stop at this point, and replication isenabled at group type C indicated by arrow E. In thiscase every LUN within this group is replicated to theother arrayand if this was incorrectly planned youmight find LUNs that did not need replicating werebeing unnecessarily duplicated to the recoverylocation, wasting valuable bandwidth and space.

    E Many vendors allow LUNs/volumes to be replicatedfrom one location to another, each with its ownindependent schedule. This offers completeflexibility, but there is the danger of inconsistenciesoccurring between the data sets.

    F Some storage vendors allow for another subgroup.These are sometimes referred to as recoverygroups, protected groups, contingency groups, orconsistency groups. In this case only LUNscontained in group E are replicated to the otherarray. LUNs not included in subgroup E are notreplicated. If you like, group C is the rule, but group Erepresents an exception to the rule.

    G This is a group of ESX hosts that allow access toeither group C or group E, depending on what the

  • array vendor supports. These ESX hosts will beadded to group G by either the Fibre Channel WWN,iSCSI IQN, or IP address or hostname. The vendorsthat develop their SRA (software that allows SRM tocommunicate to the storage layer) to work withVMwares SRM often have their own rules andregulations about the creation of these groupings; forinstance, they may state that no group E can be amember of more than one group C at any time. Thiscan result in the SRA failing to return all the LUNsexpected back to the ESX hosts. Some vendorsSRAs automatically allow the hosts to access thereplicated LUNs/volumes at the Recovery Site arrayand others do notand you may have to allocatethese units of storage to the ESX host prior to doingany testing.

    This grouping structure can have some importantconsequences. A good example of this is when you placevirtual machines on multiple LUNs. This is arecommendation by VMware generally for performancereasons, as it can allow different spindles and RAID levelsto be adopted. If it is incorrectly planned, you could causecorruption of the virtual machines. In this case, I have created a simple model where there isjust one storage array at one location. Of course, in a largecorporate environment it is very unlikely that multiple arraysexist offering multiple disk layers with variable qualities ofdisk IOPS. Frequently, these multiple arrays themselvescan be grouped together to create a collection of arraysthat can be managed as one unit. A good example of this isthe Dell EqualLogic Group Manager in which the first arraycreates a group to which many arrays can be added. In theDell EqualLogic SRA configuration the Group IP is usedas opposed to a specific IP address of a particular array. In Figure 1.2 the two virtual disks that make up the virtualmachine (SCSI 0:0 and SCSI 0:1) have been split acrosstwo LUNs in two different groups. The schedule for one

  • group has a latency of 15 minutes, whereas the other hasno latency at all. In this case, we could potentially get acorruption of log files, date stamps, and file creation as thevirtual machines operating system would not be recoveredat the same state as the file data. Figure 1.2 A VM with multiple virtual disks (SCSI 0:0 and

    SCSI 0:1) stored on multiple datastores, each with adifferent replication frequency

    We can see another example of this in Figure 1.3 if youchoose to use VMFS extents. As you may know, this ESXhas the capability to add space to a VMFS volume that iseither running out of capacity or breaking through the 2TBlimitation on the maximum size of a single VMFS volume.This is achieved by spanning a VMFS volume acrossmultiple blocks of storage or LUNs. Although in ESX 5 themaximum size for a single VMFS version 5 volume hasincreased, you might still have extents created fromprevious installations of ESX. Figure 1.3 In this scenario the two virtual disks are held on

  • a VMFS extent.

    In this case, the problem is being caused by not storing thevirtual machine on two separate LUNs in two separategroups. The impression from the vSphere client would bethat the virtual machine is being stored at one VMFSdatastore. Unless you were looking very closely at thestorage section of the vSphere client you might not noticethat the virtual machines files were being spanned acrosstwo LUNs in two different groups. This wouldnt just cause aproblem with the virtual machine; more seriously it wouldcompletely undermine the integrity of the VMFS extent. Thisbeing said, VMFS extents are generally frowned upon bythe VMware community at large, but they are occasionallyused as a temporary band-aid to fix a problem in the shortterm. I would ask you this question: How often in IT does aband-aid remain in place to fix something weeks, months,or years beyond the time frame we originally agreed?However, I do recognize that some folks are given suchsmall volume sizes by their storage teams that they have nooption but to use extents in this manner. This is often

  • caused by quite harsh policies imposed by the storageteam in an effort to save space. The reality is that if thestorage admins only give you 50GB LUNs, you find yourselfasking for ten of them, to create a 500GB extent! If you do,then fair enough, but please give due diligence to makingsure all the LUNs that comprise a VMFS extent are beingreplicated. My only message is to proceed with caution;otherwise, catastrophic situations could occur. This lack ofawareness could mean you create an extent, whichincludes a LUN, which isnt even being replicated. Theresult would be a corrupted VMFS volume at thedestination. Of course, if you are using the new VRtechnology this issue is significantly diminished, andindeed the complexity around having to use extents couldbe mitigated by adopting VR in this scenario. Clearly, there will be times when you feel pulled in twodirections. For ultimate flexibility, one group with one LUNallows you to control the replication cycles. First, if youintend to take this strategy, beware of virtual machine filesspanned across multiple LUNs and VMFS extents,because different replication cycles would causecorruption. Beware also that the people using the vSpheresay, your average server guy who only knows how tomake a new virtual machinemay have little awareness ofthe replication structure underneath. Second, if you go formany LUNs being contained in a single group, beware thatthis offers less flexibility; if youre not careful you mayinclude LUNs which do not need replicating or limit yourcapacity to replicate at the frequency you need. These storage management issues are going to be a toughnut to crack, because no one strategy will suit everyone. ButI imagine some organizations could have three groups,which are designed for replication in mind: One might usesynchronous replication, and the other two might haveintervals of 30 minutes and 60 minutes; the frequencydepends greatly on your recovery point objectives. Thisorganization would then create virtual machines on the rightVMFS volumes that are being replicated with the right

  • frequency suited for their recovery needs. I think enforcingthis strategy would be tricky. How would our virtual machineadministrators know the correct VMFS volumes to createthe virtual machines? Fortunately, in vSphere we are ableto create folders that contain volumes and set permissionson those. It is possible to guide the people who createVMFS volumes to store them in the correct locations. One method would be to create storage groups in the arraymanagement software that is mapped to different virtualmachines and their functionality. The VMFS volume nameswould reflect their different purposes. Additionally, in theVMware SRM we can create Protection Groups that couldmap directly to these VMFS volumes and their storagegroups in the array. The simple diagram in Figure 1.4illustrates this proposed approach. Figure 1.4 Protection Groups approach

  • In this case, I could have two Protection Groups in VMwareSRM: one for the boot/data VMFS volumes for Exchange,and one for the boot/data VMFS for SQL. This would alsoallow for three types of SRM Recovery Plans: a RecoveryPlan to failover just Exchange, a Recovery Plan to failoverjust SQL, and a Recovery Plan to failover all the virtualmachines. Summary

    Well, thats it for this brief introduction to SRM. Before wedive into SRM, I want to spend the next five chapterslooking at the configuration of this very same storage layer,to make sure it is fit for use with the SRM product. I willcover each vendor alphabetically (EMC, HP, NetApp) to

  • avoid being accused of vendor bias. In time I hope thatother vendors will step forward to add additional PDFs tocover the configuration of their storage systems too. Pleasedont see these chapters as utterly definitive guides tothese storage vendors systems. This is an SRM book afterall, and the emphasis is squarely on SRM. If you arecomfortable with your particular storage vendorsreplication technologies you could bypass the next fewchapters and head directly to Chapter 6, Getting Startedwith NetApp SnapMirror. Alternatively, you could jump tothe chapter that reflects your storage array and then headoff to Chapter 7. I dont expect you to read the next fourchapters unless youre a consultant who needs to befamiliar with as many different types of replication aspossible, or youre a masochist. (With that said, some folkssay that being a consultant and being a masochist aremuch the same thing...)

  • Chapter 2. Getting Started withDell EqualLogic Replication

    In this chapter you will learn the basics of how to configurereplication with Dell EqualLogic. The chapter is notintended to be the definitive last word on all the bestpractices or caveats associated with the proceduresoutlined. Instead, its intended to be a quick-start guideoutlining the minimum requirements for testing and runninga Recovery Plan with SRM; for your particularrequirements, you should at all times consult furtherdocumentation from Dell, and if you have them, yourstorage teams. Additionally, Ive chosen to coverconfiguration of the VMware iSCSI initiator to more closelyalign the tasks carried out in the storage layer with the ESXhost itself. Before I begin, I want to clearly describe my physicalconfiguration. Its not my intention to endorse a particularconfiguration, but rather to provide some background thatwill clarify the step-by-step processes Ill be documenting. Ihave two EqualLogic systems in my rack, each in its owngroup. In the real world you would likely have many arraysin each groupand a group for each site where you haveEqualLogic arrays. In my Protected Site, I have anEqualLogic PS6000XV which has sixteen 15k SAS drives,and a PS4000E which has sixteen SATA drives, and bothare configured with the same management system: theEqualLogic Group Manager. Of course, these arrays areavailable in many different disk configurations to suit usersstorage and IOPS needs, including SATA, SAS, and SSD.Apart from the different types of disks in the arrays, thecontrollers at the back are slightly different from each otherwith the PS6000 offering more network ports. ThePS6010/6510 is available with 10GB interfaces as well.From a networking perspective, the EqualLogic

  • automatically load-balances I/O across the available NICsin the controller, and there is no need to configure anyspecial NIC teaming or NIC bonds as is the case with othersystems. On the whole, I find the EqualLogic systems veryeasy to set up and configure. Figure 2.1 shows the PS4000at the top and the PS6000 at the bottom. If the figure was incolor, you would see that the controls are shown in purpleand in green to make the system easy to identify at theback of the rack, with each system offering a rich array ofuplink ports and speeds. Figure 2.1 The rear of these two Dell EqualLogic systems

    is color-coded to make them easy to distinguish from therear of the racks.

  • In the screen grab of the Group Manager shown in Figure2.2, you can see that I have two EqualLogic systems. Eachmember or array has been added into its own group, and ofcourse its possible to have many members in each group. Iconfigured these group names during a discoveryprocess when the arrays were first racked up and poweredon using a utility called the Remote Setup Wizard. TheRemote Setup Wizard is part of the Host Integration Toolsfor Windows, and a CLI version is available as part of theHost Integration Tools for Linux. This setup wizarddiscovers the arrays on the network, and then allows you toset them up with a friendly name and IP address and addthem to either a new group or an existing group. Figure 2.2 A single Dell EqualLogic array (or member) in a

  • single group called New-York-Group

    In my configuration I have two members (new-york-eql01and new-jersey-eql01) that have each been added to theirown group (New-York-Group and New-Jersey-Group). Creating an EqualLogic iSCSI Volume

    Creating a new volume accessible to ESXi hosts is a veryeasy process in the EqualLogic Group Manager. The mainstep to remember is to enable multiple access so thatmore than one ESXi host can mount and access thevolume in question. For some reason, in my work its theone blindingly obvious step I sometimes forget to perform,perhaps because Im so overly focused on making sure Iinput my IQNs correctly. You can kick off the wizard tocreate a new volume, and see the volume information indetail in the Volumes pane in the Group Manager (seeFigure 2.3). Figure 2.3 Selecting the Volumes node shows existing

    volumes; selecting Create volume lets you carve newchunks of storage to the ESX host

  • To create a volume, follow these steps. 1. On the Step 1 Volume Settings page of the wizard

    (see Figure 2.4), enter a friendly name for the volumeand select a storage pool.

    Figure 2.4 Naming a volume and selecting a storage pool

    For my volume name I chose virtualmachines, whichperhaps isnt the best naming convention; you might

  • prefer to label your volumes with a convention thatallows you to indicate which ESX host cluster hasaccess to the datastore. For the storage pool, in mycase there is just one pool, called default, whichcontains all my drives, but it is possible to have manystorage pools with different RAID levels. It is alsopossible to have multiple RAID levels within one pool,and this allows the administrator to designate apreference for a volume to reside on a RAID typethats best suited for that application IOPSs demandsor resiliency. A recent firmware update from Dellenhanced the EqualLogic arrays performance load-balancing algorithms to allow for the automaticplacement and relocation of data in volumes within andbetween EqualLogic arrays. This storage loadbalancing (conceptually, a storage version of DRS)helps to avoid performance hits by redistributing veryactive data to less heavily used array resources.

    2. On the Step 2 Space page of the wizard (seeFigure 2.5), set the size of the volume, and whether itwill be fully allocated from the storage pool or thinlyprovisioned. Also, reserve an allocation of freespace for any snapshots you choose to take.

    Figure 2.5 EqualLogic has, for some time, supported thinprovisioning that can help with volume sizing questions.

  • The allocation here is conservatively set as a default of100%. Think of this value as just a starting point whichyou can change at any time. If you create a 500GBLUN, 500GB would be reserved for snapshot datathe configuration assumes that every block mightchange. You might wish to lower this percentage valueto something based on the number of changes youenvisage, such as a range from 10% to 30%. It ispossible to accept the default at this stage, andchange it later once you have a better handle on yourstorage consumption over time.

    Remember, this allocation of snapshot space will notinfluence your day-to-day usage of VMware SRM. Thisreservation of snapshot space comes from anallocation that is separate and distinct from that usedby VMware SRM. So its entirely possible to set this as0% on volumes where you dont envisage yourselfcreating snapshots of your own, as would be the casewith archive or test and development volumes

    3. On the Step 3 iSCSI Access page of the wizard(see Figure 2.6), set the access control for thevolume.

    Figure 2.6 Setting the access control to preventunauthorized servers from accidentally connecting to a

    volume intended for another server

  • This prevents unauthorized servers from accidentallyconnecting to a volume intended for another server.You can set this using CHAP, IP address, or iSCSIIQN. In the Access type section of the page we areenabling simultaneous connections on the volume. Atthis point you can only add one entry at a time, but lateron you can add more IQNs to represent your multipleESXi hosts that will need to access the volume withinthe VMware HA/DRS cluster.

    4. Once the volume is created, use the volumesAccess tab and the Add button to add moreIQNs/IPs or CHAP usernames representing yourESXi hosts (see Figure 2.7).

    Figure 2.7 As I only have two ESX hosts, it was easy to cutand paste to add the IQNs, and merely adjust the alias

    value after the colon.

  • Granting ESXi Host Access to theEqualLogic iSCSI Volume

    Now that we have created the iSCSI target its a good ideato enable the software iSCSI target on the ESXi hosts. Ifyou have a dedicated iSCSI hardware adapter you canconfigure your IP settings and IQN directly on the card. Oneadvantage of this is that if you wipe your ESXi host, youriSCSI settings remain on the card; however, they are quitepricey. Many VMware customers prefer to use the ESXihosts iSCSI software initiator. The iSCSI stack in ESXi 5has recently been overhauled, and it is now easier to set upand offers better performance. The following instructionsexplain how to set up the iSCSI stack to connect to theEqualLogic iSCSI volume we just created. Just like with theother storage vendors, its possible to install EqualLogicsMultipathing Extension Module (MEM) that offers intelligentload balancing across your VMkernel ports for iSCSI. Before you enable the software initiator/adapter in theESXi host, you will need to create a VMkernel port group

  • with the correct IP data to communicate to the EqualLogiciSCSI volume. Figure 2.8 shows my configuration for esx1and esx2; notice that the vSwitch has two NICs for faulttolerance. In the figure, Im using ESXi Standard vSwitches(SvSwitches), but theres nothing to stop you from usingDistributed vSwitches (DvSwitches) if you have access tothem. Personally, I prefer to reserve the DvSwitch for virtualmachine networking, and use the SvSwitch for any ESXihost-specific networking tasks. Remember, ESXi 5introduced a new iSCSI port-binding policy feature thatallows you to confirm that multipathing settings are correctwithin ESXi 5. In my case, I created a single SvSwitch withtwo VMkernel ports, each with their own unique IPconfiguration. On the properties of each port group (IP-Storage1 and IP-Storage2) I modified the NIC teamingpolicy such that IP-Storage1 had dedicated access tovmnic2 and IP-Storage2 had dedicated access to vmnic3.This configuration allows for true multipathing to the iSCSIvolume for both load-balancing and redundancy purposes. Figure 2.8 In this configuration for esx1 and esx2, the

    vSwitch has two NICs for fault tolerance.

    Before configuring the VMware software initiator/adapter,you might wish to confirm that you can communicate withthe EqualLogic array by running a simple test using pingand vmkping against the IP address of the group IPaddress. Additionally, you might wish to confirm that thereare no errors or warnings on the VMkernel port groups youintend to use in the iSCSI Port Binding setting of the Port

  • intend to use in the iSCSI Port Binding setting of the PortProperties, as shown in Figure 2.9. Figure 2.9 iSCSI port binding is enabled due to the correct

    configuration of the ESX virtual switch

    In ESXi 5 you should not need to manually open the iSCSIsoftware TCP port on the ESXi firewall. The port numberused by iSCSI, which is TCP port 3260, should beautomatically opened. However, in previous releases ofESX this sometimes was not done, so I recommend thatyou confirm that the port is opened, just in case. 1. In vCenter, select the ESXi host and the

    Configuration tab.

  • 2. In the Software tab, select the Security Profile link. 3. In the Firewall category, click the Properties link. 4. In the Firewall Properties dialog box, open the TCP

    port (3260) for the Software iSCSI Client (see Figure2.10).

    Figure 2.10 By default, ESXi 5 opens iSCSI port 3260automatically.

    5. Add the iSCSI software adapter. In previousreleases of ESXi the adapter would be generated bydefault, even if it wasnt needed. The new model foriSCSI on ESXi 5 allows for better control over itsconfiguration. In the Hardware pane, click theStorage Adapters link and then click Add to createthe iSCSI software adapter, as shown in Figure2.11.

    Figure 2.11 In ESXi 5 you m