VMworld 2011 (BCO3276)

46
BCO3276 Disaster Recovery and Site Migration with Site Recovery Manager: Customer Experiences from Around the World Gil Haberman, Product Marketing Manager, Business Continuity and Disaster Recovery, VMware, Inc. Alan Baird, VMware, Inc. Christopher Wells, TUV Rheinland Japan Ltd. Paul Schlosser, VMware, Inc. Robert Busillo, Independence Blue Cross

description

In this session we heard customer experiences facing some of the biggest DR challenges ever. We heard how Site Recovery Manager was used in Japan after the great earthquake disaster and in New Zealand after the earthquake at Christchurch. We also learned about a case in which Site Recovery Manager was used for site migration.

Transcript of VMworld 2011 (BCO3276)

  • 1. BCO3276Disaster Recovery and Site Migrationwith Site Recovery Manager: CustomerExperiences from Around the WorldGil Haberman, Product Marketing Manager, Business Continuity and Disaster Recovery, VMware, Inc.Alan Baird, VMware, Inc.Christopher Wells, TUV Rheinland Japan Ltd.Paul Schlosser, VMware, Inc.Robert Busillo, Independence Blue Cross

2. Disclaimer This session may contain product features that arecurrently under development. This session/overview of the new technology representsno commitment from VMware to deliver these features inany generally available product. Features are subject to change, and must not be included incontracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or featuresdiscussed or presented have not been determined.2 3. AgendaSRM and vSphere For Simple and Reliable DRTV Rheinland, JapanMainfreight, New ZealandIndependence Blue Cross, USA3 4. SRM and vSphere For Simple and Reliable DR4 5. Disasters Happen. Do You Need Protection? 43% of companies experiencing disasters never re-open, and 29% close within two years.(McGladrey and Pullen)93% of business that lost their data center for10 days went bankrupt within one year. (National Archives & Records Administration)40% of all companies that experience a majordisaster will go out of business if they cannotgain access to their data within 24 hours. (Gartner) Top executives say 10 hours to recovery; IT managers say up to 30 hours. (Harris Interactive)5 6. vCenter Site Recovery Manager Ensures Simple, Reliable DRSite Recovery Manager Complements vSphere to provide the simplestand most reliable disaster protection and site migration for all applicationsProvide cost-efficient replication ofapplications to failover site Built-in vSphere Replication Broad support for storage-based Site A (Primary)Site B (Recovery) VMwarevCenter Server Site RecoveryManagerVMware vCenter ServerSite Recovery ManagerreplicationSimplify management of recovery and VMware vSphereVMware vSpheremigration plans Replace manual runbooks withcentralized recovery plans From weeks to minutes to set up newplan ServersServers Automate failover and migrationprocesses for reliable recovery Enable frequent non-disruptive testing Ensure fast, automated failover Automate failback processes6 7. SRM MomentumIntroduced in Q2 2008 125,000+ units sold5,000+ customers 50% annual growth in 2010 If your organization is already taking advantage of virtualization, then adding Site Recovery Manager to handle disaster recovery is a no-brainer. Jerry Wilkin Senior Systems Administrator, Dayton Superior Corp7 8. Whats New In Site Recovery Manager 5.0?vSphere Replication Bundled with SRM at no additional cost Expand DR coverage to Tier 2 apps and smaller Provides simple, cost-efficient replicationbetween vSphere clusters sitesAutomated failback Bi-directional recovery plans Automates failback to original siteStreamline plannedPlanned migrationmigrations New workflow that can be applied to any (for disaster avoidance,recovery planplanned maintenance, ) Ensures no data-loss, application-consistentmigrations of virtual machinesOthers More granular control over VM startup order Protection-side APIs IPv6 support8 9. Beyond DR: Disaster Avoidance And Planned Migrations3 typical use-cases for SRMDisaster Failover Disaster Avoidance Planned MigrationRecover from unexpected Anticipate potentialMost frequent SRM use casesite failuredatacenter outages Planned datacenter Full or partial site failure For example: in case of maintenance planned hurricane, floods, Global load balancingThe most critical but leastforced evacuation, etc.frequent use-case Streamline routine Unexpected site failures do Initiate preventive failovermigrations across sitesnot happen oftenfor smooth migration Test to minimize risk When they do, fast recovery Leverage SRM planned Execute partial failoversis critical to the business migration to ensure no Leverage SRM planneddata-lossmigration to ensure no Automated failback data-lossenables easy return to Automated failbackoriginal siteenables bi-directional migrations9 10. TV Rheinland10 11. Background TV Rheinland was started in Germany in 1872 to perform safety testing of steam pressure vessels. Today TV Rheinland is active in 61 countries and 39 different business fields. Technical certification of a wide range of technology products and services. Examples: PV cells, X-ray machines, photocopiers, computer monitors, computer mice/keyboards. Also perform Business Continuity Management, Data Protection Management, Information Security and ITIL services.11 12. Justification Propensity for seismicity in Japan. Already had infrastructure at more than one location. Services hosted for external customers required specific SLA. Simplify difficult process of disaster recovery.12 13. Status Quo Before the earthquake, companies where using physical servers at their DR site, or had no DR site at all! Companies in Japan are now conscious of a need for DR and BCP solutions. Many Japanese VMware customers are only familiar with the vSphere base product, not complimentary solutions. VMware is now more actively marketing the SRM products as a result of the recent earthquake.13 14. History Prior to SRM, DR process was manual. Already had implemented SAN replication, so running SRM was next logical step. DR testing was non-existent due to manual overhead involved with testing. Leveraged VMware snapshots to reduce RTO during failback.14 15. Implementation Met with VMware and a local reseller for guidance. Set up a POC and learned the product, especially with help of official documentation and books by 3rd party authors. Performed tests of the recovery plan. Leveraged IP address mapping CSV. 3-4 months later, put system into production.15 16. Use Cases General use of VMware products helps conserve power (useful during power shortages). Shift workloads from areas under power consumption constraints/reductions to unaffected areas. Typical DR protection between Eastern and Western Japan offices. Temporary fail-over to remote site for planned power outage situations (once per year).16 17. Disaster & Aftermath On March 11th, at 2:46PM JST our disaster recovery plan went into motion. Immediately following the initial shock, systems were functional. Performed testing of the SRM recovery plans as extra precaution. Rolling power outages were implemented by TEPCO, necessitating failover process. Systems not covered by SRM (physical machines) had RTO of >24 hours.17 18. Lessons & Suggestions Planning for the initial disaster is not enough, you must also plan for energy and other supply shortages. Ensure there is a chain of command to kick-off recovery and ensure more than 1 person can initiate it. Make sure newly created VMs are configured in the Recovery Plan. Be sure to back-up the SRM configuration (local files) and DB backend prior to upgrade. Perform frequent disaster tests. Provide more user-friendly way to map IP addresses. Alert administrators about unprotected or misconfigured VMs.18 19. Pray for Japan!19 20. Thanks! For more information: www.tuv.com Follow me: Blog: http://www.vsamurai.com Twitter: @wygtya LinkedIn: http://jp.linkedin.com/in/wygtya Facebook: http://www.facebook.com/wygtya20 21. Mainfreight21 22. New Zealand - We are here! We are here!22Confidential 23. Challenges we face Natural Disasters Earthquakes ( 3 major and 250 minor in the last 12 months) Tsunami Volcanic 2 active Remote 3 hour flight to Australia Stability of Power 1998 Auckland power crisis Reliance on hydro electricity WAN Considerations Cost and bandwidth limitations23 Confidential 24. What was learnt from Christchurch Christchurch was considered low risk for earthquakes Servers and desktops Unable to return to the office 6 months later Servers were protected but desktops were lost Reliance on backup media Slow and potentially unreliable The Human factor Other priorities Civil unrest The value of virtualisation DR with SRM becomes viable24Confidential 25. SRM - Customer Experience From Around the GlobeDavid Hall Mainfreight Group IT Infrastructure Manager25Confidential 26. Who are we A company with a 100 year visionMainfreight is a global supply chain logistics providerCommenced business in 1978Today has a market capitalisation of $993 millionSales revenues in excess of $1.75 billion4,600+ team membersUnique culture & philosophyWe have a quality focus and aim to delight our customers.26 Confidential 27. Where We Are Ready, Fire, Aim!27 Confidential 28. Our ChallengesDo more with less Hybrid model consisting of mostly physical Cost of DR & BCP Previous DR process worked but was complex & time consuming Recent Christchurch earthquakes reiterated to our business the reality of disaster occurring & the importance of DR & BCP Costs of ~$10,000 every hour the systems are down28 Confidential 29. When Disaster Strikes - Christchurch29 Confidential 30. About our environment Top performing organisations are those that have harnessed the true potential of todays cutting edge technologiesHardware / Software HP servers & storage South Auckland Cisco network Production Microsoft, Citrix, VSphere/SRM 4.x Active Active data centres Applications protected with SRM Maintrak - Web-based consignment tracking system Recovery MIMs - Inventory management system Cargowise International freight forwarding system On Account Accounting system On Sale CRM systemCentral Auckland30Confidential 31. SRM Highlights DR is only as good as the last time it was testedReduced DR test times from ~15 hours to 4 hoursReduced number of team for DR from 4 to 2Minimised downtime costs estimated at $10k per hourAchieved 99.999% availabilitySRM has been proven and used in anger - SAN failureInstallation well planned and implementedProject completed on time and on budgetMinimal external consultancy requiredProvided a platform to deliver DR for future business applications31 Confidential 32. Thank you VMware has provided us witha flexible, reliable IT platform to support the business and deliverIT services in more responsive andcost-effective ways. Kevin Drinkwater, Global Chief Information Officer 32 Confidential 33. IBC33 34. Company BackgroundVMware History IBC started in 2004 to convert physical servers to VMs in a company wide effort to consolidate hardware, drive down maintenance cost & datacenter space/utilities.Servers Virtualized We currently manage about 800 VMs residing on 60 plus ESX Hosts running ESX 4.1 & ESXi. Since 2005 we have converted over 300 physical servers to VMs.StorageEMC DMX 4 (Production and DR ) & NetApp (Test, Dev and QA)Uses for VMware We run Windows 2003, 2008, Red Hat v5 (64 and 32 bit O/Ss). We have many Tiers 1 applications running in our VM environment SQL, Share Point, Citrix, Hyperion/Informatics and our Claims processing servers. 34 35. Business NeedsWhat was neededWe were moving our data center in the Summer of 2009 from Philadelphiato Hershey , PA and needed to migrate 300+ Production VMs to our newlocation.SRM Review VMware came onsite to present the SRM product for a future IBC project (DR insourcing) after the product presentation we saw the potential in using this product for our Datacenter move. Working with VMware professional services served very beneficial for IBC.Did it solve the problem? Yes, SRM made our D.C. move less stressful and streamlined, it also solved our plans for DR insourcing & Redundant Production environment.35 36. Business NeedsWhy VMware solutionWhen we saw the SRM product and how it could help us move 300+production VMs from our Center City Philadelphia D.C to our newHershey, PA D.C it was clear to us that this product would save us manyman hours that we needed elsewhere on our D.C move weekend.SRM Characteristics The SRM advantages that IBC leveraged were the pre-move testing, streamlining and automation of the over all D.C move script which we could plan out the recovery sequence of Tier 1 Prod VMs to Tier 3 Test VMs. The over reliability of this product saved our company many Admin man hours, pre and post migration.36 37. Business NeedsTime outages avoided We saved hours of Production server outage times by using SRM instead of a manual migration and countless Admin man hours were saved allowing our staff to be utilized in other areas of the move weekend.What was neededSRM plugin for Virtual CenterEMC SRDFVMware Professional Services The professional services contact was very knowledgeable in the SRM product and how to integrate this with our EMC storage.SRM script and planning Setting up your server priority migration planning.37 38. Data Center Migration How much time till DC cutoverProfessional Service came out a few months prior to theDC move and were onsite for 2 days to prepare the planand gather information about the environment. What was the setup and integration process We worked with VMware to setup our migration script and verify that the EMC storage was replicating correctly38 39. Data Center Migration Services needed Replication of data Our initial synch was about 50 LUNS and about 30TB of data. We then setup daily replication of about 1TB a day. Setup our server priority script (what servers to power down last and which servers to power up 1st. VMware came onsite 1 more day for verification that all was well before the final move date.39 40. Data Center MigrationWhat happened on Labor day move weekend? VMware was on site Friday night when we kicked off SRM,there was about 1TB of changes left to be synched. We thendisconnected our EMC storage at the old datacenter and failedover to the new datacenter storage. We had less than 10 VMs that needed some attention to getback online. I would highly recommend the VMware Professional Services.They were on site a total of 4 days and walked us through thewhole datacenter migration.40 41. Today How is SRM running today? We currently insource our Disaster Recovery Drill at our D.R./Redundant Production datacenter in Reading, PA utilizing SRM and VMware to get us through the DR drill with replication and failover. We currently run these tests 3-4 times a year.41 42. Next Steps42 43. Where Can I Learn More?At VMworld Visit us at the booth Multiple great sessions on SRM BCO 1269 SRM 5 technical Tue 4:30PM; Wed 1 PM BCO 1562 SRM 5 technical Tue 12 PM, Wed 10 AM BCO 2527 SRM 5 technical Tue 3 PM BCO 3334 Cloud DR Mon 10 AM; Wed 4 PM BCO 3336 Cloud DR SP perspective Mon 11:30AM; Tue 12 PMVMware.com Product Page www.vmware.com/products/srm Overview, datasheet, webinars, docs, community links Free 60-day Evaluation all you need to get started! Solutions from VMware www.vmware.com/solutions/continuity43 44. Questions? 2011 VMware Inc. All rights reserved 45. BCO3276Disaster Recovery and Site Migrationwith Site Recovery Manager: CustomerExperiences from Around the World