Business Continuity HA Backup DR
-
Upload
erictsai0203 -
Category
Documents
-
view
39 -
download
0
Transcript of Business Continuity HA Backup DR
Business Continuity and Disaster Recovery with VMware Infrastructure 3
Larry EllisonRecovery ExpertAccessFlow, Inc.August 7, 2007
VirtualDR Solutions
Agenda
Importance of Business Continuity and Current Challenges
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Implementing Better Business Continuity
Virtualization as a BC enabler
Agenda
Importance of Business Continuity and Current Challenges
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Implementing Better Business Continuity
Virtualization as a BC enabler
Business Continuity
Definition
BC Importance and Focus
Traditional Challenges
Defining business continuity
What is business continuity?
Business continuity is about protecting data, systems, and services:
Preventing data loss
Minimizing planned downtime
Preventing unplanned downtime
Ensuring rapid recovery
Power failure
Server failure
Data corruption
Disk failure
Fire
Hurricane
User Error Backup windows
Server maintenance
OS fault
Software failure
Virus infection
Storage failure
Why So Much Focus on Business Continuity?Standards for availability are rising
Faster pace of business more critical change
Agility a competitive advantage, demands highest service levels
Number and severity of threats increasing
• 1 out of 500 data centers will have a major outage each year (Disaster Recovery Journal)
• 43% of companies experiencing disasters never re-open, and 29% close within two years (McGladrey and Pullen)
Circuit breakers wipe out the Web
PG&E’s faulty equipment reveals the Internet’s vulnerability to a disruption of its power source
Verne Kopytoff, Chronicle Staff Writer
Thursday, July 26, 2007
All it took to wipe out some of the Internet's biggest sites Tuesday was some faulty PG&E electrical breakers that caused a blackout in downtown San Francisco.
Some of the Web's hottest destinations - Craigslist, Yelp, Second Life - were suddenly inaccessible from San Mateo to Singapore after back-up generators failed at the facility housing their computer equipment.
Although mostly fixed within 12 hours, the incident shows how easy it is to send major swaths of the online world to the dark ages. Sites that millions of people rely on can be knocked offline by freak accidents, not to mention major catastrophes, and this event served as a wake-up call to the executives that operate them.
"If the data center was that vulnerable to a power outage, what if something really catastrophic happened like an earthquake?" asked Derek Gordon, marketing vice president for Technorati, a search engine of blogs that was brought down for a couple of hours Tuesday after the blackout. "What does that say about the vulnerability of the Internet in the Bay Area?"
The troubles started when 365 Main, a key data center in downtown San Francisco that touts its "state-of-the-art electrical system," failed to get its backup generators started immediately after the power outage hit around 1:45 p.m. A number of companies that house their computer servers in the facility were suddenly offline, setting off a mad scramble to get the Web sites up and running.
Shoppers at RedEnvelope, an online retailer, couldn't buy monogrammed pillow soaps. Hipsters on Yelp, the review site, had to take a break from sharing their reports of fabulous and not-so fabulous restaurants. Users of online classified service Craigslist were out of luck in finding a second-hand futon.
The backup generators were turned on 45 minutes after the blackout started, a delay that 365 Main said it was still investigating yesterday. But it took some of the facility's customers anywhere from another hour to 11 hours to get their servers safely rebooted and their Web sites operational.
What the episode exposed is that some companies operate entirely from one data center, a decision described by some security experts as risky. In emergencies, such companies can't shift traffic to an alternative facility where they keep additional servers.
"There's all kinds of things that can happen from a power outage to a tornado to a backhoe," said Jason Needham, director of product management at F5 Networks, a Seattle company that sells software and equipment for data centers. "All these things seem far-fetched until they happen."
However, Needham said the trend is for companies to put all their eggs in one basket, so to speak, in an effort to save money. In fact, just hours before Tuesday's power outage, 365 Main put out a press release trumpeting the fact that RedEnvelope had moved all its operations to its facility and closed an unneeded center in the Midwest.
Data centers are usually designed with redundant equipment to ensure power during outages, earthquakes and floods. Backup electricity is supposed to kick in within seconds after an outage through a complex system that keeps servers humming without interruption.
Gordon, from Technorati, called opening several data centers ruinously expensive for thinly funded Internet startups, of which there are hundreds in the Bay Area. Only profitable companies can afford such an extravagance, he said, though he acknowledged that Technorati, which isn't profitable, is in the process of moving into a second facility.
Tuesday's outage "added to the sense of urgency," Gordon said.
"The lesson here is despite all of your planning and all of your promises, you are vulnerable."
This article appeared on page C - 1 of the San Francisco Chronicle
Challenges in Implementing Business Continuity
Cost
Additional hardware; identical 1:1
Additional tools and training
Complexity
Management and provisioning
Application-specific business continuity needs
Reliability
Complex solutions are hard to test
Requires specialized training for personnel
Configure hardware
Install OS
Config OS
Install backup/restore agent
Start “Single-step automatic
recovery”
Site A Site B
X
Agenda
Importance of Business Continuity and Current Challenges
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Ensuring rapid recovery from failures
Implementing Better Business Continuity
Virtualization as a BC enablerProperties of Virtual Machines
Hardware Independence
Encapsulation
Isolation
Partitioning
Business Continuity: The Killer App for Virtualization!
2006 Customer Survey (n=2265)
…85% use VMware in production; 43% set as a default policy for production servers
Press“Best Disaster
Recovery Product of 2006”
(TechTarget)
Customers
55% of customers using virtualization for BC/DR
N=2265
55%
VMware Virtualization Basics
Hardware-independence of OS and applications
Virtual machines can be provisioned to any system
Manage OS and application as a single unit
After Virtualization:
Software tied to hardware
Single OS image per machine One application workload per OS
Inflexible, costly infrastructure
Before Virtualization:
Copyright © 2006 VMware, Inc. All rights reserved.
Virtualization Enablers for Business Continuity
Hardware Independence
Run a virtual machine on any server without modification
• Eliminate need for 1:1 hardware duplication for BC
• Eliminate risk of hardware “configuration drift”
• Re-use older servers for BC-DR
Encapsulation
Encapsulate entire systems as simple files
= FilePhysical Server
Data
AppsSystem
• System portability
• Simplify provisioning for recovery
• Simplify backup and replication
• Simplify copying and cloning of systems
Copyright © 2006 VMware, Inc. All rights reserved.
Virtualization Enablers for Business ContinuityIsolation
Each VM isolated from other VM’s
• Easier testing of a BC-DR plan
• Stability and security
• Utilize DR hardware for other tasksVMware Infrastructure
OS
App
OS
App
OS
App
Batch Job
DR Test
% Utilization
Partitioning
Safely run multiple VM’ssimultaneously on a single server
• Consolidate servers
• Boost utilization
• Provide significant cost savings
Agenda
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Implementing Better Business Continuity
Elements of preventing downtime
Eliminate planned downtime
Reduce un-planned downtime with better fault tolerance
Virtualization as a BC enabler
Importance of Business Continuity and Current Challenges
Avoiding Planned Downtime Has the Biggest Impact on Business Continuity
Per studies from IBM & Sun, planned downtime is responsible for 80-90% of total system downtime
Eliminating planned downtime can increase system availability by a full order of magnitude
80%
20%
PLANNEDDOWNTIME
UNPLANNEDDOWNTIME
90%
10%
SUN estimateIBM
estimate
1. Activate Maintenance Mode for physical host
2. DRS migrates running virtual machines to other hosts
Planned Downtime: Zero-downtime maintenance using VMware technology
Use VMware VMotion to evacuate hosts
Move running applications to other servers without disruption
Perform maintenance at any time of day
Zero downtime for hardware maintenance
Automate with DRS maintenance mode
Automates moving virtual machines to other hosts
Automates re-balancing after maintenance complete
3. Shut down idle host and perform maintenance
4. Restart host; DRS automatically rebalances workloads
VMotionVMotion
Copyright © 2005 VMware, Inc. All rights reserved.
Unplanned Downtime: Server Failure - VMware HA
Automatic restart of virtual machines in case of server failure
No need for dedicated stand-by hardware
None of the cost and complexity of clustering
Simple, Cost effective high availability for all servers
Resource Pool
X
Unplanned Application/OS Failure: Virtual Infrastructure Makes Clustering Easier
• More flexible options:• Cluster physical machines with
virtual machines
• Cluster virtual machines with virtual machines
• Lower cost:• Cluster applications using
fewer physical servers
• Test cluster configurations on a single physical server
Use the same clustering software you use today but gain:
Unplanned: Protecting from Hardware Failures
Tolerate network path failures Built-in NIC teaming Ability to share redundant components
across workloads
Tolerate storage path failures Built-in storage multi-pathing Share redundant storage paths among
multiple virtual machines
Provides, at a lower cost, fault-tolerance equivalent to that possible with physical systems
1. Overloaded host: automatic workload balancing
Unplanned Downtime: Preventing downtime due to resource bottlenecks
Physical Infrastructure
Resource bottlenecks create outages
Inflexible resources
Lengthy, manual process to rebalance workloads
With VMware Infrastructure Prevent resource bottlenecks with DRS
Automated load balancing across a pool of servers
Ability to dynamically add resources to server pool
VMware Infrastructure Resource Cluster
VMotion
2. Dynamically add resources: DRS rebalances load
Agenda
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Implementing Better Business Continuity
Keys to protecting data and systems
Minimize complexity
Minimize impact on services
Ensure comprehensive protection
Virtualization as a BC enabler
Importance of Business Continuity and Current Challenges
Protecting data and systems
with VMware InfrastructureVirtual machines store system and data state• Entire system encapsulated in files:
hardware configuration, operating system, applications, data
Physical Server
.nvram
.vmx
.vmdk
Virtual Machine
Impact• Systems are data
• Protect system using same tools and processes used to protect data
• Virtual machines are the simplest, most portable way to store system
Backup Options with VMware – Reduce Backup Windows
Agent in Service Console
Simplified backup of full-disk images
Any storage
Agent in each VM
Same architecture as physical system backup
File-level incremental backup possible
Any storage
ServiceConsole
App
OS
Backup Agent
ServiceConsole
Backup Server
tape
App
OS
Backup Agent
Backup Agent
Consolidated Backup - Agent on Proxy Server
Move backup out of VM
Provide LAN-free backup
Eliminate backup windows
Requires FC SAN
Pre-integrated with 3rd party backup products
OS
In-VM In-Console VCB
Agenda
Failure Types
DR Challenges
Physical-Virtual; Virtual-Virtual
Replication Technologies
BC Budgeting
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Implementing Better Business Continuity
Importance of Business Continuity and Current Challenges
Virtualization as a BC enabler
OS & applications have 1:1 dependencies on hardware configuration
Complex to physically recover OS, applications & data
Separate processes for system and application data
Tier 2 & 3 applications left unprotected, adding to Tier 1 RTO risk
Slow and Unreliable Process,Expensive Infrastructure
DR Challenges Today
cd, tape or ghost image
Application
OS
x86
OS files
localstorage
Storage
WAN
Application
OS
x86
OS files
localstorage
Storage
Production
“Boot & Pray”
DR Site
DR Challenges: Infrastructure and Recovery
Bound to HW5-10% utilized
Recovery Process in a Virtualized Environment
RTO of minutes to a few hours, not days to weeks! RTO of minutes to a few hours, not days to weeks!
Configure hardware
Install OS
Configure OS
Install backup agent
Start “Single-step automatic recovery”
RestoreVM
Poweron VM
Example recovery process comparison
P-P
V-V
40+ hrs
< 4+ hrs
Physical to Virtual (P-V) Recovery
How:
• VMware Converter creates virtual machines matching physical machines
• Copy virtual machines to recovery site
If Rapid Recovery is required:
• Boot virtual machines on any hardware
• Start data recovery of application data if necessary
P-V a viable option:- Server ownership issues- Lockdown on servers
P-V a viable option:- Server ownership issues- Lockdown on servers
imaging
imaging
imaging
conversion
conversion
conversion
Prim
ary
site
WAN replication
Secondary S
ite
P2V
P2V
P2V
Web
App
DNS
Targetstorage
Host Based Replication - How it works
WANWAN
SiteFailure
Replication
Host Based Replication
Sourcestorage
Virtual machine disks
Virtual machine disks
Write dataWrite data
•Replicate data to DR site•Failure – Boot VM using replicated data•Done !
Replication with VMware: Array-Based Replication
WAN orDark Fiber
WAN orDark Fiber
Array-Based Replication
PRIMARY DR SITE
SiteFailure
SourceVMFS
TargetVMFS
Customer Example
Result: <17 minutes to failover!
Real VMware Customer Results
Business Metric Results From
Server Utilization 4X- 5X Increase
Consolidation Ratio From 2:1 up to 30:1
Server Provisioning Time > 60% reduction
Planned Downtime > 95% reduction
Unplanned Downtime > 30% reduction
Time to Recovery Down to Minutes
Payback (Break-Even) < 6 months
TCO 30-70% reduction
Source: VMware customers surveyed post-use of VMware products.
Business Continuity Budgeting
without VMware Software
Total Cost
Applications Protected
Cost
Budget
Business continuity implementation
limited by budget
Business Continuity Budgeting
with VMware Software
Total Cost
Number of Applications Protected
Cost
Budget
Total Cost with Virtualization
More applications (Tier 0,1,2) protected with the same budget
Customer Results
“Our virtual IT infrastructure will help us provide greater availability than ever before for our most critical applications.”
-- Paul Poppleton, IT ManagerQUALCOMM
“Using VMware virtual infrastructure, we can offer the same levels of service and more flexibility for up to 40 percent lower server and operating system costs.”-- Rob Jones, Director of Technology, Northern Europe
ALSTOM
“We can move a virtual machine to another physical server, apply a patch, and move it back without any service interruption.”
-- Jamey Vester, Member of Professional StaffSubaru of Indiana
Agenda
The VMware Difference
Rapid, Reliable, Affordable Business Continuity
Products
Better Business Continuity with VMware Infrastructure
Preventing downtime
Protecting data and systems
Rapid Disaster Recovery
Virtualization as a BC enabler
Implementing Better Business Continuity
Importance of Business Continuity and Current Challenges
VMware® Infrastructure 3
VMware® Virtual SMP
Enables single VM to use up to 4 physical processors simultaneously
VMware® Consolidated Backup
Centralized agent less backup for VM’s
Virtual Machine File System (VMFS)
High performance cluster file system. Allows multiple ESX Servers to access same VM storage concurrently
VMware® High Availability
Cost effective automatic restart of virtual machines in case of server failure
VMware® Virtual Center
Centralize management of VM infrastructure
VMware® Distributed Resource Scheduler
Dynamic and intelligent balancing of computing resources across resource pools based on pre-defined rules.
VMware® Converter
Automates conversion of physical to virtual machines (physical-virtual)
VMware® VMotion™
Moves “live”, running VM’s from one host to another while maintaining continuous service availability.
VMware® ESX Server 3.0
Production-proven virtualization layer that resources into multiple virtual machines (VM’s) – Bare Metal
Business Continuity : The VMware Difference
Rapid Hardware independent failover and recovery for HA,DR Eliminate backup windows with LAN free backup Rapid provisioning of systems/data; backup and replication
Affordable Realize early savings from consolidation Increase HA and DR coverage for more applications Fund your BC plan with hardware and operational savings
Reliable Zero downtime planned maintenance Automatic restarts for un-planned server failure Frequent non-disruptive DR testing with dual-use of DR site
AccessFlow can be engaged to assess your business continuity needs and design an appropriate roadmap to implement a robust DR solution
http://www.accessflow.com
AccessFlow can be engaged to assess your business continuity needs and design an appropriate roadmap to implement a robust DR solution
http://www.accessflow.com
www.vmware.com/solutions/continuity/www.vmware.com/solutions/continuity/
Get Started Today
Learn
Plan aPoC
Free evaluation download: www.vmware.com/download/vi/eval.html
Free evaluation download: www.vmware.com/download/vi/eval.html
Try