Disaster Recovery by Stretching Hyper-V Clusters Across SitesSymon PerrimanProgram Manager IIClustering & High-AvailabilityMicrosoft Corporation
SESSION CODE: VIR303
Session Objectives And Takeaways
Session Objective(s): Understanding the need and benefit of multi-site clustersWhat to consider as you plan, design, and deploy your first multi-site cluster
Windows Server Failover Clustering with Hyper-V is a great solution for not only high availability, but also disaster recovery
Multi-Site Clustering
IntroductionNetworkingStorageQuorum
Defining High-Availability
But what if there is a catastrophic event and you lose the entire datacenter?
Site A
High-Availability (HA) allows applications or VMs to maintain service availability by moving them between nodes in a cluster
Defining Disaster Recovery
Disaster Recovery (DR) allows applications or VMs to maintain service availability by moving them to a cluster node in a different physical location
Site B
Node is located at a physically separate site
SAN
Site A Site B
Benefits of a Multi-Site Cluster
Protects against loss of an entire locationPower Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism
Automates failoverReduced downtimeLower complexity disaster recovery plan
Reduces administrative overheadAutomatically synchronize application and cluster changesEasier to keep consistent than standalone servers
What is the primary reason why DR solutions fail?
Dependence on People
Flexible Hardware
Two simple requirements for supportAll components must be logoed
http://www.microsoft.com/whdc/winlogo/default.mspx
Complete solution must pass the Cluster Validation Testhttp://technet.microsoft.com/en-us/library/cc732035.aspx
Same 2008 hardware will workNo reason to not move to R2!
CSV has same storage requirementsiSCSI, Fibre Channel or Serial-Attached SCSI
Support Policy: KB 943984
Multi-Site Clustering
IntroductionNetworkingStorageQuorum
Stretching the Network
Longer distance traditionally means greater network latencyMissed inter-node health checks can cause false failoverCluster heartbeating is fully configurable
SameSubnetDelay (default = 1 second)Frequency heartbeats are sent
SameSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down
CrossSubnetDelay (default = 1 second)Frequency heartbeats are sent to nodes on dissimilar subnets
CrossSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down to nodes on dissimilar subnets
Command Line: Cluster.exe /propPowerShell (R2): Get-Cluster | fl *
Security over the WAN
Encrypt inter-node communication0 = clear text1 = signed (default)2 = encrypted
Site A
10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Site B
Network Considerations
Network Deployment Options:1. Stretch VLANs across sites2. Cluster nodes can reside in different subnets
Site A
Public Network
10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Redundant Network
Site B
DNS ConsiderationsNodes in dissimilar subnetsVM obtains new IP addressClients need that new IP Address from DNS to reconnect
10.10.10.111 20.20.20.222
DNS Server 1DNS Server 2DNS Replication
Record Created
VM = 10.10.10.111
Record Updated
VM = 20.20.20.222
Site A Site B
Record UpdatedRecord Obtained
Faster Failover for Multi-Subnet Clusters
RegisterAllProvidersIP (default = 0 for FALSE)Determines if all IP Addresses for a Network Name will be registered by DNSTRUE (1): IP Addresses can be online or offline and will still be registeredEnsure application is set to try all IP Addresses, so clients can come online quicker
HostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network nameShorter TTL: DNS records for clients updated soonerExchange Server 2007 recommends a value of five minutes (300 seconds)
Solution #1: Local Failover FirstConfigure local failover fist for high availability
No change in IP addressesNo DNS replication issuesNo data going over the WAN
Cross-site failover for disaster recovery
10.10.10.111
DNS Server 1
VM = 10.10.10.111
Site A Site B
20.20.20.222
Solution #2: Stretch VLANs
Deploying a VLAN minimizes client reconnection timesIP of the VM never changes
DNS Server 1 DNS Server 2
FS = 10.10.10.111
Site A Site B
10.10.10.111
VLAN
Solution #3: Abstraction in Networking Device
Networking device uses independent 3rd IP Address3rd IP Address is registered in DNS & used by client
10.10.10.111 20.20.20.222
DNS Server 1
DNS Server 2
VM = 30.30.30.30Site A Site B
30.30.30.30
Cluster Shared Volumes Networking Considerations
CSV does not support having nodes in dissimilar subnetsUse VLANs if you want to use CSV with multi-site clusters
Note: CSV and live migration are independent, but complimentary, technologies
Site A Site B
VLANCSV
Network
Updating VMs IP Address on Cross-Subnet Failover
On cross-subnet failover, if guest is…
Best to use DHCP in guest OS for cross-subnet failover
•IP updated automaticallyDHCP•Admin needs to configure new IP•Can be scriptedStatic IP
Live Migrating Across Sites
Live migration moves a running VM between cluster nodesTCP reconnects makes the move unnoticeable to clients
Use VLANs to achieve live migrations between sitesIP client is connected to will not change
Network Bandwidth PlanningLive migration may require significant network bandwidth based on amount of memory allocated to VMLM times will be longer with high latency or low bandwidth WAN connections
Multi-Subnet vs. VLAN RecapMulti-Subnet VLAN
Live Migration (seamless)Quick Migration
Fast failoverCluster Shared Volumes
Static IPs in guestFlexibility
Complexity
Choosing the right networking model for you depends on your business requirements
Multi-Site Clustering
IntroductionNetworkingStorageQuorum
Storage in Multi-Site Clusters
Different than local clusters:Multiple storage arrays – independent per siteNodes commonly access own site storageNo ‘true’ shared disk visible to all nodes
Site B
SAN
Site A Site B
Storage Considerations
Site A
Changes are made on Site A and replicated to Site B
DR requires data replication mechanism between sites
Site B
SAN
Site A Site B
Replica
Replication Partners
Hardware storage-based replicationBlock-level replication
Software host-based replicationFile-level replication
Appliance replicationFile-level replication
Synchronous Replication
Host receives “write complete” response from the storage after the data is successfully written on both storage devices
PrimaryStorage
SecondaryStorage
WriteComplete
Replication
Acknowledgement
WriteRequest
Asynchronous Replication
Host receives “write complete” response from the storage after the data is successfully written to just the primary storage device, then replication
PrimaryStorage
SecondaryStorage
WriteComplete
WriteRequest
Replication
Synchronous versus Asynchronous
Synchronous AsynchronousNo data loss Potential data loss on hard failuresRequires high bandwidth/low latency connection
Enough bandwidth to keep up with data replication
Stretches over shorter distances
Stretches over longer distances
Write latencies impact application performance
No significant impact on application performance
Cluster Validation with Replicated Storage
Multi-Site clusters are not required to pass the Storage tests to be supported
Validation Guide and Policyhttp://go.microsoft.com/fwlink/?LinkID=119949
What about DFS-Replication?
Not supported to use the file server DFS-R feature to replicate VM data on a multi-site Failover Cluster
DFS-R performs replication on file close:Works well for Office documents Not designed for application workloads where the file is held open, like VHDs or databases
Cluster Shared Volume Overview
Cluster Shared Volumes (CSV)Distributed file access solution for Hyper-VEnabling multiple nodes to concurrently access a single ‘truly’ shared volumeProvides VMs complete transparency with respect to which nodes actually own a LUNGuest VMs can be moved without requiring any disk ownership changes
No dismounting and remounting of volumes is required
Disk5
Single Volume
VHD VHD VHD
SAN
Concurrent access to a single file system
Site BSite A
CSV with Replicated Storage
Traditional architectural assumptions do not hold trueTraditional replication solutions assume only 1 array accessed at a timeCSV assumes all nodes can concurrently access a LUN
CSV is supported by many replication vendorsTalk to your storage to understand their support story
VHD
Read/OnlyRead/Write
VM attempts to access replica
Site BSite A
Storage Virtualization Abstraction
Some replication solutions provide complete abstraction in storage arrayServers are unaware of accessible disk locationFully compatible with Cluster Shared Volumes (CSV)
Virtualized storage presents logical LUN
Servers abstracted from storage
Choosing a Multi-Site Storage Model
Traditional Cluster Storage
Cluster Shared Volumes
Live MigrationHardware Replication Consult vendorSoftware ReplicationAppliance Replication Consult vendor
Choosing the right storage model for you depends on your business requirements
EMC for Windows Server Failover ClusteringTxomin BarturenSenior ManagerSymmetrix and VirtualizationEMC Corporation
PARTNER
What’s Storage Got To Do With It?Storage Controllers can be powerful compute and replication resourcesProvide multiple forms of replication styles
Synchronous – Metro configurationsAsynchronous – Continental configurations… and various combinations of those
Arrays/Appliances are able to provide Consistency Technology to replication
Bind database and transaction logs together as an atomic unitRequired for Disaster Recovery scenarios
Single consolidated solution for all environmentsAs opposed to per-application solutionOperational ease and automated operations
Geographical Windows ClusteringLong history of Geographical Windows solutions
Original “GeoSpan” introduced in the 1990sCurrent product is called “Cluster Enabler”
Support for multiple storage replication mechanisms
Symmetrix Remote Data Facility (SRDF)CLARiiON MirrorviewEMC RecoverPoint (Appliance)
Support for multiple replication implementations
Synchronous (SRDF/S, MV/S, RP)Asynchronous (SRDF/A, MV/A, RP)
Select the best replication fit for SLA
Cluster Enabler – Integration with Failover ClusteringCluster Enabler is implemented as a cluster group resource
DLL manages disk state when necessary
Disaster or site move requestsCustom MMC for administration
Provides insight into relationshipsAllows for management of storage resources
Add/remove storage devicesAll cluster functions managed through Failover Cluster Manager
Simplified management
Unique Cluster Configuration SupportConcurrent Replication
Cascaded Replication
Heterogeneous Replication
Challenges of Block Storage ReplicationStorage block level replication is typically uni-directional (per LUN)
Change blocks flow from source site to remotePossible to have different LUNs replicating in different directionsStorage cannot enforce block level collision resolution
Application must determine resolution, or be coordinated
Applications today implement shared nothing modelSurfacing storage as R/W at multiple sites is only useful if application can handle a distributed access deviceFew applications implement the necessary support
Obvious exception is CSV
EMC VPLEX METRO support for Hyper-V and Cluster Shared Volumes
ANNOUNCING
Federated Storage InfrastructureFederated storage
A new HW and SW platform that extends storage beyond the boundaries of the data centerLocated in the SAN to present hosts with federated view of EMC and heterogeneous storage
VPLEX Local and VPLEX Metro configurationsUnique Value
Distributed coherent cache – AccessAnywhere™N+1 scale out clusterData at a Distance Architected for Global Apps
Workload “travels” with application
Sample VPLEX METRO Configuration
CSV - Volume1 - OS VHDsCSV - Volume2 - OS VHDs
CSV - Volume3 - OS VHDs
CSV - Volume4 - OS VHDs
NewYork-01
NewYork-02
NewYork-03
NewYork-04
NewJersey-01NewJersey-02
NewJersey-03
NewJersey-04
VPLEXCluster-1
VPLEXCluster-2
CSV - Volume1 - SQL VHDsCSV - Volume2 - SQL VHDs
CSV - Volume3 - SQL VHDs
CSV - Volume4 - SQL VHDs
EMC VPLEX Metro with Cluster Shared Volumes
DEMO
Multi-Site Clustering
IntroductionNetworkingStorageQuorum
Quorum Overview
Disk only (not recommended)Node and Disk majority
Node majorityNode and File Share majority
VoteVote Vote Vote Vote
Majority is greater than 50%Possible Voters:
Nodes (1 each) + 1 Witness (Disk or File Share)4 Quorum Types
Replicated Disk Witness
A witness is a tie breaker when nodes lose network connectivityThe witness disk must be a single decision maker, or problems can occur
Do not use a Disk Witness in multi-site clusters unless directed by vendor
Replicated Storage
?Vote Vote Vote
Node Majority
Site BSite A
Cross site network connectivity broken!
Can I communicate with majority of the nodes in
the cluster?Yes, then Stay Up
Can I communicate with majority of the nodes in
the cluster?No, drop out of Cluster
Membership
5 Node Cluster: Majority = 3
Majority in Primary Site
Node Majority
Disaster at Site 1
Can I communicate with majority of the nodes in
the cluster?No, drop out of Cluster
Membership
5 Node Cluster: Majority = 3
Need to force quorum manually
Site A
We are down!
Site B
Majority in Primary Site
Forcing Quorum
Forcing quorum is a way to manually override and start a node even if the cluster does not have quorum
Important: understand why quorum was lostCluster starts in a special “forced” stateOnce majority achieved, drops out of “forced” state
Command Line:net start clussvc /fixquorum (or /fq)
PowerShell (R2):Start-ClusterNode –FixQuorum (or –fq)
Multi-Site with File Share Witness
Site A Site B
Site C (branch office)
Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share
WAN
File Share Witness
File Share Witness
Multi-Site with File Share Witness
\\Foo\Share
WAN
Complete resiliency and automatic recovery from the loss of connection between sites
Can I communicate with majority of the nodes in the
cluster?No (lock failed), drop out of
Cluster Membership
Site BSite A
Can I communicate with majority of the nodes (+FSW) in the cluster?
Yes, then Stay Up
Site C (branch office)
File Share Witness (FSW) Considerations
Simple Windows File ServerSingle file server can serve as a witness for multiple clusters
Each cluster requires it’s own shareCan be made highly available on a separate cluster
Recommended to be at 3rd separate site for DR
FSW cannot be on a node in the same clusterFSW should not be in a VM running on the same cluster
Quorum Model Recap
•Even number of nodes•Highest availability solution has FSW in 3rd site
Node and File Share Majority
•Odd number of nodes•More nodes in primary siteNode Majority
•Use as directed by vendorNode and Disk Majority
•Not Recommended•Use as directed by vendor
No Majority: Disk Only
Multi-Site Clustering ContentDesign guide: http://technet.microsoft.com/en-us/library/dd197430.aspxDeployment guide/checklist:
http://technet.microsoft.com/en-us/library/dd197546.aspx
Session Summary
Multi-site Failover Clusters have many benefitsYou can achieve high-availability and disaster recover in a single solution using Windows Server Failover Clustering
Multi-site clusters have additional considerations:Determine network topology across sitesChoose a storage replication solutionPlan quorum model & nodes
Passion for High Availability?
Are You Up For a Challenge?
Become a Cluster MVP!
Contact: [email protected]
Related ContentBreakout Sessions
WSV313 | Failover Clustering Deployment SuccessWSV314 | Failover Clustering Pro Troubleshooting with Windows Server 2008 R2VIR303 | Disaster Recovery by Stretching Hyper-V Clusters across Sites ARC308 | High Availability: A Contrarian ViewDAT207 | SQL Server High Availability: Overview, Considerations, and Solution GuidanceDAT303 | Architecting and Using Microsoft SQL Server Availability Technologies in a Virtualized WorldDAT305 | See the Largest Mission Critical Deployment of Microsoft SQL Server around the WorldDAT401 | High Availability and Disaster Recovery: Best Practices for Customer DeploymentsDAT407 | Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering ImplementationsUNC304 | Microsoft Exchange Server 2010: High Availability Deep DiveUNC305 | Microsoft Exchange Server 2010 High Availability Design Considerations
Interactive SessionsVIR06-INT | Failover Clustering with Hyper-V Unleashed with Windows Server 2008 R2UNC01-INT | Real-World Database Availability Group (DAG) DesignVIR02-INT | Hyper-V Live Migration over Distance: A Multi-Datacenter Approach BOF34-IT | Microsoft Exchange Server High Availability and Disaster Recovery: Are You Prepared?
Hands-on LabsWSV01-HOL | Failover Clustering in Windows Server 2008 R2DAT01-HOL | Create a Two-Node Windows Server 2008 R2 Failover ClusterDAT02-HOL | Create a Windows Server 2008 R2 MSDTC ClusterDAT09-HOL | Installing a Microsoft SQL Server 2008 + SP1 Clustered InstanceDAT12-HOL | Maintaining a Microsoft SQL Server 2008 Failover ClusterUNC02-HOL | Microsoft Exchange Server 2010 High Availability and Storage ScenariosVIR06-HOL | Implementing High Availability and Live Migration with Windows Server 2008 R2 Hyper-V
Visit the Cluster Team in the TLC
Failover Clustering Booth
WSV-7
Failover Clustering ResourcesCluster Team Blog: http://blogs.msdn.com/clustering/
Cluster Resources: http://blogs.msdn.com/clustering/archive/2009/08/21/9878286.aspx
Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx
Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/
Clustering Forum (2008 R2): http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/
R2 Cluster Features: http://technet.microsoft.com/en-us/library/dd443539.aspx
Multi-Site Clustering Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspx
Multi-Site Clustering Deployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx
Hyper-V Business Continuity portal: http://www.microsoft.com/virtualization/en/us/solution-continuity.aspx
Microsoft Cross-Site Disaster Recovery Solutions whitepaperhttp://download.microsoft.com/download/3/6/1/36117F2E-499F-42D7-9ADD-A838E9E0C197/SiteRecoveryWhitepaper_final_120309.pdf
Virtualization Track ResourcesStay tuned into virtualization at TechEd NA 2010 by visiting our event website, Facebook and Twitter pages. Don’t forget to visit the Virtualization TLC area (orange section) to see product demos, speak with experts and sign up for promotional giveawaysMicrosoft.com/Virtualization/Events Facebook.com/Microsoft.VirtualizationTwitter.com/MS_Virt Like this session? Write a blog on 2 key learning's from this session and send it to #TE_VIR and you could win a Lenovo IdeaPad™ S10-3 with Windows 7 Netbook! Review the rules on our event websiteMicrosoft.com/Virtualization/Events
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
Complete an evaluation on CommNet and enter to win!
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registrationJoin us in Atlanta next year
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
JUNE 7-10, 2010 | NEW ORLEANS, LA
Top Related