Multi-Site Clustering for Hyper-V Disaster Recovery

33
Multi-Site Clustering for Hyper-V Disaster Recovery Greg Shields, MVP, vExpert Senior Partner Concentrated Technology www.ConcentratedTech.co m @ConcentratdGreg

description

Multi-Site Clustering for Hyper-V Disaster Recovery. Greg Shields, MVP, vExpert Senior Partner Concentrated Technology. www.ConcentratedTech.com @ ConcentratdGreg. About the speaker. Over 15 years of Windows experience. - PowerPoint PPT Presentation

Transcript of Multi-Site Clustering for Hyper-V Disaster Recovery

Page 1: Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Clusteringfor Hyper-VDisaster Recovery

Greg Shields, MVP, vExpertSenior PartnerConcentrated Technology

www.ConcentratedTech.com@ConcentratdGreg

Page 2: Multi-Site Clustering for Hyper-V Disaster Recovery

About the speaker

Administrator – Managed environments ranging from a few dozen to many thousands of users…

Consultant – Hands-on and Strategic… Speaker – TechMentor, Tech Ed, Windows Connections, MMS,

VMworld, ISACA, others… Analyst/Author – Fourteen books and counting… Columnist – TechNet Magazine, Redmond Magazine,

Windows IT Pro Magazine, TechTarget Online, others… All-around good guy…

Over 15 years of Windows experience

Page 3: Multi-Site Clustering for Hyper-V Disaster Recovery

What Makes a Disaster?

It causes a server or an entire rack of servers to inadvertently and rapidly power down

Which of the following would you consider a disaster?

Interrupts the functionality of your datacenter for an extended period of time

It’s immediately ceasing all processing on that server

Impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease

It causes problems with a service, shutting down that service and preventing some action from occurring on the server

Page 4: Multi-Site Clustering for Hyper-V Disaster Recovery

What Makes a Disaster?Which of the following would you consider a disaster?

It causes a server or an entire rack of servers to inadvertently and rapidly power down

It’s immediately ceasing all processing on that server

It causes problems with a service, shutting down that service and preventing some action from occurring on the serverJust a bad day…

Page 5: Multi-Site Clustering for Hyper-V Disaster Recovery

What Makes a Disaster?

Your decision to “declare a disaster” and move to “disaster ops” is a major one

The technologies used for disaster protection are different than those used for high-availability• More complex• More expensive

Failover and failback processes involve more thought• You might not be able to just “fail back” with a click of a button

Page 6: Multi-Site Clustering for Hyper-V Disaster Recovery

• Microsoft has not done a good job of explaining this fact!• Some Hyper-V hosts• Some networking and storage• Virtual machines that Live Migrate around

Multi-Site Hyper-V == Single-Site Hyper-VMulti-site Hyper-V looks very much the same as single-site Hyper-V

But there are some major differences too…

• VMs can Live Migrate across sites• Sites typically have different subnet arrangements• Data in the primary site must be replaced with the DR site• Clients need to know where your servers go!

Page 7: Multi-Site Clustering for Hyper-V Disaster Recovery

Constructing Site-Proof Hyper-V: Three Things

Once you have these three things, layering Hyper-V atop is easy.

Storage mechanism

Replicationmechanism

Target Servers &

Cluster

At a very high level, Hyper-V disaster recovery is three things

Page 8: Multi-Site Clustering for Hyper-V Disaster Recovery

Constructing Site-Proof Hyper-V: Three Things

Storage Device(s)

Replication Mechanism

Target Servers

PrimaryHyper-V Server

PrimaryHyper-V Server

Storage Device Storage Device

BackupHyper-V Server

BackupHyper-V Server

Backup Site

Page 9: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 1: A Storage MechanismTypically, two SANs in two different locations

Backup SAN doesn’t necessarily need to be of the same size or speed as the primary SAN

Fibre Channel,iSCSI,FCoE,

heck JBOD

Similar modelor

manufacturer

Similarity proper

replication

Replicated ≠

Full data(not always)

DR – not for

everything!

DR Environments: Where Old SANs

Go To Die!

Page 10: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 2: A Replication MechanismReplication between SANs must occur

1. Synchronously 2. Asynchronously

• Changes are made on one node at a time

• Subsequent changes on primary SAN must wait for ACK from backup SAN

• Changes on backup SAN will eventually be written

• Changes queued at

primary SAN to be transferred at intervals

Page 11: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 2: A Replication Mechanism

● Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.

1. Synchronously

Storage DevicePrimary Site

Storage DeviceBackup Site

Change Committed at Primary Site

Change Replicated to Secondary Site

Change Committed at Secondary Site

Acknowledge of Change Returned to

Primary Site

Change Complete

Page 12: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 2: A Replication Mechanism

● Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.

2. Asynchronously

Storage DevicePrimary Site

Storage DeviceBackup Site

Change 1 Committed at Primary Site

Change 2 Committed at Primary Site

Change 3 Committed at Primary Site

Changes Replicated to Secondary Site

Change 4 Committed at Primary Site

Page 13: Multi-Site Clustering for Hyper-V Disaster Recovery

Food for Thought

Synchronous

● Assures no loss of data

● Requires a high-bandwidth and low-latency connection

● Write and acknowledgement latencies impact performance

● Requires shorter distances between storage devices

● Potential for loss of data during a failure

● Leverages smaller-bandwidth connections, more tolerant of latency

● No performance impact

● Potential to stretch across longer distances

Your Recovery Point Objective makes this decision…

Which would you choose? Why?

Asynchronous

Page 14: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 2½: Replication Processing Location

1. Storage Layer

● Replication processing is handled by the SAN itself● Agents are often installed to virtual hosts or machines to ensure crash

consistency● Easier to set up, fewer moving parts. More scalable● Concerns about crash consistency

2. OS / Application Layer

● Replication processing is handled by software in the VM OS● This software also operates as the agent● More challenging to set up, more moving parts. More installations to

manage/monitor. Scalability and cost are linear● Fewer concerns about crash consistency

There are also two locations for replication processing…

Page 15: Multi-Site Clustering for Hyper-V Disaster Recovery

Thing 3: Target Servers and a Cluster Finally are target servers and a cluster in the backup

site.

Hyper-VServer

Hyper-VServer

Storage Storage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Page 16: Multi-Site Clustering for Hyper-V Disaster Recovery

Clustering’s Sordid HistoryWindows NT 4.0

- Microsoft Cluster Service “Wolfpack”- “As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering”

Windows 2000 Greater availability, scalability. Still painful

Windows 2003- Added iSCSI storage to traditional Fibre Channel- SCSI Resets still used as method of last resort (painful)

Windows 2008

- Eliminated use of SCSI Resets- Eliminated full-solution HCL requirement- Added Cluster Validation Wizard and pre-cluster tests- Clusters can now span subnets (ta-da!)

Windows 2008 R2

- Improvements to Cluster Validation Wizard and Migration Wizard- Additional cluster services- Cluster Shared Volumes (!) and Live Migration (!)

Page 17: Multi-Site Clustering for Hyper-V Disaster Recovery

So, What IS a Cluster?

Page 18: Multi-Site Clustering for Hyper-V Disaster Recovery

So, What IS a Cluster?

Quorum Drive & Storage for Hyper-V VMs

Page 19: Multi-Site Clustering for Hyper-V Disaster Recovery

So, What IS a Multi-Site Cluster?

Hyper-V ServerHyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Witness Server

Witness Site

Page 20: Multi-Site Clustering for Hyper-V Disaster Recovery

Quorum: Clustering’s Most Confusing Configuration Ever been to a Kiwanis meeting…?

A cluster “exists” because it has quorum between its members. Quorum is achieved via a voting process

If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained

Multiple quorum models exist

Different clubs – different rules

Different clusters – different rules

Different than resource failover

Page 21: Multi-Site Clustering for Hyper-V Disaster Recovery

Four Options for Quorum

1. Node and Disk Majority

2. Node Majority

3. Node and File Share Majority

4. No Majority: Disk Only

Page 22: Multi-Site Clustering for Hyper-V Disaster Recovery

Quorum in Multi-Site Clusters Node and Disk Majority Node Majority Node and File Share Majority No Majority: Disk Only

Microsoft recommends using the Node and File Share Majority model for multi-site clusters

This model provides the best protection for a full-site outage Full-site outage requires a file share witness in a third geographic

location

Page 23: Multi-Site Clustering for Hyper-V Disaster Recovery

Quorum in Multi-Site Clusters Use the Node and File Share Quorum

● Prevents entire-site outage from impacting quorum.● Enables creation of multiple clusters if necessary.

Hyper-V ServerHyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Witness Server

Witness Site

Third Site for Witness Server

Page 24: Multi-Site Clustering for Hyper-V Disaster Recovery

I Need a Third Site? Seriously?

What happens if you put the quorum’s file share in the primary site?● The secondary site might not automatically come online after a primary

site failure● Votes in secondary site < Votes in primary site

Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated…

Page 25: Multi-Site Clustering for Hyper-V Disaster Recovery

I Need a Third Site? Seriously?

What happens if you put the quorum’s file share in the secondary site?● A failure in the secondary site could cause the primary site to go down.

● Votes in secondary site > votes in primary site.

This problem gets even weirder as time passes and the number of servers changes in each site

Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated…

Page 26: Multi-Site Clustering for Hyper-V Disaster Recovery

I Need a Third Site? Seriously?

Hyper-V ServerHyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Witness Server

Witness Site

Third Site for Witness Server

Page 27: Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Cluster Tips/Tricks

● Make sure your servers failover to servers in the samesite first

● But also make sure theyhave options on failing overelsewhere

Manage Preferred Owners & Persistent Mode options

Page 28: Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Cluster Tips/Tricks

● Failback is a great solutionfor resetting after a failure

● But Failback can be amassive problem-causer as well

● Its effects are particularlypronounced in Multi-Site Clusters

● Recommendation: Turn it off,(until you’re ready)

Consider carefully the effects of Failback

Page 29: Multi-Site Clustering for Hyper-V Disaster Recovery

More Multi-Site Cluster Tips/TricksResist creating clusters that support other services

Use disk “dependencies” as Affinity/Anti-Affinity rules

Add Servers in Pairs

● A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster

● Hyper-V all by itself doesn’t have an elegant way to affinitize● Setting disk dependencies against each other is a work-around

● Ensures that a server loss won’t cause site split brain● This is less a problem with the File Share Witness configuration

Page 30: Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Cluster Tips/Tricks

Segregate traffic!!!

Page 31: Multi-Site Clustering for Hyper-V Disaster Recovery

Most Important!

● Crossing subnets also means: changing IP address, subnet mask, gateway, etc., at new site

● Automatically done by using DHCP and dynamic DNS OR must be manually updated

● DNS replication is also a problem. Clients will require time to update their local cache

● Consider reducing DNS TTL or clearing client cache

Ensure that networking remains available when VMs migrate from primary to backup site

Clustering can span subnets!- This is good, but only if you plan for it…

Page 32: Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Clusteringfor Hyper-VDisaster Recovery

Greg Shields, MVP, vExpertSenior PartnerConcentrated Technology

www.ConcentratedTech.com@ConcentratdGreg

Page 33: Multi-Site Clustering for Hyper-V Disaster Recovery

Enjoy and share this material

Feel free to promote this material

Recommend your peers to pass certification

Blog, Tweet and share this material and your experience on Facebook

You’re an Expert? We will be happy to have you as Backup Academy contributor. Apply here.

Web: http://www.backupacademy.comE-mail: [email protected]: BckpAcademyFacebook: backup.academy