Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access...

75

Transcript of Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access...

Page 1: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.
Page 2: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Failover ClusterNetworking EssentialsAmitabh P TamhaneSenior Program ManagerWindows Server Clustering

MDC-B337

Page 3: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Session Overview

• Cluster Network Basics• Cluster Network Design Planning• Cluster Network Architecture• Cluster Network Configuration Options• Multi-Subnet Cluster Networking

Considerations

Page 4: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Networking Basics

Page 5: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Nodes Connectivity• All nodes communicate with all other nodes (Full Mesh)

• Cluster state is replicated to all nodes

• Unicast in nature and uses a Request-Reply

• Communication over port 3343 1

3 4

2

Page 6: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster HeartbeatsNode health monitoring

Types of Intra-Cluster Communication

CSV I/OBuilt-in resiliency for storage volume access

Intra-Cluster SynchronizationReplicated state across nodes

Page 7: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Heartbeats: Overview• Failover Clustering conducts health

monitoring between nodes to detect when servers are no longer available

• Health monitoring (configurable):• Nodes exchange heartbeats every 1 second• Nodes are considered down if they do not respond

to 5 heartbeats

• Nodes are removed from cluster membership if they exceed thresholds

• Heartbeats are sent on all cluster enabled networks

You there?

Yes

You there?

Yes

Page 8: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Heartbeats: Example

Private NetworkCluster Network

1

Public NetworkCluster Network

2

1 2

Page 9: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Heartbeats: Network Usage• Lightweight Traffic

• (only 134 bytes)

• Sensitive to Latency / Packet Loss• Saturated NIC blocking cluster heartbeats could cause nodes to be

removed from active cluster membership• Network experience significant packet loss may cause heartbeats to be

missed

• Bandwidth not as important, but Quality of Service is…

Page 10: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Intra-Cluster Synchronization: Overview• Cluster configuration updates to all nodes

• Cluster database changes replicated to all nodes

• Cluster membership changes to all nodes• Removing nodes in case of connectivity loss

• Roles state changes stored in cluster database• Private properties updated by resource dlls• Role’s failover or move to different owner node

• Are sent over a single cluster enabled network

Page 11: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Intra-Cluster Synchronization: Example

Private NetworkCluster Network

1

Public NetworkCluster Network

2

1 2

ConfigurationChange

Page 12: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Intra-Cluster Synchronization: Network Usage• Lightweight node-2-node messaging

• Number of bytes on wire are fairly small

• Frequency of messaging dependent on workload• In general infrequent, on runtime stable File / Hyper-V clusters• Heavier frequency of updates on SQL / Exchange clusters

• Sensitive to Latency• Cluster performance would be directly affected by high latency• Cluster would continue functioning under high latency

• Bandwidth not as important, but Quality of Service is…

Page 13: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

CSV I/O Redirection: Overview• Metadata I/O sent to CSV coordinator node

• Metadata updates to files• Very infrequent for VM workload (such as starting a VM or live migration)

• All I/O redirected over network in failure scenarios• No storage connectivity (Disk access disruption)• Asymmetric storage configurations (Disk access not present)• Replicated disk (Disk read-only access)

• Are sent over same network as intra-cluster synchronization

• Can leverage SMB multi-channel to stream over multiple interfaces

Page 14: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

CSV I/O Metadata Updates: Example

1 2 3

Page 15: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

CSV I/O Redirection: Example

1 2

Page 16: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

CSV I/O Redirection: Network Usage

• Lightweight and infrequent

• Latency in network would cause Metadata I/O to be slow performance

• Bandwidth not as important, but Quality of Service is…

Metadata

• Significant bandwidth usage• All I/O forwarded via SMB

over the network

• Insufficient bandwidth may cause other important I/O to not go through

• Bandwidth is very important, as is Quality of Service…

Failure Scenarios /Asymmetric Storage

Connectivity

Page 17: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster NetworkingConsiderations

Page 18: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Traditional Network Configuration Guidance• What we have recommended over the past

decade…• At least 2 independent networks

• Network 1: Public Client Access• IPv4 (static or DHCP) or IPv6 Stateless address auto-

configuration (SLAAC)• Default gateway configured (routable for public access)• Client Access Points (Clustered Role’s IP address) configured

on this network

• Network 2: Private Infrastructure Access• IPv6 (preferred) or IPv4• IPv6 link-local (fe80) works great…

• Default gateway not configured (non-routable for public access)More deployment options with Windows Server 2012 for converged networking

Page 19: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

May I need even more than two Networks?• Various network traffic to consider:

• Yes, this gets a little unrealistic…• Especially when you start adding NIC Teaming

•Isolated network for the host partition•Increased security isolation (could use VLAN’s to isolate from Client network)Host Management

•Public network for client access to VMsVirtual Machines

•Short duration heavy burst trafficLive Migration•Intra-cluster communication is lightweight, but sensitive to latency•Metadata updates are infrequent and light, but failure conditions could be heavy

Intra-cluster communication / CSV

•Dedicated storage network•Disable for cluster useiSCSI

1

2

3

4

5

Key Takeaway: It is really about providing quality of service guarantees!!

Page 20: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Are Separate Networks Really Needed?

Required?• No – It is not required to have 2

separate networks• Clustering does support a

converged networking model• Validation will generate a Warning

to alert you of a potential single point of failure• Validation is not NIC Teaming

aware

Recommended?

• Yes – It is recommended to have redundant network communication between nodes

• Sort of… let’s talk about what really matters and converged networking (next slide)

Page 21: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Converged Network ConsiderationsResiliency

• In a highly available system you want to avoid any single points of failure

• Many ways to accomplish network redundancy• Multiple independent networks• NIC Teaming

Quality of Service

• Cluster heartbeats are lightweight, but sensitive to latency• If cluster heartbeats can’t get

through… this can be falsely interpreted that nodes are down

• Many ways to accomplish network quality of service• Multiple network cards• QoS• Create VLANs

Page 22: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Use multiple physical NIC’s• Single multi-port NIC introduces it as a single point of failure

• Connect NIC’s to different switches• Carving up VLAN’s all to the same switch is a single point of failure

• Using different types of NIC’s removes common drivers• Eliminate a driver bug from affecting connectivity across all NIC’s

• Ensure upstream network resiliency• Eliminate a single point of failure between multiple networks

Reducing Single Points of Failure

Page 23: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Achieving Network Redundancy• Use NIC Teaming to aggregate multiple NICs into single logical

NIC• Cluster configures single Cluster Network using the logical NIC for communication• Clustering fully supports in-box NIC Teaming as well as 3rd party solutions

• Multiple NICs on different subnets• Cluster creates separate Cluster Network for each NIC• NetFt will provide fault tolerance across the multiple Cluster Networks

• SMB Multi-Channel uses multiple Private Networks• Multiple NICs on different subnets with no gateways configured• Cluster configures SMB multi-channel to use private Cluster Networks for CSV

• Live Migration of VMs will take advantage of NetFt routing• Live migration will use different cluster networks in case of network down

Page 24: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• CSV Requirements:• SMB Server / Workstation services• NTLM

• CSV uses SMB client/server for node-2-node communication• SMB is aware of Cluster Networks• SMB multi-channel & RDMA support will optimize CSV traffic

• CSV supports nodes in dissimilar subnets• Added in Windows Server 2012

• Tuning CSV performance:• Disabling NetBIOS has shown increased performance• CSV testing has seen little advantage with Jumbo frames (however, not discouraged …)• TCP Offload: CSV traffic over SMB will be optimized• Receive Side Scaling: CSV traffic over SMB will be optimized

CSV Networking Considerations

Page 25: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster NetworkingUnder The Hood

Page 26: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• NetFt is a virtual network adapter• All intra-cluster communication between nodes goes over NetFt• Provides seamless internode communication

• Mechanism by which cluster uses multiple cluster-enabled adapters to communicate• Builds fault-tolerant connections between nodes in the cluster across all available interfaces

• NetFt initiates and receives heartbeats to/from other nodes in the cluster• Heartbeats are sent over all the cluster-enabled networks by NetFt• In case of missed heartbeats, notifies cluster service of communication loss to a node

• Similar to an internal NIC team for clustering• Route prioritization and routing across dissimilar subnets• Provides resiliency to the loss of a network path between nodes• Dynamically switches intra-cluster traffic to another available cluster network

• Compatible with NIC Teaming

NetFt: Failover Cluster Virtual Adapter

Page 27: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Visible in Device Manager and with IPConfig /all• Select “Show hidden devices” in Device Manager• NDIS 6.2 miniport virtual adapter

• Completely self configuring• MAC address is self-generated (based on hash of MAC address of 1st physical NIC)• MAC address conflict detection and resolution in Windows Server 2012

• NetFt self-configures an APIPA (Automatic Private Internet Protocol Addressing) address• IPv4: 169.254.* (see picture above)• IPv6: fe80::* (see picture above)

• No manual configuration necessary

Viewing NetFt Virtual Adapter

Page 28: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Discovers multiple communication paths between nodes• Identifies available network adapters, IP addresses, subnets• Determines list of same-subnet, cross-subnet, public/private Cluster Networks

• Enables NetFt routing of cluster networks• Plumbs cluster networks on NetFt• NetFt starts sending heartbeats on the enabled cluster network routes

• Maintains cluster network route health• Receives notifications from NetFt about route health• Determines the need of using cross-subnet routes

• Part of Cluster Service• Loaded upon the start of Cluster Service

Cluster Network Topology Manager

Page 29: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

NetFt: Architecture• Cluster Service plumbs network routes

over NIC1, NIC2 on NetFt

• Cluster Service establishes TCP connection over NetFt adapter using the private NetFt IP address (source port 3343)

• NetFt wraps the TCP connection inside of a UDP packet (source port 3343)

• NetFt sends this UDP packet over one of the cluster-enabled physical NIC adapters to the destination node targeted for destination node’s NetFt adapter

• Destination node’s NetFt adapter receives the UDP packet and then sends the TCP connection to the destination node’s Cluster Service

NDIS

IP

TCP UDP UDP

NetFt NIC1 NIC2

ClusSvc

NDIS

IP

TCP

NetFt NIC1 NIC2

ClusSvc

Page 30: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

NetFt: Virtual Adapter Performance Filter• Improves cluster network performance• Including CSV I/O redirection performance

• Packet rerouting to increase performance• Inspects traffic inbound on the physical NIC• Delivers traffic addressed to NetFt directly to NetFt driver• Bypasses the physical NIC UDP/IP stack• Traffic only traverses the TCP/IP stack once

• Enabled by default

Page 31: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Network Discovery• Cluster uses exactly one IP per Subnet per NIC

• Cluster ignores other IPs from the same subnet configured on the NIC• Cluster ignores other NICs & associated IPs from the same subnet

• Each NIC per Node will be part of exactly one Cluster Network• Cluster will use prefix matching to determine the set of Cluster Networks• Cluster has built-in resilience to use IPv4 or IPv6 per NIC (prefix must match)

• Cluster will use an IP from different Subnet from another NIC• Cluster will ignore other IPs configured on that NIC for subnets already discovered

Page 32: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Network Discovery: Example

Same SubnetNIC 2

1 2

Cluster Network 1

NIC 2 IgnoredBy Cluster

Same SubnetNIC 1

Page 33: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Network Roles• Cluster networks are created for all logical

subnets connected to all nodes in the cluster• Each NIC connected to that common subnet

will be listed

• Cluster networks can be configured for different cluster use:Name Valu

eDescription

Disabled for Cluster Communication

0 No cluster communication of any kind sent over this network

Enabled for Cluster Communication only

1 Internal cluster communication and CSV traffic can be sent over this network

Enabled for client and cluster communication

3 Cluster IP Address resources can be created on this network for clients to connect to. Internal and CSV traffic can be sent over this network

Page 34: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Determines priority for NetFt traffic (effects both CSV & intra-cluster traffic)

• Networks are given a “cost” (Metric) to define priority• Lower metric value = higher priority (private)• Higher metric value = lower priority (public)

• Automatically configured based on cluster network role setting• Cluster Network Role of 1 = 40,000 starting value• Cluster Network Role of 3 = 80,000 starting value

• Link speed, RDMA, and RSS capabilities will reduce metric value• NetFt will load balance across networks that are <16 metric values apart

• Load balancing is cluster wide• All communication between a pair of nodes will be on the same interface • A node may communicate with different nodes on different networks if they are of similar metric weight

Cluster Network Prioritization

Page 35: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

CSV with SMB Multi-Channel• CSV traffic will default to SMB Multi-Channel

• CSV traffic over SMB always uses multi-channel by default• If SMB multi-channel is not available, CSV traffic uses NetFt route selection

• Cluster network prioritization affects only NetFt• Cluster network prioritization does not affect SMB multi-channel

• Disabling SMB Multi-Channel:• Set-SmbServerConfiguration -EnableMultiChannel $false

CSV Streaming I/O Across Multiple

Networks

10.10.10.X

20.20.20.X

Page 36: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• SMB Multi-Channel:1. Only select networks which are:

a. Enabled for cluster useb. Set to a cluster role of 1 (internal cluster communication)• Logic can be overridden with UseClientAccessNetworksForSharedVolumes cluster property

2. Select the NIC’s with the best featuresa. Select RDMA-capable NICsb. If none select RSS-capable and/or teamed NICc. If none select all others

3. If multiple NIC’s are selected, pick NIC with highest speed4. If multiple NIC’s are equal on above criteria, then stream over multiple NIC’s

• NetFt1. If no SMB multi-channel enabled network found, CSV uses NetFt logic for network

selection

Understanding which network CSV will use:

Page 37: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Network Validation Improvements

• Network validation improved from simple PING to using NetFt

• Verifies port 3343 and full cluster network connectivity requirements

• Provides better diagnosability and pre-identifies cluster configuration problems

Page 38: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Add Node to Cluster: Networking Errors

• Cluster requires full mesh connectivity between nodes• Joining node is expected to have full mesh connectivity with existing nodes

• Joining node attempts to establish communication with all active nodes• Cluster Service on the joining node will stop if 1 or more active nodes is not reachable• Cluster Service restarted by SCM

• Cluster favors a node already in the active membership over joining node

• Add Node UI Wizard & PowerShell will timeout after 3 minutes• UI Wizard & PowerShell will evict the newly added node• Run Network Validation to identify network communication issues

Page 39: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster NetworkingConfigurations

Page 40: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Failover Clustering requires that exceptions be defined to allow clusters to communicate• Failover Clustering primarily uses port 3343

• When you install the Failover Clustering feature it will automatically enable all the necessary Windows Firewall exceptions

• Security Configuration Wizard is cluster aware and will detect and enable appropriate exceptions

• Consideration: Need to manually enable if using a 3rd party firewall solution• Common support call generator• Symantec Endpoint Protection

Firewall ExceptionsInbound Rules

Outbound Rules

Page 41: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Failover clustering by default is configured to deliver the highest levels of availability• Which means quickly detecting and reacting to failures• This can sometimes result in premature failovers

• Some customers may wish to have reduced sensitivity• Enables greater tolerance for unreliable networks• Results in greater downtime when things do go wrong

Tuning Network Thresholds

Highest Availability Tolerance of Transient Failures

Clustering is fully configurable to achieve either

Page 42: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Configuring Cluster Heartbeating• Cluster intra-node heartbeating is fully configurable• Configurable via a cluster common property• Thresholds for heartbeats across subnets independently

configurable

• PowerShell:• (Get-Cluster).SameSubnetDelay = 2

Increasing heartbeat thresholds does not fix network problems, it only masks them!

Property Default Maximum Description

SameSubnetDelay 1 second 2 seconds Frequency heartbeats are sent

SameSubnetThreshold

5 heartbeats

120 heartbeats

Missed heartbeats before an interface is considered down

CrossSubnetDelay 1 second 4 seconds Frequency heartbeats are sent to nodes on dissimilar subnets

CrossSubnetThreshold

5 heartbeats

120 heartbeats

Missed heartbeats before an interface is considered down to nodes on dissimilar subnets

Page 43: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Win2012 R2 Heartbeat Changes• For a Hyper-V deployment slightly more relaxed settings may make

sense• Traditionally the definition of down, is when clients cannot connect to an app in the VM• In general, TCP defines recoverable network errors for applications• Recommended for cluster heartbeats not to exceed 20 seconds

• Greater resiliency to transient network failures with Windows Server 2012 R2• Heartbeat thresholds increased by default for Hyper-V Clusters• Defaults changed when the first VM is clustered

• Cluster heartbeating improved for increased resiliency to packet loss

Cluster Property Default Hyper-V Default

SameSubnetThreshold 5 10

CrossSubnetThreshold 5 20

Page 44: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• NetFt produces additional information to the Cluster.log in Windows Server 2012 to help identify root cause on cluster network failures

• Logs last X number of heartbeats to the Cluster.log on failure• Reduces need to turn on NetFt tracing and provides data in a single log file

• Configurable via RouteHistoryLength cluster common property• Captures last 10 heartbeats by default (double default failure threshold)• Change this value to adjust for the changes in heartbeat thresholds

Configuring NetFt Heartbeat Logging

Page 45: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Configuring Cluster Network RolesAutomatic Configuration

• Network role is automatically configured during cluster creation

• Logic for role assignment:

Manual Configuration

• Failover Cluster Manager

• PowerShell• (Get-ClusterNetwork “Cluster Network 1").Role=3

•If enabled for iSCSI Software Initiator•New Windows Server 2012 logic

Disabled for Cluster

Communication

•If no default gateway is present

Enabled for Cluster Communication

only

•If a default gateway is present

Enabled for client and cluster

communication

Page 46: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Configuring Cluster Network Prioritization• Clustering has intelligence to automatically detect and set preference on

which network to use• Metric’s can be manually configured as well

• Cluster automatically assigned mode (default)• (Get-ClusterNetwork “Cluster Network 1").AutoMetric = $true

• Manually set• (Get-ClusterNetwork “Cluster Network 2").Metric = 40000

• Consideration: NetFt load balancing and SMB multi-channel will send cluster communication over multiple paths• Recommended to leave as default cluster controlled• Restricting traffic different in Windows Server 2012

Page 47: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Configuring Live Migration Network Priority• Intra-cluster communication / CSV traffic will

prefer the highest priority network

• Live migration will prefer the 2nd highest network• Live migration uses cluster network topology to discover available networks

and establish priority

• Live migration preference can also be manually configured in Failover Cluster Manager• Will take preference over cluster settings

• Also configurable via PowerShell• http://blogs.msdn.com/b/virtual_pc_guy/archive/2013/05/02/using-powershell-t

o-configure-live-migration-networks-in-a-hyper-v-cluster.aspx

Page 48: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Network Binding Order• Cluster nodes are multi-homed systems• Network priority affects DNS client for outbound network connectivity

• Client connecting adapter should be first bound• Non-routed networks lower priority

• Configuring Binding Order• GUI:• Control Panel -> Network Connections• Press “Alt” to get the menu bar• Advanced->Advanced Settings

• Command Line:• nvspbind.exe (web download)

• NetFt automatically placed at bottom of binding order during install• New in Windows Server 2012

Page 49: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Configuring Quality of Service Policies • QoS policy features in Windows Server 2012

Prioritization

• Recommendation: Configure on all cluster deployments

• Heartbeats and Intra-cluster communication are sensitive to latency and configuring a QoS Priority Flow Control policy will ensure they are sent first

Bandwidth Allocation

• Recommendation: Configure on CSV deployments

• CSV may send large amounts of data, need to ensure it has sufficient bandwidth

• Relative Minimum Bandwidth SMB policy recommended

Page 50: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Recommended that iSCSI Storage fabric have a dedicated and isolated network

• Disable iSCSI networks for cluster use• Prevents intra-cluster communication as well as CSV traffic from flowing over same

network• Cluster will automatically disable iSCSI interfaces for cluster use in Win2012

• Set to be lowest in binding order

• Configure iSCSI network redundancy with MPIO• NIC Teaming is now supported in Windows Server 2012 with iSCSI

iSCSI Cluster Planning

Page 51: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Spanning ClustersAcross Subnets

Page 52: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• Failover Clustering supports having nodes reside in different IP subnets• Most commonly used for multi-site clusters that are stretched across datacenters• Longer distance traditionally means greater network latency

• CSV now supports nodes in different IP subnets in Windows Server 2012

• SQL Server 2012 now supports multi-subnet clusters

Multi-subnet Clusters

10.10.10.111

Site A Site B

20.20.20.222

Page 53: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Value 2 (new)Value 1Value 0 (default)

Controlling Full Mesh Heartbeating• NetFt behavior for building cross subnet routes is

configurable• Exposed via PlumbAllCrossSubnetRoutes cluster property• (Get-Cluster). PlumbAllCrossSubnetRoutes = 1

Do not attempt to find cross subnet routes if local routes are found

Always attempt to find routes that cross subnets

Disable the cluster service from attempting to discover cross subnet routes after node successfully joins

Page 54: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

• FSW is a type of quorum witness that uses a SMB Share for Arbitration• Partitioned nodes arbitrate for quorum witness to achieve majority

• FSW is configured primarily in multi-site stretch cluster scenarios• Placed on a 3rd separate site for automatic failover• Can be stretched to any distance (Azure??)

• FSW network traffic is minimal• Time-stamp updated only when nodes join or leave cluster membership• Light-weight arbitration protocol• Latency delays need to be really significant to impact the functionality• Arbitration max 90 sec

File Share Witness (FSW)

Page 55: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Cluster Network Security over the WAN• Encrypt intra-node communication

• Example: (Get-Cluster). SecurityLevel = 2

Site A Site B

10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Value Description

0 Clear Text

1 Signed (default)

2 Encrypted Disclaimer: Incurs some performance overhead

Page 56: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Multi-subnet Resource Configuration• Multi-subnet clusters leverage a

single NetName with multiple IP’s

• Network Name resource stays up if either IP Address Resource A OR IP Address Resource B is up

• DNS registration behavior configurable via NetName private property RegisterAllProvidersIP

OR

Network Name Resource

IP Address Resource A

IP Address Resource B

Page 57: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Client Reconnect Considerations• Nodes in dissimilar subnets• Client Access Point fails across subnets• Clients need that new IP Address from DNS to reconnect

10.10.10.111

DNS Server 1DNS Server 2DNS Replication

Record Created Record Updated

Site A Site B

Record UpdatedRecord Obtained

20.20.20.222

Page 58: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Solution #1: Prefer Local Failover• Scale up for local failover for higher availability

• No change in IP addresses for HA• Means not going over the WAN and is still usually preferred

• Cross-site failover for disaster recovery

10.10.10.111

DNS Server 1

VM = 10.10.10.111

Site A Site B

20.20.20.222

Page 59: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Solution #2: Stretch VLAN’s• Deploying a VLAN minimizes client reconnection times• IP of the VM never changes

DNS Server 1 DNS Server 2

FS = 10.10.10.111

Site A Site B

10.10.10.11110.10.10.111

VLAN

Page 60: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Solution #3: Abstraction in Network Device• Network device uses 3rd IP• 3rd IP is the one registered in DNS & used by client• Example:

http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf

10.10.10.111 20.20.20.222

DNS Server 1

DNS Server 2

VM = 30.30.30.30

Site A Site B

30.30.30.30

Page 61: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Solution #4: Configure Network Name Setting• RegisterAllProvidersIP (default = 0 for FALSE)• Determines if all IP Addresses for a Network Name will be registered by DNS• TRUE (1): IP Addresses can be online or offline and will still be registered• Ensure application is set to try all IP Addresses, so clients can connect quicker• Not supported by all applications, check with application vendor

• Supported by SQL Server 2012

• HostRecordTTL (default = 1200 seconds)• Controls time the DNS record lives on client for a cluster network name• Shorter TTL: DNS records for clients updated sooner• Disclaimer: This does not speed up DNS replication

Page 62: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Solution #5: Network Virtualization• When deploying Hyper-V with Windows Server 2012 the new

Network Virtualization can abstract VMs logical subnet boundaries

• Enables VMs to run on nodes in different subnets without re-configuring the IP address in the guest OS

• Requires SCVMM 2012 SP1

Page 63: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

In Review: Session Objectives and Takeaways• Design your network on clusters to provide:

1. Resiliency• With multiple networks or through NIC Teaming

2. Quality of service• Dedicated cluster networks is old-school… use QoS policies in

Windows Server 2012

• CSV brings bandwidth considerations

• In general clustering will ‘just work’ out of the box• IT Generalists: No need to worry• IT Specialists: Highly flexible and configurable

Page 64: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Appendix

Page 65: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Network Name PropertiesProperty DescriptionResourceData Resource Internal Use (read only)

StatusNetBIOS Status (error) code for NetBIOS – 0 means no error (read only)

StatusDNS Status (error) code for DNS – 0 means no error (read only)

StatusKerberos Status (error) code for Kerberos – 0 means no error (read only)

CreatingDC Domain controller upon which this netname’s AD object was initially created (read only)

LastDNSUpdateTime Time at which DNS was last updated (read only)

ObjectGUID (read only)

Name The name published in NetBIOS and SAM account name in AD

DnsName The name published in DNS

RemapPipeNames Legacy setting for SMB

HostRecordTTL TTL in seconds of the DNS record, this controls how long caches will retain the record

RegisterAllProvidersIP 0 (false), 1 (true) – when enabled all IP addresses this netname depends on will be published to DNS no matter what their provider state is

PublishPTRRecords 0 (false), 1 (true) – Create reverse DNS records

TimerCallbackAdditionalThreshold

Unused

Page 66: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

IP Address (v4) PropertiesProperty DescriptionLeaseObtainedTime Time when DHCP lease was acquired (Read only)

LeaseExpiresTime Time when DHCP lease runs out (Read only)

DhcpServer The DHCP server that issued the lease (Read only)

DhcpAddress Address assigned by DHCP (Read only)

DhcpSubnetMask Subnet mask assigned by (Read only)

Network The network this IP address is on (such as “Cluster Network 2”)

Address The IP Address assigned to this IP resource

SubnetMask Subnet mask

EnableNetBIOS 0 (false), 1 (true) - Controls whether this address is published by netbios

OverrideAddressMatch This setting is unused

EnableDhcp 0 (false), 1 (true) – determines whether this IP Address resource obtains its address via DHCP

Page 67: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Relative Minimum Bandwidth Policy ExampleExample of setting minimum policy of cluster for 30%, live migration for 20%, and SMB traffic for 50% of the total bandwidth

New-NetQosPolicy “Cluster” –Cluster –MinBandwidthWeightAction 30New-NetQosPolicy “Live Migration” –LiveMigration –MinBandwidthWeightAction 20New-NetQosPolicy “SMB” –SMB –MinBandwidthWeightAction 50

Page 68: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Priority Flow Control (PFC) Example

Example of setting cluster heartbeating and intra-node synchronization to be the highest priority traffic

New-NetQosPolicy “Cluster”-Cluster –Priority 6New-NetQosPolicy “SMB” –SMB –Priority 5New-NetQosPolicy “Live Migration” –LiveMigration –Priority 3

Note: Available values of 0 – 6Must be enabled on both the nodes in the cluster and the physical network switchUndefined traffic is of priority 0

Page 69: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Client Access & SMB Multi-Channel• Cluster is integrated with SMB Multi-Channel for Client Access

• Allows streaming SMB Client traffic across multiple networks• Delivers improved I/O performance for accessing SMB Share hosted on a cluster• Cluster configures SMB multi-channel to use public Cluster Networks for client traffic

• Clients accessing a Clustered File Share will default to SMB Multi-Channel• RDMA capable adapter’s IP address is registered with DNS Server• SMB multi-channel prefers RDMA networks• If SMB multi-channel is not available, CSV traffic uses NetFt route selection

• Cluster configures SMB Multi-Channel:• By default, all public networks are configured• To disallow a public network for SMB Multi-Channel:• Configure on Distributed Network Name Resource:• Private Property: “ExcludeNetworks” – List of exclusion network Ids

SMB Client traffic

Streaming I/O Across Multiple

Networks

10.10.10.X 20.20.20.X

\\server1\share1

Client

Page 70: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Related contentBreakout Sessions MDC-B305 Continuous Availability: Deploying and Managing Clusters Using Windows Server 2012 R2MDC-B311 Application Availability Strategies for the Private CloudMDC-B331 Upgrading Your Private Cloud with Windows Server 2012 R2MDC-B333 Storage and Availability Improvements in Windows Server 2012 R2MDC-B336 Cluster in a Box 2013: How Real Customers Are Making Their Business Highly Available…MDC-B337 Failover Cluster Networking EssentialsMDC-B375 Microsoft Private Cloud Fast Track v3: Private Cloud Reference Architecture…MDC-B403 Failover Clustering: Quorum Model Design for Your Private Cloud

Hands-on Labs MDC-H303 Configuring Hyper-V over Highly Available SMB Storage

Find Me Later at the Storage Booth

Page 71: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Track resourcesHow to Configure a Clustered Storage Space in Windows Server 2012http://blogs.msdn.com/b/clustering/archive/2012/06/02/10314262.aspx

Virtualizing storage for scale, resiliency, and efficiencyhttp://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx

Updated Links on Windows Server 2012 File Server and SMB 3.0http://blogs.technet.com/b/josebda/archive/2013/05/05/updated-links-on-windows-server-2012-file-server-and-smb-3-0.aspx

Page 72: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Track resourcesLearn more about Windows Server 2012 R2 Preview, download the datasheet and evaluation bits on http://aka.ms/WS2012R2Learn more about System Center 2012 R2 Preview, download the datasheet and evaluation bits on http://aka.ms/SC2012R2

Page 73: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

msdn

Resources for Developers

http://microsoft.com/msdn

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Resources for IT Professionals

http://microsoft.com/technet

Page 74: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

Evaluate this session

Scan this QR code to evaluate this session.

Page 75: Cluster Heartbeats Node health monitoring CSV I/O Built-in resiliency for storage volume access Intra-Cluster Synchronization Replicated state.

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.