AlwaysOn High Availability

50
October 15-18, 2013 | Charlotte, NC SQL Server AlwaysOn PRODUCTION OF DATATECH IT INC.

Transcript of AlwaysOn High Availability

Page 1: AlwaysOn High Availability

October 15-18, 2013 | Charlotte, NC

SQL Server AlwaysOn

PRODUCTION OF DATATECH IT INC.

Page 2: AlwaysOn High Availability

OVERVIEW

2

What are AlwaysOn Availability Groups

What AlwaysOn Availability Groups are not

AlwaysOn Architecture

Solid Foundation

Monitoring Availability Group Health

Diagnosing Availability Group Health

PRODUCTION OF DATATECH IT INC.

Page 3: AlwaysOn High Availability

3

What is AlwaysOn Availability Groups

• New integrated high availability and disaster recovery solution• Simplified deployment• Simplified management

AlwaysOn - SQL Server 2012 High Availability solution

AlwaysOn Flavors

• Clustered Resource=SQL Server instance

AlwaysOn Failover Clustered Instance (SQLFCI)

• Clustered Resource=SQL Server database(s)

AlwaysOn Availability Group (new clustered resource)

Utilize Windows Cluster Health Detection and

Automatic Failover

Require and hosted in a single Windows

Cluster

No interdependency; availability group

hosted on SQL standalone

Page 4: AlwaysOn High Availability

Availability Group

AGCluster

N1SQLINST1

AG1 = DB1

N2SQLINST2

AG1 = DB1

SQLINST1 - Primary Replica SQLINST2 – Secondary Replica

Synchronize

Availability Group - New Clustered Resource

Availability Databases - Resource Contents

Availability Replicas - Resource Location

Group of databases are

highly available for application

Standalone SQL Server uses Local

storage

Log blocks replicated

Page 5: AlwaysOn High Availability

Scaling Availability Groups

AGCluster

N1SQLINST1

AG1 = DB1, DB2AG2 = DB3, DB4,

DB5

N2SQLINST2

AG1 = DB1, DB2AG2 = DB3, DB4,

DB5AG1 Synchronize

N3SQLINST3

AG2 = DB3, DB4, DB5

AG2 Synchronize

AG2 Synchronize

PRODUCTION OF DATATECH IT INC.

Page 6: AlwaysOn High Availability

Scaling Availability Groups LimitsAGCluster

N1SQLINST1

AG1 = DB1, DB2

N2SQLINST2

N3SQLINST3

N4SQLINST4

N5SQLINST5

AG1 = DB1, DB2 AG1 = DB1, DB2 AG1 = DB1, DB2 AG1 = DB1, DB2

Availability Group Maximums

Page 7: AlwaysOn High Availability

7

Availability Group ListenerProvide client connectivity to availability group databases

• DNS name• One or more IP addresses• Listener port

Client Access Point Clustered Resource

• Availability group resource dependency on listener resource

Create listener for availability group in SQL Server

• Connect to the primary replica (read/write)• Read intent connection routed to secondary• Multiple subnet Listener (stretch cluster, multi-site, remote site)

Listener Advantages

Page 8: AlwaysOn High Availability

Application Reconnect on Failover

HRDB

SecondaryPrimary Secondary

ServerA ServerB ServerC

N1SQLINST1

AG= DB1, DB2

AGAG_LISTEN

ServerB

N1SQLINST2

AG= DB1, DB2

Availability Group and Listener failover

AGAG_LISTEN

Application retry connect

Health Problem On SQLINST1 Detected

Page 9: AlwaysOn High Availability

Availability Group on Standalone SQL Server

AGCluster

N1SQLINST1

AG1 = DB1, DB2

N2SQLINST2

AG1 = DB1, DB2

SynchronizePrimary Replica Secondary Replica

AG1 = DB1 AG1 = DB1

Primary ReplicaSecondary Replica

Minimal Configuration

Two node Windows cluster

SQL Server standalone instances

Default instances (SQLINST1, SQLINST2)

Page 10: AlwaysOn High Availability

Availability Groups Hosted on SQL FCI

• AGCluster

10

N1SQLFCI1

AG1 = DB1, DB2

N3SQLFCI2

AG1 = DB1, DB2

Synchronize

Primary Replica Secondary Replica

N2SQLFCI1

N4SQLFCI2

Page 11: AlwaysOn High Availability

Availability Group on SQLFCI and Standalone SQL

11

AGCluster

N1SQLFCI1

AG1 = DB1, DB2

N3SQLINST1

AG1 = DB1, DB2Synchronize

N2SQLFCI1

REMOTEPrimary Replica Secondary Replica

Page 12: AlwaysOn High Availability

AlwaysOn Availability Groups

FlexibleMulti-database failover

Multiple secondaries

Synchronous and asynchronous data movement

Automatic and manual failover

Multi-site clustering

Flexible failover policy

Cloud, Hybrid opportunities

Efficient

Integrated

Page 13: AlwaysOn High Availability

AlwaysOn Availability Groups

Efficient

Close to real-time data

Active Secondary

Readable Secondary

Backup from Secondary

DBCC CHECKDB

Automation using PowerShell

Fast Failover

Compression/Encryption

Automatic Page Repair

Rolling Upgrades

Flexible

Integrated

Page 14: AlwaysOn High Availability

AlwaysOn Availability Groups

Integrated

Application failover using virtual name

Configuration Wizard

PowerShell

Dashboard

System Center Integration (monitor pack)

Rich diagnostic infrastructure

File-stream replication

Replication publisher failover

Efficient

Flexible

Page 15: AlwaysOn High Availability

15

Comparing High Availability Offerings

SQL Failover Clustered Instance

Log Shipping

Mirroring

Replication

Page 16: AlwaysOn High Availability

Availability Groups vs SQL Server Clustering

SQL Server Clustered

• Multi-site Cluster• Shared Storage• Automatic Failover SQL

Service

Availability Group• Local Storage (multiple

copies)• Readable Secondaries• Application Failover• Faster failover

Page 17: AlwaysOn High Availability

Availability Groups vs Log ShippingLog Shipping• Disaster Recovery Site• Easy to Deploy

Availability Group• Multi-Site Cluster• Availability Group wizard• Automatic Failover• Client Redirection

(Listener)• Readable Secondary's

Page 18: AlwaysOn High Availability

Availability Groups vs ReplicationReplication

• Wizard setup• Multiple Secondary Readable

secondary• Dashboard• Create Indexes at secondary

Availability Group

• Automatic Failover• Client Redirection (Listener)• Minimum latency• Readable Secondary Statistics• Compression

Page 19: AlwaysOn High Availability

Availability Groups vs Database Mirroring

Database Mirroring• Disaster Recovery database• Wizard setup per database• Synchronous, Asynchronous Mirroring• Automatic Failover• Automatic Page Repair• Log Stream Compression

Availability Group• Multi-Site solution• Failover Resource >=1 database(s)• Multiple mirrors• Readable secondary• Listener

Page 20: AlwaysOn High Availability

October 15-18, 2013 | Charlotte, NC

What AlwaysOn Availability Groups Are Not

PRODUCTION OF DATATECH IT INC.

Page 21: AlwaysOn High Availability

21

What are AlwaysOn Availability Groups Are Not

<> Database Health Detection

• SQL Failover Clustered Instance• Availability Groups

AlwaysOn utilizes Flexible Failover Policy

Page 22: AlwaysOn High Availability

What are AlwaysOn Availability Groups Are Not

22

AGCluster

N1 N2

SQLFCI1 SQLFCI2

SQLFCI1

Availability Group Replica <> SQL Server Failover Clustered Instance• Two Node Active Active SQL Server

SQLFCI1 (Active-Active)

SQLFCI2 (Active-Active)

Page 23: AlwaysOn High Availability

What are AlwaysOn Availability Groups Are Not

23

AGCluster

N1 N2SQLFCI1 SQLFCI2 SQLFCI1

AG1 = DB1, DB2 AG1 = DB1, DB2AG1 = DB1, DB2

Availability Group Replica <> SQL Server Failover Clustered Instance• Two Node Active Active SQL Server

SQLFCI1Primary Replica

SQLFCI2Secondary Replica

AG1 (DB1, DB2)Availability Group

You cannot host two replicas from same availability group on same Windows cluster node!!!

Page 24: AlwaysOn High Availability

October 15-18, 2013 | Charlotte, NC

AlwaysOn Availability Group Architecture

PRODUCTION OF DATATECH IT INC.

Page 25: AlwaysOn High Availability

25

AlwaysOn Availability Group Architecture

MultiSubnet Listener

Availability Group Role/States Transition

Synchronous and Asynchronous Commit

AlwaysOn Health Detection

Flexible Failover Policy

Page 26: AlwaysOn High Availability

26

Availability Group Listener in Multiple Subnets

LA NY

Primary10.10.29.196

Secondary10.10.45.130

Availability group spans multiple networks• Very popular with DR

Define Listener in multisubnet• One IP address for each network • All IP addresses registered in DNS• One IP address online (primary replica)• At failover:

• IP in new primary network on lined• IP in old primary network off lined

Page 27: AlwaysOn High Availability

27

MultiSubnet Listener

Primary Secondary

OfflineOnline

Requirements• MultiSubnetFailover=True• SQL Native Access Client 11• SQL Client (.NET 4)

MultiSubnetFailover=True• Simultaneous IP Connect

Attempt

MultiSubnetFailover=False• Sequential IP Connect Attempt

Page 28: AlwaysOn High Availability

MultiSubnet Listener Connection

Secondary

Client Provider Opens Sockets to Both IP Addresses

Primary Secondary

ServerA ServerB ServerC

N1SQLINST1PRIMARY

AG= DB1, DB2AG_LISTEN10.10.10.11

Online ServerA

N2SQLINST2

SECONDARYAG= DB1, DB2

Application Issues to connect Through ListenerUsing MultiSubnetFailover=True

AG_LISTEN11.11.11.12

OfflineApplication Connects to Online IP Address Successfully

Page 29: AlwaysOn High Availability

MultiSubnet Listener Connection

Secondary

Client Provider Opens Socket to Offline IP Address

Primary Secondary

ServerA ServerB ServerC

N1SQLINST1 AG= DB1, DB2

AGAG_LISTEN

10.10.10.11Online ServerA

N1SQLINST2

AG= DB1, DB2

Application attempts connect using ListenerWithout MultiSubnetFailover Parameter

AGAG_LISTEN

11.11.11.12OfflineConnection Times Out

Increase Connection Timeout and connect

Offline and then Online IP address attemptedConnection Successful

Page 30: AlwaysOn High Availability

30

AlwaysOn Architecture – Availability Replica Roles

Availability Replica Roles• Primary• Secondary• Resolving

What is Resolving?• Primary and Secondary normal operating roles• Transitional role between primary and secondary roles

What does Resolving mean to availability database?• SQL Server sets databases offline

Page 31: AlwaysOn High Availability

31

AlwaysOn Architecture – Availability Replica Roles Availability Replica State Transitions

• Normal operating role

Primary_normal

• Online Request from SQL resource DLL• Retrieve configuration from Cluster store

Primary_pending

• Manual or Forced Failover

Resolving_pending_failover

• Unkown or off state

Not_available

• Gate role moving replica to normal operational roles

Resolving_normal

• Normal operating role

Secondary_normal

Page 32: AlwaysOn High Availability

32

Role Transitions – SQL Server Startup on Primary

AGCluster

N1SQLINST1

AG1 = DB1, DB2

NOT_AVAILABLE

SQL Server Startup

N1SQLINST1

AG1 = DB1, DB2

RESOLVING

N1SQLINST1

AG1 = DB1, DB2

PRIMARY_PENDING

N1SQLINST1

AG1 = DB1, DB2

PRIMARY_NORMAL

Page 33: AlwaysOn High Availability

33

Role Transitions

AGCluster

N1 N2SQLINST1 SQLINST2

AG1 = DB1, DB2 AG1 = DB1, DB2

RESOLVING RESOLVING

Health Check Detects Failure

PRIMARY SECONDARY

Availability Group RESOLVING

Health Check Detects Health

PRIMARY SECONDARY

SECONDARY PRIMARY

Automatic Failover

Restart resource

Restart Fails

RESOLVING RESOLVING

Page 34: AlwaysOn High Availability

34

AlwaysOn Architecture – Synchronous Asynchronous Commit

AGCluster

N1SQLINST1

AG1 = DB1, DB2

N2SQLINST2

AG1 = DB1, DB2

SQLINST1 - Primary ReplicaSQLINST2 - Secondary Replica

Transaction Log Block

Last Hardened LSN

Last Redone LSN Harden

Last Hardened LSNLast Redone LSN

Redo LOG

Page 35: AlwaysOn High Availability

SQLINST1 Buffer Pool

Synchronous Commit Mode

35

AGCluster

AG1 = DB1, DB2

N2SQLINS

T2

AG1 = DB1, DB2

ID Name

1 CurtEmployee

Log Cache

Log Block

1

Begin TranInsert into Employee() values (2, ‘Shon’)Commit

N1SQLINS

T1

T

Log Writer

Harden

Harden

Tlog Tlog

T1

ORLOGLOG

Page 36: AlwaysOn High Availability

SQLINST1 Buffer Pool

Asynchronous Commit Mode

36

AGCluster

AG1 = DB1, DB2

N2SQLINST2

AG1 = DB1, DB2

ID Name

1 CurtEmployee

Log Cache

Log Block

1

Begin TranInsert into Employee() values (2, ‘Shon’)Commit

N1SQLINST1

T

Log Writer

Harden

Harden

Tlog Tlog

T1

LOGLOG

Page 37: AlwaysOn High Availability

AlwaysOn Architecture – Backup on Secondary

R/W workload

Primary

Backups

Secondary

Backups

SecondaryBackups

Backups can be done on any replica of a database

Secondary replica may be synchronous or asynchronous

Log backups done on all replicas form a single log chain

Advisable for backups to be stored centrally

Recovery Advisor makes restores simple

Page 38: AlwaysOn High Availability

Log backups Form single log chain

Secondary 1 Secondary 2

Primary

LSN 1 - 10

Log backup LSN 1 - 10

Log Backup

LSN 11 - 20

LSN 11 - 20

Log Backup

LSN 31 - 40

LSN 31 - 40

Log Backu

pLSN 21 - 30

LSN 21 - 30

Log Backu

p

Log Backu

p

Log Backu

p

Log Backu

p

Page 39: AlwaysOn High Availability

39

AlwaysOn Architecture

AlwaysOn Flexible Failover Policy

Used for SQL Server Failover Clustered Instance

Used for availability groups

Page 40: AlwaysOn High Availability

40

Achieving Availability

Number of 9’s Availability Percentage Total Annual Downtime2 99% 3 days, 15 hours3 99.9% 8 hours, 45 minutes4 99.99% 52 minutes, 34 seconds5 99.999% 5 minutes, 15 seconds

High Availability Starts with the Foundation• The principal goal of a high availability solution is to minimize or mitigate the

impact of downtime.

High Availability != Constant Availability

The Availability of a system can be expressed as this calculation:• Actual Uptime / Expected Uptime * 100• The resulting value is often expressed by industry in terms of the number of

9’s that the solution provides

Page 41: AlwaysOn High Availability

41

Starting with WindowsSQL Server relies upon the Windows platform to provide foundational infrastructure• Services for networking• Storage• Security• Patching• monitoring

Solid Foundation can significantly reduce planned downtime• Most planned downtime caused by operating system patching,

application patching, hardware maintenance, etc. • This can constitute almost 80 percent of the outages in an IT

environment.

Page 42: AlwaysOn High Availability

42

SQL 2012 Setup GUI on Server Core/UIMODE=EnableUIOnServerCore

Provides GUI Setup on Server

Core Installations

Makes installation

easier

Hyperlinks will not work

Page 43: AlwaysOn High Availability

43

Rolling Upgrade and Patching

AlwaysOn features facilitate rolling upgrades and patching of instances• Helps significantly to reduce application downtime.

Windows Failover Cluster

• Upgrade Passive Nodes without impact

SQL Server on Hyper-V.

• SQL Server instances hosted in the Hyper-V environment receive the additional benefit of Live Migration, which enables you to migrate virtual machines between hosts with zero downtime.

• Administrators can perform maintenance operations on the host without impacting applications.

Page 44: AlwaysOn High Availability

44

Mechanisms to Monitor AlwaysOn HealthAlwaysOn Dashboard

AlwaysOn Powershell cmdlets

Performance Monitor

AlwaysOn Policy Based Management

Why • Above and beyond AlwaysOn in-box health monitoring

What• Monitor Quorum and Cluster Health

• Quorum state?• Are cluster members detected?

• Monitor Availability Group, Replica and Database Health• Are availability groups, replicas, databases healthy?

• Monitor Availability Database Performance Health • Use policies to ensure RPO, RTO

Page 45: AlwaysOn High Availability

45

Dashboard

Monitoring Availability Groups

Health Snapshot

• Windows Cluster• Availability Group• Availability Replica• Availability

Database

Page 46: AlwaysOn High Availability

46

Availability Group Dynamic Management Views

Mechanisms to Monitor AlwaysOn Health

Cluster and Cluster State• sys.dm_hadr_cluster • sys.dm_hadr_cluster_members

Availability Group, Replica, Database• select * from sys.availability_groups• select * from sys.availability_replicas• select * from sys.availability_databases_cluster

State Information• sys.dm_hadr_availability_group_states • sys.dm_hadr_availability_replica_states• sys.dm_hadr_database_replica_states • sys.dm_hadr_database_replica_cluster_states

Page 47: AlwaysOn High Availability

47

Availability Database Performance Health

N1SQLINST1

AG1 = DB1, DB2

N2SQLINST2

AG1 = DB1, DB2sync

Log Send Queue

LCT=T-10LCT=T

Redo Queue

Maintain high availability• Monitoring Availability Group ‘healthy’• Accumulating Latency

Recovery Point Objective (RPO)

Recovery Time Objective (RTO)

Page 48: AlwaysOn High Availability

48

Availability Database Performance Health

N1SQLINST1

AG1 = DB1, DB2

N2SQLINST2

AG1 = DB1, DB2sync

RedoAlwaysOn Health Health Failure Detected

Offline Overhead Online OverheadRTO=Detection + Redo + Overhead

Detection• health_check_timeout (30 seconds)• sp_server_diagnostics (1/3 HCT)• lease expiration (20 seconds)

Redo Queue Outstanding

Overhead

Page 49: AlwaysOn High Availability

Diagnose – Minimize Data Loss

49

Primary

Secondary 2

SYNCHRONIZINGORNOT SYNCHRONIZED

SYNCHRONIZING

Secondary 1

RESOLVING

RESOLVING

SCENARIO

Page 50: AlwaysOn High Availability

Diagnose Recovery Pending Availability Database

50

Diagnose Availability Groups

Availability Database Suspect or Recovery Pending