AlwaysOn High Availability
-
Upload
mohammed-alam -
Category
Documents
-
view
127 -
download
2
Transcript of AlwaysOn High Availability
October 15-18, 2013 | Charlotte, NC
SQL Server AlwaysOn
PRODUCTION OF DATATECH IT INC.
OVERVIEW
2
What are AlwaysOn Availability Groups
What AlwaysOn Availability Groups are not
AlwaysOn Architecture
Solid Foundation
Monitoring Availability Group Health
Diagnosing Availability Group Health
PRODUCTION OF DATATECH IT INC.
3
What is AlwaysOn Availability Groups
• New integrated high availability and disaster recovery solution• Simplified deployment• Simplified management
AlwaysOn - SQL Server 2012 High Availability solution
AlwaysOn Flavors
• Clustered Resource=SQL Server instance
AlwaysOn Failover Clustered Instance (SQLFCI)
• Clustered Resource=SQL Server database(s)
AlwaysOn Availability Group (new clustered resource)
Utilize Windows Cluster Health Detection and
Automatic Failover
Require and hosted in a single Windows
Cluster
No interdependency; availability group
hosted on SQL standalone
Availability Group
AGCluster
N1SQLINST1
AG1 = DB1
N2SQLINST2
AG1 = DB1
SQLINST1 - Primary Replica SQLINST2 – Secondary Replica
Synchronize
Availability Group - New Clustered Resource
Availability Databases - Resource Contents
Availability Replicas - Resource Location
Group of databases are
highly available for application
Standalone SQL Server uses Local
storage
Log blocks replicated
Scaling Availability Groups
AGCluster
N1SQLINST1
AG1 = DB1, DB2AG2 = DB3, DB4,
DB5
N2SQLINST2
AG1 = DB1, DB2AG2 = DB3, DB4,
DB5AG1 Synchronize
N3SQLINST3
AG2 = DB3, DB4, DB5
AG2 Synchronize
AG2 Synchronize
PRODUCTION OF DATATECH IT INC.
Scaling Availability Groups LimitsAGCluster
N1SQLINST1
AG1 = DB1, DB2
N2SQLINST2
N3SQLINST3
N4SQLINST4
N5SQLINST5
AG1 = DB1, DB2 AG1 = DB1, DB2 AG1 = DB1, DB2 AG1 = DB1, DB2
Availability Group Maximums
7
Availability Group ListenerProvide client connectivity to availability group databases
• DNS name• One or more IP addresses• Listener port
Client Access Point Clustered Resource
• Availability group resource dependency on listener resource
Create listener for availability group in SQL Server
• Connect to the primary replica (read/write)• Read intent connection routed to secondary• Multiple subnet Listener (stretch cluster, multi-site, remote site)
Listener Advantages
Application Reconnect on Failover
HRDB
SecondaryPrimary Secondary
ServerA ServerB ServerC
N1SQLINST1
AG= DB1, DB2
AGAG_LISTEN
ServerB
N1SQLINST2
AG= DB1, DB2
Availability Group and Listener failover
AGAG_LISTEN
Application retry connect
Health Problem On SQLINST1 Detected
Availability Group on Standalone SQL Server
AGCluster
N1SQLINST1
AG1 = DB1, DB2
N2SQLINST2
AG1 = DB1, DB2
SynchronizePrimary Replica Secondary Replica
AG1 = DB1 AG1 = DB1
Primary ReplicaSecondary Replica
Minimal Configuration
Two node Windows cluster
SQL Server standalone instances
Default instances (SQLINST1, SQLINST2)
Availability Groups Hosted on SQL FCI
• AGCluster
10
N1SQLFCI1
AG1 = DB1, DB2
N3SQLFCI2
AG1 = DB1, DB2
Synchronize
Primary Replica Secondary Replica
N2SQLFCI1
N4SQLFCI2
Availability Group on SQLFCI and Standalone SQL
11
AGCluster
N1SQLFCI1
AG1 = DB1, DB2
N3SQLINST1
AG1 = DB1, DB2Synchronize
N2SQLFCI1
REMOTEPrimary Replica Secondary Replica
AlwaysOn Availability Groups
FlexibleMulti-database failover
Multiple secondaries
Synchronous and asynchronous data movement
Automatic and manual failover
Multi-site clustering
Flexible failover policy
Cloud, Hybrid opportunities
Efficient
Integrated
AlwaysOn Availability Groups
Efficient
Close to real-time data
Active Secondary
Readable Secondary
Backup from Secondary
DBCC CHECKDB
Automation using PowerShell
Fast Failover
Compression/Encryption
Automatic Page Repair
Rolling Upgrades
Flexible
Integrated
AlwaysOn Availability Groups
Integrated
Application failover using virtual name
Configuration Wizard
PowerShell
Dashboard
System Center Integration (monitor pack)
Rich diagnostic infrastructure
File-stream replication
Replication publisher failover
Efficient
Flexible
15
Comparing High Availability Offerings
SQL Failover Clustered Instance
Log Shipping
Mirroring
Replication
Availability Groups vs SQL Server Clustering
SQL Server Clustered
• Multi-site Cluster• Shared Storage• Automatic Failover SQL
Service
Availability Group• Local Storage (multiple
copies)• Readable Secondaries• Application Failover• Faster failover
Availability Groups vs Log ShippingLog Shipping• Disaster Recovery Site• Easy to Deploy
Availability Group• Multi-Site Cluster• Availability Group wizard• Automatic Failover• Client Redirection
(Listener)• Readable Secondary's
Availability Groups vs ReplicationReplication
• Wizard setup• Multiple Secondary Readable
secondary• Dashboard• Create Indexes at secondary
Availability Group
• Automatic Failover• Client Redirection (Listener)• Minimum latency• Readable Secondary Statistics• Compression
Availability Groups vs Database Mirroring
Database Mirroring• Disaster Recovery database• Wizard setup per database• Synchronous, Asynchronous Mirroring• Automatic Failover• Automatic Page Repair• Log Stream Compression
Availability Group• Multi-Site solution• Failover Resource >=1 database(s)• Multiple mirrors• Readable secondary• Listener
October 15-18, 2013 | Charlotte, NC
What AlwaysOn Availability Groups Are Not
PRODUCTION OF DATATECH IT INC.
21
What are AlwaysOn Availability Groups Are Not
<> Database Health Detection
• SQL Failover Clustered Instance• Availability Groups
AlwaysOn utilizes Flexible Failover Policy
What are AlwaysOn Availability Groups Are Not
22
AGCluster
N1 N2
SQLFCI1 SQLFCI2
SQLFCI1
Availability Group Replica <> SQL Server Failover Clustered Instance• Two Node Active Active SQL Server
SQLFCI1 (Active-Active)
SQLFCI2 (Active-Active)
What are AlwaysOn Availability Groups Are Not
23
AGCluster
N1 N2SQLFCI1 SQLFCI2 SQLFCI1
AG1 = DB1, DB2 AG1 = DB1, DB2AG1 = DB1, DB2
Availability Group Replica <> SQL Server Failover Clustered Instance• Two Node Active Active SQL Server
SQLFCI1Primary Replica
SQLFCI2Secondary Replica
AG1 (DB1, DB2)Availability Group
You cannot host two replicas from same availability group on same Windows cluster node!!!
October 15-18, 2013 | Charlotte, NC
AlwaysOn Availability Group Architecture
PRODUCTION OF DATATECH IT INC.
25
AlwaysOn Availability Group Architecture
MultiSubnet Listener
Availability Group Role/States Transition
Synchronous and Asynchronous Commit
AlwaysOn Health Detection
Flexible Failover Policy
26
Availability Group Listener in Multiple Subnets
LA NY
Primary10.10.29.196
Secondary10.10.45.130
Availability group spans multiple networks• Very popular with DR
Define Listener in multisubnet• One IP address for each network • All IP addresses registered in DNS• One IP address online (primary replica)• At failover:
• IP in new primary network on lined• IP in old primary network off lined
27
MultiSubnet Listener
Primary Secondary
OfflineOnline
Requirements• MultiSubnetFailover=True• SQL Native Access Client 11• SQL Client (.NET 4)
MultiSubnetFailover=True• Simultaneous IP Connect
Attempt
MultiSubnetFailover=False• Sequential IP Connect Attempt
MultiSubnet Listener Connection
Secondary
Client Provider Opens Sockets to Both IP Addresses
Primary Secondary
ServerA ServerB ServerC
N1SQLINST1PRIMARY
AG= DB1, DB2AG_LISTEN10.10.10.11
Online ServerA
N2SQLINST2
SECONDARYAG= DB1, DB2
Application Issues to connect Through ListenerUsing MultiSubnetFailover=True
AG_LISTEN11.11.11.12
OfflineApplication Connects to Online IP Address Successfully
MultiSubnet Listener Connection
Secondary
Client Provider Opens Socket to Offline IP Address
Primary Secondary
ServerA ServerB ServerC
N1SQLINST1 AG= DB1, DB2
AGAG_LISTEN
10.10.10.11Online ServerA
N1SQLINST2
AG= DB1, DB2
Application attempts connect using ListenerWithout MultiSubnetFailover Parameter
AGAG_LISTEN
11.11.11.12OfflineConnection Times Out
Increase Connection Timeout and connect
Offline and then Online IP address attemptedConnection Successful
30
AlwaysOn Architecture – Availability Replica Roles
Availability Replica Roles• Primary• Secondary• Resolving
What is Resolving?• Primary and Secondary normal operating roles• Transitional role between primary and secondary roles
What does Resolving mean to availability database?• SQL Server sets databases offline
31
AlwaysOn Architecture – Availability Replica Roles Availability Replica State Transitions
• Normal operating role
Primary_normal
• Online Request from SQL resource DLL• Retrieve configuration from Cluster store
Primary_pending
• Manual or Forced Failover
Resolving_pending_failover
• Unkown or off state
Not_available
• Gate role moving replica to normal operational roles
Resolving_normal
• Normal operating role
Secondary_normal
32
Role Transitions – SQL Server Startup on Primary
AGCluster
N1SQLINST1
AG1 = DB1, DB2
NOT_AVAILABLE
SQL Server Startup
N1SQLINST1
AG1 = DB1, DB2
RESOLVING
N1SQLINST1
AG1 = DB1, DB2
PRIMARY_PENDING
N1SQLINST1
AG1 = DB1, DB2
PRIMARY_NORMAL
33
Role Transitions
AGCluster
N1 N2SQLINST1 SQLINST2
AG1 = DB1, DB2 AG1 = DB1, DB2
RESOLVING RESOLVING
Health Check Detects Failure
PRIMARY SECONDARY
Availability Group RESOLVING
Health Check Detects Health
PRIMARY SECONDARY
SECONDARY PRIMARY
Automatic Failover
Restart resource
Restart Fails
RESOLVING RESOLVING
34
AlwaysOn Architecture – Synchronous Asynchronous Commit
AGCluster
N1SQLINST1
AG1 = DB1, DB2
N2SQLINST2
AG1 = DB1, DB2
SQLINST1 - Primary ReplicaSQLINST2 - Secondary Replica
Transaction Log Block
Last Hardened LSN
Last Redone LSN Harden
Last Hardened LSNLast Redone LSN
Redo LOG
SQLINST1 Buffer Pool
Synchronous Commit Mode
35
AGCluster
AG1 = DB1, DB2
N2SQLINS
T2
AG1 = DB1, DB2
ID Name
1 CurtEmployee
Log Cache
Log Block
1
Begin TranInsert into Employee() values (2, ‘Shon’)Commit
N1SQLINS
T1
T
Log Writer
Harden
Harden
Tlog Tlog
T1
ORLOGLOG
SQLINST1 Buffer Pool
Asynchronous Commit Mode
36
AGCluster
AG1 = DB1, DB2
N2SQLINST2
AG1 = DB1, DB2
ID Name
1 CurtEmployee
Log Cache
Log Block
1
Begin TranInsert into Employee() values (2, ‘Shon’)Commit
N1SQLINST1
T
Log Writer
Harden
Harden
Tlog Tlog
T1
LOGLOG
AlwaysOn Architecture – Backup on Secondary
R/W workload
Primary
Backups
Secondary
Backups
SecondaryBackups
Backups can be done on any replica of a database
Secondary replica may be synchronous or asynchronous
Log backups done on all replicas form a single log chain
Advisable for backups to be stored centrally
Recovery Advisor makes restores simple
Log backups Form single log chain
Secondary 1 Secondary 2
Primary
LSN 1 - 10
Log backup LSN 1 - 10
Log Backup
LSN 11 - 20
LSN 11 - 20
Log Backup
LSN 31 - 40
LSN 31 - 40
Log Backu
pLSN 21 - 30
LSN 21 - 30
Log Backu
p
Log Backu
p
Log Backu
p
Log Backu
p
39
AlwaysOn Architecture
AlwaysOn Flexible Failover Policy
Used for SQL Server Failover Clustered Instance
Used for availability groups
40
Achieving Availability
Number of 9’s Availability Percentage Total Annual Downtime2 99% 3 days, 15 hours3 99.9% 8 hours, 45 minutes4 99.99% 52 minutes, 34 seconds5 99.999% 5 minutes, 15 seconds
High Availability Starts with the Foundation• The principal goal of a high availability solution is to minimize or mitigate the
impact of downtime.
High Availability != Constant Availability
The Availability of a system can be expressed as this calculation:• Actual Uptime / Expected Uptime * 100• The resulting value is often expressed by industry in terms of the number of
9’s that the solution provides
41
Starting with WindowsSQL Server relies upon the Windows platform to provide foundational infrastructure• Services for networking• Storage• Security• Patching• monitoring
Solid Foundation can significantly reduce planned downtime• Most planned downtime caused by operating system patching,
application patching, hardware maintenance, etc. • This can constitute almost 80 percent of the outages in an IT
environment.
42
SQL 2012 Setup GUI on Server Core/UIMODE=EnableUIOnServerCore
Provides GUI Setup on Server
Core Installations
Makes installation
easier
Hyperlinks will not work
43
Rolling Upgrade and Patching
AlwaysOn features facilitate rolling upgrades and patching of instances• Helps significantly to reduce application downtime.
Windows Failover Cluster
• Upgrade Passive Nodes without impact
SQL Server on Hyper-V.
• SQL Server instances hosted in the Hyper-V environment receive the additional benefit of Live Migration, which enables you to migrate virtual machines between hosts with zero downtime.
• Administrators can perform maintenance operations on the host without impacting applications.
44
Mechanisms to Monitor AlwaysOn HealthAlwaysOn Dashboard
AlwaysOn Powershell cmdlets
Performance Monitor
AlwaysOn Policy Based Management
Why • Above and beyond AlwaysOn in-box health monitoring
What• Monitor Quorum and Cluster Health
• Quorum state?• Are cluster members detected?
• Monitor Availability Group, Replica and Database Health• Are availability groups, replicas, databases healthy?
• Monitor Availability Database Performance Health • Use policies to ensure RPO, RTO
45
Dashboard
Monitoring Availability Groups
Health Snapshot
• Windows Cluster• Availability Group• Availability Replica• Availability
Database
46
Availability Group Dynamic Management Views
Mechanisms to Monitor AlwaysOn Health
Cluster and Cluster State• sys.dm_hadr_cluster • sys.dm_hadr_cluster_members
Availability Group, Replica, Database• select * from sys.availability_groups• select * from sys.availability_replicas• select * from sys.availability_databases_cluster
State Information• sys.dm_hadr_availability_group_states • sys.dm_hadr_availability_replica_states• sys.dm_hadr_database_replica_states • sys.dm_hadr_database_replica_cluster_states
47
Availability Database Performance Health
N1SQLINST1
AG1 = DB1, DB2
N2SQLINST2
AG1 = DB1, DB2sync
Log Send Queue
LCT=T-10LCT=T
Redo Queue
Maintain high availability• Monitoring Availability Group ‘healthy’• Accumulating Latency
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
48
Availability Database Performance Health
N1SQLINST1
AG1 = DB1, DB2
N2SQLINST2
AG1 = DB1, DB2sync
RedoAlwaysOn Health Health Failure Detected
Offline Overhead Online OverheadRTO=Detection + Redo + Overhead
Detection• health_check_timeout (30 seconds)• sp_server_diagnostics (1/3 HCT)• lease expiration (20 seconds)
Redo Queue Outstanding
Overhead
Diagnose – Minimize Data Loss
49
Primary
Secondary 2
SYNCHRONIZINGORNOT SYNCHRONIZED
SYNCHRONIZING
Secondary 1
RESOLVING
RESOLVING
SCENARIO
Diagnose Recovery Pending Availability Database
50
Diagnose Availability Groups
Availability Database Suspect or Recovery Pending