High Availability Basics

1

High Availability Options:Data Guard and RAC

Julian Dyke

Independent Consultant

Back to Basics February 2008

juliandyke.com© 2008 Julian Dyke

2 juliandyke.com© 2008 Julian Dyke

Agenda

� Data Guard

� The Theory

� The Reality

� Real Application Clusters

� The Theory

� The Reality


Data GuardThe Theory


Data GuardReasons for Deployment

� Site Failures

� Power failure

� Air conditioning failure

� Flooding

� Fire

� Storm damage

� Hurricane

� Earthquake

� Terrorism

� Sabotage

� Plane crash

� Planned Maintenance


Primary Database Standby Database

Data GuardStandby Database

Primary

Instance

Database

Redo

Database

Standby

Instance

Site 1 Site 2


Data GuardPhysical Standby

� Physical Standby

� Technology introduced in Oracle 7.2

� Marketed as Data Guard in Oracle 8.1.7 and above

� Standby is identical copy of primary database

� Redo changes

� transported from primary to standby

� applied on standby (Redo Apply)

� Can switch operations to standby

� Planned (switchover / switchback)

� Unplanned (failover)

� Failover time dependent on various factors

� Rate of redo generation / size of redo logs

� Redo transport / apply configuration


Data GuardLogical Standby

� Introduced in Oracle 9.2

� Subset of database objects

� Redo copied from primary to standby

� Changes converted into logical change records (LCR)

� Logical change records applied on standby (SQL Apply)

� Standby database can be opened for updates

� Can modify propagated objects

� Can create new indexes for propagated objects

� May need larger system for logical standby

� LCR apply can be less efficient than redo apply

� Array updates on primary become single row updates on standby


Data GuardProtection Modes

� Three protection modes:

� Maximum protection - zero data loss

�Redo synchronously transported to standby database

�Redo must be applied to at least one standby before transactions on primary can be committed

�Processing on primary is suspended if no standby is available

� Maximum availability - minimal data loss

�Similar to maximum protection mode

� If no standby database is available processing continues on primary

� Maximum performance (default)

�Redo asynchronously shipped to standby database

� If no standby database is available processing continues on primary


Data GuardRedo Log Shipping

� ARCH background process

� Copies completed redo log files to standby

� LGWR background process - modes are:

� ASYNC - asynchronous

�Oracle 10.1 and below

� redo written by LGWR to dedicated area in SGA

� read from SGA by LNSn background process

�Oracle 10.2 and above

� redo written by LGWR to local disk

� read from disk by LNSn background process

� SYNC - synchronous

�Redo written to standby by LGWR - modes are:

�AFFIRM - wait for confirmation redo written to disk

�NOAFFIRM - do not wait


Data GuardARCH Redo Transmission

ARC0 ARC1

OnlineRedoLog

LGWR RFS

StandbyRedoLog

ARCn

ArchivedRedoLogs

MRPLSP

StandbyDatabase

PrimaryDatabase

LOG_ARCHIVE_DEST_1

LO

G_A

RC

HIV

E_D

ES

T_2


ArchivedRedoLogs


Data GuardLGWR (ASYNC) Redo Transmission

ArchivedRedoLogs

ARCn

RFS

StandbyRedoLog

ARCn

ArchivedRedoLogs

MRPLSP

StandbyDatabase

PrimaryDatabase

LOG_ARCHIVE_DEST_1


LNSn

LGWR

OnlineRedoLog


Data GuardLGWR (SYNC) Redo Transmission

ArchivedRedoLogs

ARCn

OnlineRedoLog

RFS

StandbyRedoLog

ARCn

ArchivedRedoLogs

MRPLSP

StandbyDatabase

PrimaryDatabase

LOG_ARCHIVE_DEST_1


LNSnLGWR


Data GuardRole Transitions

� There are two types of role transition

� Switchover

�Planned failover to standby database

�Original primary becomes new standby

�Original standby becomes new primary

�No data loss

�Can switchback at any time

� Failover

�Unplanned failover to standby database

�Old primary may need to be rebuilt

�Old standby becomes new standby

�Possible data loss


Standby Database

After Switchover

Data GuardSwitchover

PhysicalStandby

Instance

Database

Redo

Database

Primary

Instance

Primary

Instance

Database

Redo

Database

PhysicalStandby

Instance

Standby Database

Primary Database

PrimaryDatabase

Before Switchover

Site1 Site2 Site1 Site2


Unavailable

After Failover

Data GuardFailover

Primary

Instance

Database Database

Primary

Instance

Primary

Instance

Database

Redo

Database

PhysicalStandby

Instance

Standby Database

Primary Database

PrimaryDatabase

Before Failover

Site1 Site2 Site1 Site2


Data GuardRead-Only Mode

� Physical standby database can be opened in read-only mode

� (Managed) Recovery must be suspended

� Reports can use temporary tablespaces

�Sorts

�Temporary tables

� Reports cannot modify permanent objects

� Failover times may be affected

�Suspended redo must be applied


Data GuardDelayed Redo Application

� Delay in redo application can be configured

� Redo is transported immediately

�Provides protection against site failure

� Redo is not applied immediately

�Provides protection against human error

� Increases potential failover times

� In Oracle 10.1 and above flashback database can be used as an alternative to delayed redo application


Data GuardData Guard Broker


� Stable in Oracle 10.2 and above

� Managed using DGMGRL utility

� Contains Data Guard configuration

� Additional layer of complexity

� Used by Enterprise Manager to manage standby

� Mandatory for some new functionality e.g.

� Fast Start Failover


Data GuardFast Start Failover

Database

Site2Site1

Observer

Site3

Primary

Node 1

Standby

Node 2

Database



� Detects failure of primary database

� Automatically fails over to nominated standby database

� Requirements include

� Flashback logging must be configured

� DGMGRL must be used

� Observer process running in third independent site

�Highly available in Oracle 11.1 and above

� MAXIMUM AVAILABILITY protection mode

�Standby database archive log destination must be configured as LGWR SYNC

� MAXIMUM PERFORMANCE protection mode


� Primary database can potentially be reinstated automatically

� Using flashback logs



� Advantages

� No interconnect network required between sites

� No storage network required between sites

� RAC licences not required if each site is a single-instance

� Disadvantages

� Active / Passive

� Requires Enterprise Edition licence

� Remaining infrastructure must also failover

�Network

�Application tier

�Clients


Data GuardOracle 11g New Features

� Snapshot Standby

� Standby can be converted to snapshot standby

�Can be opened in read-write mode (for testing)

� Redo transport continues

� Redo apply delayed

� Standby can subsequently be converted back to physical standby

� Active Data Guard

� Separately licensed option

� Updates applied to primary

� Changes can be read immediately on standby databases

� Standby database can be opened in read-only mode

�Redo can continue to be applied


Data GuardLicensing

� Standby database nodes must by fully licensed

� Same metric as primary (named user, CPU etc)

� Standard Edition

� Cannot use Data Guard

� Use user-defined scripts to transport redo

� Use Automatic Recovery to apply redo

� Manually resolve archive log gaps

� Enterprise Edition

� Use Managed Recovery to apply redo

� Use Fetch Archive Logging to resolve archive log gaps

� Additional licenses required for Active Data Guard


Data GuardAlternatives


� Manual log shipping using scripts

� SAN level Replication technologies

� Netapp SnapMirror, MetroCluster

� EMC SRDF, Mirrorview

� HP StorageWorks

� Redo log replication technologies

� Quest Shareplex


Data GuardThe Reality



� Many sites run physical standbys

� Well proven technology

� Spare capacity on standby often used for development or testing during normal operations

� Relatively few sites run a logical standby

� Streams is much more popular

� Many sites enable flashback logging

� In both development and production environments

� Very few using Automatic Failover

� Very few sites working with Oracle 11g yet

� Consequently none using Active Data Guard



� Failover times

� Normally dependent on management decisions

�Usually some investigation before failover

� Time to failover database is minimal (5-10 minutes)

� Time to failover infrastructure can be hours

�Network configuration

�DNS

�Application / web servers

�Clients

� Failover SLAs often up to 48 hours

� Rebuild times

� Can take minutes using flashback logging

� Can take much longer depending on reason for failover


RACThe Theory


RACRedundancy

� Single Point of Failure

� If component fails, system will be inaccessible

� Redundancy

� Duplicate components

� If component fails another can be used

� Active-Active or Active-Passive

� Examples include

� Power Supplies

� RAID

� Bonded Networks

� IO Multipathing

� Oracle RAC


RAC4-node cluster

Public Network

SharedStorage

Node 1

Instance 1

Node 2

Instance 2

Node 3

Instance 3

Node 4

Instance 4

PrivateNetwork

(Interconnect)

Storage

Network


RACCache Coherency

� RAC must ensure changes made by any instance

� Are not overwritten by another instance

� Maintain ACID properties

� Current Blocks

� Blocks can be updated by any instance

� Only current version of a block can be updated

� Only one current version of a block can exist across all instances

� Consistent Read Blocks

� Can have theoretically unlimited number of consistent versions of a block

� in each instance

�across all instances


RACCluster Manager

� All clusters must have cluster management software

� Manages node membership and evictions

� Oracle Clusterware

� Mandatory for RAC in Oracle 10.1 and above

� Known as Cluster Ready Services (CRS) 10.1 only

� Can be combined with vendor clusterware

� IBM HA/CMP

�HP ServiceGuard

�Sun Cluster

� Must be running before ASM/RDBMS instances can be started on a node

� Can be used with non-RAC databases and applications



RACInterconnect

� Used for inter-node communication by:


� ASM Instances

� RDBMS Instances

� Optimally high bandwidth / low latency

� Typically 1GB Ethernet

� Uses TCP / UDP protocols

� NIC interfaces often bonded for availability

� Other physical networks supported e.g. Infiniband


RACShared Storage

� Required for

� Oracle Clusterware Files

�Oracle Cluster Registry (OCR)

�Voting Disk

� Database Files

�Control Files

�Database

�Online Redo Logs

�Server Parameter File

� Strongly recommended for

� Archived redo logs

� Backup copies


RACShared Storage

� Can use

� Storage Area Network (SAN) e.g.:

�EMC Clariion / Symmetrix

�HP MSA / EVA / XP series

�Hitachi

�Fujitsu

� Network Attached Storage (NAS) e.g.:

�Network Appliance

�Pillar Data System

�Sun StorageTek

�EMC Celerra

� JBOD (with ASM)


RACShared Storage

� Fibre Channel

� SCSI protocol - block based

� Normally 2GB or 4GB

� Requires one or more Host Bus Adapters (HBA) per node

� Requires fabric switches

� iSCSI

� SCSI protocol - block based

� Packets sent over dedicated IP network

� Can use standard network components

� Processing often offloaded to NIC firmware

� NFS

� File-based

� Uses standard network components


RACShared Storage

� Cluster-aware File Systems:

� Automatic Storage Management

� Cluster File Systems

�Oracle Cluster File System (OCFS/OCFS2)

�Red Hat GFS

� IBM GPFS

�Sun Storedge QFS

�Veritas CFS

� Network File System

�On supported Network Attached Storage only


RACAutomatic Storage Management (ASM)


� Additional functionality in 10.2 and 11.1

� Generic code (all supported platforms)

� Available for both single-instance and RAC databases

� Provides shared storage for RAC

� Can optionally provide mirroring:

� Normal Redundancy (mirrored)

� High Redundancy (triple mirroring)

� Useful with JBOD or extended clusters

� Mandatory for Oracle 10g Standard Edition RAC

� Presents storage as disk groups containing

� Physical disks

� Logical files

� Requires additional ASM instance on each node


RACLicensing


� RAC option free

� Maximum two nodes

� Maximum four CPUs

� Must use Oracle Clusterware

� Must use Automatic Storage Management (ASM)

� No extended clusters

� Enterprise Edition

� RAC option 50% extra (per EE license)

� No limit on number of nodes

� No limit on number of CPUs

� Can use any shared storage (ASM, CFS or NFS)

� Can use Enterprise Manager Packs (Diagnostics, Tuning..)


Node 1 Node 2

RACProcess Architecture

Clusterware

OPROCD OCSSD CRSD EVMD

+ASM1

PMON SMON LGWR DBWn ARCH

LMON LCK0 LMD0 LMSn DIAG

PROD1



Clusterware

OPROCD OCSSD CRSD EVMD

+ASM2



PROD2




RAC Reasons For Deployment

� Availability

� Node failure

� Instance failure

� Scalability

� Distribute workload across multiple instances

� Scale out

� Manageability

� Economies of scale

� Administration / Monitoring / Backups / Standby

� Reduction in total cost of ownership

� Database consolidation

� Commodity hardware


RACAvailability

� Ensure continued availability of database in event of node or instance failure

� Automatic failover

� No human intervention required

� In the event of node or instance failure:

� All sessions connected to failed node are terminated

� Sessions connected to remaining nodes are

� temporarily suspended while resources are re-mastered

� resume after brown-out period

� New sessions will be connected to remaining nodes only

� Ensuring availability requires spare capacity during normal operations

� Either additional node

� Or reduction in service level


RAC Availability

Public Network

SharedStoage

Node 1

Instance 1

Node 2

Instance 2

Node 3

Instance 3

Node 4

Instance 4

PrivateNetwork

(Interconnect)

Storage

Network


RACScalability

� Workload can be distributed across multiple nodes

� Workload can be balanced across all nodes using connection management

�Client-side using Oracle Net

�Server-side using listener processes

� Workload can be directed to specific nodes using services

� Level of scalability dependent on application

Re

so

urc

es

Re

so

urc

es

Throughput Throughput


RACScalability

� Factors that can degrade scalability

� Excessive parsing

� Consistent reads

� SELECT FOR UPDATE / user defined locking

� DDL

� Object-oriented code

� Features that can improve scalability

� Services

� Automatic Segment Space Management

� Partitioning

� Sequences

� Reverse indexes


RACManageability

� Advantages

� Consolidation

� Economies of scale

�Administration

�Monitoring

�Backup and recovery

�Standby database

� Disadvantages

� Increased Planned downtime

� Complexity

� Dependencies

� Skills


RACTotal Cost of Ownership

� Benefits

� Lower hardware costs - commodity hardware

� Lower support costs

� Management economies of scale

� Costs

� Redundant hardware

�Servers, Storage, NIC, HBA, Switches, Fabric

� Oracle licenses

� Experienced staff

� Application modifications


RACApplications

� Most applications should run on RAC without modification

� Performance is not guaranteed

� Applications that perform well in single-instance have best chance of scaling in RAC

� Applications performing badly in single-instance will perform worse in RAC

� Some features do not port easily to RAC e.g.:

� DBMS_ALERT, DBMS_PIPE, External files

� Applications that can be logically partitioned tend to scale best

�Minimize use of interconnect

�Maximize use of buffer caches

� Implementation more likely to succeed if you have direct or indirect access to source code


RACDatabase Services

� Allow sessions with similar workload characteristics to be logically grouped and managed

� Services can be assigned to

� set of preferred instances - used if available

� set of available instances - used if preferred instances not available

� failover to available instances is automatic

� failback to preferred instances is manual

� Services can be configured to maximize instance affinity

� Limited statistics reported at service level

� Can also be reported at service / module / action level

� Trace can be enabled at service level

� Can also be enabled at service / module / action level


RACDatabase Services

Listener1

PROD1

Listener2

PROD2

SERVICE1 AVAILABLEPREFERRED

PROD2PROD1

SERVICE1 SERVICE1

SERVICE1

Listener1

PROD1

Listener2

PROD2

SERVICE1SERVICE1

AfterBefore


RACExtended Clusters

� Currently the Holy Grail of high availability

� RAC nodes located at physically separate sites

� Implicit disaster recovery

� Requires Enterprise Edition licences + RAC option

� In the event of a site failure, database is still available

� Storage is duplicated at each site

� Can use ASM or vendor-supplied storage technology

� Active / Active configuration

� Users can access database via either site

� Configuration and performance tuning are complex

� Cache fusion traffic between sites


RACExtended Clusters

Storage Network

Public Network

Database

Storage Network

Site2Site1

Private Network

Quorum

Site3

Instance 1

Node 1

Instance 2

Node 2

Database


RACDisaster Recovery

� Data Guard and RAC are fully compatible

� Can configure any permutation e.g.

RACSingle instance

RACRAC

Single instanceRAC

Single instanceSingle-instance

StandbyPrimary

� All instances can participate in redo log shipping

� Only one instance can perform managed recovery

� Standby database might be a potential bottleneck


RACAlternatives

� Single Instance Databases

� No RAC overhead

� Simpler to install / configure / manage

� Single point of failure

� Oracle Products

� Oracle Streams


� Proprietary Clustering Solutions

� HP ServiceGuard

� IBM HA/CMP

� Sun Cluster


RACThe Reality


RACThe Reality

� Many sites running RAC

� Mostly Oracle 10.2

� A few still running Oracle 10.1

� Still some Oracle 9.2

� Most RAC users develop their own applications or use bespoke applications developed by a third-party

� Probably around 20 extended clusters in production across Europe

� Many Oracle 10.2 sites run ASM

� Very few run OCFS or raw devices

� Very few use third-party cluster file systems

� Most sites using SAN - fewer using NAS

� In UK most users currently deploy on Linux x86-64

� Solaris very popular in other regions


RACThe Reality

� Few Oracle 10g users run vendor clusterware

� Most RAC deployments for availability� Decreased unplanned downtime� Increased planned downtime

� Increasing number of deployments for scalability� Workload balancing� Services

� Manageability benefits very doubtful� Economies of Scale versus Additional complexity

� TCO reductions possible in some circumstances� Replace large SMP boxes� Replace legacy active-passive clusters


RAC The Reality

� Most users run 2-node clusters

� Some have 3-node or 4-node clusters

� A handful run five nodes or more

� Most users only have one database per cluster

� Few grids

� Oracle Clusterware scales well

� Number of nodes does not impact performance

� Oracle RAC databases might scale well

� Dependent on application

� Additional nodes may improve or degrade performance


RACThe Reality

� ASM currently the most popular RAC storage technology

� Deployed in numerous Oracle 10.2 RAC production systems

� No operating system utilities

� ASMCMD in Oracle 10.2 and above

� Generally disliked by storage administrators

� Too much control to DBAs

� Acceptable performance

� ASM instance provides metadata

� RDBMS instances read and write blocks directly from files


Thank you for listening� Acknowledgements

� Dave Burnham (Scalabilities)

� Dev Nayak (DSP Global)

� Jason Lester (DSP Global)

� Phil Grice (Joraph Consulting)

� Larry Carpenter (Oracle)

� References

� http://www.juliandyke.com/References/References.html

� Questions

� [email protected]

High Availability Basics

Documents

Transcript of High Availability Basics