Daniela Milanova - BGOUG · RMAN_DataGuard_10g_wp.pdf. Data Guard or Remote Mirroring?

68

Transcript of Daniela Milanova - BGOUG · RMAN_DataGuard_10g_wp.pdf. Data Guard or Remote Mirroring?

Daniela MilanovaSenior Sales Consultant

Oracle Disaster Recovery Solution

What is Data Guard?

• Management, monitoring and automation software infrastructure that protects data against failure, errors, and corruptions of the database

• Automates the process of maintaining a copy of a Oracle production database (standby database)

Data Changes

PrimarySite

Clients Clients

StandbySite

PrimaryDatabase

StandbyDatabase

Data Guard Architecture

Services types:• Log transport services• Log apply services• Role-management services

Software Data Guard Requirements

• Same release of Oracle Database Enterprise Edition must be installed for all databases

• Incase of using ASM/OMF, all should use the same combination

Hardware an OS Data Guard Requirements

• The hardware can be different for the primary and standby database

• The operating system and platform architecture for the primary and standby databases must be the same

• The operating system version for the primary and standby databases can be different

• In case of all databases are on the same system, OS must allow mounting more than one database with the same name.

Data Guard At the Highest Level

• Data Guard comprises of two parts• REDO APPLY

• Maintains a physical, block for block copy of the Production (also called Primary) database.

• Can be open in Read Only mode for short time reporting

• SQL APPLY• Maintains a logical, transaction for transaction

copy of the Production database.• Can be open in Read Write for reporting

purposes and cloning activities

REDO Apply Architecture

• Maintains a ‘Physical’ block for block copy of the Primary Database

Physical Standby DatabaseAsynchronous/Synchronous

Redo Shipping

PrimaryDatabase

Network

DIGITAL DATA STORAGE

DIGITAL DATA STORAGE

Backup

Redo Apply

MRP

SQL Apply Architecture

Logical Standby Database

ContinuouslyOpen for Reports

SQLApply

TransformRedo to

SQLAdditional

Indexes and Materialized

Views

• Maintains a ‘Logical’ transactional copy of the Primary Database

Asynchronous/Synchronous

Redo Shipping

PrimaryDatabase

Network

Data Changes

PrimarySite

Clients

ReportingClients

StandbySite

PrimaryDatabase

LogicalStandbyDatabase

Data Protection & Disaster Recovery Solution with Reporting Capability

Data Changes

StandbySite

PhysicalStandbyDatabase

DataGuard

Data Guard Data Protection Modes

• Maximum protection• No data loss• In case of failure remote writting the primary

database is shutsdown

• Maximum availability• No data loss• In case of failure remote writting the primary

database works in maximum performance

• Maximum performance• Highest possible level of data protection• No affecting performance of the primary database

Data Guard Role Transition• Oracle Data Guard supports two role-

transition operations• Switchover

• Planned role reversal• Used for OS or hardware maintenance• No data loss

• Failover• Unplanned role reversal• Use in Emergency• Zero or minimal data loss depending on choice

of data protection mode

Existing Site Recovery Tradeoffs

• Log apply may be delayed to protect from user errors but:• Switchover/Failover gets delayed• Reports run on old data

• After failing over to standby, production DB must be rebuilt

Primary Database Standby DatabaseRedo

Shipment

DelayedApply

Reporting on delayed data

Enhanced DR with Flashback Database

• Flashback DB removes the need to delay application of logs• Flashback DB removes the need to reinstantiate primary after

failover• Real-time apply enables real-time reporting on standby

Real Time Apply

No Delay!

Real TimeReporting

Flashback Log

Flashback Log

Primary: No reinstantiation after failover!

RedoShipment

Primary Database Standby Database

Rolling Database Upgrades

• In Oracle Database SQL Apply provides the starting point for performing rolling upgrades of the Oracle RDBMS software and database with minimal interruption of service.

• By utilizing a Logical standby database customers can upgrade one database while running on the original production database and then run in a mixed version environment before returning to the original, but upgraded, configuration!

SQL Apply – Rolling Database Upgrades

Major ReleaseUpgrades

Patch SetUpgrades

Cluster Software & Hardware Upgrades

Initial SQL Apply Config

ClientsRedo

Version X Version X

1

BA

Switchover to B, upgrade A

Redo

4

Upgrade

X+1X+1

BA

Run in mixed mode to test

Redo

3

X+1X

A B

Upgrade node B to X+1

Upgrade

LogsQueue

X

2

X+1

A B

Benefits of Oracle Disaster Recovery Solution

• Disaster recovery and high availability • Complete data protection • Efficient utilization of system resources • Flexibility in data protection to balance

availability against performance requirements

• Automatic gap detection and resolution • Centralized and simple management • Integrated with Oracle database

Ease of Use

• New and Improved Data Guard Manager!• Monitoring SQL Apply

• Unsupported Storage Attributes• Applied Logs and Apply Progress

• Managing the Logical Standby• Bypassing the Guard• Skipping Table Redo• Skipping Failed (and subsequently

fixed) Transactions

New Data Guard Feature:Fast-Start Failover

Automatic and fast• Physical and Logical standby each complete

failover in less than 20 seconds• Old primary is reinstated automatically once

connectivity is re established between Observer and primary database

Data Guard Best Practices:Switchover for Planned MaintenanceFor fastest switchover (< 1 minute)• Prior to switchover

• A physical standby transitioning from read-only back to Redo Apply should be restarted

• Disconnect all sessions and stop job processing• Shutdown abort for all secondary RAC instances on both primary

and standby databases• Enable real-time apply on the standby database and ensure the

standby is synchronized with the primary database• For switchovers using SQL or command line interface,

open the new primary directly from the mount state• Or, simulate a Fast-Start Failover - complete transactions

and shutdown abort all primary instances

Data Guard Best Practices:Faster Redo Transport

• Set SDU=32K• Tune network parameters that affect network

buffer sizes and queue lengths• Ensure sufficient network bandwidth for peak

database redo generation rate + other activities

http://www.oracle.com/technology/deploy/availability/pdf/MAA_DG_NetBestPrac.pdf

Data Guard Best Practices:Tune Network Parameters

• Send and receive buffer size = 3 x bandwidth delay product (BDP)

BDP = the product of the estimated minimum bandwidth and the round trip time between the

primary and standby serverBDP = 1,000 Mbps * 25ms (.025 secs)

= 1,000,000,000 * .025= 25,000,000 Megabits / 8 = 3,125,000 bytes

• Tune network device queues to eliminate packet losses and waits. Set device queues to a minimum of 10,000 (default 100)

Impact of Network Tuning

Test Results - Oracle Database10g Release 1 & 2Test Results - Oracle Database10g Release 1 & 2

Data Guard or Remote Mirroring• Remote Mirroring (host-based and storage-based)

is another way to protect enterprise data

• However:• What about Data Reliability?• What about Data Recoverability?• What about Data Availability?• What about Cost?

• A well-designed Business Continuity Plan must consider these critical issues in addition to simple data protection

1. Better Network Efficiency- Transmits only redo data

- Remote mirroring solutions: datafiles, archivelog files, redolog files must be mirrored

2. Better suited for WAN-s– Fibre/ESCON-based mirroring solutions have an intrinsic distance

limitation– Protocol converters needed – adds to the cost, complexity and

latency– Data Guard based on standard TCP/IP– Data Guard doesn’t have to deal with protocol converters, extra cost

and latency issues

3. Better Data Protection– Data Guard enables zero data loss– Preserves write-order consistency– Avoids logical and physical corruptions– Both SQL Apply and Redo Apply validates redo data before applying

Data Guard is the Preferred Solution

Data Guard is the Preferred Solution4. Higher Flexibility

– Data Guard based on commodity hardware– Does not force lock-in with storage vendors– Remote mirroring solutions typically need identically configured

storage from the same vendor

6. Higher ROI– Provides more value for DR investment

� Standby database can be opened read-only or read-write � Allow backups to be offloaded on the standby database � Allows reporting/queries using the standby database

– Integrated natively with other HA features (RAC, RMAN, etc.)– No extra cost

5. Better Functionality– Data Guard is a comprehensive DR solution:

� Redo Apply/SQL Apply� Flexible protection modes� Push-button switchover/failover� Graceful handling of network connectivity problems

Data Guard and Remote Mirroring -Summary

• For protecting Oracle data, Oracle Data Guard’s integrated disaster recovery solution involving standby databases is preferred to remote disk mirroring:

• For technical reasons• For business reasons

• Remote mirroring may be used to protect non-Oracle database data that are changing frequently:

• File system data• Data in databases that are not Oracle

Competitive Strengths vs. SharePlex• SharePlex

• Redo log-based replication tool from Quest software

• Heavy front-end processing to extract transaction information from the primary redo logs

• Somewhat similar to Data Guard SQL Apply

• It doesn’t make sense for customers to use SharePlex:

No support because of architecture limitationsSupportedZero Data Loss

Much moreMinimalPrimary system overhead

Limited integrationIntegrated with RAC, RMAN, Flashback, …Integration with HA features

At best a replication solutionComprehensive and integrated DR solutionDR

Based on unpublished and unsupported interface1Native feature of the databaseFeature support

ExpensiveFreeCost

SharePlexData Guard

1. See MetaLink Note 97080.1

10g New Features and Best Practices

Data Guard Release 10.2Redo Transport Improvements• Increased network write sizes to 10 MB to better

utilize network capacity for both ARCH and LNS• LNS can potentially write 10MB or less

• Full decoupling of LGWR and LNS processes• No more waits during log switches• No more waits when LNS buffer is full

• Intra-file parallelism support for ARCH• Up to 29 parallel remote archive processes

1GB/100Mbps/25msRTT1GB/100Mbps/25msRTT

Data Guard Best Practices:Gap Resolution and Data Loss• For fastest gap resolution

• Leverage intra-file archive parallelism (MAX_CONNECTIONS attr)• Follow tips for tuning redo transport to improve network

utilization• To minimize data loss

• For a low latency, high bandwidth network, use SYNC transport• For high latency or low bandwidth networks, use ASYNC to

minimize primary database performance impact• Follow tips for tuning redo transport• Example: Less than 7 seconds of data loss exposure for

high redo rates of 2-12 MB/sec with <=25 ms latency in our tests

Data Guard Best Practices:Reduce Overhead on Primary

Performance Gains with 10g Release 2 ASYNC Transport• For redo rates less than 2 MB/sec, there is less than 5%

impact on the primary database across different latencies

• For very high redo rates of 20 MB/sec, less than 10% impact on primary database even with latencies of 50 and 100 ms

• Primary database performance impact was 2-3 times less with the new ASYNC transport compared to previous releases

Best Practice• Allocate additional I/O bandwidth for Online Redo Log

Files

Data Guard Best Practices:Using Standby for Backups

Offload Backups to Physical Standby Database• Eliminate backup overhead on primary database• RMAN allows for backup operations while Redo Apply is

in progress Best Practices

• For simplicity, use identical directory structures on the primary and standby databases

• Use RMAN Recovery Catalog so that backups taken on one database server can be restored on another

• Use a catalog server physically separate from primary and standby sites

• Reference MAA RMAN/Data Guard best practices paperhttp://www.oracle.com/technology/deploy/availability/pdf/

RMAN_DataGuard_10g_wp.pdf

Data Guard or Remote Mirroring?

• Data Guard SYNC transport has less overhead on the primary database

Load 200txns/sec & Redo rate 1.1 MB/sec

Data Guard Advantage Because …• Data Guard only transmits redo. A remote

mirroring solution must transmit all database writes• A remote mirroring solution needs to transmit the

following writes: LGWR - log writer, DBWR – database writer, ARCH - archiver, RVWR – flashback log writer, and foreground direct writes

• Both DBWR and LGWR are affected by network latency in a remote mirroring solution. In contrast, only LGWR is impacted by network latency in a Data Guard solution• Higher wait times for DBWR can be very etrimental to

performance, causing contention for free buffers and an increase in buffer busy waits

Some customer references …

First American Real Estate Solutions

• Nations largest source of Real Estate data• 100 million properties• Online services for 50,000 clients

• Lenders, Information Resellers, Government, Utilities, Corporations, Appraisers, Agents & Title Companies

• Thousands of concurrent online users at peak

• www.firstamres.com

HA/DR Requirements• High Availability: 24x7 - 365 days/year

• Limited instances of planned downtime once/quarter

• Recovery Point Objective (RPO) - maximum data loss

• Oracle9i: 10MB for computer failure, 200MB for site failure

• Recovery Time Objective (RTO) for Oracle Database• Oracle9i: 10 minutes for computer failure, 1 hour for site

failure

• Oracle Database 10g goals• RPO: zero data loss for computer failure, 10MB for site

failure• RTO: zero downtime for computer failure, 10 minutes for

site failure

First American Oracle 9i HA/DR Architecture

Primary Database

Local Standby #1Data GuardLGWR AsynchronousRedo Shipping

Remote Disaster Recovery Site

PrimaryProductionSite

Local Standby #2Data GuardDelayed Apply(30 minutes)LGWR AsynchronousRedo Shipping

RemoteStandby #3

Data GuardArchive Log Shipping (ARCH)

1500 miles >

Looking Ahead to Oracle Database 10g

• Real Application Clusters• Transparent failover on node failure, zero data loss

• Flashback Technologies• Flashback Database & Flashback Table • Protect/repair for logical corruptions

• Enhanced LGWR ASYNC redo transport• Improve RPO for remote DR site

• Real Time Apply• Improve RTO

First AmericanOracle Database 10g Architecture - Plan

Primary DatabaseReal Application Cluster

Data GuardLGWR Asynchronous

redo shipping

Remote DisasterRecovery Site

Primary ProductionSite

Standby DatabaseData Guard

1500 miles >

First AmericanOracle Database 10g Benefits

• Higher Availability –transparent node failover• RAC for HA, Data Guard for DR

• Better remote data protection • ASYNC enhancements = less compromise on WAN

• Better protection against logical corruption• Fewer databases, surgically repair vs full point in time

• Less downtime• Faster failover, quicker repair of logical corruptions

Oracle CorporationGlobal Single Instance (GSI)

• A key enabler in Oracle saving $1 billion annually• Consolidation: 1 is the magic number

• Versus 75 separate implementations of Oracle Apps • Versus 100’s of Oracle databases world wide

• Oracle E-Business Suite• 7,000 concurrent users• 5.5TB Oracle database• www.oracle.com

Oracle Global Single InstanceHA/DR Requirements

• HA requirement• Continuous operation regardless of component failure

• DR requirement• Protect against site failure, physical & logical corruption• RPO – 5 minutes of transactions• RTO – database failover in less than 1 hour

• High workload – OLTP system• 8.2MB/sec redo generation at peak, 2.5MB/sec sustained

• WAN, dual OC12• 1,000 miles of separation, 25-35ms RTT network latency

Oracle Global Single InstanceHA/DR Architecture

Primary Database

Data GuardLGWR Asynchronous

redo shipping

Disaster Recovery Site(4) SUN F12Ks

DR domain 8 CPU’s eachDevelopment & Test domain: 28 CPU’s each

GSI Production Site(4) SUN F12Ks36 CPU’s each

Standby Database(4 hour delayed apply)

1,000 miles >

Utilization of Standby Resources

• Four node Standby Cluster• 2 domains: DR, Development & Test• DR domain has sufficient capacity to maintain standby

database and execute failover• At Failover time:

• Failover is executed, standby assumes primary role • Development & Test is stopped• CPU’s are re-allocated to the new production domain• Nodes are upgraded in a rolling fashion with no

application downtime

Delayed Apply – Downtime Avoided

• Human error caused logical corruption on primary• 160,000 row table updated by mistake

• Standby database configured with 4 hour delayed apply

• Instead of 10 hours of downtime, just 30 minutes• Cancel recovery on standby and open read only• Stop the affected application on primary• Export data from standby• Recreate table on primary, import data to primary db after

disabling triggers• Restart application on primary• Restart recovery on standby

Oracle Global Single InstanceOracle Database 10g Feature Adoption

• Flashback Technologies• Flashback Table• Flashback Database

• Data Guard 10g• Real Time Apply• Asynchronous Redo Transport enhancements• Redo Apply performance enhancements

• Benefits• Faster failover, better data protection

Ohio Savings Bank

• Founded in 1899• In Top 20 of all US Mortgage Lenders• Provide mortgage services to independent brokers

nationwide via Web• $13 billion in assets• Reputation for Innovation

• 2002 Web Site of the Year (Mortgage Technology Magazine)

• www.ohiosavings.com

HA/DR Requirements

• 24 x 7 - 365 days/year

• Recovery Point Objective: zero data loss

• Recovery Time Objective: 30 minutes

• Planned maintenance windows Sunday mornings

Ohio Savings BankOracle9i Architecture

Primary Database

Data GuardArchive Log Shipping (ARCH)

3rd party storage based synchronous disk mirroring

for online logs

Remote DR SiteHP N-Class PA-RISCEMC SymmetrixSAN attachedHP-UX v11.0

Primary Production2-node RAC ClusterHP N-Class PA-RISCEMC SymmetrixSAN attachedHP-UX v11.0

15 miles >

Online Mortgage Services

Ohio Savings BankOracle Database 10g Architecture

Primary Database

Data Guard Maximum Availability synchronous redo shipping

Zero Data Loss 15 miles >

Primary Production3-node RAC ClusterHP DL-380, 2 Zeon CPUs/nodeEMC Symmetrix& ClariionSAN attachedRed Hat Linux

Standby Database

Remote DR Site3-node RAC ClusterHP DL-380, 2 Zeon CPUs/nodeEMC Symmetrix& ClariionSAN attachedRed Hat Linux

Customer Call Center

Ohio Savings BankOracle Database 10g Features Deployed

• Automatic Storage Management• Reduces time spent managing storage

• RMAN Flash Recovery Area• Fully automates disk-based backup & recovery

• Oracle Data Guard• Zero Data Loss• Replaces 3rd party remote mirroring• Standby DB also used for daily exports

Ohio Savings BankAutomatic Storage Management

• Automatically spreads database files across all available storage

• Automatic rebalancing of used disk space when disks are added or removed

• Increases I/O distribution beyond disk array striping

• Reduces DBA workload

Ohio Savings Bank, Future PlansGRID – from concept to reality

• Add nodes to the existing RAC 10g cluster

• Manage cluster via a single system view

• Add mortgage database, and potentially the OSB Data Warehouse to same RAC 10g cluster

• Define application workloads as services

• Establish rules to dynamically allocate processing resources to services

• Maximize the utilization of resources while meeting changing business needs

Oracle Disaster Recovery Solution

Includes as Oracle Products:• Oracle Database Enterprise Edition

on both sites

Oracle Maximum Available Architecture

PrimarySite

RAC based

Clients Clients

SecondarySite

Application Servers

Oracle Maximum Availability Architecture

Instance1 Instance2 Instance1 Instance2Data Guard

Application Servers

WAN TrafficManager

Dedicated Network

hbhb

hbhb

Resources

• Maximum Availability Architecture white papers: http://otn.oracle.com/deploy/availability/htdocs/maa.html

• New SQL Apply Best Practices Paper now available!

• HA Portal on OTN: http://otn.oracle.com/deploy/availability

• Data Guard home page on OTN: http://otn.oracle.com/deploy/availability/htdocs/odg_overview.html