Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format...

73
Database Tables to Storage Bits: Data Protection Best Practices for Oracle Database Ashish Ray, Senior Director, Product Management, Oracle Gurmeet Goindi, Principal Product Manager, Oracle Gagan Singh, Senior DBA, Intel

Transcript of Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format...

Page 1: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

Database Tables to Storage Bits: Data Protection Best Practices for Oracle Database Ashish Ray, Senior Director, Product Management, Oracle Gurmeet Goindi, Principal Product Manager, Oracle Gagan Singh, Senior DBA, Intel

Page 2: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 2

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 3: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 3

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 4: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 4

Quick Show of Hands

How many Database administrators? How many Storage administrators?

Neither? Example, Managers?

Page 5: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 5

Oracle Database Architecture Logical View Physical View

Memory

Storage

Page 6: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 6

Logical and Physical Database Structures

Tablespace

Datafile 1

8k 8k 8k 8k

8k 8k

Datafile 2

8k 8k 8k 8k

8k 8k

Extent Extent Segment

Tablespace

Table B

Segment

Table A

Segment

Page 7: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 7

Summary

Segments Extents Oracle Data Blocks

Disk Blocks

Compute Storage

Logical and Physical Database Structures

Page 8: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 8

The Anatomy of Storage Infrastructure

NAS/SAN

ORACLE RAC

Storage controllers/heads are specialized servers Running hundreds of

thousands of lines of code Most setups require

redundant configuration of controllers, disk shelves, switches, etc.

FC

Client IO

CPU Complex

Storage IO

Ethernet Fibre Channel

PCI Express

PCI Express

SAS Exposed to similar failures as your server hardware and software

Page 9: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 9

The Anatomy of a Storage Controller

FC

Client IO

CPU Complex

Storage IO

Ethernet Fibre Channel

PCI Express

PCI Express

SAS

Client traffic comes in at 10GbE or 4-8Gb FC Storage backend is usually slower Further, one client request usually results in

multiple IO requests to drives A write is only acknowledged once written to

disk or some form of Non Volatile Memory Data flows through many interfaces, each

interface can have bugs, add corruptions, etc. For performance:

– Add more drives and/or add faster drives – Introduce Flash in the architecture

Page 10: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 10

So … What About Flash in Storage Architecture

What’s special about Flash – Performance – No moving parts – Expensive, but getting cheaper

Server Attached Flash – PCI Express Attached IO cards – Extremely fast, improves performance for currently active data

Flash in Storage Controller – Flash is faster than Hard Disk Drives – Flash in server is faster than Flash in the storage controller

NAS/SAN

Page 11: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 11

Server Attached Flash

Typical deployment: – Standalone x86 server(s) – Single instance (non-RAC) databases – 1-10 TB local Flash – Database stored on Flash – Ex: HP DL580 G7 + Fusion-io

ANY component failure = loss of database access – CPU, memory, O/S, Flash, etc. – Local second server with identical Flash recommended ($$$ x 2)

Standalone Servers with Local Flash Cards

Good for performance,

Not so good for HA

Page 12: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 12

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 13: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 13

Storage Solutions for Data Protection Traditional Approach to Datacenter Disruptions

Backups & Snapshots Human Errors

Storage Failures Data Corruptions

Storage Replication Systemwide Failures

Site or Network Failures Natural Disasters

Page 14: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 14

Storage Solutions for Data Protection Point-in-time, read-only logical copy of

your data Space efficient because only changed

blocks are stored Multiple implementations:

Snapshots Overview Copy on write

Redirect on write

Changed Block

Original Block

Copy on write Redirect on write Array copies the original block before changing it

Writes are directed to new blocks

High write overhead Low write overhead

EMC BCV NetApp Snapshots

Page 15: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 15

Storage Solutions for Data Protection

Snapshots are application data-format unaware and cannot validate application data Snapshots reside on the same array as the source data, so they are

vulnerable to failures that affect the storage array Snapshots are rendered useless in a data loss or corruption scenario A corrupted block can potentially affect a series of Snapshots Reconstruction has the same performance penalty as that of a full copy Restoring a Snapshot can invalidate all Snapshots taken after it

Snapshots Limitations

Here is the rub: Snapshots are NOT Backups !!

Page 16: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 16

Fibre Channel

Storage Solutions for Data Protection Synchronous Mirroring

Host

Storage Array

Storage Array

Host

Primary Site

Remote/DR Site Update made to a data volume on primary site is

synchronously replicated to a data volume on secondary site The data volume on the secondary site is not mountable

for most implementations The distance between the sites is typically less than

100km, though the technology limits it to 200km Deploying FC links is expensive and complicated.

Bandwidth is usually shared by multiplexing Multiple FC channels using a DWDM switch($$)

Page 17: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 17

Storage Solutions for Data Protection Synchronous Mirroring: Limitations and Challenges

Physical Limitations – Cable degrades the signal, the signal can only be transmitted in the range of ~200km

without regenerating – Latency for light travelling in a fiber cable is 1 ms for every 100 km – FC is an acknowledgement-based protocol, hence latency if transmitting each frame

will have at least 2 ms latency – An IO request from an application will involve transmitting multiple such frames

Protocol Limitations – FC uses credit based algorithm for flow control (buffer credits) – A transmit port can send only as many frames as the number of available buffer credits – Ineffective management of buffer credits will affect availability and performance

Page 18: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 18

Storage Solutions for Data Protection Asynchronous Storage Replication

Host

Storage Array

Storage Array

Host

Primary Site

Remote/DR Site

An Update made to a data volume on the primary site is transmitted to a data volume on the secondary site at a later point in time The data volume on the secondary site might be

mountable Can easily support distances greater than 500 km Not the same level of protection as synchronous mode The link between the two sides is typically an IP based

WAN link (FCIP) In case of FCIP the networking gear must handle FC

over IP, and also have advanced QoS features

WAN

Page 19: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 19

In the Event of a Disaster Remote system will detect an outage The storage resources at the remote site will have to be

enabled for writes The Remote host will perform the required procedure to

access the storage. For example a UNIX host will need to do the following:

Import the Volume group Activate the logical Volumes Sanity check the Files System (fsck) Mount the Volumes Restart the application

Host

Storage Array

Storage Array

Host

Primary Site

Remote/DR Site

Restoring to Normal Operations can take

hours

Page 20: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 20

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 21: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 21

NAS SAN

Storage

Standby

Compute Server

NAS SAN

Storage

Storage Level Data Protection for Database

STORAGE MIRRORING

Primary

Compute Server

Oracle Instance

Compute Server

Oracle Instance

Compute Server

Oracle Instance

NO Oracle

Instance

Mirrors bits, not information

Page 22: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 22

Database Recovery in Event of a Disaster

Compute Server

NAS SAN

Storage

Standby

Activating Storage Replica – Failure Detection – Follow Vendor specific process to

bring DR storage online (can take hours)

Open database for applications

Oracle Instance

Bring database to a consistent state – Roll Forward (redo) – Transaction Recovery

Page 23: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 23

Storage Remote-Mirroring: Fundamental Flaw Zero Oracle Awareness, Mirrors Every Write for Real-Time Protection

Recovery files

Recovery files

Remote mirror all changed blocks

SYNC or ASYNC

Primary Database Remote Volumes Oracle Instance (in memory)

Oracle data files

Oracle data files

Page 24: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 24

Protection From Localized Failures Silent corruptions and disk failures

Compute Server

NAS SAN

Storage

Standby Oracle Instance

Bit Error Rates for memory, media and network elements have remained almost constant over last few years The density of bits packed in those media has

increased. The rate of processing these bits has also increased Hence the probability of corrupting these bits

keeps increasing. HBA and disk drives all have firmware and

firmware have bugs Disk Drive firmware sometimes misrepresent the

state of a write – Lost Writes

Page 25: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 25

Corruption Propagation What is my exposure ??

Only Application aware data consistency check can guarantee end to end data integrity.

Eg: Oracle Data Guard

Storage Remote Mirroring… blocks are just bits on a disk

Checksum is the only validation method Far superior than storage level checksum

Page 26: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 26

Storage Based Data Protection for Databases

Attributes Storage Based Solution Transactional Integrity Not Guaranteed Time to recover to normal operations In hours Complexity High Use of Network Resources Inefficient Idle DR Resources Mostly Cost of the Solution Expensive Management Overhead High Risk of a Resume Generating Event Very High

Summary – Risks and Limitations

Page 27: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 27

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 28: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 28

Oracle Integrated Data Protection

Oracle recommendation: Maximum Availability Architecture (MAA) – Vastly more intelligent data protection than offered by storage layer – Architectural principle: Fault Isolation

Examples of MAA’s unique value-proposition – Database-optimized Physical Replication – Surgical Protection from Corruptions – e.g. Lost Writes – Zero Downtime Auto-repair of Corrupted Blocks – Efficient Correction of Human Errors

Page 29: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 29

RAC – Scalability – Server HA

ASM – Volume Management Online Redefinition,

Edition-based Redefinition, Data Guard, GoldenGate – Minimal downtime maintenance, upgrades, and migrations

Production Site

Maximum Availability Architecture (MAA) Low-Cost, Integrated, Fully Active, High ROI

Active Data Guard – Data Protection, DR – Query Offload

GoldenGate – Active-active – Heterogeneous

Oracle Secure Backup – Backup to tape / cloud

Active Replica

Flashback – Human error

correction

RMAN & Fast Recovery Area – On-disk backups

Page 30: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 30

Online Redefinition, Edition-based Redefinition, Data Guard, GoldenGate – Minimal downtime maintenance, upgrades, and migrations

Production Site

RAC – Scalability – Server HA

Flashback – Human error

correction

Active Data Guard – Data Protection, DR – Query Offload

GoldenGate – Active-active – Heterogeneous

Active Replica

Oracle Secure Backup – Backup to tape / cloud

ASM – Volume Management

RMAN & Fast Recovery Area – On-disk backups

Maximum Availability Architecture (MAA) Low-Cost, Integrated, Fully Active, High ROI

Page 31: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 31

Oracle Integrated Data Protection

Oracle recommendation: Maximum Availability Architecture (MAA) – Vastly more intelligent data protection than offered by storage layer – Architectural principle: Fault Isolation

Examples of MAA’s unique value-proposition – Database-optimized Physical Replication – Surgical Protection from Corruptions – e.g. Lost Writes – Zero Downtime Auto-repair of Corrupted Blocks – Efficient Correction of Human Errors

Page 32: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 32

Database Integrated Physical Replication Data Guard: Why A Big Deal

With Data Guard, redo blocks are transmitted directly from SGA: like a memcpy over the network

System Memory (SGA)

Oracle Database

Architecture

To Standby Databases

TCP/IP Better performance since no disk I/O Better isolation from lower layer faults Better network utilization: only redo

blocks sent over the network Transactional consistency always

maintained Upon database failover, apps simply

reconnect to new primary database

CPU / Memory

Storage

Page 33: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 33

Data Guard, Compared to Storage Mirroring Oracle Aware – Simple, Efficient, Physical Replication

Oracle data files

Oracle Instance (in memory)

Primary Database

Oracle data files

Standby Database Oracle Instance (in memory)

Recovery files

Recovery files

SYNC or ASYNC database redo

96% less network I/O & 85% less network volume Knowledge of Oracle redo and block structure Fault-isolation principles: no corruption propagation

Page 34: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 34

Oracle Integrated Data Protection

Oracle recommendation: Maximum Availability Architecture (MAA) – Vastly more intelligent data protection than offered by storage layer – Architectural principle: Fault Isolation

Examples of MAA’s unique value-proposition – Database-optimized Physical Replication – Surgical Protection from Corruptions – e.g. Lost Writes – Zero Downtime Auto-repair of Corrupted Blocks – Efficient Correction of Human Errors

Page 35: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 35

Outage at Financial Services Company

Production database alert.log: – Errors in file /opt/app/oracle/admin/dg/bdump/dg1.trc:

– ORA-01186 : file 93 failed verification tests

– ORA-01251 : Unknown File Header Version read for file number 93

– ORA-01251 - Corrupted file header. This could be caused due to missed read or write or hardware problem or process external to oracle overwriting the information in file header.

Database crashed – customer-facing applications down – Trade confirmation, new accounts, customer account information

Page 36: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 36

Data Corruption Protection

Only the database system has intrinsic knowledge of its block structure Only the database can perform true

end-to-end validation as block traverses the system stack Validation checks can be scaled

across related processes such as transaction management, backup & recovery, mirroring, etc.

Integrated with the Database

Disk

FC / TCP/IP

Disk Firmware

HBA / NIC

SAN/NAS

Device Driver

OS

CPU/Memory

I/O P

ATH

Page 37: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 37

Oracle-aware Corruption Protection

Database can check data, detect and repair corruptions – Checksum validation to detect data and redo block corruption – Checks semantic integrity of data blocks (Oracle knows Oracle) – Detects writes acknowledged but really lost by the I/O subsystem – Administrator can configure the level of checking – Can be configured for data blocks / data + index blocks

Specific technologies provide additional validation – RMAN validates while doing backup and recovery – ASM validates using mirroring copies – Data Guard validates with standby database

Built-in Data Validation

Page 38: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 38

The Problem of Lost Writes

Lost Writes – because of storage layer bug / malfunction, a write operation has really failed, but acknowledged as successful Database now has stale blocks

– Subsequent transactions may access the stale blocks – May update the same block / other blocks based on stale contents, with

serious business implications Forwarding confidential information to a terminated employee? Investment on a stock with multiple sell orders? Issuing an incorrect press release?

Database may continue running for days, till various ORA-600s

One of the Nasty Ones!

Page 39: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 39

Lost Write Real Life Example From Support Database: Lengthy Outage for Multi-TB Database

Problems first appear at the standby, but standby data is safe ORA-00600: internal error code, arguments: [3020] , [648], [1182463], [2719091455], [] ORA-10567 : Redo is inconsistent with data block (file# 648, block# 1182463) Recovery interrupted!

Noticed odd query results on production Noticed ORA-600 errors on production this morning for which SGA Heapdump was uploaded. New info : I was rebuilding an index. After a few minutes, the database took an unexpected crash. ***please help. it's very urgent, production is down.

4 days later: Production Database still down

Page 40: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 40

The Problem of Lost Writes

Remedy – A basis of comparison with valid blocks

Oracle solution – Data Guard Physical Standby

Controlled by DB_LOST_WRITE_PROTECT – TYPICAL: buffer cache read operations logged in redo, for read-write tablespaces – FULL: buffer cache read operations logged in redo also for read-only tablespaces

When lost write protection enabled, SCNs of incoming redo blocks from primary database compared to SCNs of blocks on physical standby

Solution!

Page 41: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 41

The Problem of Lost Writes

Primary(SCN) lower than Standby(SCN) implies lost-write error on primary database: – ORA-00752: recovery detected a lost write of a data block – ORA-10567: Redo is inconsistent with data block (file# 7, block# 26) – ORA-10564: tablespace TBS_2 – ORA-01110: data file 7: '/oracle/dbs/btbs_21.f' – ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 57503

In such a case, best option: failover to the standby database – SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;

Additional information: – MOS Note 1265884.1 - Resolving ORA-752 or ORA-600 [3020] During Standby Recovery

Detection by Data Guard

Page 42: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 42

Lost Write Protection By Data Guard

http://www.oracle.com/technetwork/database/features/availability/demonstrations-092317.html

Demonstration Lost Write Protection

Page 43: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 43

Oracle Integrated Data Protection

Oracle recommendation: Maximum Availability Architecture (MAA) – Vastly more intelligent data protection than offered by storage layer – Architectural principle: Fault Isolation

Examples of MAA’s unique value-proposition – Database-optimized Physical Replication – Surgical Protection from Corruptions – e.g. Lost Writes – Zero Downtime Auto-repair of Corrupted Blocks – Efficient Correction of Human Errors

Page 44: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 44

Active Data Guard: Auto-Block Repair

Automatic Block Repair – When Oracle detects corrupt blocks at primary database, it repairs online

by copying good version from an active standby database (& vice versa) – Transparent to the user and application

High Availability by Repairing Corruptions Online

Active Standby Database

Primary Database

Read/Write Workload

Continuous redo shipping, validation & apply

Real-time Reporting

Page 45: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 45

Active Data Guard Auto Block Repair

http://www.oracle.com/technetwork/database/features/availability/demonstrations-092317.html

Demonstration Auto-Corruption Fix

Page 46: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 46

Auto Block Repair … In Real Life

As you know, we have put one of our physical standby database in "open read only" mode to make use of "ABMR" (Automatic Block Media Recovery) feature. One incident happened last night … I thought I will share this information with you. Physical standby Database which we opened in read only mode reported below messages in alert.log. Tue Nov 15 22:00:04 2011 Automatic block media recovery requested for (file# 99, block# 369368) Automatic block media recovery successful for (file# 99, block# 369368) Errors in file /home/oracle/admin/physdby_ora_10751_ORAOOP.trc (incident=560569): ORA-01578: ORACLE data block corrupted (file # 99, block # 369368) ORA-01110: data file 99: '+DATA/data2/data2.dbf’

From operational perspective we see that, Physical standby database which was in open read only mode has seen block corruption, and ABMR helped us to recover this block from primary. Thrilled to see that ABMR works!!

Email From Customer For A Tier-1 Deployment

Page 47: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 47

Auto Block Repair … Insights

For your production databases, deploying an Active Data Guard standby is a good thing! You may not need to run real-time reports, but …

– Active Data Guard standby, in real-time, protects your production database from block corruption, in an app transparent manner

– As we saw earlier, this also works vice-versa Does not matter whether redo transport mode is SYNC or ASYNC –

as long as corresponding block can be applied within timeout threshold (default=60 secs), Auto Block Repair will be successful

Don’t Leave Home Without It!

Page 48: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 48

Oracle Integrated Data Protection

Oracle recommendation: Maximum Availability Architecture (MAA) – Vastly more intelligent data protection than offered by storage layer – Architectural principle: Fault Isolation

Examples of MAA’s unique value-proposition – Database-optimized Physical Replication – Surgical Protection from Corruptions – e.g. Lost Writes – Zero Downtime Auto-repair of Corrupted Blocks – Efficient Correction of Human Errors

Page 49: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 49

IOUG Survey: Causes for Unplanned Downtime To Err Is Human

Page 50: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 50

Flashback Technologies

Flashback revolutionizes error correction: – View ‘good’ data as of a point in time before error

Time/work to rewind data depends on the work done since the error happened, instead of the database size! Simple: SQL> flashback database to <timestamp>; Flexible: Flashback Query, Table, Transaction, Database, Drop

Fast, Granular Error-Correction

0

20

40

60

Correction Time = Error Time + f(DB_SIZE)

Page 51: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 51

Oracle Recovery Manager (RMAN) Oracle-Integrated Backup & Recovery Engine

Oracle Enterprise Manager

RMAN

Database

Fast Recovery Area

Tape

Oracle Secure Backup*

Intrinsic knowledge of database file formats and recovery procedures Block validation Online block-level recovery Tablespace/data file recovery Online, multi-streamed backup Unused block compression Native encryption

Integrated disk, tape & cloud backup leveraging the Fast Recovery Area (FRA) and Oracle Secure Backup

Cloud *RMAN also supports leading 3rd party media managers

Page 52: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 52

Database Integrated Data Protection

Attributes Storage Based Solution

Database Integrated Solution

Transactional Integrity Not Guaranteed Guaranteed Time to recover to normal operations In hours Seconds Complexity High Minimal Use of Network Resources Inefficient Efficient Idle DR Resources Mostly Never Cost of the Solution Expensive Low Management Overhead High Low Risk of a Resume Generating Event Very High Very Low

Contrasting with a Storage based Data Protection Solution

Page 53: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 53

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 54: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

54

Data Protection and High Availability Practices at High Tech Manufacturing

Gagan Singh Sr. Database Administrator Technology and Manufacturing Group Intel Corporation

Page 55: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

55

Agenda

• Intel – Database Setup Overview

• Enterprise Backup & Recovery (eBaR) setup – Past

• Enterprise Backup & Recovery (eBaR) HA2.0 – Present Architecture

• Production Scenarios

• Key Takeaways

Page 56: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

56

Database Setup Overview

• DSS and OLTP Setup

• Zero data loss and high availability is priority

• DB sizes range from few GB’s to ~ 50 TB

• Monitoring and auditing are key

• Robust Backup and Recovery procedures

• Application tier includes third party vendor products and in-house apps

• 24x7 uptime

Page 57: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

57

eBaR setup - Past

• Perl wrappers (8.x-10.2.x)

• Manageability challenges with new OS/DB versions

• Decentralized Recovery Catalog (RCAT)

• Complex to add/modify RMAN parameters

• Troubleshooting challenges

• Operations Nightmare - Monitoring and Auditing.

Page 58: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

58

Grid Control Node Grid Control Node

MSCS Failsafe Cluster

OMS+RCAT

Data Guard Redo Shipping

+DATA +FRA(Backup+ Flashback+ Archivelog)

Backup Network

Symantec 6.5 Symantec 6.5

+FRA(Backup+ Flashback+ Archivelog)

+DATA

VTL Servers

11.2.0.2-Grid/D

B

11.2.0.2-Grid/D

B

Backup Network

Observer

Data C

entre A

Data C

entre B

eBaR setup – Present Architecture

Page 59: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

59

The Setup

Weekly Incremental D2D D2T

Weekend L0 D2D D2T (Monthly Offsite)

Global Scripts

Block Change Tracking is Enabled

Basic Compression

Backup type “Backupset”

Flash Back Enabled (24Hr retention)

Page 60: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

60

Back up Flow

a • EM Grid Control OMS initiates scheduled backup Job

b • EM Grid Agent invokes RMAN on target DB

c • RMAN Global Script is executed

d • Disk To Disk (+FRA)

e • Purge Archivelogs (follow retention of 7 days)

f • D2T (FRA to VTL)

g • EM Grid Control Sends Success/Failure Email

Page 61: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

61

eBaR Eco-System and Highlights

• Online Data Duplication for STDBY creation

•Roll Forward STDBY with RMAN incremental

•RMAN Global Scripts •Substitution Variables for TAGS

•Archive log deletion policies

•Block Change Tracking •Multiple Channels (4) •Queue Depth •MAXPIECESIZE(32G) • FILESPERSET (2 data files, 20 archivelogs)

•Centralized RCAT •RCAT HA •Centralized Monitoring •Centralized Scheduling •Log History

Grid Control Integration Performance

STANDBY Database Cloning

Manageability

Page 62: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

62

Production Scenarios – Unplanned Outage

STDBY reinstated automatically using Oracle Flashback Database & Observer

No impact to App (Designed for retries)

Database Failover to STDBY < 60 seconds

Fast Start Fail Over (FSFO) initiated

DB Outage detected by Data Guard FSFO Observer

Failure Detected on Storage Layer

Page 63: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

63

Planned Outage

Enabled FSFO

Patch the new STDBY

No impact to App (Designed for retries)

Switched roles using Data Guard Broker

Patched the STDBY Stack

Disabled FSFO

Page 64: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

64

Key Takeaways

Summary: • No downtime to factory operations during planned outage compared to

~40min in the past • Zero Impact on Role Changes (Planned/Unplanned) • Auto Reinstatement via Flashback Database saves hours of STDBY rebuild

time

Learning’s Intel created in-house scripts to • Automate Backup job swaps in Grid Control upon role change • Automatic relocation of Observer on role changes based on DG guidelines • Force Level 0 backups on role change • Pre-checks to identify issues that prevent switchover need to be integrated

Page 65: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

65

Thanks to the Intel TMG(ATTD) Engineering Team

Contact Info:

[email protected]

Page 66: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the
Page 67: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

67

Reference

My Oracle Support:

Master Note for Data Guard [ID 1101938.1]

RMAN Backup Performance [ID 360443.1]

RMAN Performance Troubleshooting [ID 1326686.1]

OTN

http://www.oracle.com/technetwork/database/features/availability/oracle-database-maa-best-practices-155386.html

Page 68: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 68

Database & Storage Architecture

Data Protection with Storage Technologies

Storage-based Data Protection – Database Implications

Database-Integrated Data Protection

Customer Viewpoint – Intel

Summary

Agenda

Page 69: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 69

HA and Data Protection are in Oracle’s DNA

High Availability is ingrained in Oracle’s software development process

Bottom’s Up approach – the solution is highly available since each component is highly available

Each new feature adheres to Maximum Availability guidelines

Oracle’s Design Approach

Page 70: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 70

Resources OTN HA Portal:

http://www.oracle.com/goto/availability

Maximum Availability Architecture (MAA): http://www.oracle.com/goto/maa

MAA Blogs: http://blogs.oracle.com/maa

Exadata on OTN: http://www.oracle.com/technetwork/database/exadata/index.html

Oracle HA Customer Success Stories on OTN: http://www.oracle.com/technetwork/database/features/ha-casestudies-098033.html

Page 71: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 71 After OpenWorld, visit oracle.com/goto/availability

Key HA Sessions and Demos by Oracle Development Monday, 1 October – Moscone South

12:30p Oracle Data Guard Zero-Data-Loss Protection at Any Distance, 300 12:30p Future of Exadata: OLTP, Warehousing, and Consolidation, 104 1:45p Automating ILM with the Latest Database Technology, 300 1:45p Extracting Data in Oracle GoldenGate Integrated Capture Mode, 102 3:15p Maximize Availability with the Latest Database Technology, 303 3:15p Maximize Enterprise Availability with the Latest DB Technology, 303 4:45p Mission-Critical Oracle Exadata OLTP Deployment at PayPal, 300 4:45p Temporal Database Capabilities with the Latest DB Technology, 300 Tuesday, 2 October – Moscone South 10:15a Database Tables to Storage Bits: Data Protection Best Practices, 300 10:15a GoldenGate & Data Guard: Working Together Seamlessly, 305 11:45a Active Data Guard Zero-Downtime Database Maintenance, 300 11:45a Using Automatic Storage Mgmt with the Latest DB Technology, 301 1:15p The Four Ts of RMAN: Tips, Tuning, Troubleshooting, and … ?, 102 5:00p Maximum Availability Architecture Best Practices for Exadata, 303

Wednesday, 3 October – Moscone South 10:15a Operational Best Practices for Oracle Exadata, 102 10:15a Maximize Availability by Minimizing Disruption for End Users and Application, 301 11:45a What’s New in the Latest Generation of Oracle RAC, 301 11:45a Best Practices for HA w/ GoldenGate on Oracle Exadata, 102 1:15p Oracle Secure Backup: Integration Best Practices with Engineered Systems, 300 1:15p Application MAA Best Practices on Oracle Private Clouds, 200 5:00p Tuning &Troubleshooting Oracle GoldenGate on Oracle, 102 Thursday, 4 October – Moscone South 11:15a Integrate Your Globally Distributed IT for Key Cloud Computing Benefits, 300 12:45p Backup and Recovery of Oracle Exadata: Experiences and Best Practices, 300

Demos – Mon 10:00a-6:00p - Tue 9:45a-6:00p - Wed 9:45a-4:00p Oracle Maximum Availability Architecture, S-011 Oracle GoldenGate 11gR2 New Features, S-239 Oracle Database 12c: Global Data Services, S-010 Oracle Database 12c Application Continuity - S-009

Oracle Secure Backup, S-014 Oracle Active Data Guard, S-007 Oracle Recovery Manager and Oracle Flashback Technologies, S-019 Oracle Real Application Clusters and Oracle RAC One Node - S-008 Oracle Database 12c Xstream, Streams, Advanced Queing, S-018

Page 72: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 72

Graphic Section Divider

Page 73: Database Tables to Storage Bits: Data Protection …...Snapshots are application data- format unaware and cannot validate application data Snapshots reside on the same array as the

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 73