Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance...

44
Experiences with Real- Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing [email protected] Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation

Transcript of Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance...

Page 1: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Experiences with Real-Time Data Warehousing Using Oracle

Database 10G

Mike SchmitzHigh Performance Data Warehousing

[email protected] Brey

Principal Member Technical StaffST/NEDC Oracle Engineering

Oracle Corporation

Page 2: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing2

Agenda

The meaning of Real-Time in Data Warehousing Customer Business Scenario

Customer Environment “Real-Time” Requirement

Our Real-Time Solution Real-Time data architecture Incremental Operational Source Change Capture Transformation and Population into DW Target

Simplified Functional Demonstration Asynchronous Change Data Capture (Oracle)

Performance Characteristics and Considerations

Page 3: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz

High Performance Data Warehousing3

My BackgroundAn independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or small-scale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U.S. and Europe.

I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF.

Page 4: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing4

Real-Time in Data Warehousing

Data Warehousing Systems are complex environments Business rules Various data process flows and dependencies

Almost never pure Real-Time Some latency is a given

What do you need? Real Time Near Real-Time Just in Time for the business

Page 5: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing5

Customer Business Scenario

Client provides software solutions for utility companies Utility companies have plants generating energy supply

Recommended maximum output capacity Reserve Capacity Buy supplemental energy as needed

Peak demand periods are somewhat predictable Each day is pre-planned on historical behavior

Cheaper to buy energy ahead Expensive to have unused capacity

Existing data warehouse supports the planning function Reduced option expenses Cut down of supplemental energy costs

Page 6: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing6

Customer “Real-Time” Requirement

Getting more in-time accuracy enhances operational business Compare today's plant output volumes to yesterdays

or last week’s average Know when to purchase additional options or supplies

Customer Target Actual data within a 5 minute lag Use a single query Use a single tool

Page 7: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing7

Sample Analysis Graph

Plant A

0

20,000

40,000

60,000

80,000

100,000

8am 9am 10am

Today

Yesterday

Last WeekAvg

Max

Page 8: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing8

Our Real-Time SolutionOverview

Three-Step Approach:1. Implement a real-time DW data

architecture

2. Near real-time incremental change capture from operational system

3. Transformation and Propagation (population) of change data to DW

Page 9: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing9

Our Real-Time SolutionReal-Time DW Data Architecture

Add a Real-Time “Partition” to our Plant Output Fact Table for current day activity Separate physical table No indexes or RFI constraints (data coming in

will have RFI enforced) during daily activity UNION ALL viewed to the Plant Output Fact

Table

Page 10: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing10

Our Real-Time SolutionChange Capture and Population

1. Incremental change capture from operational site Synchronous or Asynchronous

2. Transformation and Propagation (population) of change data to the DW

Continuous trickle feed or periodic batch

Operations Staging DWAsynch CDC

Trigger

Batch

Synch CDC

Page 11: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing11

Our Real-Time SolutionIncremental Change Capture

Done with Oracle’s Change Data Capture (CDC) functionality Synchronous CDC available with Oracle9i Asynchronous CDC with Oracle10g

Asynchronous CDC is the preferred mechanism Decoupling of change capture from the

operational transaction

Page 12: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz

High Performance Data Warehousing

Asynchronous CDC

SQL interface to change data Publish/subscribe paradigm Parallel access to log files, leveraging

Oracle Streams Parallel transformation of data

OLTPDB

Redologfiles

Logical Change DataBased on

Log Miner

Oracle10g DWTables

SQL, PL/SQL,Java

Transform

Page 13: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing13

Our Real-Time SolutionPopulation of Change Data into DW

Continuous Change table owner creates trigger to populate

warehouse real-time partition Periodic Batch

Utilize the Subscribe Interface Subscribe to specific table and column changes

through view Sets a window and extracts the changes at required

period Purges view and moves window

Page 14: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing14

Integrate daily changes into historical fact table At the end of the day

index the current day table and apply constraints (no validate)

Create new fact table partition Exchange current day table with new partition Create next days “Real-Time Partition” table

Our Real-Time SolutionThe Daily Process

Page 15: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing15

Simplified Functional DemoSchema Owners

AO_CDC_OP Owns the operational schema

AO_CDC Owns the CDC change sets and change tables

(needs special cdc privileges) ? CDC Publish Role

AO_CDC_DW Owns the data warehouse schema (also needs

special cdc privileges) ? CDC Subscribe Role

Page 16: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Simplified Functional DemoOperational Schema

Page 17: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Simplified Functional DemoData Warehouse Schema

D_GENERATING_PLANT

GENERATING_PLANT_KEY: NUMBER(4)

PLANT_ID: VARCHAR2(24)PLANT_NAME: VARCHAR2(32)PLANT_STATUS: VARCHAR2(15)PLANT_TARGET_MAX_CAPACITY_KWH: NUMBER(15)PLANT_ABSOL_MAX_CAPACITY_KWH: NUMBER(15)UPDATE_TS: TIMESTAMP(6)

D_OUTPUT_MINUTE

OUTPUT_MINUTE_KEY: NUMBER

D_OUTPUT_DAY

OUTPUT_DAY_KEY: NUMBER

F_CURRENT_DAY_PLANT_OUTPUT

OUTPUT_DAY_KEY: NUMBER(7)OUTPUT_MINUTE_KEY: NUMBER(4)GENERATING_PLANT_KEY: NUMBER(4)

OUTPUT_ACTUAL_QTY_IN_KWH: NUMBER(15)

F_PLANT_OUTPUT

OUTPUT_DAY_KEY: NUMBEROUTPUT_MINUTE_KEY: NUMBERGENERATING_PLANT_KEY: NUMBER(4)

OUTPUT_ACTUAL_QTY_IN_KWH: NUMBER(15)

Page 18: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing18

What do we have?

Operational transaction table AO_CDC_OP.PLANT_OUTPUT

DW historical partitioned fact table AO_CDC_DW.F_PLANT_OUTPUT

DW current day table (“Real-Time Partition”) AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT

Data Warehouse UNION ALL view AO_CDC_DW.V_PLANT_OUTPUT

Page 19: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing19

First

The CDC user publishes Create a Change Set (CDC_DW) Add supplemental logging for the operational

table Create a change table for the operational

table (CT_PLANT_OUTPUT) Force database logging on the tablespace to

catch any bulk insert /*+ APPEND */ (non-logged) activity

Page 20: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing20

Next – Transform and Populate

One of two ways Continuous Feed

Logged Insert activity Permits nearer real-time Constant system load

Periodic Batch Feed Permits non-logged bulk operations You set the lag time – how often do you run the batch

process? Hourly Every five minutes

Less system load overall

Page 21: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing21

The Continuous Feed

Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations

Page 22: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing22

The Batch Feed

The CDC schema owner Authorizes AO_CDC_DW to select from the change table (the

select will be accomplished via a generated view) The DW schema owner

Subscribes to the change table and the columns he needs (with a centralized EDW approach this would usually be the whole change table) with a subscription and view name

Activates the subscription Extract

Extend the window Extracts changed data via the view (same code as trigger) Purges the window (logical Delete – physical deletion is handled by

the CDC schema owner)

Page 23: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing23

Extraction from Change Table View

insert /*+ APPEND*/ into ao_cdc_dw.F_CURRENT_DAY_PLANT_OUTPUT (generating_plant_key, output_day_key, output_minute_key, output_actual_qty_in_kwh) select p.generating_plant_key ,d.output_day_key ,m.output_minute_key ,new.output_in_kwh from ao_cdc_dw.PO_ACTIVITY_VIEW new inner join ao_cdc_dw.d_generating_plant p on new.plant_id = p.plant_id inner join ao_cdc_dw.d_output_day d on trunc(new.output_ts) = d.output_day inner join ao_cdc_dw.d_output_minute m on to_number(substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),10,2)||substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),13,2)) = m.output_time_24hr_nbr;

Page 24: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing24

Next Step

Add the current days activity (the contents of the current day fact table) to the historical fact table as a new partition Index and apply constraints to the current day

fact table Add a new empty partition to the fact table Exchange the current day fact table with the

partition Create the new current day fact table

Page 25: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing25

Let’s step thru this live

Page 26: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Mike Schmitz High Performance Data Warehousing26

Summary

We created a real-time partition for current day activity We put CDC on the operational table and created a

change table populated by an asynchronous process (reads redo log)

We demonstrated continuous feed to the DW by using a trigger based approach

We demonstrated a batch DW feed by using the CDC subscribe process

We showed how to add the current day table to the fact table and set up the next days table

An electronic copy of the SQL used to build this prototype is available by emailing [email protected]

Page 27: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Michael BreyPrincipal Member Technical StaffST/NEDC Oracle EngineeringOracle Corporation

Page 28: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Overview

Benchmark Description System Description Database Parameters Performance Data

Page 29: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

The Benchmark

Customer OLTP benchmark run internally at Oracle Insurance application handling customer inquires and

quotes over the phone N users perform M quotes Quote = actual work performed during a call with a

customer Mixture of Inserts, Updates, Deletes, Singleton Selects,

Cursor Fetches, Rollbacks/commits, savepoints Compute average time for all quotes across users

Page 30: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

System Info

SunFire 4800 A standard Shared Memory Processor (SMP) 8 900-Mhz CPUs 16 GB physical memory Solaris 5.8 Database storage: striped across 8 Sun

StorEdge T3 arrays (9X36.4MB each)

Page 31: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Database Parameters

Parallel_max_servers 20 Streams_pool_size 400M (default 10% shared

pool) Shared_pool_size 600M Buffer cache 128M Redo buffers 4M Processes 600

Page 32: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Change Data Capture (CDC)

Sync Async HotLog

Async AutoLog

Available Oracle 9i Oracle 10g Oracle 10g

source system cost

System resources

System resources

Minimal

Part of txn YES NO NO

Changes seen

Real time Near real time

Variable

Systems 1 1 2

Page 33: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Tests

Conducted tests with Asynchronous Hotlog CDC enabled and disabled and with Sync CDC.

Asynchronous Hotlog CDC tests conducted at different log usage levels

Appr. 10, 50, and 100% of all OLTP tables with DML operations were included in CDC

Tests run with: 250 concurrent users Continuous peak workload after ramp-up 175 transactions per second

Page 34: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Impact on Transaction Time

0.9

1

1.1

1.2

1.3

1.4

1.5

noCDC

noCDC

suppl

Async10%

Async50%

Async100%

Sync100%

Page 35: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

CPU ConsumptionSupplemental Logging

USR + SYS Time

0

1

2

3

4

5

65 75 145

215

285

355

425

495

565

635

705

775

845

915

985

Time (s)

Usa

ge (#

CP

US

)no CDC

no CDC w/ suppl

Page 36: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

CPU Consumption10% DML Change tracking

USR + SYS Time

0

1

2

3

4

5

65

80

15

5

23

0

30

5

38

0

45

5

53

0

60

5

68

0

75

5

83

0

90

5

98

0

Time (s)

Us

ag

e (

#C

PU

S)

no CDC w/suppl

CDC 10%

Page 37: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

USR + SYS Time

0

1

2

3

4

5

65

75

145

215

285

355

425

495

565

635

705

775

845

915

985

Time (s)

Usa

ge

(#C

PU

S)

no CDC w/suppl

CDC 50%

CPU Consumption50% DML Change tracking

Page 38: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

USR + SYS Time

0

1

2

3

4

5

6

7

85

75

145

215

285

355

425

495

565

635

705

775

845

915

985

Time (s)

Usa

ge

(#C

PU

S)

no CDC w/suppl

CDC 10%

CDC 100%

CPU Consumption10%,100% DML Change tracking

Page 39: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Latency of Change Tracking

Latency is defined as the time between the actual change and its reflection in the Change Capture Table

Latency = time[change record insert] – time[redo log insert] Latency measurement were made for the 100%

Asynchronous Hotlog CDC run 99.7% of records arrived in less than 2 secs

53.5% of records arrived in less than 1 sec Remaining records arrived in less than 3 sec Asynchronous CDC kept up with the constant high OLTP

workload all the time

Page 40: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Summary

Change Data Capture enables enterprise-ready near real-time capturing of change data

No fallback for constant high-load OLTP environments

Minimal impact on origin OLTP transactions Predictable additional resource requirements,

solely driven by the amount of change tracking Oracle provides the flexibility to meet your “on-

time” business needs

Page 41: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

AQ&

Page 42: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Next Steps….Data Warehousing DB Sessions

11:00 AM

#40153, Room 304

Oracle Warehouse Builder:

New Oracle Database 10g Release

3:30 PM

#40176, Room 303

Security and the Data Warehouse

4:00 PM

#40166, Room 130

Oracle Database 10g

SQL Model Clause

Monday

8:30 AM#40125, Room 130

Oracle Database 10g: A Spatial VLDB Case Study

3:30 PM#40177, Room 303

Building a Terabyte Data Warehouse,Using Linux and RAC

  5:00 PM

#40043, Room 104

Data Pump in Oracle Database 10g:Foundation for Ultrahigh-Speed Data

Movement

Tuesday

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

Page 43: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Next Steps….Data Warehousing DB Sessions

8:30 AM #40179, Room 304

Oracle Database 10g Data Warehouse Backup and Recovery

11:00 AM#36782, Room 304

Experiences with Real-Time Data Warehousing using Oracle 10g

1:00PM#40150, Room 102

Turbocharge your Database, Using the Oracle Database 10g

SQLAccess Advisor

Thursday

Oracle Database 10g

Oracle OLAP

Oracle Data Mining

Oracle Warehouse Builder

Oracle Application Server 10

Business Intelligence and Data Warehousing Demos All Four DaysIn The Oracle Demo Campground

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

Page 44: Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com.

Reminder – please complete the OracleWorld online session survey

Thank you.