Scaling Data

Post on 02-Dec-2014

885 views 3 download

Tags:

description

 

Transcript of Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies

Scaling SAS® Data Access toOracle® RDBMS

Howard PlemmonsSAS Institute Inc.Andrew HoldsworthOracle Corporation

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling

What is Scaling?

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling

“To remove the scales of a fish”

“To climb up by means of a scaling ladder”

“To reach the highest point”

Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

Why Scale to Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

SAS tools, SAS/ACCESS®

SAS Procedure and Processes

Oracle tools

Oracle Procedures and Processes

Copyright © 2003, SAS Institute Inc. All rights reserved.

Intelligence Value Chain

Copyright © 2003, SAS Institute Inc. All rights reserved.

Intelligence Value Chain Silver into Gold

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS System 9

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS V8 vs. SAS System 9

FEATURE SAS V8 SAS System 9

Libname Engine x x

Procedure Interface x x

Fast Load x x

Threaded Interface x

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS V8 I/O Model

Copyright © 2003, SAS Institute Inc. All rights reserved.

Threaded Interface SAS 9

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS Procedures proc sort

proc summary

proc dmine

proc reg; proc dmreg

proc means

proc loess; proc dmdb

proc glm

proc robustreg

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS/ACCESS® Engines

ORACLE

DB2

Informix

ODBC

Sybase

Teradata

Copyright © 2003, SAS Institute Inc. All rights reserved.

Libname and SAS Procedure Controls

dbslice (“where”,”where”,…)

dbsliceparm (ALL,…)

defaults (THREADED_APPS,2)

options sastrace=‘,,t’;

procedure controls – CPU count

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action - DBSLICEPARM

-dbsliceparm none

option dbsliceparm=

libname x oracle user=scott pass=tiger

dbsliceparm=(threaded_apps,2);

proc print data=y.oratab (dbsliceparm=(all,4)); run;

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action - DBSLICE

libname x oracle user=scott pass=tiger;

proc print data=x.oratab (dbslice= (“where x<100”, “where x >= 100”) );

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action – CPUCOUNT, THREADS

CPUCOUNT=

THREADS | NOTHREADS

Copyright © 2003, SAS Institute Inc. All rights reserved.

Process

Libname controls

Procedure controls

Execution

Copyright © 2003, SAS Institute Inc. All rights reserved.

Linear Scalability

Achieved Speedup

Scalability – SAS 9 Threaded speedup in PROC REG

Run on 12-way Unix Box

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scalability – SAS 9 Threaded speedup in PROC SORT

Run on 8-way Unix BoxTests run in memory cache

Copyright © 2003, SAS Institute Inc. All rights reserved.

What Does This Mean - access

393000 Rows

No Threads - baseline

Two Threads (DBSLICE) – 31%

Six Threads (DBSLICEPARM) – 54%

Run on 10-way Unix BoxTests run in memory cache

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

Data Volumes

Data ACCESS

Data Organization

Scaling using Oracle - Andrew

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling with

The Star Query

Use of Parallelism

Use of the Direct Path

Use of Specialist Indexes

Use of Analytical Functions

Use of Materialized Views

Use of The Oracle9i Optimizer

Copyright © 2003, SAS Institute Inc. All rights reserved.

The Star Query

Fact

Product

Time

Geography

Customer

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Queries The star query is a very common DW

technique. It is highly optimized in Oracle and can be tuned depending on the type of queries. In summary the more known about the query composition the higher level of optimization possible.

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Query Optimization

The Optimization is 3 step Process1.Apply query predicates to dimension tables to generate

lists of foreign keys into the fact table.

2.Query the fact table using series of single column bit mapped indexes on the foreign keys

3.Having resolved the query within the fact table complete the query by joining back to dimension tables where needed and roll the query up.

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Queries

– To enable star queries the DBA should do the following1. Build single column bitmapped indexes on each

foreign key in the fact table

2. Build indexes on the dimension tables for query predicates

3. Build indexes on the dimension tables to assist in the join back and roll up process

4. Generate statistics for the schema

5. Set the parameter STAR_TRANSFORMATION_ENABLED=TRUE

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of Parallelism

Multiple CPUs to execute a single query as well multiple concurrent queries

Execute Table scans, Index probes and scans in parallel

Execute Joins and Sorts in parallel

Execute DML in parallel

Parallelism can be configured manually or automatically

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of Partitioning

Partitioning was originally designed to allow management of large db objects however by partitioning data performance gains can be made by the following• Partition pruning

• Join optimizations

Partitioning can be done by the following methods• Range e.g. Data or key ranges

• List e.g. Discrete values such as State

• Hash to achieve equal size partitions

Two types of partitioning can be applied

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of The Direct Path

By pass the conventional transaction layer to insert and copy data within the database

SQL*Loader is user currently by SAS

Other options include• Insert with /*+ append */ hint

• Create Table as Select with NOLOGGING

These constructs can be used to transform vast amounts of data rapidly in parallel

Copyright © 2003, SAS Institute Inc. All rights reserved.

Specialist Indexes

B-Tree Indexes

Bit Mapped Indexes including join indexes

Functional Indexes

Copyright © 2003, SAS Institute Inc. All rights reserved.

Analytical Functions

Oracle has embraced the ANSI OLAP extensions to SQL

These permit faster response times on queries that would require multiple passes of the data with conventional SQL

This allows grouped results and functionality such as moving averages

Copyright © 2003, SAS Institute Inc. All rights reserved.

Materialized Views

Materialized view allow automatic use of summary tables without a user having to re-write the query

Well designed materialized views are small in size and can increase performance by orders of magnitude.

Materialized views are in fact Oracle tables and can use all other features to improve performance

Copyright © 2003, SAS Institute Inc. All rights reserved.

Oracle9i Optimizer

On upgrade of Oracle Releases the Optimizer behavior will change

The Optimizer is tested with over 400,000 SQL Statements

• Where plans change between releases the actual query is ran to test for degradation

• Slower plans are corrected

It is still important to have good representative Statistics

DBMS_STATS package allows parallel generation and migration of schema statistics

Copyright © 2003, SAS Institute Inc. All rights reserved.

Oracle9i Optimizer

Some common Optimizer problems seen with Oracle9i

• Bad or incomplete statistics

• Init.ora parameters influencing optimizer

• SQL written for RBO

Copyright © 2003, SAS Institute Inc. All rights reserved.

Summary

Oracle and SAS provide techniques for scaling to larger databases by optimizing both query performance and fetch performance.

These techniques are simple to adopt and allow huge productivity improvements

We have identified some core technologies here however this is a partial picture of the SAS/Oracle ability.

Copyright © 2003, SAS Institute Inc. All rights reserved.

About the Speakers

Howard Plemmons Andrew HoldsworthSenior Software Manager Director

SAS Institute Inc. Oracle Corp.

SAS Circle 500 Oracle Pkwy,

Cary, NC Redwood Shores, CA94065

Phone:

919-531-7779 650-506-2938

E-mail:

Howard.Plemmons@sas.com Andrew.Holdsworth@oracle.com

Copyright © 2003, SAS Institute Inc. All rights reserved.

Other SUGI Papers/Presentations

•PC File Data Objects Directly from UNIX – 8:00am Tuesday

•SAS/ACCESS and use of Metadata – Rm 619 @ 2:30

•Lessons in Scalability – SAS Presents – 3:20 Tuesday

•Data Warehousing section - performance

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling SAS Data ACCESS to ORACLE RDBMS

Copyright © 2003, SAS Institute Inc. All rights reserved.Copyright © 2003, SAS Institute Inc. All rights reserved. 40