ORACLE Data Warehousing Guarino

download ORACLE Data Warehousing Guarino

of 62

Transcript of ORACLE Data Warehousing Guarino

  • 8/6/2019 ORACLE Data Warehousing Guarino

    1/62

    Copyright 2008 - Oracle Corporation

  • 8/6/2019 ORACLE Data Warehousing Guarino

    2/62

    Oracle Data Warehounsing

    Vincenzo GuarinoTechnology Sales Consultant

  • 8/6/2019 ORACLE Data Warehousing Guarino

    3/62

    Copyright 2008 - Oracle Corporation

    Agenda

    Scenario

    Oracle Optimized Warehouse Initiative Data Warehousing with Oracle Database Oracle Data Warehouse Platform

  • 8/6/2019 ORACLE Data Warehousing Guarino

    4/62

    Copyright 2008 - Oracle Corporation

    Scenario

  • 8/6/2019 ORACLE Data Warehousing Guarino

    5/62

    Copyright 2008 - Oracle Corporation

    Oracles BI & DW Product Strategy

    Integrated Data Warehouse Database

    Scalability, Availability, Manageability Advanced analytic content, Data Quality and ETL/EL-T, DataMining services, Spatial data

    Integrated Business Intelligence Tools Next Generation Business Intelligence Technology Platform

    Integrated Analytic Applications Enterprise Wide, Industry Specific Analytic and Corporate

    Performance Management Applications

    Exploits Any Information Exploits Any Information Exploits Any Information

  • 8/6/2019 ORACLE Data Warehousing Guarino

    6/62

    Copyright 2008 - Oracle Corporation

    The Database Market

    http://www.oracle.com/database/number-one-database.html

    Source IDC

    http://www.oracle.com/solutions/business_intelligence/feature_dw_leadership.html

    40.9%

    22.8%

    15.0%

    9.6%

    11.7%

    Oracle

    IBM

    Microsoft

    Teradata

    Other

  • 8/6/2019 ORACLE Data Warehousing Guarino

    7/62

    Copyright 2008 - Oracle Corporation

    Introducing Oracle OptimizedWarehouse Initiative

  • 8/6/2019 ORACLE Data Warehousing Guarino

    8/62

    Copyright 2008 - Oracle Corporation

  • 8/6/2019 ORACLE Data Warehousing Guarino

    9/62

    Copyright 2008 - Oracle Corporation

    A bit of methodology

    OperationalSystems

    OperationalSystems

    Centralized Repository (1 st level Data Warehouse) 3rd Normal Form Data Model Optimization for large volume of data

    Enterprise Reporting and general queries Data and Metadata integration

    Dipendent Data Marts (2 nd level Data Warehouse) Dimensional schemas and views Multidimensional objects Complex analytical queries High query performance

    Staging area Reconciling, transforming and integrating source data Data quality checks and corrections, recycling data

    loading errors

    Data Sources Applications, Transactions, external data

    Staging AreaStaging Area

    Atomic Data LayerAtomic Data Layer

    Performance Data LayerPerformance Data Layer

  • 8/6/2019 ORACLE Data Warehousing Guarino

    10/62

    Copyright 2008 - Oracle Corporation

    Problem

    This chain is composed by complex processes for transforming source data to analytical information Source applications, ETL/EL-T processes, dimensional data

    modeling, metadata menagement and integration, and by Software products that allow to build up and use

    the information Database, ETL/EL-T tool, Front-end tool, Data Hub, Multi-

    dimensional engine, Reporting environment,

    But the Hardware components are also involved in thiscontext Servers, CPU, Memory, Disks, Network, Devices,

    Some percentage of customers inevitably end up with poorly configured data warehouses

    Performance is the essence of theOracle Optimized Warehouse program

    OperationalSystems

    OperationalSystems

    OperationalSystems

    OperationalSystems

    Staging AreaStaging AreaStaging AreaStaging Area

    Atomic Data LayerAtomic Data LayerAtomic Data LayerAtomic Data Layer

    Performance Data LayerPerformance Data LayerPerformance Data LayerPerformance Data Layer

  • 8/6/2019 ORACLE Data Warehousing Guarino

    11/62

    Copyright 2008 - Oracle Corporation

    But how to start with?

  • 8/6/2019 ORACLE Data Warehousing Guarino

    12/62

    Copyright 2008 - Oracle Corporation

    Full Range of DW Solution Options

    Flexibility for the mostdemanding data warehouse

    Benefits: High performance

    Unlimited scalability Completely

    customizable

    Industry-leadingdatabase and hardware

    CustomCustom

    DatabaseOptions

    Management

    Packs

    Partitioning RAC

    OptimizedWarehouseOptimizedWarehouse Scalable systems pre-

    installed and pre-configured: ready to runout-of-the-box

    Benefits:

    High performance Simple to buy

    Fast to implement

    Easy to maintain

    Competitively priced

    Flexibility

    Pre-configured, Pre-installed, Validated

    DatabaseOptions

    ManagementPacks

    Documented best-practiceconfigurations for datawarehousing

    Benefits: High performance

    Simple to scale; modularbuilding blocks

    Industry-leadingdatabase and hardware

    Available today with HP,IBM, Sun, EMC/Dell

    ReferenceConfigurationReferenceConfiguration

  • 8/6/2019 ORACLE Data Warehousing Guarino

    13/62

    Copyright 2008 - Oracle Corporation

    Oracle OptimizedWarehouse

    < 1 - 2 week

    Take delivery ofOracle

    Optimized Warehouse

    Build from Scratchwith Components

    Pre-implementationsystem sizing

    Acquisition ofcomponents

    Installation andconfiguration

    Testing andValidation

    Weeks to Months

    Accelerate implementations and lower risk

    Oracle Optimized Warehouse Initiative

    Faster deploymentLower Risk

    ReferenceConfigurations

    Acquisition ofcomponents

    Installation andconfiguration

    Testing andValidation

    Weeks to Months

  • 8/6/2019 ORACLE Data Warehousing Guarino

    14/62

    Copyright 2008 - Oracle Corporation

    Oracle Optimized Warehouse Initiative - OWI

    Goals for Oracle data warehouse solutions:

    Provide superior system performance Provide a superior customer experience

    One product for data warehouse Database and options software, servers, storage Pre-installed, pre-configured Validated performance Sold as a single product Supported as a single product

  • 8/6/2019 ORACLE Data Warehousing Guarino

    15/62

    Copyright 2008 - Oracle Corporation

    Soon

    OptimizedWarehouses

    ReferenceConfigurations

    Partner

    OWI availability

  • 8/6/2019 ORACLE Data Warehousing Guarino

    16/62

    Copyright 2008 - Oracle Corporation

    Oracle Optimized Warehouse Initiative - OWI

    SolarisAIXLinuxO/S

    E20KP570 Power 6PE2950Server 10 TB5-20 TB1-4 TBSize

  • 8/6/2019 ORACLE Data Warehousing Guarino

    17/62

    Copyright 2008 - Oracle Corporation

    OWI Building Block Scale-Out

    Validation and testing of incremental growth path

  • 8/6/2019 ORACLE Data Warehousing Guarino

    18/62

    Copyright 2008 - Oracle Corporation

    Oracle Optimized Warehouse Reference

    Configurations

    What is it? Documented balanced system

    configurations for pre-definedDWBI environments

    Starting point for sizing a system Balanced system consists of CPU,

    memory, I/O, and cabling

    Leverages scalable, modular components

    Enables incremental growth(scale-in, scale-out)

    Mitigates implementation risks Available on HP, Sun, IBM, andDell/EMC Example Reference

    Configuration, with HP

  • 8/6/2019 ORACLE Data Warehousing Guarino

    19/62

    Copyright 2008 - Oracle Corporation

    http://www.oracle.com/solutions/business_intelligence/optimized-warehouse-initiative.html

  • 8/6/2019 ORACLE Data Warehousing Guarino

    20/62

    Copyright 2008 - Oracle Corporation

    Data Warehousing withOracle Database

  • 8/6/2019 ORACLE Data Warehousing Guarino

    21/62

    Copyright 2008 - Oracle Corporation

    Oracle for Data WarehousingContinuous innovation

    PerformancePerformance

    D W H & A n a

    l y t i c a l

    f e a t u r e s

    D W H & A n a

    l y t i c a l

    f e a t u r e s

  • 8/6/2019 ORACLE Data Warehousing Guarino

    22/62

    Copyright 2008 - Oracle Corporation

    Total Security

    Oracle Database Enterprise Editions and Options

    Storing and managing eachtype of data

    ETL functionalities, advancedAnalytic content, Data Miningbuilt in the Database Kernel

    High performance on largevolume of data

    (VLDB and VLDW)

  • 8/6/2019 ORACLE Data Warehousing Guarino

    23/62

    Copyright 2008 - Oracle Corporation

    Oracle Data WarehousePlatform

  • 8/6/2019 ORACLE Data Warehousing Guarino

    24/62

    Copyright 2008 - Oracle Corporation

    Oracle Data Warehouse Platform

    ELT & Data Quality

    DataIntegration

    DataModeling

    MetadataManagement

    DataProfiling SOA

    Analytic PlatformMulti

    DimensionalCalculations

    Time Series Forecasting Statistics DataMining

    Scalable Data Management

    AutomaticStorage

    ManagementPartitioning ParallelOperations

    AggregationManagement

    RealApplication

    Clusters

  • 8/6/2019 ORACLE Data Warehousing Guarino

    25/62

    Copyright 2008 - Oracle Corporation

    Oracle Data Warehouse Platform

    ELT & Data Quality

    DataIntegration

    DataModeling

    MetadataManagement

    DataProfiling SOA

    Analytic PlatformMulti

    DimensionalCalculations

    Time Series Forecasting Statistics DataMining

    Scalable Data Management

    AutomaticStorage

    ManagementPartitioning ParallelOperations

    AggregationManagement

    RealApplication

    Clusters

  • 8/6/2019 ORACLE Data Warehousing Guarino

    26/62

    Copyright 2008 - Oracle Corporation

    Oracle Enterprise Grid

    Computing as a utility A network of clients and service

    providers Client-side: Simplicity

    Request computation or information and receive it Server-side: Sophistication

    Availability, load balancing,utilization

    Information sharing, datamanagement Virtualization

    Clients see a large virtual server Underlying infrastructure hidden

    Manageability Easy and automated managementfrom a unique Web console

    High availability & Scalability S t o r a g e

    S t o r a g e

    D a t a b a s e D a t a b a s e

    M i d d l e w a

    r e M i d d

    l e w a r e

    A p p l i c a t i

    o n s

    A p p l i c a

    t i o n s

    GridGridControlControl

  • 8/6/2019 ORACLE Data Warehousing Guarino

    27/62

    Copyright 2008 - Oracle Corporation

    Oracle Real Application Cluster - RACCapacity on demand for the Grid

    Database clustering with shareddisk

    Low cost highest quality of service Scalability & availability

    Add/drop servers as needschange

    Automatically balance loadacross servers

    On-line configuration of services and priorities

    Proven Hundreds of customers

    running enterpriseapplications

    High

    Med

    Low

    High

    Low

    Priority0.5/0.75 ms.

    0.5/1.00 ms.

    1.0/1.5 ms.

    1.0/1.5 ms.

    3.0/5.0 ms.

    High

    Med

    Low

    High

    Low

    0.5/0.75 ms.

    Warning/CriticalThreshold

    0.5/1.00 ms.

    1.0/1.5 ms.

    1.0/1.5 ms.

    3.0/5.0 ms.

    TransactionServices

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    RAC01 RAC02 RAC03 RAC04

    Batch JobServices

    Instances

    C L U S N OD E - 1

    C L U S N OD E - 2

    C L U S N OD E - 3

    C L U S N OD E - 4

    High

    Med

    Low

    High

    Low

    Priority0.5/0.75 ms.

    0.5/1.00 ms.

    1.0/1.5 ms.

    1.0/1.5 ms.

    3.0/5.0 ms.

    High

    Med

    Low

    High

    Low

    0.5/0.75 ms.

    Warning/CriticalThreshold

    0.5/1.00 ms.

    1.0/1.5 ms.

    1.0/1.5 ms.

    3.0/5.0 ms.

    TransactionServices

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    ERPCRM

    SS

    HOT

    STD

    RAC01 RAC02 RAC03 RAC04

    Batch JobServices

    Instances

    C L U S N OD E - 1

    C L U S N OD E - 2

    C L U S N OD E - 3

    C L U S N OD E - 4

  • 8/6/2019 ORACLE Data Warehousing Guarino

    28/62

    Copyright 2008 - Oracle Corporation

    Extend and growth as needed

    3 6 9 12 15 18 21 24Months

    100%

    200%

    300%W

    or

    k

    l

    o

    a

    d

  • 8/6/2019 ORACLE Data Warehousing Guarino

    29/62

    Copyright 2008 - Oracle Corporation

    RAC for Data WarehousingManageability

    ETL

    OLAP

    ReportsDuranteDurante le orele oredi picco delledi picco dellequery equery e analisianalisiutenteutente

    During peakDuring peakworking hours of working hours of usersusers queries andqueries andanalysisanalysis

    ETL

    OLAP

    Reports

    DuranteDurante lala finestrafinestratemporale dedicatatemporale dedicataai caricamenti deiai caricamenti deinuovi datinuovi dati

    During intervalsDuring intervalswhen the DW iswhen the DW isloaded with newloaded with newand modified dataand modified data

    ETL

    OLAP

    Reports

    Subito dopoSubito dopo iicaricamenticaricamenti

    After having loadedAfter having loadedthe datathe data

    ETL

    OLAP

    Reports

    Possibilit diPossibilit dibilanciarebilanciaretotalmente tuttitotalmente tutti iiserviziservizi

    Without responseWithout responsetime requirements alltime requirements alltypes of workload cantypes of workload canrun on all nodesrun on all nodes

  • 8/6/2019 ORACLE Data Warehousing Guarino

    30/62

    Copyright 2008 - Oracle Corporation

    Partitioning

    Partitioning addresses key issues in supporting verylarge tables and indexes

    Decompose them into smaller and more manageable piecescalled partitions

    SQL queries and DML statements do not need to be modifiedin order to access partitioned tables

    DDL statements can access and manipulate individualspartitions rather than entire tables or indexes

    Add a new partition, organize an existing partition, or dropa partition with minimal to zero interruption to a read-onlyapplication

    Partitioning is entirely transparent to applicationsMarJan

    Feb

    Sales

    SQL

    Application

    MarMarJanJan

    FebFeb

    SalesSales

    SQLSQL

    ApplicationApplication

  • 8/6/2019 ORACLE Data Warehousing Guarino

    31/62

    Copyright 2008 - Oracle Corporation

    Apr2007

    Feb2007

    Jan2007

    Oct2007

    May2007

    Jul2007

    Aug2007

    MarMar20072007

    JunJun20072007

    Nov2007

    Dec2007

    SepSep20072007

    Partitioning

    Using the partitioning methods can help tune SQL statements to avoidunnecessary index and table scans (using partition pruning)

    Improve the performance of massive join operations when largeamounts of data (for example, several million rows) are joined together by using partition-wise joins

    Partitioning data greatly improves manageability of very large databasesand dramatically reduces the time required for administrative tasks suchas backup and restore

    AprApr20072007

    FebFeb20072007

    JanJan20072007

    OctOct20072007

    MayMay20072007

    JulJul20072007

    AugAug20072007

    MarMar20072007

    JunJun20072007

    NovNov20072007

    DecDec20072007

    SepSep20072007

    SELECT sum(revenue)SELECT sum(revenue)FROM SalesFROM SalesWHERE sales_date INWHERE sales_date IN

    (to_date((to_date( MARMAR--1515 --20072007 ,, MONMON--DDDD--YYYYYYYY),),

    to_date(to_date( JUNJUN --1010 --20072007 ,, MONMON--DDDD--YYYYYYYY),),to_date(to_date( SEPSEP --2828 --20072007 ,, MONMON--DDDD--YYYYYYYY););

  • 8/6/2019 ORACLE Data Warehousing Guarino

    32/62

  • 8/6/2019 ORACLE Data Warehousing Guarino

    33/62

    Copyright 2008 - Oracle Corporation

    Rolling Window Operations

    Q4 06 Q1 07 Q2 07 Q3 07

    Order Table(partitioned by quarter)

    Drop

    Other data & queries not affected

    Q4 07Add

  • 8/6/2019 ORACLE Data Warehousing Guarino

    34/62

    Copyright 2008 - Oracle Corporation

    DIGITALDATA STORAGE

    High PerformanceStorage Tier

    Low CostStorage Tier

    Online ArchiveStorage Tier

    Active LessActive Historical Archive

    Offline ArchiveStorage Tier

    Use Flashback Data Archive for long-term storage of old data Use table, index partitioning to separate data into different tiers

    Use new ILM assistant to establish policies, create scripts

    Information Lifecycle ManagementOptimize storage cost and performance

  • 8/6/2019 ORACLE Data Warehousing Guarino

    35/62

    Copyright 2008 - Oracle Corporation

    Oracle Data Warehouse Platform

    ELT & Data Quality

    DataIntegration

    DataModeling

    MetadataManagement

    DataProfiling SOA

    Analytic PlatformMulti

    DimensionalCalculations

    Time Series Forecasting Statistics DataMining

    Scalable Data Management

    AutomaticStorage

    ManagementPartitioning ParallelOperations

    AggregationManagement

    RealApplication

    Clusters

  • 8/6/2019 ORACLE Data Warehousing Guarino

    36/62

  • 8/6/2019 ORACLE Data Warehousing Guarino

    37/62

    Copyright 2008 - Oracle Corporation

    Key points

    Declarative, graphical and Wizard driven development The transformation engine is the target Oracle

    Database Configurable ETL and/or EL-T mechanism

    The transformation language is PL/SQL, automatically

    generated and optimized depending on the Databaserelease

    Open and standard (CWM) Metadata Repository

  • 8/6/2019 ORACLE Data Warehousing Guarino

    38/62

    Copyright 2008 - Oracle Corporation

    Oracle Warehouse Builder

    Licensing Option Informationhttp://download.oracle.com/docs/cd/B28359_01/license.111/b28287/toc.htm

  • 8/6/2019 ORACLE Data Warehousing Guarino

    39/62

    Copyright 2008 - Oracle Corporation

    Oracle Warehouse Builder Core ETL features Included in any edition of Oracle Database Advanced Relational AND OLAP Modeling Design Experts for automate complex tasks

    OWB Core

    Oracle Warehouse Builder

    * To be licensed separately

    OWBData QualityOption *

    Advanced Data Profiling Auto-derived or Custom Data Rules and Mappings Data Auditors in the context of ETL Process Flows 6-Sigma Quality Indices

    OWBEnterprise ETLOption *

    Advanced ETL Features Good for Large Scale, Complex ETL Deployments Slow Changing Dimensions, Pluggable Mappings Guided Change Propagation, Complex Process

    Flows

    OWBConnectorsOption *

    Enterprise ETL Connectors for Oracle E-Business Suite Peoplesoft Siebel SAP R/3

  • 8/6/2019 ORACLE Data Warehousing Guarino

    40/62

    Copyright 2008 - Oracle Corporation

    Oracle Data Warehouse Platform

    ELT & Data Quality

    DataIntegration

    DataModeling

    MetadataManagement

    DataProfiling SOA

    Analytic PlatformMulti

    DimensionalCalculations

    Time Series Forecasting Statistics DataMining

    Scalable Data Management

    AutomaticStorageManagement

    Partitioning ParallelOperationsAggregationManagement

    RealApplicationClusters

  • 8/6/2019 ORACLE Data Warehousing Guarino

    41/62

    Copyright 2008 - Oracle Corporation

    Bring the algorithms to the data, not the data to the

    algorithmsUnparalleled Analytic Power

    Analytic computations done bythe database

    Statistics OLAP Data Mining

    Scalability

    Security Simplicity Single source of Truth Low information latency

    OLAP

    Data Mining

    Statistics

  • 8/6/2019 ORACLE Data Warehousing Guarino

    42/62

    Copyright 2008 - Oracle Corporation

    SQL Analytic and Statistic functions

    Window Aggregate functions (moving andcumulative)

    Avg, sum, min, max, count, variance, stddev,first_value, last_value

    Ranking functions rank, dense_rank, cume_dist, percent_rank, ntile

    LAG/LEAD functions Direct inter-row reference using offsets

    Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count,

    ratio_to_report

    Statistical Aggregates Correlation, linear regression family, covariance

    Linear regression Fitting of an ordinary-least-squares regression line to a

    set of number pairs. Frequently combined with the COVAR_POP,

    COVAR_SAMP, and CORR functions.

    Descriptive Statistics average, standard deviation, variance, min, max,

    median (via percentile_count), mode, group-by & roll-up

    DBMS_STAT_FUNCS: summarizes numericalcolumns of a table and returns count, min, max, range,mean, stats_mode, variance, standard deviation,median, quantile values, +/- n sigma values,top/bottom 5 values

    Correlations Pearsons correlation coefficients, Spearman's and

    Kendall's (both nonparametric). Cross Tabs

    Enhanced with % statistics: chi squared, phi

    coefficient, Cramer's V, contingency coefficient,Cohen's kappa Hypothesis Testing

    Student t-test , F-test, Binomial test, Wilcoxon SignedRanks test, Chi-square, Mann Whitney test,Kolmogorov-Smirnov test, One-way ANOVA

    Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test,Chi-Squared Test, Normal, Uniform, Weibull,

    Exponential Pareto Analysis

    80:20 rule, cumulative results table

  • 8/6/2019 ORACLE Data Warehousing Guarino

    43/62

    Copyright 2008 - Oracle Corporation

    Oracle OLAP 11g

    Enhance the Oracle Data Warehouse and improvebusiness intelligence applications by:

    Delivering rich analytic content Advanced analytic calculations using simple SQL Any combination of Total and Sub-total available

    Building and managing Multi-dimensional structures inside the

    database Pre-calculated indicators always available

    Time-series, non additive measures across dimensions,wide range of functions used with respect of the Timedimension

    Every drill operation delivers coherent level data detail Accelerating query performance

    Completely transparent to the application

  • 8/6/2019 ORACLE Data Warehousing Guarino

    44/62

    Copyright 2008 - Oracle Corporation

  • 8/6/2019 ORACLE Data Warehousing Guarino

    45/62

    Copyright 2008 - Oracle Corporation

    Calculated Measures

  • 8/6/2019 ORACLE Data Warehousing Guarino

    46/62

    Copyright 2008 - Oracle Corporation

    Olap-based Materialized ViewsBreakthrough Performance

    Tables

    Relational Star Schema

    SQL Query

    A single cube provides theequivalent of thousands of MVs

    Efficiently computed,compressed, maintained

    The 11g SQL Query

    Optimizer treats OLAPcubes as MVs andrewrites queries to accesscubes transparently OLAP Cube

    Query Rewrite

    Ol b d M i li d Vi

  • 8/6/2019 ORACLE Data Warehousing Guarino

    47/62

    Copyright 2008 - Oracle Corporation

    Olap-based Materialized ViewsBreakthrough Manageability

    Tables

    Relational Star SchemaRelational Star Schema

    Like 10g MVs, providesfast incremental refresh of the cube as underlyingdata changes

    A single object to maintainrather than thousands

    Simple - Cube refreshsyntax is identical to MVRefresh syntax

    OLAP cubeOLAP cube

    Cube Cube Refresh Refresh

    A l i W k M

  • 8/6/2019 ORACLE Data Warehousing Guarino

    48/62

    Copyright 2008 - Oracle Corporation

    Analytic Workspace Manager

    Graphical interface for designing, creating and managing MultidimensionalStructures

    O l I D t b D t Mi i

  • 8/6/2019 ORACLE Data Warehousing Guarino

    49/62

    Copyright 2008 - Oracle Corporation

    Oracle In-Database Data MiningA Disruptive Technology

    Provides a rich set of predictive algorithms High efficiency Predictive Analysis Multiple state-of-the-art supported algorithms

    Easy to integrate and deploy SQL functions, Java API JSR-73, PL/SQL API Graphical easy-to-use interface Third party products that extend the coverage

    Scalable

    Secure and Reliable Changes the economics of analytics

    Oracle Data Mining

  • 8/6/2019 ORACLE Data Warehousing Guarino

    50/62

    Copyright 2008 - Oracle Corporation

    Oracle Data MiningAlgorithms & Example Applications

    A1 A2 A3 A4 A5 A6 A7A1 A2 A3 A4 A5 A6 A7

    Support Vector Machine

    Generalized Linear Models multivariate linear regression logistic regression

    Regression Predict a numeric value

    Predict a purchase amount or costPredict the value of a home

    Decision TreesNave BayesSupport Vector MachineAdaptive Bayes Network*

    *Deprecated

    Classification and Prediction Predict customers most likely to

    respond to a campaign or offer, incur the highest costs, etc.

    Target your best customers Develop customer profiles

    Minimum Description LengthAttribute Importance Identify most influential attributes for a

    target attribute

    Income

    Gender

    Status Gender HH Size

    >$50K 4

    Age

    Buy = 0 Buy = 1 Buy = 1 Buy = 0

  • 8/6/2019 ORACLE Data Warehousing Guarino

    51/62

    Copyright 2008 - Oracle Corporation

    F1 F2 F3 F4F1 F2 F3 F4

    Non-Negative MatrixFactorizationFeature Extraction Reduce a large dataset into

    representative new attributes Useful for clustering and text mining

    Apriori Association RulesAssociation Rules Find co-occurring items in a market

    basket Suggest product combinations Design better item placement on

    shelves

    Enhanced k -meansOrthogonal PartitioningAnomaly Detection

    Clustering Find naturally occurring groups

    Market segmentation Find disease subgroups Identify frauds, anomalies

    Oracle Data MiningAlgorithms & Example Applications

    Oracle Data Mining

  • 8/6/2019 ORACLE Data Warehousing Guarino

    52/62

    Copyright 2008 - Oracle Corporation

    Text MiningText Mining Combine data and text for better

    models Add unstructured text e.g. physicians

    notes to structured data e.g. age,weight, height, etc., to predictoutcomes

    Classify and cluster documents Combined with Oracle Text to develop

    advanced text mining applications

    Oracle Data MiningAlgorithms & Example Applications

    SQL Data Mining

  • 8/6/2019 ORACLE Data Warehousing Guarino

    53/62

    Copyright 2008 - Oracle Corporation

    SQL Data Mining

    Given a previously built response model (classification),predict who will respond to the campaign,and why

    select cust_name,

    prediction(campaign_model using *)as responder,

    prediction_details(campaign_model using *)as reason

    from customers;

    Real-time Prediction

  • 8/6/2019 ORACLE Data Warehousing Guarino

    54/62

    Copyright 2008 - Oracle Corporation

    Real time Prediction

    With records as(select

    178255 ANNUAL_INCOME,0 CAPITAL_GAIN,83 SAVINGS_BALANCE,246 AVE_CHECKING_BALANCE,

    30 AGE,HIGH EDUCATION,Mngr WORKCLASS,Married MARITAL_STATUS,Sales OCCUPATION,Husband RELATIONSHIP,White RACE,

    Male SEX,70 HOURS_PER_WEEK,? NATIVE_COUNTRY,98 PAYROLL_DEDUCTION from dual)

    select s.prediction prediction, s.probability probabilityfrom (

    select PREDICTION_SET( CD_BUYERS76485_DT , 1 USING *) pset

    from records) t, TABLE(t.pset) s;

    On-the-fly, single recordapply with new data

    Oracle Data Miner GUI interface

  • 8/6/2019 ORACLE Data Warehousing Guarino

    55/62

    Copyright 2008 - Oracle Corporation

    Oracle Data Miner GUI interface

    Oracle Data Miner sActivity Guidessimplify & automatedata mining for business users

    Oracle Data Miner providesmodel performance andevaluation viewers

    Technology Partnership

  • 8/6/2019 ORACLE Data Warehousing Guarino

    56/62

    Copyright 2008 - Oracle Corporation

    Technology PartnershipSPSS Clementine

    Combine SPSS Clementine ease of use with ODM in-Database functionality & scalability

    Build, store, browse and score models in the Database for optimal performance

    InforSenseInforSense -- A Single Optimized Environment forA Single Optimized Environment forReal Time Predictive Analytics withinReal Time Predictive Analytics within the Databasethe Database

  • 8/6/2019 ORACLE Data Warehousing Guarino

    57/62

    Copyright 2008 - Oracle Corporation

    Oracle DataSources

    Data Mining

    Preprocess

    Statistics

    Text

    OLAP

    Scheduler

    OracleFunctionalities:

    Deploy the analytic workflowas a WebService

    OracleDecision TreeModel

    yy

    SQL free analytics : drag-drop application buildVisual analytics : interactive visualisation

    Integrative analytics : unified analytical environmentAutomated analytics : deploy to Oracle Portal and BPEL

    InforSenseService

    Deploy the analytic workflowas a service embedding toBPEL, SFA, CRM

    Interact with (visualize) dataat any step in the workflow

    Deployment

  • 8/6/2019 ORACLE Data Warehousing Guarino

    58/62

    Copyright 2008 - Oracle Corporation

    And last but not least

    Oracle Database value innovation

  • 8/6/2019 ORACLE Data Warehousing Guarino

    59/62

    Copyright 2008 - Oracle Corporation

    Oracle Data GuardOracle ClusterwareOnline OperationsFlashback OperationsRolling UpgradesAdvanced Backup/RecoveryStreams, ReplicationOracle Real Application ClustersOracle Secure Backup

    Automatic Workload RepositoryAutomatic Memory Management

    Automatic Database Diagnostic MonitorParallel Operations

    Database Control & Grid ControlTuning Pack

    Diagnostic PackChange Management Pack

    Configuration Management PackProvisioning Pack

    Automatic Storage ManagementRecovery ManagerOracle Cluster File System

    Information Lifecycle ManagementTransportable TablespacesExternal TablesCompressionPartitioning

    Virtual Private DatabaseLabel Security (Fine Grained Audit)

    Identity Management

    Secure Application RolesTransparent Data EncryptionDatabase Vault

    Audit Vault

    Manageability

    Security

    Availability

    Storage

    The Data Warehousing Books

  • 8/6/2019 ORACLE Data Warehousing Guarino

    60/62

    Copyright 2008 - Oracle Corporation

    http://www.oracle.com/technology/documentation/database11gR1.html

  • 8/6/2019 ORACLE Data Warehousing Guarino

    61/62

    Copyright 2008 - Oracle Corporation

  • 8/6/2019 ORACLE Data Warehousing Guarino

    62/62

    Copyright 2008 - Oracle Corporation