02 IBM Netezza Overview

download 02 IBM Netezza Overview

of 38

description

Netezza doc

Transcript of 02 IBM Netezza Overview

  • 2011 IBM Corporation

    Information Management

    IBM Netezza Overview

    Information Management Technology Ecosystems

    Spring 2011

  • 2011 IBM Corporation2

    Information Management

    Agenda

    What is Netezza?

    Value Proposition

    iClass Advanced Analytics

    Netezza Product Family

  • 2011 IBM Corporation3

    Information Management

    True Appliances

    Dedicated device

    Optimized for purpose

    Complete solution

    Fast installation

    Very easy operation

    Standard interfaces

    Low cost

  • 2011 IBM Corporation4

    Information Management

    A purpose built analytics engine

    Integrated database, server, data storage

    Standard interface

    Low cost of ownership

    Speed: 10-100x faster than traditional systems

    Simplicity: Minimal administration and tuning

    Scalability: Peta-scale user data capacity

    Smart: High-performance advanced analytics

    What is Netezza Appliance?

  • 2011 IBM Corporation5

    Information Management

    Operations

    Theres nothing to do its an appliance

    BI Developers & DBAs faster delivery times

    No configuration

    No physical modeling

    No indexes

    No tuning out of the box performance

    Data model agnostic

    ETL Developers

    No aggregate tables needed simpler ETL logic

    Faster load and transformation times

    Business Analysts

    Train of thought analysis 10 to 100x faster

    True ad hoc queries no tuning, no indexes

    Ask complex queries against large datasets

    Lower latency load & query simultaneously

    OnStream processing by 100s of nodes

    Performance, Value, Simplicity

  • 2011 IBM Corporation6

    Information Management

    Transactional &

    Collaborative

    Applications

    Content

    Structured

    Data

    AnalyzeIntegrate

    Govern

    Data

    Manage

    Streaming

    Information

    Business Analytic

    Applications

    Streams

    Big Data

    Data

    Warehouses

    External

    Informatio

    n Sources

    www

    Quality Lifecycle

    Management

    Security &Privacy

    Data

    Warehouse

    Appliances

    Master Data

    Netezza as part of IBM Information Management Portfolio

  • 2011 IBM Corporation7

    Information Management

    Sample Netezza client base

    Page 7

    Digital Media

    Financial

    Services

    Government

    Health & Life

    Sciences

    Retail /

    Consumer

    Products

    Telecom

    Other

  • 2011 IBM Corporation8

    Information Management

    88

    Speed Scalability

    SmartSimplicity

    IBM Netezza TwinFin Client Derive Value Across Industries

    1 PB on Netezza7 years of historical data100-200% annual data growth

    NYSE has replaced an Oracle IO

    relational database with a data

    warehousing appliance from Netezza,

    allowing it to conduct rapid searches

    of 650 terabytes of data.

    ComputerWeekly.com

    when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process

    - SVP Application Development, Nielsen

    15,000 users running 800,000+ queries per day 50X faster than before

  • 2011 IBM Corporation9

    Information Management

    Traditional Data Warehouse Complexity

  • 2011 IBM Corporation10

    Information Management

    Data Warehousing Simplified

  • 2011 IBM Corporation11

    Information Management

    Why Netezza over Conventional DW?

    Larger budget allocation for application & asset development Budget shift to strategic, value added activities More visibility within the organization Increased application services with better rates Reduced low end IT oriented services

    Why Netezza? Performance mattersOn Site POCs matter (with 1 DBA vs. 15 DBAs)TCO and ROI matterResults matterThe Partner asset matters most

    Typical Budget Outlay for BI Project

    Real $$ Saved

    Budget Allocation with Netezza architecture

    Application Admin Infrastructure

    Application Admin Infrastructure

  • 2011 IBM Corporation12

    Information Management

    Netezza Architecture in a Nutshell

    Advanced Analytics

    Advanced Analytics

    LoadersLoaders

    ETLETL

    BIBI

    Applications

    FPGA

    Memory

    CPU

    FPGA

    Memory

    CPU

    FPGA

    Memory

    CPU

    Hosts

    LiteHost

    (IBM xSeries,Red Hat Linux)

    Disk Enclosures S-Blades

    NetworkFabric

    Netezza Appliance

  • 2011 IBM Corporation13

    Information Management

    Data Stream Processing

    Page 13

    FPGA Core CPU Core

    Uncompress ProjectRestrictVisibility

    Complex Joins, Aggs, etc.

    NOTE: There are 96 Disk Drives, 96 FPGA Cores and 96 CPU Cores per rack

  • 2011 IBM Corporation14

    Information Management

    Inside the Netezza TwinFin Appliance

    High-performance database

    engine streaming joins,

    aggregations, sorts, etc.

    SQL Compiler

    Query Plan

    Optimize

    Admin

    Processor &

    streaming DB logic

    Slice of User Data

    Swap and Mirror partitions

    High speed data streaming

    SMP Hosts

    S-Blades(with FPGA-based Database Accelerator)

    Disk Enclosures

  • 2011 IBM Corporation15

    Information Management

    Extract / Load

    Data In / Out

    Data Out

    S

    Q

    L

    O

    D

    B

    C

    J

    D

    B

    C

    O

    L

    E

    -

    D

    B

    Netezza Integration with 3rd party Tools (ETL, BI)

    Analytics

    StandardInterfaces

  • 2011 IBM Corporation16

    Information Management

    Agenda

    What is Netezza?

    Value Proposition

    iClass Advanced Analytics

    Netezza Product Family

  • 2011 IBM Corporation17

    Information Management

    No dbspace/tablespace sizing and configuration

    No redo/physical log sizing and configuration

    No journaling/logical log sizing and configuration

    No page/block sizing and configuration for tables

    No extent sizing and configuration for tables

    No Temp space allocation and monitoring

    No RAID level decisions for dbspaces

    No logical volume creations of files

    No integration of OS kernel recommendations

    No maintenance of OS recommended patch levels

    No JAD sessions to configure host/network/storage

    No software to install

    One simple partitioning strategy: HASH

    Simple:: Performance Tuning Dramatically Simplified

    Instead of spending time and effort on tedious DBA tasks, I can use the time for higher BUSINESS VALUE tasks.

    For example, I can Bring on new applications and groups Quickly build out new data marts Provide more functionality to end users

    DBA

  • 2011 IBM Corporation18

    Information Management

    August 11 | Confidential

    Netezza Simplicity Example

    IndicesMore Tuning Modifications

    PRIMARY INDEX XPKFRT_BILL_EVNT ( Frt_Bill_Id )INDEX ( Proc_Dt )INDEX ( Ship_Dt )INDEX ( Pro_Nbr )INDEX ( Prnt_Cust_Id )INDEX ( Prnt_Cust_Id ,Expr_Dt )INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id )INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id ,Expr_Dt )INDEX ( Src_Data_Cd )INDEX XSI_PROC_VALUE ( Proc_Dt ) ORDER BY VALUES ( Proc_Dt )INDEX XSI_SHIP_VALUE ( Ship_Dt ) ORDER BY VALUES ( Ship_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Rjct_Rsn_Cd ,Expr_Dt )INDEX ( Frt_Bill_Id ,Seq_Nbr ,Eff_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Ship_Dt ,Prnt_Cust_Id )INDEX ( Cust_Id )INDEX ( Pymt_Dt ,Pymt_Crnc_Cd ,Chck_Nbr )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Carr_Id ,Prnt_Cust_Id ,Acct_Nbr ,Trnp_Mode_Cd ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id ,Chck_Nbr )INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Carr_Id ,Prnt_Cust_Id ,Chck_Nbr )INDEX ( Carr_Id ,SCAC ,Prnt_Cust_Id ,Plnt_NCS ,Crnc_Cd )INDEX ( Frt_Bill_Id ,Eff_Dt )INDEX XSI_CLT_INV_VALUE ( Clt_Inv_Dt ) ORDER BY VALUES ( Clt_Inv_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Clt_Inv_Dt ,Prnt_Cust_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Ship_Dt ,SCAC ,Trnp_Mode_Cd ,Inbd_Outb_Cd ,Expr_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rcv_Dt ,Clt_Inv_Dt ,Carr_Id ,Prnt_Cust_Id ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rjct_Rsn_Cd ,Seq_Nbr ,Eff_Dt ,Expr_Dt )INDEX ( BOL_Nbr )INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Prnt_Cust_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Carr_Id ,Prnt_Cust_Id ,Crnc_Cd )INDEX ( Cust_Id ,Plnt_NCS ,Div_Cd )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt )INDEX ( Wgt )INDEX ( Plnt_NCS )INDEX ( Acct_Nbr );

    Aggregation TablesTuning Considerations

    CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_WEEK ,NO FALLBACK ,

    NO BEFORE JOURNAL,NO AFTER JOURNAL(Ct_Frt_Bill_Id_wk INTEGER NOT NULL,

    Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,Proc_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Week',Adt_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Audit Week',Rcv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Week',Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Week,Clt_Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Week',Ship_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Ship Week',Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC;

    CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_MONTH ,NO FALLBACK ,

    NO BEFORE JOURNAL,NO AFTER JOURNAL(Ct_Frt_Bill_Id_Mth INTEGER NOT NULL,

    Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,Proc_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Month',Adt_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Audit Month',Rcv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Month',Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Month,Clt_Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Month',Ship_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Ship Month',

    Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,

    Legacy DDL Model ConsiderationsCreate Table - Logical Model

    CREATE SET TABLE TISTBL.FRT_BILL_EVNT ,NO FALLBACK ,

    NO BEFORE JOURNAL,NO AFTER JOURNAL(Frt_Bill_Id INTEGER NOT NULL,Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT

    CASESPECIFIC NOT NULL,Proc_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill

    Process Date',Adt_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill

    Audit Date',Rcv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill

    Receive Date',Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier

    Invoice Date',Clt_Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Client

    Invoice Date',Ship_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill

    Ship Date',Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT

    CASESPECIFIC,

    Netezza DDL

    Create Table - Logical Model

    CREATE TABLE frt_bill_evnt( Frt_Bill_Id INTEGER NOT NULL,

    Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) NOT NULL,Proc_Dt DATE,Adt_Dt DATE,Rcv_Dt DATE,Inv_Dt DATE,Clt_Inv_Dt DATE,Ship_Dt DATE,Rjct_Rsn_Cd CHAR(2),..Dlvy_Tm INTEGER,Actl_Cmdt_Ds CHAR(30),Carr_Rfrc_Nbr VARCHAR(25))distribute on hash ( Frt_Bill_Id );

    No indices and no tuning!

    Reduced

  • 2011 IBM Corporation19

    Information Management

    Simple:: Ramifications on TCO

    Telecom Call Detail Record FACT (6 billion rows)

    Oracle Object Count *

    Netezza Object Count

    Tables 1 1

    Indexes 12

    Table Partitions 47

    Index Partitions 564

    Table Partitions tablespaces 47

    Index Partitions tablespaces 47

    Table Data Files 170

    Index Data Files 122

    TOTAL 1,010 1

    Look at all the weeks/months worth of effort, DBA design and maintenance that we don't have with Netezza. The appliance claims are true.

    *: Oracle data does not account for ADDITIONAL effort required in configuring and engineering the file system design to accommodate this index management scheme.

  • 2011 IBM Corporation20

    Information Management

    Time of deployment reducedApproach: Load n Go for a unique sale

    Migrate data model (Netezza is model agnostic) Export legacy DDL to text file Keep current logical model intact and simple

    Keep table names and column definitions Remove all physical database design (extents, blocks, pages, materialized views, locks,

    indices) Users will keep consistent view and navigation of data

    Migrate data (500 GB to 1 TB /hour) Export legacy data to delimited ASCII file or pipe Load legacy data into the NPS system

    Provide secondary view of data Install Netezza ODBC/JDBC driver on users desktop

    Speed and Simplicity = New Analytics

  • 2011 IBM Corporation21

    Information Management

    Agenda

    What is Netezza?

    Value Proposition

    iClass Advanced Analytics

    Netezza Product Family

  • 2011 IBM Corporation22

    Information Management

    The Analytic Enterprise

    BI Reporting and Ad-Hoc Analysis

    What happened? When and where? How much?

    Predictive Analytics

    What will happen? What will the impact be?

    Optimization

    What is thebest choice?

  • 2011 IBM Corporation23

    Information Management

    Advanced Analytics Challenges

    Inefficient process

    Limited analytic complexity

    Inability to react to market conditions

    Expensive

    Time consuming

    Processing on limited and stale data

    Inability to experiment

  • 2011 IBM Corporation24

    Information Management

    Big Data Meets Big Math

    Analytics Without Constraints

  • 2011 IBM Corporation25

    Information Management

    Advanced Analytics the Traditional Way

    Fraud Detection

    Fraud Detection

    Demand Forecasting

    Demand Forecasting

    SAS

    R, S+

    AnalyticsGrid

    DataWarehouse

    C/C++, Java, Python, Fortran,

    Data

    SQL

    SQL

    ETL

    SQL

    ETL

    ETL

  • 2011 IBM Corporation26

    Information Management

    Advanced Analytics with TwinFin

    Fraud Detection

    Fraud Detection

    Demand Forecasting

    Demand Forecasting

    SAS

    R, S+

    AnalyticsGrid

    DataWarehouse

    Data

    SQL

    SQL

    ETL

    SQL

    ETL

    ETL

    C/C++, Java, Python, Fortran,

  • 2011 IBM Corporation27

    Information Management

    Advanced Analytics with TwinFin

    SAS

    R, S+

    SQL

    SQL

    Fraud Detection

    Fraud Detection

    Demand Forecasting

    Demand Forecasting

  • 2011 IBM Corporation28

    Information Management

    Advanced Analytics Framework

    TwinFin(i) Appliance

    Partner / Customer Analytics

    Partner / Customer Analytics

    Advanced Analytic Extensions Development & Modeling Tools

    Advanced Analytics Platform

    Data Warehouse Appliance Platform

    Applications

    Open Source Analytics(CRAN, GNU)

    Open Source Analytics(CRAN, GNU)

    Partner ADE or IDE

    Partner ADE or IDE

    R CLI/GUI R CLI/GUI EclipseEclipse

    UserDefined Extensions

    UserDefined Extensions

    AnalyticExecutables

    AnalyticExecutables

    nzMatrixnzMatrix C/C++C/C++ JavaJava PythonPython RR FortranFortran MapReduce/Hadoop

    MapReduce/Hadoop

    Open Language APIOpen Language APIMassively Parallel Analytic EnginesMassively Parallel Analytic Engines Open Framework APIOpen Framework API

    AMPP Architecture + Netezza Performance SoftwareAMPP Architecture + Netezza Performance Software

    Customer Analytics Application

    Customer Analytics Application

    Partner Analytics Application

    Partner Analytics Application

    Partner Visualization

    Partner Visualization

    Partner Data Integration

    Partner Data Integration

    Partner Business Intelligence

    Partner Business Intelligence

    Parallelized Analytics

    Parallelized Analytics

  • 2011 IBM Corporation29

    Information Management

    Agenda

    What is Netezza?

    Value Proposition

    iClass Advanced Analytics

    Netezza Product Family

  • 2011 IBM Corporation30

    Information Management

    Netezzas Market-Leading Evolution

    Worlds First

    Data Warehouse

    Appliance

    Worlds First

    100 TB Data

    Warehouse

    Appliance

    Worlds First

    Petabyte Data

    Warehouse

    Appliance

    Worlds First

    Analytic Data

    Warehouse

    Appliance

    NPS

    8000 Series

    TwinFin with i-Class

    Advanced Analytics(300X Performance)

    NPS

    10000 Series (50X Performance)

    TwinFin

    (150X Performance)

    2003 2006 2009 2010

    I

    m

    p

    a

    c

    t

    2011

  • 2011 IBM Corporation31

    Information Management

    The Evolving IBM Netezza Appliance Family

  • 2011 IBM Corporation32

    Information Management

    Appliance Family

    500 TB to 10+ PB

    Analytics on Massive Data

    Queryable Archiving

    High Capacity

    TBD

    Performance is 10X todays Netezza

    appliance

    Ultra Performance

    1 TB to 1.25 PB

    Data WarehouseHigh Performance

    Analytics

    NetezzaNetezza

    Development and Test System

    1 TB to 10 TB

    Just Arriv

    ed!Com

    ing Soon!

  • 2011 IBM Corporation33

    Information Management

    High Capacity Appliance

    Lowest price/TB in the market

    Scalability to more than 10 Petabytes

    Appliance simplicity

    SQL and analytics on massive data

    Fast data loading (5.5 TB/hour)

    Low environmental footprint

    Unparalleled Scalability and Value!

    NEW!

  • 2011 IBM Corporation34

    Information Management

    Netezza Models

    TF3 TF6 TF12 TF24 TF48 TF120

    Snippet Processors 24 48 96 192 384 960

    Capacity(TB)

    8 16 32 64 128 320

    Effective Capacity (TB)*

    32 64 128 256 512 1280

    .........

    1 10

    ... ...

    Predictable, Linear Scalability throughout entire family

    Capacity = User Data spaceEffective Capacity = User Data Space with compression

    *: 4X compression assumed

  • 2011 IBM Corporation35

    Information Management

    Replication: Vision and Direction

    Distributed global data warehouse

    Automated local high-availability & DR

    Concurrent query execution

    Workload balancing among nodes

    Currently in Phase 1: Async Replication for DR and Concurrent Access

  • 2011 IBM Corporation36

    Information Management

    Data Center

    NZ Appliance Family

    The Big Picture:: The Vision of a Seamless Platform

    Cloud Deployment

    Department/Region

    DataCapture

    Department/Region

    DataCapture

    Multisystem Data Integration / Federation

    Data Center

    NZ Appliance Family

    Data Governance

    Data Replication

    Multiple appliances specialized to its task

    Globally scalable grid

    Appears as a cloud to users

    Flexible and elastic

    Fast data loading and analysis

  • 2011 IBM Corporation37

    Information Management

    Bottom LineWhy Netezza?

    Provides What Users Want Opens Access to Detail and Historical Data

    Enables More/New Sophisticated ad hoc and complex queries

    Faster Response Times

    Best Performance

    10-100 times faster on the largest, most complex queries

    Lowest Price Significantly less expensive

    Reduced TCO Reduce maintenance, DBA and system admin expense Installs in hours not days/weeks/months

    Next Generation Analytic Applications Deployed Redirect Analytic resources to Applications from Infrastructure

  • 2011 IBM Corporation

    Information Management

    Questions?

    Information Management Technology Ecosystems

    Spring 2011