02 IBM Netezza Overview
-
Upload
lokesh-ceeba -
Category
Documents
-
view
145 -
download
0
description
Transcript of 02 IBM Netezza Overview
-
2011 IBM Corporation
Information Management
IBM Netezza Overview
Information Management Technology Ecosystems
Spring 2011
-
2011 IBM Corporation2
Information Management
Agenda
What is Netezza?
Value Proposition
iClass Advanced Analytics
Netezza Product Family
-
2011 IBM Corporation3
Information Management
True Appliances
Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost
-
2011 IBM Corporation4
Information Management
A purpose built analytics engine
Integrated database, server, data storage
Standard interface
Low cost of ownership
Speed: 10-100x faster than traditional systems
Simplicity: Minimal administration and tuning
Scalability: Peta-scale user data capacity
Smart: High-performance advanced analytics
What is Netezza Appliance?
-
2011 IBM Corporation5
Information Management
Operations
Theres nothing to do its an appliance
BI Developers & DBAs faster delivery times
No configuration
No physical modeling
No indexes
No tuning out of the box performance
Data model agnostic
ETL Developers
No aggregate tables needed simpler ETL logic
Faster load and transformation times
Business Analysts
Train of thought analysis 10 to 100x faster
True ad hoc queries no tuning, no indexes
Ask complex queries against large datasets
Lower latency load & query simultaneously
OnStream processing by 100s of nodes
Performance, Value, Simplicity
-
2011 IBM Corporation6
Information Management
Transactional &
Collaborative
Applications
Content
Structured
Data
AnalyzeIntegrate
Govern
Data
Manage
Streaming
Information
Business Analytic
Applications
Streams
Big Data
Data
Warehouses
External
Informatio
n Sources
www
Quality Lifecycle
Management
Security &Privacy
Data
Warehouse
Appliances
Master Data
Netezza as part of IBM Information Management Portfolio
-
2011 IBM Corporation7
Information Management
Sample Netezza client base
Page 7
Digital Media
Financial
Services
Government
Health & Life
Sciences
Retail /
Consumer
Products
Telecom
Other
-
2011 IBM Corporation8
Information Management
88
Speed Scalability
SmartSimplicity
IBM Netezza TwinFin Client Derive Value Across Industries
1 PB on Netezza7 years of historical data100-200% annual data growth
NYSE has replaced an Oracle IO
relational database with a data
warehousing appliance from Netezza,
allowing it to conduct rapid searches
of 650 terabytes of data.
ComputerWeekly.com
when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process
- SVP Application Development, Nielsen
15,000 users running 800,000+ queries per day 50X faster than before
-
2011 IBM Corporation9
Information Management
Traditional Data Warehouse Complexity
-
2011 IBM Corporation10
Information Management
Data Warehousing Simplified
-
2011 IBM Corporation11
Information Management
Why Netezza over Conventional DW?
Larger budget allocation for application & asset development Budget shift to strategic, value added activities More visibility within the organization Increased application services with better rates Reduced low end IT oriented services
Why Netezza? Performance mattersOn Site POCs matter (with 1 DBA vs. 15 DBAs)TCO and ROI matterResults matterThe Partner asset matters most
Typical Budget Outlay for BI Project
Real $$ Saved
Budget Allocation with Netezza architecture
Application Admin Infrastructure
Application Admin Infrastructure
-
2011 IBM Corporation12
Information Management
Netezza Architecture in a Nutshell
Advanced Analytics
Advanced Analytics
LoadersLoaders
ETLETL
BIBI
Applications
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
Hosts
LiteHost
(IBM xSeries,Red Hat Linux)
Disk Enclosures S-Blades
NetworkFabric
Netezza Appliance
-
2011 IBM Corporation13
Information Management
Data Stream Processing
Page 13
FPGA Core CPU Core
Uncompress ProjectRestrictVisibility
Complex Joins, Aggs, etc.
NOTE: There are 96 Disk Drives, 96 FPGA Cores and 96 CPU Cores per rack
-
2011 IBM Corporation14
Information Management
Inside the Netezza TwinFin Appliance
High-performance database
engine streaming joins,
aggregations, sorts, etc.
SQL Compiler
Query Plan
Optimize
Admin
Processor &
streaming DB logic
Slice of User Data
Swap and Mirror partitions
High speed data streaming
SMP Hosts
S-Blades(with FPGA-based Database Accelerator)
Disk Enclosures
-
2011 IBM Corporation15
Information Management
Extract / Load
Data In / Out
Data Out
S
Q
L
O
D
B
C
J
D
B
C
O
L
E
-
D
B
Netezza Integration with 3rd party Tools (ETL, BI)
Analytics
StandardInterfaces
-
2011 IBM Corporation16
Information Management
Agenda
What is Netezza?
Value Proposition
iClass Advanced Analytics
Netezza Product Family
-
2011 IBM Corporation17
Information Management
No dbspace/tablespace sizing and configuration
No redo/physical log sizing and configuration
No journaling/logical log sizing and configuration
No page/block sizing and configuration for tables
No extent sizing and configuration for tables
No Temp space allocation and monitoring
No RAID level decisions for dbspaces
No logical volume creations of files
No integration of OS kernel recommendations
No maintenance of OS recommended patch levels
No JAD sessions to configure host/network/storage
No software to install
One simple partitioning strategy: HASH
Simple:: Performance Tuning Dramatically Simplified
Instead of spending time and effort on tedious DBA tasks, I can use the time for higher BUSINESS VALUE tasks.
For example, I can Bring on new applications and groups Quickly build out new data marts Provide more functionality to end users
DBA
-
2011 IBM Corporation18
Information Management
August 11 | Confidential
Netezza Simplicity Example
IndicesMore Tuning Modifications
PRIMARY INDEX XPKFRT_BILL_EVNT ( Frt_Bill_Id )INDEX ( Proc_Dt )INDEX ( Ship_Dt )INDEX ( Pro_Nbr )INDEX ( Prnt_Cust_Id )INDEX ( Prnt_Cust_Id ,Expr_Dt )INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id )INDEX ( Evnt_Typ_Cd ,Prnt_Cust_Id ,Expr_Dt )INDEX ( Src_Data_Cd )INDEX XSI_PROC_VALUE ( Proc_Dt ) ORDER BY VALUES ( Proc_Dt )INDEX XSI_SHIP_VALUE ( Ship_Dt ) ORDER BY VALUES ( Ship_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Rjct_Rsn_Cd ,Expr_Dt )INDEX ( Frt_Bill_Id ,Seq_Nbr ,Eff_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Ship_Dt ,Prnt_Cust_Id )INDEX ( Cust_Id )INDEX ( Pymt_Dt ,Pymt_Crnc_Cd ,Chck_Nbr )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Carr_Id ,Prnt_Cust_Id ,Acct_Nbr ,Trnp_Mode_Cd ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id ,Chck_Nbr )INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Carr_Id ,Prnt_Cust_Id ,Chck_Nbr )INDEX ( Carr_Id ,SCAC ,Prnt_Cust_Id ,Plnt_NCS ,Crnc_Cd )INDEX ( Frt_Bill_Id ,Eff_Dt )INDEX XSI_CLT_INV_VALUE ( Clt_Inv_Dt ) ORDER BY VALUES ( Clt_Inv_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Clt_Inv_Dt ,Prnt_Cust_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Clt_Inv_Dt ,Ship_Dt ,SCAC ,Trnp_Mode_Cd ,Inbd_Outb_Cd ,Expr_Dt )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rcv_Dt ,Clt_Inv_Dt ,Carr_Id ,Prnt_Cust_Id ,Orig_Id ,Orig_Addr_Id ,Dest_Id ,Dest_Addr_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt ,Rjct_Rsn_Cd ,Seq_Nbr ,Eff_Dt ,Expr_Dt )INDEX ( BOL_Nbr )INDEX ( Evnt_Typ_Cd ,Proc_Dt ,Prnt_Cust_Id )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Carr_Id ,Prnt_Cust_Id ,Crnc_Cd )INDEX ( Cust_Id ,Plnt_NCS ,Div_Cd )INDEX ( Frt_Bill_Id ,Evnt_Typ_Cd ,Proc_Dt )INDEX ( Wgt )INDEX ( Plnt_NCS )INDEX ( Acct_Nbr );
Aggregation TablesTuning Considerations
CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_WEEK ,NO FALLBACK ,
NO BEFORE JOURNAL,NO AFTER JOURNAL(Ct_Frt_Bill_Id_wk INTEGER NOT NULL,
Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,Proc_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Week',Adt_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Audit Week',Rcv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Week',Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Week,Clt_Inv_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Week',Ship_Wk DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Ship Week',Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC;
CREATE SET TABLE AGG_TISTBL.FRT_BILL_EVNT_MONTH ,NO FALLBACK ,
NO BEFORE JOURNAL,NO AFTER JOURNAL(Ct_Frt_Bill_Id_Mth INTEGER NOT NULL,
Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,Proc_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Process Month',Adt_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Audit Month',Rcv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Receive Month',Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier Invoice Month,Clt_Inv_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Client Invoice Month',Ship_Mth DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill Ship Month',
Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,
Legacy DDL Model ConsiderationsCreate Table - Logical Model
CREATE SET TABLE TISTBL.FRT_BILL_EVNT ,NO FALLBACK ,
NO BEFORE JOURNAL,NO AFTER JOURNAL(Frt_Bill_Id INTEGER NOT NULL,Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) CHARACTER SET LATIN NOT
CASESPECIFIC NOT NULL,Proc_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill
Process Date',Adt_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill
Audit Date',Rcv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill
Receive Date',Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Carrier
Invoice Date',Clt_Inv_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Client
Invoice Date',Ship_Dt DATE FORMAT 'YYYY-MM-DD' TITLE 'Freight Bill
Ship Date',Rjct_Rsn_Cd CHAR(2) CHARACTER SET LATIN NOT
CASESPECIFIC,
Netezza DDL
Create Table - Logical Model
CREATE TABLE frt_bill_evnt( Frt_Bill_Id INTEGER NOT NULL,
Load_Id INTEGER,Evnt_Typ_Cd CHAR(3) NOT NULL,Proc_Dt DATE,Adt_Dt DATE,Rcv_Dt DATE,Inv_Dt DATE,Clt_Inv_Dt DATE,Ship_Dt DATE,Rjct_Rsn_Cd CHAR(2),..Dlvy_Tm INTEGER,Actl_Cmdt_Ds CHAR(30),Carr_Rfrc_Nbr VARCHAR(25))distribute on hash ( Frt_Bill_Id );
No indices and no tuning!
Reduced
-
2011 IBM Corporation19
Information Management
Simple:: Ramifications on TCO
Telecom Call Detail Record FACT (6 billion rows)
Oracle Object Count *
Netezza Object Count
Tables 1 1
Indexes 12
Table Partitions 47
Index Partitions 564
Table Partitions tablespaces 47
Index Partitions tablespaces 47
Table Data Files 170
Index Data Files 122
TOTAL 1,010 1
Look at all the weeks/months worth of effort, DBA design and maintenance that we don't have with Netezza. The appliance claims are true.
*: Oracle data does not account for ADDITIONAL effort required in configuring and engineering the file system design to accommodate this index management scheme.
-
2011 IBM Corporation20
Information Management
Time of deployment reducedApproach: Load n Go for a unique sale
Migrate data model (Netezza is model agnostic) Export legacy DDL to text file Keep current logical model intact and simple
Keep table names and column definitions Remove all physical database design (extents, blocks, pages, materialized views, locks,
indices) Users will keep consistent view and navigation of data
Migrate data (500 GB to 1 TB /hour) Export legacy data to delimited ASCII file or pipe Load legacy data into the NPS system
Provide secondary view of data Install Netezza ODBC/JDBC driver on users desktop
Speed and Simplicity = New Analytics
-
2011 IBM Corporation21
Information Management
Agenda
What is Netezza?
Value Proposition
iClass Advanced Analytics
Netezza Product Family
-
2011 IBM Corporation22
Information Management
The Analytic Enterprise
BI Reporting and Ad-Hoc Analysis
What happened? When and where? How much?
Predictive Analytics
What will happen? What will the impact be?
Optimization
What is thebest choice?
-
2011 IBM Corporation23
Information Management
Advanced Analytics Challenges
Inefficient process
Limited analytic complexity
Inability to react to market conditions
Expensive
Time consuming
Processing on limited and stale data
Inability to experiment
-
2011 IBM Corporation24
Information Management
Big Data Meets Big Math
Analytics Without Constraints
-
2011 IBM Corporation25
Information Management
Advanced Analytics the Traditional Way
Fraud Detection
Fraud Detection
Demand Forecasting
Demand Forecasting
SAS
R, S+
AnalyticsGrid
DataWarehouse
C/C++, Java, Python, Fortran,
Data
SQL
SQL
ETL
SQL
ETL
ETL
-
2011 IBM Corporation26
Information Management
Advanced Analytics with TwinFin
Fraud Detection
Fraud Detection
Demand Forecasting
Demand Forecasting
SAS
R, S+
AnalyticsGrid
DataWarehouse
Data
SQL
SQL
ETL
SQL
ETL
ETL
C/C++, Java, Python, Fortran,
-
2011 IBM Corporation27
Information Management
Advanced Analytics with TwinFin
SAS
R, S+
SQL
SQL
Fraud Detection
Fraud Detection
Demand Forecasting
Demand Forecasting
-
2011 IBM Corporation28
Information Management
Advanced Analytics Framework
TwinFin(i) Appliance
Partner / Customer Analytics
Partner / Customer Analytics
Advanced Analytic Extensions Development & Modeling Tools
Advanced Analytics Platform
Data Warehouse Appliance Platform
Applications
Open Source Analytics(CRAN, GNU)
Open Source Analytics(CRAN, GNU)
Partner ADE or IDE
Partner ADE or IDE
R CLI/GUI R CLI/GUI EclipseEclipse
UserDefined Extensions
UserDefined Extensions
AnalyticExecutables
AnalyticExecutables
nzMatrixnzMatrix C/C++C/C++ JavaJava PythonPython RR FortranFortran MapReduce/Hadoop
MapReduce/Hadoop
Open Language APIOpen Language APIMassively Parallel Analytic EnginesMassively Parallel Analytic Engines Open Framework APIOpen Framework API
AMPP Architecture + Netezza Performance SoftwareAMPP Architecture + Netezza Performance Software
Customer Analytics Application
Customer Analytics Application
Partner Analytics Application
Partner Analytics Application
Partner Visualization
Partner Visualization
Partner Data Integration
Partner Data Integration
Partner Business Intelligence
Partner Business Intelligence
Parallelized Analytics
Parallelized Analytics
-
2011 IBM Corporation29
Information Management
Agenda
What is Netezza?
Value Proposition
iClass Advanced Analytics
Netezza Product Family
-
2011 IBM Corporation30
Information Management
Netezzas Market-Leading Evolution
Worlds First
Data Warehouse
Appliance
Worlds First
100 TB Data
Warehouse
Appliance
Worlds First
Petabyte Data
Warehouse
Appliance
Worlds First
Analytic Data
Warehouse
Appliance
NPS
8000 Series
TwinFin with i-Class
Advanced Analytics(300X Performance)
NPS
10000 Series (50X Performance)
TwinFin
(150X Performance)
2003 2006 2009 2010
I
m
p
a
c
t
2011
-
2011 IBM Corporation31
Information Management
The Evolving IBM Netezza Appliance Family
-
2011 IBM Corporation32
Information Management
Appliance Family
500 TB to 10+ PB
Analytics on Massive Data
Queryable Archiving
High Capacity
TBD
Performance is 10X todays Netezza
appliance
Ultra Performance
1 TB to 1.25 PB
Data WarehouseHigh Performance
Analytics
NetezzaNetezza
Development and Test System
1 TB to 10 TB
Just Arriv
ed!Com
ing Soon!
-
2011 IBM Corporation33
Information Management
High Capacity Appliance
Lowest price/TB in the market
Scalability to more than 10 Petabytes
Appliance simplicity
SQL and analytics on massive data
Fast data loading (5.5 TB/hour)
Low environmental footprint
Unparalleled Scalability and Value!
NEW!
-
2011 IBM Corporation34
Information Management
Netezza Models
TF3 TF6 TF12 TF24 TF48 TF120
Snippet Processors 24 48 96 192 384 960
Capacity(TB)
8 16 32 64 128 320
Effective Capacity (TB)*
32 64 128 256 512 1280
.........
1 10
... ...
Predictable, Linear Scalability throughout entire family
Capacity = User Data spaceEffective Capacity = User Data Space with compression
*: 4X compression assumed
-
2011 IBM Corporation35
Information Management
Replication: Vision and Direction
Distributed global data warehouse
Automated local high-availability & DR
Concurrent query execution
Workload balancing among nodes
Currently in Phase 1: Async Replication for DR and Concurrent Access
-
2011 IBM Corporation36
Information Management
Data Center
NZ Appliance Family
The Big Picture:: The Vision of a Seamless Platform
Cloud Deployment
Department/Region
DataCapture
Department/Region
DataCapture
Multisystem Data Integration / Federation
Data Center
NZ Appliance Family
Data Governance
Data Replication
Multiple appliances specialized to its task
Globally scalable grid
Appears as a cloud to users
Flexible and elastic
Fast data loading and analysis
-
2011 IBM Corporation37
Information Management
Bottom LineWhy Netezza?
Provides What Users Want Opens Access to Detail and Historical Data
Enables More/New Sophisticated ad hoc and complex queries
Faster Response Times
Best Performance
10-100 times faster on the largest, most complex queries
Lowest Price Significantly less expensive
Reduced TCO Reduce maintenance, DBA and system admin expense Installs in hours not days/weeks/months
Next Generation Analytic Applications Deployed Redirect Analytic resources to Applications from Infrastructure
-
2011 IBM Corporation
Information Management
Questions?
Information Management Technology Ecosystems
Spring 2011