s002 Smart Analytics Power Tech

23
© 2010 IBM Corpora tion IBM Power Systems Technical University October 18–22, 2010 — Las Vegas, NV Session Title: IBM Smart Analytics System Speaker Name: Doug Mack Session: s002

Transcript of s002 Smart Analytics Power Tech

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 1/23

© 2010 IBM Corporation

IBM Power Systems Technical University

October 18–22, 2010 — Las Vegas, NV 

Session Title: IBM Smart Analytics System

Speaker Name: Doug MackSession: s002

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 2/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

FastFlexible

 Affordable

“Our prices are lower thanothers. Is this sustainable given

our costs, or a future threat?”

“…What is our risk 

this morning?”

“…Are we using our stimulus

funding effectively?”

“…Do we have productissues or fraudulent claims

from service?”

“…How & when should we adjust

plans to reduce churn & expand

share?”

“…Which treatments areineffective and should be

eliminated to lower costs?

Analytics are cross industry

4000+ GBS Consultants with deep industry knowledge

Industry Specific Warehouse Packs for quick starts

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 3/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

AIIM & Accenture Surveys, 2007

47% of users

don’t have

confidencein their information

42% of managers

use wrong 

information

at least once a week

59% of managers

miss

informationthey should have used

Unlocking the Business Value of Information

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 4/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

DFFTCA 3P 0DFRTBB 5ADFRTTB 5ADFMNTI 1ADFTG1B 1ADFTG2B 1ADFTG3B 1ADFTG4B 1ADFMNEE 25ADFMNEF 11P 2DFRERP 11P 2DFWELF 11P 2DFWILF 11P 2DFWILR 11P 2DFWILS 11P 2DFWILT 11P 2DFQI1W 5A

DFQ2IW 3ADFTRES 10ADFYT1LL 45ADFYT1LO 12ADFYT1LR 12ADFRRWA 5ADF6TYHA 1ADFTIIPQ 1P 0DFDRTF 6P 0DFDRTG 6P 0DFDRTH 6P 0DFTPPL 1P 0DFTINM 3P 0DFTIR2 30ADFTIGL 12ADFTTDT 6P 0DFTTED 6P 0DFHHIJ 4P 2DFHHIK 4P 2DFTYHI 5P 2DFTYIA 1ADFTYKN 1A

DFTTWK 1ADFTGHA 1ADFTGSS 2ADFTGPE 3ADFTGYI 5P 2

T00032P

DSFTCA 3P 0DSRTBB 5ADSRTTB 5ADSMNTI 1ADSVB1B 1ADSVB2B 1ADSYT1LO 50ADSYT1LR 12ADSRRWA 5ADS6TYHA 1ADSTIIPQ 3P 0DSDRTF 6P 0DSVBHA 1ADSVBSS 2ADSVBPE 3ADSVBYI 5P 2DSMNTI 25A

DSVR2B 25ADSVR3B 25ADSYT2WL 12ADSYTWLT 12ADSRRYUQ 6A

T01045P

KSFTCA 3P 0KSGSBB 5AKSGDMB 5AKSMARI 1AKSYT3LA 50AKSYT3LE 6P 0KSRRWA 5AKS6TYHA 1AKSTIIPQ 9P 0KSDGSF 6P 0KSVYHA 2AKSVFSS 2AKSVGTE 3P 0KSVUYI 5P 2KSMPTI 2AKSVR2B 2AKSVR3B 2A

KSYTBEL 10AKSYTPIT 10AKSRQAU1 5A

T01046P

AGFRCA 3P 0AGAC3EE 6P 0AGRRWA 5AAG6RYHA 1AAGR22PQ 9P 0AGDGSF 6P 0AGVYHA 14AAGVFSS 12AAGVGRE 3P 0AGVUY2 5P 2

AGMPR2 2AAGVR2B 2AAGVR3B 2AAGACBEE 1AAGACP2R 10AAGRQAU1 5AAGGSBB 1AAGGDMB 8AAGMAR2 1AAGAC3EA 50AAG6TTHA 1AAGRSAPQ 6P 0AGHISF 6P 0

R02126P

TLFTCA 3P 0TLRTBB 5ATLRTTB 5ATLTNT3 1ATLKB1B 1ATLKB2B 1ATLTNT3 25ATLKR2B 25ATLKR3B 25ATLPT2WL 12ATLPTWLT 12ATLRRPUQ 6A

T03140P

FPPTWLT 12AFPLLPUQ 6AFPFTCA 1P 0FPLTTB 5AFPTNTP 1AFPYB1B 1AFPTNTP 25AFPYL2B 1P 0FPYLPB 25A

T05001P

Getting INFORMATON out of operational databases can be problematic

Transaction Schema

Databases designed for TRANSACTION processing

Data elements designed by developers – not meaningful to analysts

Table/File relationships not well understood – need an expert

Transactions characterized by

 – Many users

 – Short CPU time required per use

 – Small number of records accessed/updated per transaction

Analytical Queries characterized by

 – Smaller number of users

 – Intensive resource utilization (I/O)

 – Large number of database records accessed

 – Data loads and queries occuring at some time

Data may not be consolidated

Dirty Data

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 5/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

The Foundation for Analytics includes:

A Data warehouse is a repository of an organization'selectronically stored data. Data warehouses are designed to

facilitate reporting and analysis. An expanded definition for datawarehousing includes tools to extract, transform, and load(ETL) data into the repository, and tools to manage and retrievemetadata.

Business Intelligence is the name for the ability to act oninformation that may come from a data warehouse.

Data mining is a business process based on advancedtechnology, which finds unknown and complex relationships indata, producing insight into business issues and predictions to

improve business decisions.

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 6/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

IBM Smart Analytics System Schematic

InfoSphere

Warehouse

CubingCubing

ServicesServices

Cognos 8 BI

ETLETL

Operational Source SystemsStructured/ Unstructured Data

Data WarehouseData Warehouse

Data/Text Mining

ETL – Extract, Transform, and Load

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 7/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Data Warehouse Schema

INVOICE_NUMBER 7P 0

INVOICE_LINE_NUMBER 3P 0PRODUCT_NUMBER 5P 0

CUSTOMER_NUMBER 10ASELLING_COMPANY 5A

SUPPLY_WAREHOUSE 5A

QUANTITY_ORDERED 11P 0QUANTITY_SHIPPED 11P 0

TOTAL_DISCOUNT 9P 2

NET_PRICE 9P 2BASE_PRICE 9P 2

UNIT_COST 9P 2EXTENDED_COST 11P 2

EXTENDED_PRICE 11P 2MARGIN 11P 2

SALES_REP 5A

COMMISSION_VALUE 7P 2INVOICE_DATE DATE

SHIP_DATE DATEDELIVERY_DATE DATE

INVOICE_TIME TIMEMONTH_NUMBER 2P 0WEEK_NUMBER 2P 0

LOAD_DATE (DATE)

INVOICE_LINES

CUSTOMER_NUMBER 10ACUSTOMER_NAME 35A

  ADDRESS_LINE_1 35A  ADDRESS_LINE_2 35ACITY 35A

STATE_CODE 2A

ZIP_CODE 10ACONTACT_NAME 35ATELEPHONE 15A

SALES_REP_DEFAULT 5ACUSTOMER_CATEGORY 5A

CUSTOMER_CLASS 5A

REGION_CODE 5ALOAD_DATE DATE

LAST_CHANGE_TIME TMSTPSTATUS_FLAG 1A

CUSTOMERSPRODUCT_NUMBER 5P 0PRODUCT_DESCRIPTION 42A

BRAND_CODE 5ABRAND_DESCRIPTION 20A

ORIGIN_CODE 5A

ORIGIN_DESCRIPTION 20AFAMILY_CODE 5A

FAMILY_DESCRIPTION 20ACOST 9P 2

BASE_PRICE 9P 2

PRODUCT_WEIGHT 9P 4PRODUCT_VOLUME 9P 4

LOAD_DATE DATELAST_CHANGE_TIME TSTP

STATUS_FLAG 1A

PRODUCTS

Only includes thecolumns we care

about

Dates are truedate columns

Meaningfultable and

column names

De-normalized design

reduced to only 3tables

Complexcalculationsalready done

Very easy to work with

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 8/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Traditional approach to Data Warehousing and Business Intelligence

Evaluate Business Intelligence Tools from several vendors

Evaluate Database Management Systems from several vendors

Evaluate Extract/Transform/Load (ETL) tools from several vendors

Select Vendors (come to contract terms)

Size System

Develop Systems Integration Plan

Develop Operational Skills/Procedures

Build Multi-vendor Support Structure

Install, Configure, Tune

Tune again and again and again

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 9/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Compared to 7600

4x more cores per module

50% less space and energy requirements

2x storage capacity per data module

NEW: IBM Smart Analytics System 7700 

Complete End to End Analytical Solution shipped in a manner of weeksversus months - reducing risk and improving time to value. Completely

integrated solution based on powerful warehouse infrastructure with a singlepoint of support

Start small and grow big with proven, flexible modular design that preservesyour investment and maintains optimized design as you grow

Exploit the latest POWER7 architecture

 – Optimized for Analytics with Massively Parallel Processing design

 – New IBM System Storage with Solid State Drive (SSD) options

 – InfoSphere Warehouse 9.7 adds Oracle compatibility

1 Performance per data module, and cost comparing equivalently sized systems in Raw Data Terabytes

2x Performance at ½ the cost1

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 10/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Cognos Module

2x performance through optimization at build time

26-40% performance improvement over POWER6

High availability for BI built into system

POWER7 Cognos Module and Control Console

Optional Cognos module is refreshed with Power 740 8 core servers improving performanceand supporting higher number of concurrent users per module. Design of the Cognosmodule includes an active-active, workload balancing architecture that is preconfigured andoptimized by IBM before shipping

The 7700 control console provides another layer above the hardware and software that

allows administration over the system as a whole, rather than each component individually.This console allows the user to manage and maintain software for coordinated stackupdates (operating system, driver, firmware, and other components).

1 Based on complexity of queries comparing on a core to core basis

20-40% Performance Improvements1

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 11/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Faster Results Less Risk 

June Jan.

5

4

3

2

Jan.

Build “from scratch”  Pre-Built

Pre-implementationSystem Sizing

AcquireComponents

Installation &

Configuration

Testing & Validation

Months versus weeks

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 12/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Data Warehouse Software InfoSphere Warehouse

InfoSphere Warehouse Advanced WorkloadManagement

Tivoli System Automation

Analytics Software Options Business Intelligence Module (Cognos 8 BI) InfoSphere Warehouse Cubing Services InfoSphere Warehouse Text Analytics & Data Mining

Hardware/OS IBM Power 740 Servers (16 core modules)

IBM System Storage Designed for the IBM SmartAnalytics System including SSD acceleration AIX 6.1

IBM Smart Analytics System 7700What’s in the box? 

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 13/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

IBM Smart Analytics System 7700Transparent Modular Architecture

FoundationModule

DataModule

User Module

Failover Module

+

+

1 Module 1 to x Nodes 0 to y Nodes 0 or (x+y+1)/5 Nodes

Foundation Structure

Add-On Modules

AnalyticsModule

BIModule

3rd PartyModules

WarehousePacks...

Modular design

+ SSD

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 14/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Scaling out to support more data or more usersShared Nothing, Massively Parallel Design

Foundation Module

User Module

Data Module

CPUCPU

MEM

CPUCPU

Data Module 4

CPUCPU

MEM

CPUCPU

Data Module 5

CPUCPU

MEM

CPUCPU

CPUCPU

MEM

CPUCPU CPUCPU

MEM

CPUCPU CPUCPU

MEM

CPUCPU

Data Module 1 Data Module 2 Data Module 3

Foundation Module

CPUCPU

MEM

CPUCPU

User Module 2

CPUCPU

MEM

CPUCPU

User Module 3

Legend

Expand by addingadditional user or 

data modules

Balanced system design

 – System modules with optimal processor, memory, and I/O specifications

Scale-out by adding additional system modules

 – Which always include balanced I/O

Proven “best practice“ for large scale data warehousing

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 15/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

InfoSphere Warehouse is powered by DB2

DB2 offers many unique, industry leading capabilities that are

advantageous to data warehousing environments – “Shared Nothing” Architecture

 – Advanced Cost Based Parallel Query Optimizer 

 – Flexible Partitioning

 – Multi-dimensional Clustering (MDC) – Materialized Query Tables (MQT)

 – Industry leading compression

 – Workload management

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 16/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

InfoSphere

Warehouse

Cubing Services is a multidimensional analysis server that enables OLAP applications to access to largedata volumes stored inside a DB2 database

Benefits

Empowers users with ad hoc access to business information.

 – What is the profitability for Product A across the Branches X,Y,Z?

Speed of thought access to OLAP data managed by DB2

OLAP and SQL shared access to the same information

Single point of management, maintenance, and performance tuning

Accessible via Cognos 8 BI, Microsoft Excel, Alphablox, IBM DataQuant, andCubeware Cockpit

Online Analytical Processing (OLAP)Cubing Services

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 17/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Data Mining & Visualization

InfoSphere Warehouse provides real time datamining embedded in your warehouse, accessible

from any application supporting SQL. Benefits

 – Discover information on customer and product associations andbehaviors captured by your application

Examples

• Market basket analysis• Customer segmentation

• Click stream analysis

• Disease and treatment analysis

• Fraud detection

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 18/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Modeling & DesignInfoSphere Warehouse Design Studio

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 19/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Embedded Data Movement & Transformation

The SQL Warehousing Tool (SQW)

Easy to Use & Package

Graphically build complex transformations within DB2

Advanced Workflow Control and Scheduling

Build your Warehouse Source to Target as one Application

Integration

Automate Text Analytics & Data Mining workloads

Ability to Natively source data from non-DB2 RDBMS

Scale up and integrate with Information Server/DataStage

Compliance

Ability to add version management

Job Monitoring

InfoSphere Warehouse provides extract, load and transform (ELT) capabilities based on theDB2 server engine. SQL Warehousing Tool application are developed in a fully integratedgraphical interface within the InfoSphere Design studio.

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 20/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Oracle/Sun Database Machine (Exadata)

Exadata positioned as “it can do it all” – transaction processing and data warehousing –contrary to IBM’s “workload optimized systems”

IBM Internal Use Only

Designed with Oracle RAC shared disk clusters

Provides storage servers to Oracle database

 – Linux on x86 running Oracle Enterprise Linux

For every Oracle database server there are twoadditional “Exadata” servers to perform the I/O,each with 12 disk drives and new Exadata software

 –  You have to license each disk drive in theExadata servers @ $10,000 each for a total of $1,680,000 for a Full Rack plus 22% annualmaintenance

3

2

¼ Rack

84Database Servers

147Exadata Storage Servers

Full Rack½ RackOracle Database Machine

10

11

12

1 2 3 4 5 6 7 8 9 10 11 12Data Base Nodes in Cluster 

   E   f   f  e  c   t   i  v  e

   N  o   d  e  s

Oracle RACScalability

1

2

3

45

6

7

8

9

Perfect Linear 

PerformanceEffectiveNodes

1.692.44

Productive Resources

Wasted Resources

Oracle RAC characteristics as shown in Dell RAC InfiniBand Studyhttp://www.dell.com/downloads/global/power/ps2q07-20070279-Mahmood.pdf 

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 21/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

Smart Analytics Systems Compared to Exadata

1 Data Volumes represent uncompressed user space

2 Oracle prices include RAC, Partitioning Option, Advanced Compression, Tuning and Diagnostic packs and includes 3 years maintenance and support

3 IBM prices based on preliminary 7700 list pricing and include 3 years SW and HW maintenance and support.

4 http://imcomp.torolab.ibm.com/wiki/images/f/f2/FullSolitaire2008report.pdf 

$3.67M31 data module (12TB)½ Rack (14TB1) $5.15M2

$5.07M32 data modules(24TB)

Full Rack (28TB) $10.13M

ListSmart Analytics 7700Oracle Database Machine

Oracle prices do NOT include

 – Installation and migration services

 – Additional software, servers and storage required to add analytical applications thatare NOT included with Exadata

• No OLAP, Data or Text Mining

DB2 Compression could increase usable data space 3-5x

 – DB2 Compression Option is included in the price of IBM Smart Analytics DB2 requires 60% of the staffing of Oracle

 – Study of 400 Power Systems clients4

50% LESS

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 22/23

© 2010 IBM Corporation

IBM Power Systems Technical University — Las Vegas, NV

8/3/2019 s002 Smart Analytics Power Tech

http://slidepdf.com/reader/full/s002-smart-analytics-power-tech 23/23