Petascale Analytics - The World of Big Data Requires Big Analytics

© 2011 IBM Corporation

Petascale AnalyticsThe World of Big Data Requires Big Analytics

October 2011

H. J. Schick

IBM Germany Research & Development GmbH

© 2011 IBM Corporation2

Source: The Evolution of Live in 60 Seconds


Quiz: What comes after zettabyte?

1 yottabyte = 1,000,000,000,000,000,000,000,000 bytes

© 2011 IBM Corporation8 Exp

eri

men

t:


Google’s Server Design

10

Source: cnet News, Google Uncloaks Once-Secret Server, April 2009


The Digital Universe is a Perpetual Tsunami

1. How will we find the information we need when we need it?

2. How will we know what information we need to keep, and how will we keep it?

3. How will we follow the growing number of government and industry rules about retaining records tracking transactions, ensuring information privacy?

4. How will we protect the information we need to protect?

» Solution:

New search and discovery tools Ways to add structure to unstructured data New storage and information management technique More compliance tools Better Security

11

© 2011 IBM Corporation13 13

Learning Systems(XXI Century)

Era of Natural PhilosophyEra of Modern

Science

Ind

ust

rial

Rev

olu

tio

nAstronomy (Babylon, 1900 BC)

Platonic Academy (387 BC)

Mathematics (India, 499 BC)

Scientific Revolution(1543 AD)

Newton’s Laws

(1687 AD)Relativity(1905 AD)

Quantum Physics

(1925 AD)

Computing(1946 AD)

DNA(1953 AD)

Evo

luti

on

of

Sci

ence

Time

The Evolution of Science


Algorithms and ApplicationsStatic programming

Archives

Structured Data and TextThe Calculating

Paradigm

People Hypothesize,Determine “what it means”,Run other applications…

Today’s Systems – The Calculating Paradigm


Future Systems – The Learning Paradigm

Training and Learning EnginesTo Build Models and Define Insight

Hypothesis EnginesTo Understand and Plan Actions

Policy EngineBusiness, Legal

and Ethical Rules

Verification Engines(e.g. Simulations)

Active Learning

(Natural Interfaces)

Outcome EngineActuation and Validation

Society Nature Institutions Archives


Up to 10,000 Times larger

Up to 10,000 times faster

Traditional Data Warehouse and Business Intelligence

Dat

a S

cale

yr mo wk day hr min sec … ms s

Exa

Peta

Tera

Giga

Mega

Kilo

Decision FrequencyOccasional Frequent Real-time

Data in Motion

Da

ta a

t R

es

t

New “Big Data” Brings New Opportunities, Requires New Analytics

Telco Promotions100,000 records/sec, 6B/day10 ms/decision270TB for Deep Analytics

DeepQA

100s GB for Deep Analytics 3 sec/decision

Smart Traffic250K GPS probes/sec630K segments/sec2 ms/decision, 4K vehicles

Homeland Security600,000 records/sec, 50B/day1-2 ms/decision320TB for Deep Analytics


Enabling Multiple Solutions & Appliances to Achieve a Smarter Planet

Peta2

AnalyticsAppliance

+ +

Reactive + DeepAnalytics Platform

Big Analytics Ecosystem

Peta2 Data-centricSystem

Algorithms

Big Data

Skills

DeepEyesWebcam Fusion

DeepCurrentPower Delivery

DeepSafetyPolice/Security

DeepTrafficArea Traffic Prediction

DeepFriendsSocial Network Monitor

DeepWaterWater management

DeepBasketFood Market Prediction

DeepBreathAir Quality Control

DeepPulsePolitical Polling

DeepResponseEmergency Coordination

DeepThunderLocal Weather Prediction

DeepSoilFarm Prediction


Statistical Ensembleof 600 to 800

Scoring Engines

~30 Machine Learning Models Weigh Scores, Produce

Confidence for Each Question 0<P<1

Hypothetical Question With Greatest Confidence is Chosen

Evidence-Based

Decision Support System

Evidence-Based

Decision Support System

S1S1

S2S2

S3S3

SNSN

. . .

StaticData

Corpus

StaticData

Corpus

Answer: A large country in the Western Hemisphere whose capital has a similar name.

Hypothesis Generated from “Answer”Guess Questions Q1, Q2 … Qi

Hypothesis Generated from “Answer”Guess Questions Q1, Q2 … Qi

Question: What is Brazil?

Element Refresh Time

DataCorpus

2 Weeks

Hypothesis Engines

Weeks toMonths

Scoring Engines

Weeks toMonths

Decision Support Engine

4 Days

Watson Today: Processes Unstructured Text & 200 Hypothesis/3 seconds

Watson

3,000 cores;100 TFlops2 TB memory

~ 200 KW


Exascale Research and DevelopmentS

ource: Exascale R

esearch and Developm

ent – Request for Inform

ation, July 2011


Big Data Systems Require a Data-centric Architecture for Performance

Data lives on disk and tapeMove data to CPU as neededDeep Storage Hierarchy

Data lives in persistent memoryMany CPU’s surround and useShallow/Flat Storage Hierarchy

Old Compute-centric Model New Data-centric Model

Massive ParallelismPersistent Memory

Largest change in system architecture since the System 360 Huge impact on hardware, systems software, and application design

Flash Phase Change

Manycore FPGA

inputoutput


Scale-in is the New Systems Battlefield

FLASH SSD

3D ChipsFPGAManycore BPRAM/SCM

Interconnect In-mem DB DAS

Scale-inMaximize system density

Minimize end-to-end latency

Sys

tem

Cap

acit

y (c

apab

ility

)

Sin

gle

De

vice

De

vice

Clu

ste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Scale-in

Exascale

Peta2

Low Med High Extreme

System Density (1/Latency end-to-end)

Device ClustersSingle Device

Low Med High

Physical Limits

Sca

le-o

ut

NASBlade Server

Scale-outMaximize system capacity

Terabyte HDDPOWER 7

Scale-upMaximize device capacity

AtomTransistor

AtomStorage

Scale-downMaximize feature density

Cloud Computing


HDD cost advantage continues, 1/10 SCM cost, but

SCM dominates in performance, 10,000x faster than HDD

Storage Class Memory - The Tipping Point for Data-centric Systems

Relative Cost

Relative Latency

DRAM 100 1

SCM 1 10

FLASH 15 1000

HDD 0.1 100000

Source: Chung Lam, IBM

FLASH

(Phase Change)

SCM in 2015$0.05 per GB$50K per PB$0.10 / GB

$0.01 / GB

23


Background: 3 Styles of Massively Parallel Systems

Data in Motion: High Velocity Mixed Variety High Volume*

(*over time)

SPL, C, Java

Data at Rest*: High Volume Mixed Variety Low Velocity

Deep AnalyticsExtreme Scale-out

(*pre-partitioned)

Simulation (BlueGene)

Generative Modeling Extreme Physics

C/C++, Fortran, MPI, OpenMP

= compute node

Reactive Analytics Extreme Ingestion

Streaming (Streams)

Long Running Small InputMassive Output

Hadoop/MapReduce (BigInsights)JAQL, Java

Reducers

Mappers

Input Data (on disk)

Output Data


Fault-tolerant Hadoop Distributed File System (HDFS)

26

Source: Hadoop Overview, http://www.cloudera.com


Map Reduce Logical Data Flow

27

Source: O’Reilly, Hadoop – The Definitive Guide

0067011990999991950051507004...9999999N9+00001+99999999999...0043011990999991950051512004...9999999N9+00221+99999999999...0043011990999991950051518004...9999999N9-00111+99999999999...0043012650999991949032412004...0500001N9+01111+99999999999...0043012650999991949032418004...0500001N9+00781+99999999999...

1



28


(0, 0067011990999991950051507004...9999999N9+00001+99999999999...)(106, 0043011990999991950051512004...9999999N9+00221+99999999999...)(212, 0043011990999991950051518004...9999999N9-00111+99999999999...)(318, 0043012650999991949032412004...0500001N9+01111+99999999999...)(424, 0043012650999991949032418004...0500001N9+00781+99999999999...)

2



29


(1950, 0)(1950, 22)(1950, −11)(1949, 111)(1949, 78)

3



30


(1949, [111, 78])(1950, [0, 22, −11])

4



31


(1949, 111)(1950, 22)

5


The Blue Gene/Q ASIC

32

Sou

rce:

ED

N N

ews,

Hot

Chi

ps:

The

Puz

zle

of M

any

Cor

es


The Blue Gene/Q Packaging Hierarchy

33

Source: The Register, IBM’s Blue Gene/Q Super Chip Grows 18th Core


Opportunity: Blue Gene Active Storage

Flash Capacity 320 GB

I/O Bandwidth 1.5 GB/s

IOPS 207,000+

Nodes 512

Storage Cap 640 TB

I/O Bandwidth 768 GB/s

Random IOPS 100 Million

Compute 104 TF

Bi-Section BW 512 GB/s

51

2 B

GQ

Fla

sh C

ard

sBlue Gene/Q

Active Storage Rack

… scale it like BG/Q.

BG/Q + Flash Memory => Blue Gene Active Storage (BGAS)

BGAS Capabilities Per Rack

• 104 TeraFLOPS – 512 nodes, 8196 cores -- 50% of Standard BG/Q System

• 512 GB/s Bi-Section Bandwidth - All-to-All Throughput of 2GB/s per Node

• 768 GB/s I/O bandwidth – 100TB Sort in ~330 sec (vs 10,000 sec today)

• 100 Million IOPS – Equivalent to order 1 Million Disks

Research and Development Challenges:

• Packaging: integrate Flash today, tomorrow PCM, Memristor, Racetrack, etc.

• System Software: Persistent Memory Management, k-v Store on BGAS

• Resilience: Single Path to Storage, BG/Q Network for General Workloads

• Integration: System Management, Middleware and Frameworks, Applications


NAND Flash Challenges

1. Need to erase before writing

2. Data retention errors

3. Limited number of writes

4. Management of initial and runtime bad blocks

5. Data errors cause by read and write disturb

» Factors that influence reliability, performance, write endurance:

• Use of Single Level Cell (SLC) and Multi Level Cell (MLC) NAND technology• Wear out mechanism that limits service life: Wear-leveling algorithm• Ensuring data integrity through bad block management techniques• Use of error detection and correction algorithms• Write amplification

35


Gartner’s Hype Cycle

36


Thank you very much for your attention.

Petascale Analytics - The World of Big Data Requires Big Analytics

Technology

Transcript of Petascale Analytics - The World of Big Data Requires Big Analytics