Intridea ajn-rttos OA NYC Summit

180
Anthony Nyström Fellow, Managing Director of Engineering Tuesday, June 18, 13

description

OANYC Summit

Transcript of Intridea ajn-rttos OA NYC Summit

Page 1: Intridea ajn-rttos OA NYC Summit

Anthony NyströmFellow, Managing Director of Engineering

Tuesday, June 18, 13

Page 2: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 3: Intridea ajn-rttos OA NYC Summit

What is Intridea?

Tuesday, June 18, 13

Page 4: Intridea ajn-rttos OA NYC Summit

What is Intridea?We design and develop apps: Web, Mobile and Data

Tuesday, June 18, 13

Page 5: Intridea ajn-rttos OA NYC Summit

What is Intridea?We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

Tuesday, June 18, 13

Page 6: Intridea ajn-rttos OA NYC Summit

What is Intridea?We work with cool clients – really!

We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

Tuesday, June 18, 13

Page 7: Intridea ajn-rttos OA NYC Summit

What is Intridea?

40+ Intrideans:Designers/Developers/Scientists+ Smart biz folks

We work with cool clients – really!

We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

Tuesday, June 18, 13

Page 8: Intridea ajn-rttos OA NYC Summit

What is Intridea?

40+ Intrideans:Designers/Developers/Scientists+ Smart biz folks

We work with cool clients – really!

We work from anywhere!

We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

Tuesday, June 18, 13

Page 9: Intridea ajn-rttos OA NYC Summit

What is Intridea?

40+ Intrideans:Designers/Developers/Scientists+ Smart biz folks

We work with cool clients – really!

We work from anywhere!

We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

We are growing

Tuesday, June 18, 13

Page 10: Intridea ajn-rttos OA NYC Summit

What is Intridea?

40+ Intrideans:Designers/Developers/Scientists+ Smart biz folks

We work with cool clients – really!

We work from anywhere!

We hire the best and the smartest

We design and develop apps: Web, Mobile and Data

Founded inWashington, DC

We are growing

Tuesday, June 18, 13

Page 11: Intridea ajn-rttos OA NYC Summit

Anthony NyströmFellow, Managing Director of Engineering

Tuesday, June 18, 13

Page 12: Intridea ajn-rttos OA NYC Summit

The guy on stage

Intridean:

Anthony NyströmFellow, Managing Director of Engineering

Tuesday, June 18, 13

Page 13: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 14: Intridea ajn-rttos OA NYC Summit

Data Science in the NOW!It takes an army of TOOLS

Tuesday, June 18, 13

Page 15: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 16: Intridea ajn-rttos OA NYC Summit

An Army of Tools you say?

Tuesday, June 18, 13

Page 17: Intridea ajn-rttos OA NYC Summit

An Army of Tools you say?

• I am going to talk about what NOW means in Data Science

• Databases, Streaming Engines, Query Engines and Interfaces

• We are going to look at many of them and single out a few

• Each has a respected and in some cases competing set of features

Tuesday, June 18, 13

Page 18: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 19: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Tuesday, June 18, 13

Page 20: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Actionable Intelligence & Knowledge

Tuesday, June 18, 13

Page 21: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Actionable Intelligence & Knowledge

NOW has innate context

Tuesday, June 18, 13

Page 22: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Actionable Intelligence & Knowledge

NOW has innate context

TIME is THE natural facet for our minds & life!

Tuesday, June 18, 13

Page 23: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 24: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Tuesday, June 18, 13

Page 25: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Tuesday, June 18, 13

Page 26: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Tuesday, June 18, 13

Page 27: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Pattern Extraction (ML/NLP)

Tuesday, June 18, 13

Page 28: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Pattern Extraction (ML/NLP)

Signature Extraction (Binary, Encoded)

Tuesday, June 18, 13

Page 29: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Pattern Extraction (ML/NLP)

Signature Extraction (Binary, Encoded)

Not user input data like Google, Yahoo etc.

Tuesday, June 18, 13

Page 30: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Pattern Extraction (ML/NLP)

Signature Extraction (Binary, Encoded)

Not user input data like Google, Yahoo etc.

“I am looking for data that conforms to a learned or known pattern”

Tuesday, June 18, 13

Page 31: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Trends | Patterns | Extraction

Data Centric Trends

Pattern Extraction (ML/NLP)

Signature Extraction (Binary, Encoded)

Not user input data like Google, Yahoo etc.

“I am looking for data that conforms to a learned or known pattern”

“I am looking for data that matches a predefined signature”

Tuesday, June 18, 13

Page 32: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 33: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Tuesday, June 18, 13

Page 34: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Routing | Transformation | Computation

Tuesday, June 18, 13

Page 35: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Routing | Transformation | Computation

Intelligent Routing

Tuesday, June 18, 13

Page 36: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Routing | Transformation | Computation

Transformation & Computation

Intelligent Routing

Tuesday, June 18, 13

Page 37: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Routing | Transformation | Computation

Transformation & Computation

Intelligent Routing“I need to replicate/fork that of criteria x portions of this data

stream”

Tuesday, June 18, 13

Page 38: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Routing | Transformation | Computation

Transformation & Computation

Intelligent Routing“I need to replicate/fork that of criteria x portions of this data

stream”

“I need to transform certain fields” or “I need to compute a some value on certain fields”

Tuesday, June 18, 13

Page 39: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 40: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Tuesday, June 18, 13

Page 41: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?Algorithmic Speciality

Tuesday, June 18, 13

Page 42: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Concepts

Algorithmic Speciality

Tuesday, June 18, 13

Page 43: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Regression

Concepts

Algorithmic Speciality

Tuesday, June 18, 13

Page 44: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Relationships

Regression

Concepts

Algorithmic Speciality

Tuesday, June 18, 13

Page 45: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Relationships

Regression

Concepts

Algorithmic Speciality

What does a value represent or infer (NLP/ML/k-NN)

Tuesday, June 18, 13

Page 46: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Relationships

Regression

Concepts

Algorithmic Speciality

What does a value represent or infer (NLP/ML/k-NN)

How is a value related to another value or How can we predict such relations

Tuesday, June 18, 13

Page 47: Intridea ajn-rttos OA NYC Summit

Why is NOW in data Special?

Relationships

Regression

Concepts

Algorithmic Speciality

What does a value represent or infer (NLP/ML/k-NN)

How is a value related to another value or How can we predict such relations

Topological, Ontological, Forest (Evolutionary/Random) (NLP)

Tuesday, June 18, 13

Page 48: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 49: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 50: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 51: Intridea ajn-rttos OA NYC Summit

Point of Sale System • Terminal• Admin• Tablet

Tuesday, June 18, 13

Page 52: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 53: Intridea ajn-rttos OA NYC Summit

Merck• RT Persona• RT Data• Browser

Tuesday, June 18, 13

Page 54: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 55: Intridea ajn-rttos OA NYC Summit

Where is NOW in data?

Tuesday, June 18, 13

Page 56: Intridea ajn-rttos OA NYC Summit

Where is NOW in data?

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 57: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 58: Intridea ajn-rttos OA NYC Summit

Latency

Tuesday, June 18, 13

Page 59: Intridea ajn-rttos OA NYC Summit

Latency

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 60: Intridea ajn-rttos OA NYC Summit

Latency

Standard - NOPE!

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 61: Intridea ajn-rttos OA NYC Summit

Latency

Standard - NOPE!

Depends upon the Medium - YEP!

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 62: Intridea ajn-rttos OA NYC Summit

Latency

Standard - NOPE!

Depends upon the Consumer - YEP!

Depends upon the Medium - YEP!

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 63: Intridea ajn-rttos OA NYC Summit

Latency

Standard - NOPE!

Depends upon the Consumer - YEP!

Depends upon the Medium - YEP!

Depends upon Technology - YEP!

Data Creation Time | Data Consumption Time

Tuesday, June 18, 13

Page 64: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 65: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Tuesday, June 18, 13

Page 66: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Tuesday, June 18, 13

Page 67: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Near Real-Time

Tuesday, June 18, 13

Page 68: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Some-Time

Near Real-Time

Tuesday, June 18, 13

Page 69: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Some-Time

Data that is consumed immediately after creation

Near Real-Time

Tuesday, June 18, 13

Page 70: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Some-Time

Data is consumed within seconds/minutes

Data that is consumed immediately after creation

Near Real-Time

Tuesday, June 18, 13

Page 71: Intridea ajn-rttos OA NYC Summit

NOW and Latency

Real-Time

Some-TimeData is consumed when requested & is NOT RT nor NRT

Data is consumed within seconds/minutes

Data that is consumed immediately after creation

Near Real-Time

Tuesday, June 18, 13

Page 72: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 73: Intridea ajn-rttos OA NYC Summit

Physiological Latency

Tuesday, June 18, 13

Page 74: Intridea ajn-rttos OA NYC Summit

Perception:

Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77

Physiological Latency

Tuesday, June 18, 13

Page 75: Intridea ajn-rttos OA NYC Summit

Perception:

Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77

Stock Exchange ~ 5-100 milliseconds (ms)

Physiological Latency

Tuesday, June 18, 13

Page 76: Intridea ajn-rttos OA NYC Summit

Web Sites ~ 50-400 milliseconds (ms)

Perception:

Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77

Stock Exchange ~ 5-100 milliseconds (ms)

Physiological Latency

Tuesday, June 18, 13

Page 77: Intridea ajn-rttos OA NYC Summit

Web Sites ~ 50-400 milliseconds (ms)

Perception:

Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77

Games (FPS) ~ 10-150 milliseconds (ms)

Stock Exchange ~ 5-100 milliseconds (ms)

Physiological Latency

Tuesday, June 18, 13

Page 78: Intridea ajn-rttos OA NYC Summit

Web Sites ~ 50-400 milliseconds (ms)

Perception:

Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77

Games (FPS) ~ 10-150 milliseconds (ms)

Social/Games ~ 200 ms -1 second

Stock Exchange ~ 5-100 milliseconds (ms)

Physiological Latency

Tuesday, June 18, 13

Page 79: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 80: Intridea ajn-rttos OA NYC Summit

Real-Time (DB’s, Index’s, FS’s)

Tuesday, June 18, 13

Page 81: Intridea ajn-rttos OA NYC Summit

Real-Time (DB’s, Index’s, FS’s)No particular order

Tuesday, June 18, 13

Page 82: Intridea ajn-rttos OA NYC Summit

Real-Time (DB’s, Index’s, FS’s)

• MySQL

No particular order

Tuesday, June 18, 13

Page 83: Intridea ajn-rttos OA NYC Summit

Real-Time (DB’s, Index’s, FS’s)

• MySQL

• SQL Server

No particular order

Tuesday, June 18, 13

Page 96: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 97: Intridea ajn-rttos OA NYC Summit

HBase

Tuesday, June 18, 13

Page 98: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS

Tuesday, June 18, 13

Page 99: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS

Scaling

Tuesday, June 18, 13

Page 100: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS

Hadoop

Scaling

Tuesday, June 18, 13

Page 101: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS“Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is

rather automatic

Hadoop

Scaling

Tuesday, June 18, 13

Page 102: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS“Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is

rather automatic

Hadoop

ScalingFault Tolerance

Commodity Machines

Tuesday, June 18, 13

Page 103: Intridea ajn-rttos OA NYC Summit

HBase

Regions and HDFS

Runs on top of HadoopMapReduce Integration

“Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is

rather automatic

Hadoop

ScalingFault Tolerance

Commodity Machines

Tuesday, June 18, 13

Page 104: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 105: Intridea ajn-rttos OA NYC Summit

Cassandra

Tuesday, June 18, 13

Page 106: Intridea ajn-rttos OA NYC Summit

Cassandra

Always Writable

Tuesday, June 18, 13

Page 107: Intridea ajn-rttos OA NYC Summit

Cassandra

Always Writable

Scaling

Tuesday, June 18, 13

Page 108: Intridea ajn-rttos OA NYC Summit

Cassandra

Always Writable

More...

Scaling

Tuesday, June 18, 13

Page 109: Intridea ajn-rttos OA NYC Summit

Cassandra

Always WritableEven when internally the write fails. However, the data will eventually

become consistent (Tunable)

More...

Scaling

Tuesday, June 18, 13

Page 110: Intridea ajn-rttos OA NYC Summit

Cassandra

Always WritableEven when internally the write fails. However, the data will eventually

become consistent (Tunable)

More...

ScalingCan span data centers

Peer-to-Peer communication between nodes (Gossip)

Tuesday, June 18, 13

Page 111: Intridea ajn-rttos OA NYC Summit

Cassandra

Always Writable

Supports MapReduceSupports Range Queries

Even when internally the write fails. However, the data will eventually become consistent (Tunable)

More...

ScalingCan span data centers

Peer-to-Peer communication between nodes (Gossip)

Tuesday, June 18, 13

Page 112: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 113: Intridea ajn-rttos OA NYC Summit

Redis

Tuesday, June 18, 13

Page 114: Intridea ajn-rttos OA NYC Summit

Redis

Transactions

Tuesday, June 18, 13

Page 115: Intridea ajn-rttos OA NYC Summit

Redis

Transactions

An evolutionary Key-Value Store

Tuesday, June 18, 13

Page 116: Intridea ajn-rttos OA NYC Summit

Redis

Transactions

Pub-Sub

An evolutionary Key-Value Store

Tuesday, June 18, 13

Page 117: Intridea ajn-rttos OA NYC Summit

Redis

TransactionsAtomic operations (MULTI/EXEC/Discard) Queue your operations and

EXEC/Commit as transaction. Allows for Roll-back support.

Pub-Sub

An evolutionary Key-Value Store

Tuesday, June 18, 13

Page 118: Intridea ajn-rttos OA NYC Summit

Redis

TransactionsAtomic operations (MULTI/EXEC/Discard) Queue your operations and

EXEC/Commit as transaction. Allows for Roll-back support.

Pub-Sub

An evolutionary Key-Value StoreSupports complex types that are closely related to fundamental data

structures. No need for abstraction layer.

Tuesday, June 18, 13

Page 119: Intridea ajn-rttos OA NYC Summit

Redis

Transactions

Publish - Push messages to a channelSubscribe - Listen to a channel

Atomic operations (MULTI/EXEC/Discard) Queue your operations and EXEC/Commit as transaction. Allows for Roll-back support.

Pub-Sub

An evolutionary Key-Value StoreSupports complex types that are closely related to fundamental data

structures. No need for abstraction layer.

Tuesday, June 18, 13

Page 120: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 121: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-Time

Tuesday, June 18, 13

Page 122: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-TimeQueries and Streams

Tuesday, June 18, 13

Page 123: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-Time

• Storm

Queries and Streams

Tuesday, June 18, 13

Page 124: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-Time

• Storm

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 125: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 126: Intridea ajn-rttos OA NYC Summit

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 127: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 128: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop

• MapReduce v2 (YARN)

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 129: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop

• MapReduce v2 (YARN)

• Pig • Kafka

Queries and Streams

Tuesday, June 18, 13

Page 130: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop

• MapReduce v2 (YARN)

• Pig

• Hive

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 131: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop • Cascalog

• MapReduce v2 (YARN)

• Pig

• Hive

• Kafka

Queries and Streams

Tuesday, June 18, 13

Page 132: Intridea ajn-rttos OA NYC Summit

• MapReduce

Near Real-Time & Real-Time

• Storm

• Drill/Dremel

• Hadoop • Cascalog

• MapReduce v2 (YARN)

• Pig

• Hive

• Kafka

• DataTurbine

Queries and Streams

Tuesday, June 18, 13

Page 133: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 134: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Tuesday, June 18, 13

Page 135: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale

Tuesday, June 18, 13

Page 136: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale

Development

Tuesday, June 18, 13

Page 137: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale

Batch

Development

Tuesday, June 18, 13

Page 138: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale100’s to 1000’s of server nodes

Extreme and cheapSimple programming model

Batch

Development

Tuesday, June 18, 13

Page 139: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale100’s to 1000’s of server nodes

Extreme and cheapSimple programming model

Batch

DevelopmentJava, Python, Grep & Others...

Tuesday, June 18, 13

Page 140: Intridea ajn-rttos OA NYC Summit

MapReduce/Hadoop

Scale

Complex Multi-Step Processing

100’s to 1000’s of server nodesExtreme and cheap

Simple programming model

Batch

DevelopmentJava, Python, Grep & Others...

Tuesday, June 18, 13

Page 141: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 142: Intridea ajn-rttos OA NYC Summit

Storm

Tuesday, June 18, 13

Page 143: Intridea ajn-rttos OA NYC Summit

Storm

FAST

Tuesday, June 18, 13

Page 144: Intridea ajn-rttos OA NYC Summit

Storm

FAST

Integration

Tuesday, June 18, 13

Page 145: Intridea ajn-rttos OA NYC Summit

Storm

FAST

Assurance

Integration

Tuesday, June 18, 13

Page 146: Intridea ajn-rttos OA NYC Summit

Storm

FASTOver a million tuples processed per second per node

Assurance

Integration

Tuesday, June 18, 13

Page 147: Intridea ajn-rttos OA NYC Summit

Storm

FASTOver a million tuples processed per second per node

Assurance

IntegrationIntegrates with any queueing system and any database system

Handles the parallelization, partitioning, and retrying on failures when necessary

Tuesday, June 18, 13

Page 148: Intridea ajn-rttos OA NYC Summit

Storm

FAST

Scalable, Fault-Tolerant, Guarantees your data will be processed!

Over a million tuples processed per second per node

Assurance

IntegrationIntegrates with any queueing system and any database system

Handles the parallelization, partitioning, and retrying on failures when necessary

Tuesday, June 18, 13

Page 149: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 150: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

Tuesday, June 18, 13

Page 151: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

Languages

Tuesday, June 18, 13

Page 152: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

Languages

Scalable

Tuesday, June 18, 13

Page 153: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

Languages

SQL Idioms

Scalable

Tuesday, June 18, 13

Page 154: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

LanguagesHuman Readable

SQL Idioms

Scalable

Tuesday, June 18, 13

Page 155: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

LanguagesHuman Readable

SQL Idioms

ScalableSimultaneous n Queries upon both stream data and static

Tuesday, June 18, 13

Page 156: Intridea ajn-rttos OA NYC Summit

CQL/StreamQL/SparQL/QL-RTDB/

Languages

All support to a large degree what you would expect from SQL

Human Readable

SQL Idioms

ScalableSimultaneous n Queries upon both stream data and static

Tuesday, June 18, 13

Page 157: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 158: Intridea ajn-rttos OA NYC Summit

PIG

Tuesday, June 18, 13

Page 159: Intridea ajn-rttos OA NYC Summit

PIG

Language

Tuesday, June 18, 13

Page 160: Intridea ajn-rttos OA NYC Summit

PIG

Language

Parallelization

Tuesday, June 18, 13

Page 161: Intridea ajn-rttos OA NYC Summit

PIG

Language

Underneath

Parallelization

Tuesday, June 18, 13

Page 162: Intridea ajn-rttos OA NYC Summit

PIG

LanguageHigh Level and easy to understand (Pig Latin)

Underneath

Parallelization

Tuesday, June 18, 13

Page 163: Intridea ajn-rttos OA NYC Summit

PIG

LanguageHigh Level and easy to understand (Pig Latin)

Underneath

ParallelizationIt is trivial to achieve parallel execution of simple, "embarrassingly

parallel" data analysis tasks

Tuesday, June 18, 13

Page 164: Intridea ajn-rttos OA NYC Summit

PIG

Language

Essentially a MapReduce sequence compiler

High Level and easy to understand (Pig Latin)

Underneath

ParallelizationIt is trivial to achieve parallel execution of simple, "embarrassingly

parallel" data analysis tasks

Tuesday, June 18, 13

Page 165: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 166: Intridea ajn-rttos OA NYC Summit

PIG

Tuesday, June 18, 13

Page 167: Intridea ajn-rttos OA NYC Summit

PIGExample Pig Script

Tuesday, June 18, 13

Page 168: Intridea ajn-rttos OA NYC Summit

PIGExample Pig Script

Tuesday, June 18, 13

Page 169: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 170: Intridea ajn-rttos OA NYC Summit

PIG

Tuesday, June 18, 13

Page 171: Intridea ajn-rttos OA NYC Summit

PIGThat same example using MR Java code

Tuesday, June 18, 13

Page 172: Intridea ajn-rttos OA NYC Summit

Tuesday, June 18, 13

Page 173: Intridea ajn-rttos OA NYC Summit

The perfect Army!

Tuesday, June 18, 13

Page 174: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In Memory

Tuesday, June 18, 13

Page 175: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In Memory

Identify and Plan

Tuesday, June 18, 13

Page 176: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In Memory

Consumer

Identify and Plan

Tuesday, June 18, 13

Page 177: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In MemoryKeep as much as you can IN MEMORY! Think Redis...

Consumer

Identify and Plan

Tuesday, June 18, 13

Page 178: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In MemoryKeep as much as you can IN MEMORY! Think Redis...

Consumer

Identify and PlanWhat data can be batch processed and what can’t! Think

Hadoop and Storm (for stream) and HBase (for adhoc)

Tuesday, June 18, 13

Page 179: Intridea ajn-rttos OA NYC Summit

The perfect Army!

In Memory

Who is the data consumer? Person or Process? Think Pig or xQL’s for both!

Keep as much as you can IN MEMORY! Think Redis...

Consumer

Identify and PlanWhat data can be batch processed and what can’t! Think

Hadoop and Storm (for stream) and HBase (for adhoc)

Tuesday, June 18, 13

Page 180: Intridea ajn-rttos OA NYC Summit

www.intridea.com

Anthony NyströmFellow, Managing Director of Engineering

[email protected]@AnthonyNystrom

Thank You Gracias Merci Danke

Tuesday, June 18, 13