Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning...

32
Data Intelligence for All Visual, Interactive, Predictive Analytics for Big Data Christopher Nguyen, PhD Co-Founder & CEO Presented on January 10, 2014

Transcript of Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning...

Page 1: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

Data Intelligence for All Visual, Interactive, Predictive Analytics for Big Data !

Christopher Nguyen, PhD Co-Founder & CEO

Presented on January 10, 2014

Page 2: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Hadoop distributed/streaming analytics, Yahoo Hadoop Eng, UIUC PhD

Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD

Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD

Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ...

Page 3: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

BIG DATA PROBLEMS

Huge Volume

High Velocity

Great Variety

2 0 0 5

Page 4: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

HADOOPHIV E

MAPREDUCE

2 0 1 0

helped solve big data problems

Page 5: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

BIG DATA + BIG COMPUTE = OPPORTUNITIES

Machine Learning

Predictive Analytics

Natural Language

automatic customer segmentation

2 0 1 3

BIG DATA= PROBLEMS

Page 6: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Hadoop has a Big Problem. It’s too slooow…

Page 7: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Users need interactive visualization & advanced analytics

Page 8: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

But no tools available on the Hadoop stack offer this

Page 9: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Until now.

DATA INTELLIGENCE FOR ALL

Page 10: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.comHDFS

BATCH ETL MapReduce

SQL Queries Hive

Basic Analytics (SQL Queries)

!Impala, Stinger, Presto,

Platfora, Hadapt

!Advanced Analytics

In-Memory, Fast Interactive Visualization,

Data Mining and Machine Learning

ADATAO SOLUTION IN HADOOP LANDSCAPE

Hybrid Export-then-

Analyze

RDBMS

DATA INTELLIGENCE FOR ALL

Interactive, FastBatchHybrid

Page 11: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Powerful In-Memory Data Mining

Machine Learning Big Analytics Platform BIG

COMPUTE

(Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data)BIG

DATA

Business Users Data Scientists Data Engineers

Visually Beautiful

Interactive DataExploration

Narrative Web App

BIG INSIGHTS

01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100

ONE Integrated Solution for Business & Data Science & Engineering

Page 12: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

DATA INTELLIGENCE FOR ALL

is the first & still only solution for this problem

Page 13: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

CLIENT WORKER WORKER WORKERWORKERMASTER

Live Demo Deployment Diagram

Page 14: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

It’s Big & Interactive1Magic

Page 15: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

It’s Web-Based2Magic

Page 16: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

It’s Multi-Lingual3Magic

Page 17: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

It’s Dashboards!4Magic

Page 18: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

It’s Machine Learning5Magic

Page 19: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

for Business Users

A Beautiful New Way to Create & Share Visual Narratives of Your Analysis

Perform Ad Hoc Queries in Plain English

Publish Streaming, Interactive Dashboards

Collaborate With Others In Real Time

Query Terabytes in Seconds.

Predictive Decision Making

Page 20: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

for Data Scientists & Engineers

Powerful In-Memory Data Mining & Machine Learning—Model Terabytes in Seconds

Interactive, Cluster-Scale Data Munging & Modeling with Native R, R-Studio, Python, SQL, and Java Front-ends

Real-Time Scoring Directly From Trained Models

Share reproducible, live data analysis documents

Hadoop, Cassandra, RDBMS, Streaming Data

01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100 Big Data Mining & Machine Learning

Page 21: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Use Cases

Page 22: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Internet Service Provider

Interactive, Ad Hoc Business Query

Insight Discovery on Aggregated Operational Data

Finance

Engineering

Marketing

Sales

Page 23: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Customer Service Provider

Product Recommendation

Cross-channel User Experience Optimization

Page 24: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Sensor Network Analyticsfor Predictive Maintenance

Heavy Equipment Manufacturer

Page 25: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Mobile Ad Platform

Ad Targeting

CTR Prediction

Page 26: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Scaling Performance

Page 27: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Algorithm Run time (sec) for 50GB (800M rows)

Per-Core Throughput (MB/sec-core)

Per-Machine Throughput (MB/s)

adatao.lm (ridge) 3.04 130 1,040

adatao.lm 4.05 102 816

adatao.lm.gd 12.2 32 256

adatao.glm.gd 24.5 16 128

adatao.glm 36.1 11 88

adatao.kmeans 335 1.2 9.6

pAnalytics performance on building machine learning models with cluster Adatao16 (m3.2xlarge) on a 50GB data set of 5 features and 800 million rows. (Gradient descent algorithms are over 5 iterations)

Page 28: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Algorithm Run time (sec) for 1.1 TB dataset (1.6B rows)

Per-Core Throughput (MB/sec-core)

Per-Machine Throughput (MB/s)

adatao.lm (ridge) 70.9 130 1,040

adatao.lm 74.9 123 984

adatao.lm.gd 127 72.8 582

adatao.glm.gd 145 63.6 509

pAnalytics performance on building machine learning models with cluster Adatao40 (m3.2xlarge) on a 1.1 TB data set of 40 features and 1.6 billion rows. (Gradient descent algorithms are over 5 iterations)

Page 29: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Page 30: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Business Users

Fast & Easy Business Analytics !Natural Language !Beautiful Web UI

Big & Fast Data Science !R, Python, REST API !Data Mining & ML

Data Intelligence for All

Data Scientists & Engineers01100011

0110001

01100011

10001100

01100011

0110001

01100011

10001100

Page 31: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com

Thanks! !

Data Scientist? http://adatao.com/forms/ds.html

Page 32: Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Valley Machine Learning Meetup

http://adatao.com