Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team

33
BIG DATA IN HYBRID WORLDS The Story of M

Transcript of Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team

BIG DATA IN HYBRID WORLDS

The Story of M

I’m Florian CEO of Dataiku maker Data  Science  Studio,the « Photoshop for Data Science »

COMMUNITY  EDITION  (it’s  FREE)    http://www.dataiku.com/dss/trynow/

H i !

React on twitter @fdouetteau #BigDataParis

B i g o r S m a l l

Startup Big Firm

H O W D O P E O P L E TA K E D E C I S I O N S

B U Y I N G D E C I S I O N S

Should I buy it ?

S O C I A L D E C I S I O N S

Should I talk to him ?

M LIKE MEETING

B u s i n e s s D e c i s i o n s

B u s i n e s s I n t e l l i g e n c e

B u s i n e s s I n t e l l i g e n c e

IN 2001 man (actually Gartner)

invented big data

Volume Variety Velocity

WHAT IF THE META GROUP HAD CHOSEN ANOTHER LETTER?

Capacity Complexity Celerity

Size Serendipity Speed

Big Blur Blazing

Or Combine

Com….. Bu.. Sh..

BIG DATA RELIGION ?

M LIKE METRICS

M L I K E M E T R I C S

How much does it cost to produce and maintain a metric ?

How many metrics do I need ?

Do I Follow the right metrics ?

Do I Have enough data ?

Do I Have enough Data?

• Self-ServiceBuild your own metrics

• Analytical Capabilities Find your patterns

• Large VolumeStore it all

M o r e M e t r i c s M e a n s M o r e M e a n s

DATA MINING

M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n

Mission Critical

Small Structured

Large Diverse

Sheer Curiosity

Reporting for Financein Any Industry

Analyze Each Tweet

Web Navigation For E-Merchant

Ticket DataFor Discountsin Retail

Phone Call Logs for Security

RTB Data For Advertising

Customer Consumption For Anti-Churn in Utilities

CLASSIC BI

LARGE PRODUCTION

PLATFORM

DATAEXPLORATION

Optimization

FilingsFor Fraud in Insurance

D

DATA MINING

T O D AY E A C H O W N A S I T S S T O R E

Mission Critical

Small Structured

Large Diverse

Sheer Curiosity

CLASSIC BI

LARGE PRODUCTION

PLATFORM

DATAEXPLORATION

Optimization

DATA WAREHOUSING

DATA MININGREPOSITORIES DATA LAKE

GOOGLE LIKE PLATFORM

i t ’ s n o t j u s t a b o u t t h e m e t r i c s

D ATA D R I V E N B U S I N E S S

P r o b l e m i s t h e h u m a n

Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?

M LIKE MACHINE

R i s e o f A I

1997 Deep Blue 2011 Watson’s Jeopardy

2012 Google Cat2005 Autonomous Vehicule

1974 - 1993 AI Winters

www.dataiku.com

Churn

Volume Forecast

RecommenderSegmentation Lifetime Value

Risk Score Hot Location

Pricing Ranking FraudEvent Paths

APPLICATIONS OF MACHINE LEARNING TO

BUSINESS PROBLEMS

P R E D I C T I V E M A I N C O N F O R T Z O N E

Mission Critical

Small Structured

Large Diverse

Sheer Curiosity

Reporting for Financein Any Industry

Analyze Each Tweet

Web Navigation For E-Merchant

Ticket DataFor Discountsin Retail

Phone Call Logs for Security

RTB Data For Advertising

Customer Consumption For Anti-Churn in Utilities

Optimization

FilingsFor Fraud in Insurance

Not EnoughData To Learn From ?

Not Enough“Hard" Examples So that you can learn

Dataiku - Pig, Hive and Cascading

Welcome to Technoslavia

Hadoop Ceph

Sphere Cassandra

Kafka Flume Spark

Scikit-Learn GraphLAB prediction.io jubatus

Mahout WEKA

MLBase LibSVM

RapidMiner Panda

Kibana

InfiniDB Drill Spark SQL

Hive Impala

Elastic Search

SOLR MongoDB

Riak Membase

Pig

Cascading

Talend

Machine Learning Mystery LandScalability Central

SQL Colunnar Republic

Vizualization County Data Clean Wasteland

Statistician Old House

R Real-time island

Storm

NOSQL Nihiland

E m b r a c e M a n y S k i l l s M a n y - S e t s

Data Plumberer

BI Manager

Data Scientist

Data Waiter

Data Cleaner

Business Analyst

REALJOB

DREAMJOB

• Reformulation de la recherche

• Pas de réponse

• Clic sur un pro• Top recherche• Clic de navigation ou filtre

COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ?

20 M

Analyse & corrections

automatisation

>10 occurrences1,4M

requêtes

>200M recherches

✗ ✓

0,5M requêtes priorisées

"PREDICTIVE CONTENT MANAGEMENT” FROM PAGES JAUNES

Machine

Gestion Exploration

pagesjaunes.frAnnuaire

hadoop PIG+Hive

Export indexation

Moteur d’interprétation

crawl Autres référentiels

Sickit-learn

O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o

Data Science Studio

Historical delivery and retrieval data

Modeling of a score for each delivery

Cleaning and temporal enrichment of data

Data aggregation by geographic location

Incorporation of new deliveries to the existing model

by

E X P L O R E N E W W O R D S

Mission Critical

Small Structured

Large Diverse

Sheer Curiosity

Optimization

OptimizeExisting

BI Capabilities Build MandatoryLarge Volume Capabilities

EXPLORE POTENTIAL

NOT BEING RELEVANT DANGER ZONE

Analytics

Predictive

Self Service

Cluster

www.dataiku.com