Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

41
TOYOTA Customer 360 on Apache Spark TM DATA DRIVEN Brian Kursar, Sr Data Scientist Toyota Motor Sales IT Research and Development Final

Transcript of Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Page 1: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

TOYOTA Customer 360◦ on Apache SparkTM

DATA DRIVEN

Brian Kursar, Sr Data Scientist Toyota Motor Sales IT Research and Development Final

Page 2: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

TOYOTA Big Data History

2015

2014

2013

2012

2011

2010

C360 - Next Gen Insights Platform Over 6B Records

C360 - Customer Experience Analytics Over 700M Records

C360 - Toyota Social Media Intelligence Center Over 500M Records

Product Quality Analytics v2 Over 120M Records

Marketing and Incentives Analytics 70M Records

Product Quality Analytics Over 60M Records

Page 3: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

SPARK SUMMIT 2014

TEAM TOYOTA

R&D

Data Engineering

Infrastructure

Enterprise Architecture

Page 4: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Genchi Genbutsu “Go Look, Go See”

Page 5: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

•  Compute •  Streaming •  Machine Learning

ACTIONABLE INSIGHTS

Page 6: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Customer Experience original Batch Job

160 hours (6.6 days)

Same job re-written using Apache Spark …

4 hours

Data Engineering

Page 7: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Existing Tools

Page 8: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Existing Tools

Jan – Feb Feb - Mar

+1% +1%

+1% +2%

Toyota Social Opinion 2013

Page 9: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

40% Retailers Selling Toyota Vehicles

11% Opinions on Marketing Campaigns

10% Feedback on Dealer Sales and Service Experiences

9% Opinions on Product Styling and Features

8% People In Market for a Toyota

8% Incident Reports Involving a Toyota Vehicle

7% Feedback on Product Quality

5% Customers Advocating for the Brand

2% Completely Irrelevant

Toyota Online Conversations by the Numbers

2014 Study

Page 10: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Toyota Online Conversations by the Numbers

40% Retailers Selling Toyota Vehicles

11% Opinions on Marketing Campaigns

10% Feedback on Dealer Sales and Service Experiences

9% Opinions on Product Styling and Features

8% People In Market for a Toyota

8% Incident Reports Involving a Toyota Vehicle

7% Feedback on Product Quality

5% Customers Advocating for the Brand

2% Completely Irrelevant

2014 Study

50% Noise

Page 11: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Categorize and Prioritize incoming Social Media interactions in Real-Time using Spark MLLib

Campaign Opinions

Customer Feedback

Product Feedback Noise

Page 12: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)
Page 13: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

First Spark MLlib Experiment

•  Seat Cover Wrinkles/Cracking •  Brake Noise •  Shift Quality •  Oil Leaks •  HVAC Odor •  Dead Battery •  Rodent Wire Harness Damage •  Paint Chips

Time-box project to 12 Weeks Classify at min 80% accuracy

Page 14: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Describe this issue…

Page 15: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Noise Brakes

Page 16: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

When I’m backing up in my 2012 Prius, it sounds like something hanging up or scraping as it rotates and only happens in the morning..

I hear a squeak coming from the back wheel of my Prius as I pull out from my driveway in the morning.

Page 17: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Extract features from the Verbatim Text

Identify Training Data for Brake Noise Model

Extract Category (Label) matching Model Objective

Page 18: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Social ML Pipeline

Natural  Language  Processing  Options

Feature  EngineeringOptions

Statistical  Selection(Chi  Square)

All Top

TopRandomRandom

Popular

Vectorization  Options

Machine  Learning  Model

TF*IDF

TF  OptionsNaturalBooleanLogAugmented

IDF  OptionsUnaryInverseInverse  SmoothProbDF

Support  Vector  Machine  (MLlib)

Validation  Set  (1  Fold)

Cross  Validation:  Repeat  10  times

Training  Set(9  Fold)

Multiple  Iteration:  Optimal  SVM  Parameter

Training  selection  filters

StopwordsStemmingN-­‐grams  extraction

Cleaning  (#,’,  /,  @)

Positive  &  Negative  training  Sets

Labeled  Data(Surveys,  Hand  Scored  Social)

Extracted  n-­‐grams

Page 19: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Extract Text Features

Page 20: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Train Predictive Model

Page 21: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)
Page 22: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Ver 1

56% Accuracy

Page 23: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Ver 3

36% Accuracy

Page 24: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Ver 8

35% Accuracy

Page 25: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

False Positives

I just had my friend at the toyota dealer rotate my tires and he

said … that the brake pads are getting thin really fast. So what should I do when they get too thin in the future and start

to squeak?

Page 26: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

False Positives

i cut the iac hose as shown in figure 20 in the manual but when i

start the car, it started gasping for air... choking...

sounds like it's about to die out.

i bought the power brake check valve (80190 part for kragen)... but either i'm not installing it right or it's the wrong size... i have no idea.

Page 27: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Solution

Page 28: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Explicit Semantic Analysis

Page 29: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Noise Brakes

Page 30: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Noise

caliper pads

rotor

wheel

squeak grind

groan

squeal

Brakes

drum

Page 31: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Noise

squeak grind

groan

squeal

Distance Similarity Between Concepts

pads caliper

rotor

wheel

drum

Brakes

Distance is calculated between

concepts based on the Minkowski distance formula.

Page 32: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Ver 9

82% Accuracy

Page 33: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)
Page 34: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Kaizen

= Continuous Improvement

Page 35: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Kaizen

Train

Test Evaluate

Refine

Page 36: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Social ML Pipeline

Natural  Language  Processing  Options

Feature  EngineeringOptions

Statistical  Selection(Chi  Square)

All Top

TopRandomRandom

Popular

Vectorization  Options

Machine  Learning  Model

TF*IDF

TF  OptionsNaturalBooleanLogAugmented

IDF  OptionsUnaryInverseInverse  SmoothProbDF

Support  Vector  Machine  (MLlib)

Validation  Set  (1  Fold)

Cross  Validation:  Repeat  10  times

Training  Set(9  Fold)

Multiple  Iteration:  Optimal  SVM  Parameter

Training  selection  filters

StopwordsStemmingN-­‐grams  extraction

Cleaning  (#,’,  /,  @)

Positive  &  Negative  training  Sets

Labeled  Data(Surveys,  Hand  Scored  Social)

Extracted  n-­‐grams

Synonyms POS

Feature  Manager

Negation  Evaluator

Distance  Similarity

Manhattan  Distance  (L1)-­‐|x|+  |y|

Euclidean  Distance  (L2)-­‐√(|x|^  2  +  |y|^  2)

Concept  Manager Features

Concept  Interpreter

Page 37: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

TODAY

Page 38: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

FUTURE

Manufacturing Data

Connected Vehicle Data

Consumer Data

Page 39: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

TEAM TOYOTA Spark Tips

•  Education and Inclusion

•  Pace Yourself

•  Design and Plan a Transitional Architecture to Incrementally

Introduce Spark elements into your Applications

•  Use Joda Time for Date Comparisons

Page 40: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

TEAM TOYOTA Lessons Learned

•  Be mindful of AKKA versions when trying to Build a new Spark release to a

packaged Hadoop Distribution

•  Use SparkSQL versus DSLs for Joins

•  Remember to configure Memory Fraction based on the size of your data.

Page 41: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)

Brian Kursar – Sr Data Scientist Toyota Motor Sales IT Research and Development @briankursar

Visit us at the TOYOTA Booth here at the Spark Summit Today.