Predictive analytics and big data tutorial

42
Ben Taylor @bentaylordata Predictive Analytics / Data Science

description

This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.

Transcript of Predictive analytics and big data tutorial

Page 1: Predictive analytics and big data tutorial

Ben Taylor @bentaylordata

Predictive Analytics / Data Science

Page 2: Predictive analytics and big data tutorial

Presentation Objectives

• Enable you to be smarter than your prospect (data history / lingo)

• Motivate you to be unstoppable and hyper-confident

• Motivate you to begin looking for data driven opportunities

• Motivate you to become a data scientist

Page 3: Predictive analytics and big data tutorial

"What the hell is cloud computing?"-Larry Ellison, CEO Oracle

Page 4: Predictive analytics and big data tutorial

What is cloud computing?

?

Page 5: Predictive analytics and big data tutorial

What is big data?

Big data includes datasets or problems which exceed the capacity of a single computer and require a distributed data access system.

The concept of "big" is relative to the conventional systems and technology and is subject to change in the future with advances in memory and storage solutions.

http://www.pcmag.com/article2/0,2817,2453838,00.asp

Page 6: Predictive analytics and big data tutorial

Big data trends

Page 7: Predictive analytics and big data tutorial

What is a data scientist?

Page 8: Predictive analytics and big data tutorial

What is a data scientist?

Engineering Finance Economics Mathematics Computer Science Physics

Data Science6-10yrs

Python Bootcamp $8,000 (3 months)

$16,000-$4,000 (3 months)

$115K avg

Page 9: Predictive analytics and big data tutorial

What is a data scientist?

Page 10: Predictive analytics and big data tutorial

What is a data scientist?

Master Builder

Page 11: Predictive analytics and big data tutorial

What is a data scientist?

Reality distortion: Hyper-confidence

Page 12: Predictive analytics and big data tutorial
Page 13: Predictive analytics and big data tutorial

Data Scientist = Peacock

Page 14: Predictive analytics and big data tutorial

@bentaylordata

Humans Algorithms

VS

Page 15: Predictive analytics and big data tutorial

Smartest pirate

Page 16: Predictive analytics and big data tutorial

Humans Algorithms

VS

NA

Page 17: Predictive analytics and big data tutorial

Humans Algorithms

VSGerman (1795), French (1806)

Page 18: Predictive analytics and big data tutorial

Humans Algorithms

VS

1997, IBM deep blue

Kasparov

Page 19: Predictive analytics and big data tutorial

Humans Algorithms

VS

2011, IBM Watson

Ken Jennings & Brad Rutter

Page 20: Predictive analytics and big data tutorial

Humans Algorithms

VS

2014, HireVue Iris

Hiring Panel

Page 21: Predictive analytics and big data tutorial

Prediction process

Raw data

Data munging

Training

Model

Page 22: Predictive analytics and big data tutorial

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

Clean data

Page 23: Predictive analytics and big data tutorial

Numeric Excel example

@bentaylordata

Page 24: Predictive analytics and big data tutorial

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Page 25: Predictive analytics and big data tutorial

Missing values + categorical

@bentaylordata

Page 26: Predictive analytics and big data tutorial
Page 27: Predictive analytics and big data tutorial

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Retail > 15, Engineering > 95

> 5.67

Page 28: Predictive analytics and big data tutorial

Resume model

Page 29: Predictive analytics and big data tutorial

Resume model

Page 30: Predictive analytics and big data tutorial

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Retail > 15, Engineering > 95GPA, Colleges, Hobbies

> 5.67

Page 31: Predictive analytics and big data tutorial

Text deeper dive

Page 32: Predictive analytics and big data tutorial

Sentiment example

Page 33: Predictive analytics and big data tutorial

Sentiment example

Page 34: Predictive analytics and big data tutorial

Sentiment

Page 35: Predictive analytics and big data tutorial

Given data, find cat? dog?

@bentaylordata

Page 36: Predictive analytics and big data tutorial

Talk like a data nerd

@bentaylordata

Page 37: Predictive analytics and big data tutorial

Confidence & Over-fitting

Page 38: Predictive analytics and big data tutorial

Confidence & Over-fitting

Page 39: Predictive analytics and big data tutorial

Data Lingo Supervised vs unsupervised learning

Supervised: Training set provided.

Unsupervised: No training set, clustering based on similar attributes.

Page 40: Predictive analytics and big data tutorial

Data Lingo Analytic Layers

Descriptive Analytics: Telling a data story, plotting, or visualization.

Predictive Analytics: Predict future outcomes, usually trained on a historical training set

Prescriptive Analytics: Using the insight from your predictive model to proactively change something

Interview/Interaction Analytics: Any analytics surrounding the interview or interaction.

Page 41: Predictive analytics and big data tutorial

Data Lingo Prediction methods

Regression: Predicting a continuous output (stock)

Classification: Predicting discrete category outputs. i.e. Yes/Maybe/No

Page 42: Predictive analytics and big data tutorial

Data Lingo

Data Types Structured: Does it play well in Excel?

Unstructured: Raw text (Twitter), audio, video, photos, resumes, etc…