Cloudera Wrangle 2015

19
Condense Fact from the Vapor of Nuance Mike Conover, Ph.D. Staff Data Scientist, LinkedIn

Transcript of Cloudera Wrangle 2015

Condense Fact from the Vapor of NuanceMike Conover, Ph.D. Staff Data Scientist, LinkedIn

Good Problems

UnambiguousConcrete Response Variables

Ad Clicks Real Estate Pricing

Flower Species

Qualitative Multidimensional

Non-Euclidean

Expertise

Nebulous

TrustSerendipity Sentiment

High-Fidelity Proxy Variables Crowdsourcing / In-House Evaluation

Tip of the Spear

Operationalize

“All models are wrong, but some are useful.”George Box

Execution

Quality Control

Scale is the Premise

Spectrum

Qualitative Evaluation

Heteroscedasticity Stratified Sampling

Temporal Factors

Confirmation Bias

Fit to Print

LightweightNarrative Frame Toy Examples Propaganda

“Given enough eyeballs, all bugs are shallow.”Eric S. Raymond

InfrastructureNotebooks

Model Viewers Soup to Nuts

Ship It!

Devil’s BargainFeature Transformations Transcription Errors Model Specification

Power ToolsSingularity Something’s Wrong. Go.

Spark & MLLib + Notebooks

Condense Fact from the Vapor of NuanceMike Conover, Ph.D. Staff Data Scientist, LinkedIn

BiomorphsIterating on Response Variables