Lessons from 2MM machine learning models

17
Kaggle The home of data science

Transcript of Lessons from 2MM machine learning models

Page 1: Lessons from 2MM machine learning models

Kaggle

The home of data science

Page 2: Lessons from 2MM machine learning models

GE Flight Quest 2Optimize flight routes basedon weather & traffic

$250,000122 teams

Hewlett Foundation: Automated Essay ScoringDevelop an automated scoring algorithmfor student-written essays

$100,000155 teams

Allstate Purchase Prediction ChallengeDevelop an automated scoring algorithmfor student-written essays

$50,0001,570 teams

Merck Molecular Activity ChallengeHelp develop safe and effective medicinesby predicting molecular activity

$40,000236 teams

Higgs Boson Machine Learning ChallengeUse the ATLAS experiment toidentify the Higgs boson

$13,0001,302 teams

Page 3: Lessons from 2MM machine learning models

Age Income Default

58 $95,824 True73 $20,708 False59 $82,152 False66 $25,334 True

Age Income Default

73 $53,44561 $36,67947 $90,42244 $79,040

Training Data Test Data

The Kaggle Approach

Page 4: Lessons from 2MM machine learning models
Page 5: Lessons from 2MM machine learning models

Mapping Dark Matter

Competition Progress

Accuracy(lower is better)

Week 1 Week 3 Week 5 Week 7 End

.0150

.0170Martin O’LearyPhD student in Glaciology, Cambridge U

Page 6: Lessons from 2MM machine learning models

“In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms”

“The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”

Page 7: Lessons from 2MM machine learning models

Mapping Dark Matter

Competition Progress

Accuracy(lower is better)

Week 1 Week 3 Week 5 Week 7 End

.0150

.0170

Martin O’LearyPhD student in Glaciology, Cambridge U

Marius CobzarencoGrad student in computer vision, UC London

Ali Haissaine & Eu Jin LocSignature Verification, Qatar U & Grad Student @ Deloitte

Other

deepZot (David Kirkby & Daniel Margala)Particle Physicist & Cosmologist

Page 8: Lessons from 2MM machine learning models

We’ve worked with many of the world’s largest companies

Healthcare & Pharma

Consumer Internet

Finance IndustrialConsumerMarketing

Oil& Gas

$50b+Beverage

Co.

Global Bank

Top CreditCard

Issuer

Top 5 E&P

Top 20 E&P

Page 9: Lessons from 2MM machine learning models
Page 10: Lessons from 2MM machine learning models

That submit over 100K machine learning models per month

May-10 May-11 May-12 May-13 May-14 May-150

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Monthly Submissions to Kaggle Competitions

Page 11: Lessons from 2MM machine learning models

There’s a cookbook for winning competitions on structured data. It starts with exploring the data.

Page 12: Lessons from 2MM machine learning models

2. Create and select features

Page 13: Lessons from 2MM machine learning models

3. Parameter tuning and ensembling

Page 14: Lessons from 2MM machine learning models

A second cookbook is emerging on computer vision and speech problems. It involves using convolutional neural networks.

Page 15: Lessons from 2MM machine learning models

The vast majority of time is spent training algorithms when CNNs are applied.

Page 16: Lessons from 2MM machine learning models

There are the problems that land in the middle…

Page 17: Lessons from 2MM machine learning models

Anthony [email protected] 283 9781