Big Data for Social Good

35
Big Data for Social Good Using Kaggle for Business and Social Impact Peter Prettenhofer @datarobot Tobias Pfaff @datalook Emmanuel Letouzé @datapopalliance

Transcript of Big Data for Social Good

Page 1: Big Data for Social Good

Big Data for Social Good Using Kaggle for Business and Social Impact

Peter Prettenhofer @datarobot

Tobias Pfaff @datalook

Emmanuel Letouzé @datapopalliance

Page 2: Big Data for Social Good

Agenda

Big Data for Social Good

> The Macro Perspective

> The Micro Perspective

…and Kaggle

> Short Intro

> Interactive Crash Course

Interactive part: 1. Load bit.ly/bitkom15 and create account 2. Create account on kaggle.com

Page 3: Big Data for Social Good

Big Data and Development

> The Macro Perspective

Emmanuel Letouzé

Page 4: Big Data for Social Good

Emmanuel Letouzé

Page 5: Big Data for Social Good

Emmanuel Letouzé

* Sou

rce: Oxfam

Intern

ational, citin

g Credit Su

isse, Jan. 20

14

“The bottom half of the world’s population owns the same as the richest 85 people in the world”*

Page 6: Big Data for Social Good

Emmanuel Letouzé

Big Data as / is a new ecosystem: from the 3 Vs to the 3Cs of Big Data

i. Exhaust

ii. Web

iii. Sensing

Crumbs

Capacities

Community

Page 7: Big Data for Social Good

Emmanuel Letouzé

Applications: Taxonomy

1. Descriptive -e.g. maps, clouds..

2. Predictive: -forecasting -inference

3. Prescriptive -causal inference

Page 8: Big Data for Social Good

Emmanuel Letouzé

Page 9: Big Data for Social Good

Emmanuel Letouzé

Page 10: Big Data for Social Good

Emmanuel Letouzé

Page 11: Big Data for Social Good

Emmanuel Letouzé

Page 12: Big Data for Social Good

Emmanuel Letouzé

Applications: ongoing

Page 13: Big Data for Social Good

Emmanuel Letouzé

Applications: Examples

Page 14: Big Data for Social Good

Emmanuel Letouzé

Implications: Ethics

Page 15: Big Data for Social Good

Emmanuel Letouzé

Implications: Power

Jonathan Glemmie, The Guardian, Oct 3, 2013

Page 16: Big Data for Social Good

Emmanuel Letouzé

Page 17: Big Data for Social Good

> The Micro Perspective

Tobias Pfaff

Page 18: Big Data for Social Good

Bring the superpowers of data science to nonprofit organizations and to the local administration.

> The Micro Perspective

Tobias Pfaff

Page 19: Big Data for Social Good

Saving lives with predictive analytics in New York City

> The Micro Perspective

Tobias Pfaff

Page 20: Big Data for Social Good
Page 21: Big Data for Social Good

[Slide from Drew Conway, http://bit.ly/1BjFpvW] Tobias Pfaff

Time that buildings are at risk of severe fire significantly reduced.

Page 22: Big Data for Social Good

Which social problems can be solved with (Big) Data?

Tobias Pfaff

> The Micro Perspective

Page 23: Big Data for Social Good

Who are the other players?

Tobias Pfaff

> The Micro Perspective

Page 24: Big Data for Social Good

…and how can my company get involved?

Tobias Pfaff

> The Micro Perspective

Page 25: Big Data for Social Good

[http://bit.ly/1w6fccS] Tobias Pfaff

Page 26: Big Data for Social Good

> Short Intro to Data Science Competitions with

Peter Prettenhofer

Page 27: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work Return% ProductID Dept Price MFR

1.94 54323 Household 54.95 USA0.023 92356 Household 9.95 USA

0.8 78023 Computer 4.5 China0.01 12340 Audio 109.99 China0.41 31240 Audio 29.99 Taiwan0.97 12351 Hardware 54.95 Mexico

0.0115 90141 Hardware 4.99 USA0.4 81240 Hardware 6.55 Taiwan0.03 14896 Computer 211.99 Korea

0.205 62132 Computer 1100 USA1.6878 54323 Audio 34.99 USA0.0345 92356 Audio 7.99 USA

0.64 78023 Household 229.9 Brazil0.72 12340 Audio 19.95 Mexico0.41 31240 Computer 6.99 Taiwan1.94 54323 Hardware 11.99 Taiwan

0.023 92356 Household 2.05 USA0.08 78023 Computer 99.99 USA2.09 12340 Computer 129.99 China1.1 31240 Audio 18.99 China

Target Features

Page 28: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work Return% ProductID Dept Price MFR

1.94 54323 Household 54.95 USA0.023 92356 Household 9.95 USA

0.8 78023 Computer 4.5 China0.01 12340 Audio 109.99 China0.41 31240 Audio 29.99 Taiwan0.97 12351 Hardware 54.95 Mexico

0.0115 90141 Hardware 4.99 USA0.4 81240 Hardware 6.55 Taiwan0.03 14896 Computer 211.99 Korea

0.205 62132 Computer 1100 USA1.6878 54323 Audio 34.99 USA0.0345 92356 Audio 7.99 USA

0.64 78023 Household 229.9 Brazil0.72 12340 Audio 19.95 Mexico0.41 31240 Computer 6.99 Taiwan1.94 54323 Hardware 11.99 Taiwan

0.023 92356 Household 2.05 USA0.08 78023 Computer 99.99 USA2.09 12340 Computer 129.99 China1.1 31240 Audio 18.99 China

Training

Testing Ground Truth

Page 29: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work

Training

Testing Blanks

Return% ProductID Dept Price MFR1.94 54323 Household 54.95 USA

0.023 92356 Household 9.95 USA0.8 78023 Computer 4.5 China0.01 12340 Audio 109.99 China0.41 31240 Audio 29.99 Taiwan0.97 12351 Hardware 54.95 Mexico

0.0115 90141 Hardware 4.99 USA0.4 81240 Hardware 6.55 Taiwan0.03 14896 Computer 211.99 Korea

0.205 62132 Computer 1100 USA1.6878 54323 Audio 34.99 USA0.0345 92356 Audio 7.99 USA

? 78023 Household 229.9 Brazil? 12340 Audio 19.95 Mexico? 31240 Computer 6.99 Taiwan? 54323 Hardware 11.99 Taiwan? 92356 Household 2.05 USA? 78023 Computer 99.99 USA? 12340 Computer 129.99 China? 31240 Audio 18.99 China

Page 30: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work

Training

Testing Submissions

Return% ProductID Dept Price MFR1.94 54323 Household 54.95 USA

0.023 92356 Household 9.95 USA0.8 78023 Computer 4.5 China0.01 12340 Audio 109.99 China0.41 31240 Audio 29.99 Taiwan0.97 12351 Hardware 54.95 Mexico

0.0115 90141 Hardware 4.99 USA0.4 81240 Hardware 6.55 Taiwan0.03 14896 Computer 211.99 Korea

0.205 62132 Computer 1100 USA1.6878 54323 Audio 34.99 USA0.0345 92356 Audio 7.99 USA

0.83 78023 Household 229.9 Brazil0.65 12340 Audio 19.95 Mexico0.52 31240 Computer 6.99 Taiwan1.74 54323 Hardware 11.99 Taiwan0.1 92356 Household 2.05 USA0.02 78023 Computer 99.99 USA2.9 12340 Computer 129.99 China0.83 31240 Audio 18.99 China

Page 31: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work

Page 32: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work

Training

Testing Public/Private

Leaderboard

Return% ProductID Dept Price MFR1.94 54323 Household 54.95 USA

0.023 92356 Household 9.95 USA0.8 78023 Computer 4.5 China0.01 12340 Audio 109.99 China0.41 31240 Audio 29.99 Taiwan0.97 12351 Hardware 54.95 Mexico

0.0115 90141 Hardware 4.99 USA0.4 81240 Hardware 6.55 Taiwan0.03 14896 Computer 211.99 Korea

0.205 62132 Computer 1100 USA1.6878 54323 Audio 34.99 USA0.0345 92356 Audio 7.99 USA

0.83 78023 Household 229.9 Brazil0.65 12340 Audio 19.95 Mexico0.52 31240 Computer 6.99 Taiwan1.74 54323 Hardware 11.99 Taiwan0.1 92356 Household 2.05 USA0.02 78023 Computer 99.99 USA2.9 12340 Computer 129.99 China0.83 31240 Audio 18.99 China

Page 33: Big Data for Social Good

Peter Prettenhofer

How Data Science Competitions Work

Page 34: Big Data for Social Good

> Interactive Kaggle Crash Course

Peter Prettenhofer

Interactive part: 1. Load bit.ly/bitkom15 and create account 2. Create account on kaggle.com Slides and interactive material available at bit.ly/bitkom16

Page 35: Big Data for Social Good

Thank you.

Peter Prettenhofer [email protected]

Tobias Pfaff [email protected]

Emmanuel Letouzé [email protected]

Questions?