Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data...
Transcript of Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data...
![Page 1: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/1.jpg)
Big Data is Not About the Data!
Gary King
Institute for Quantitative Social ScienceHarvard University
(Talk at the Golden Seeds Innovation Summit, New York City 1/30/2013)
Gary King (Harvard) Big Analytics 1 / 10
![Page 2: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/2.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 3: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/3.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 4: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/4.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs
3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 5: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/5.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras
4 Health information: digital medical records, hospital admittances,accelerometers & other devices in cell phones
5 Biological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variables
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 6: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/6.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones
5 Biological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variables
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 7: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/7.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables
6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 8: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/8.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.
7 Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, campaign contributions
8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,multiplayer games, virtual worlds
9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 9: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/9.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions
8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,multiplayer games, virtual worlds
9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 10: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/10.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds
9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 11: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/11.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year
10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 12: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/12.jpg)
The Data in Big Data
1 Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviews
2 Commerce: credit cards, sales, real estate transactions, RFIDs3 Geographic location: cell phones, Fastlane, garage cameras4 Health information: digital medical records, hospital admittances,
accelerometers & other devices in cell phones5 Biological sciences: genomics, proteomics, metabolomics, brain
imaging producing huge numbers of person-level variables6 Satellite imagery: increasing in scope, resolution, and availability.7 Electoral activity: ballot images, precinct-level results, individual-level
registration, primary participation, campaign contributions8 Web surfing artifacts: clicks, searches, and advertising clickthroughs,
multiplayer games, virtual worlds9 > 90% of all data ever created was created last year10 Popular versions: MoneyBall, SuperCrunchers, The Numerati
Gary King (Harvard) Big Analytics 2 / 10
![Page 13: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/13.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 14: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/14.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditized
easy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 15: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/15.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvements
Ignore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 16: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/16.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every year
With a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 17: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/17.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 18: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/18.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 19: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/19.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customized
Moore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 20: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/20.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm
$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 21: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/21.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm design
Low cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 22: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/22.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital needed
Innovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 23: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/23.jpg)
The Value in Big Data: the Analytics
Data:
becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increases
Where the Value is: the Analytics
Output can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm$2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approaches
Gary King (Harvard) Big Analytics 3 / 10
![Page 24: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/24.jpg)
Examples of what’s now possible
Opinions of activists:
A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 25: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/25.jpg)
Examples of what’s now possible
Opinions of activists:
A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 26: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/26.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews
billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 27: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/27.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 28: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/28.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise:
A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 29: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/29.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week?
500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 30: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/30.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 31: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/31.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts:
A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 32: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/32.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends”
continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 33: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/33.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 34: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/34.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries:
Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 35: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/35.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries: Dubious ornonexistent governmental statistics
satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 36: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/36.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries: Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 37: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/37.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries: Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 38: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/38.jpg)
Examples of what’s now possible
Opinions of activists: A few thousand interviews billions ofpolitical opinions in social media posts (1B every 3 Days)
Exercise: A survey: “How many times did you exercise last week? 500K people carrying cell phones with accelerometers
Social contacts: A survey: “Please tell me your 5 best friends” continuous record of phone calls, emails, text messages, bluetooth,social media connections, address books
Economic development in developing countries: Dubious ornonexistent governmental statistics satellite images ofhuman-generated light at night, road networks, other infrastructure
Expert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics wins
In each: without new analytics, the data are useless
Gary King (Harvard) Big Analytics 4 / 10
![Page 39: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/39.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 40: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/40.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysis
Sentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 41: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/41.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 42: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/42.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 43: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/43.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)
Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 44: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/44.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 45: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/45.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 46: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/46.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 47: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/47.jpg)
How to Read a Billion Blog Posts& Classify Deaths w/o Physicians
Examples of Bad Analytics:
Physicians’ “Verbal Autopsy” analysisSentiment analysis via word counts
Different problems, Same Analytics Solution:
Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’s
Modern Data Analytics: New method for estimating %’s led to:
1
2 Worldwide cause-of-death estimates for
Gary King (Harvard) Big Analytics 5 / 10
![Page 48: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/48.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts:
If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 49: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/49.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts:
If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 50: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/50.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 51: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/51.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 52: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/52.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 53: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/53.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 years
Ignore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 54: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/54.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)
Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 55: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/55.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)
Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 56: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/56.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 57: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/57.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 58: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/58.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)
More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 59: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/59.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts
Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 60: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/60.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thought
Other applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 61: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/61.jpg)
The Solvency of Social Security
Successful: single largest government program; lifted a wholegeneration out of poverty; extremely popular
Solvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs out
SSA data: little change other than updates for 75 years
SSA analytics:
Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimistic
New customized analytics we developed:
Logical consistency (e.g., older people have higher mortality)More accurate forecasts Trust fund needs ≈ $1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.
Gary King (Harvard) Big Analytics 6 / 10
![Page 62: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/62.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 63: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/63.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paper
Now: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 64: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/64.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 65: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/65.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 66: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/66.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to cover
Now:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 67: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/67.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 68: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/68.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?
We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 69: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/69.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them all
Goal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 70: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/70.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)
More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 71: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/71.jpg)
Reading and Writing Technology
Writing Technology: Big changes
Then: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etc
Reading Technology: Little change (ripe for disruption)
Then: 50, 100, 300 years ago: Get book; read cover to coverNow:
How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.
Gary King (Harvard) Big Analytics 7 / 10
![Page 72: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/72.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 73: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/73.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 74: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/74.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 75: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/75.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 76: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/76.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!
Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 77: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/77.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 78: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/78.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 79: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/79.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with help
Invert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 80: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/80.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizes
Insights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 81: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/81.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better
(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 82: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/82.jpg)
Computer-Assisted Reading (Consilience)
To understand many documents, humans create categories torepresent conceptualization, insight, etc.
Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve information
Bad Analytics:
Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)
Our alternative: Computer-assisted Categorization
You decide what’s important, but with helpInvert effort: you innovate; the computer categorizesInsights: easier, faster, better(Lots of technology, but it’s behind the scenes)
Gary King (Harvard) Big Analytics 8 / 10
![Page 83: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/83.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 84: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/84.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 85: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/85.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releases
Categorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 86: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/86.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claiming
New Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 87: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/87.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 88: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/88.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”
“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 89: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/89.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 90: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/90.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 91: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/91.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 92: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/92.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken down
Data: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 93: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/93.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor them
We analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 94: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/94.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censored
Previous understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 95: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/95.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the government
Results:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 96: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/96.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 97: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/97.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the government
Censored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10
![Page 98: Big Data is Not About the Data! - Gary Kinggking.harvard.edu/files/gking/files/evbase-gs.pdfBig Data is Not About the Data! Gary King Institute for Quantitative Social Science Harvard](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec5891a7a54df7331779d91/html5/thumbnails/98.jpg)
Example Insights from Computer-Assisted Reading
1 What Members of Congress Do
Data: 64,000 Senators’ press releasesCategorization: (1) advertising, (2) position taking, (3) credit claimingNew Insight: partisan taunting
Joe Wilson during Obama’s State of the Union: “You lie!”“Senator Lautenberg Blasts Republicans as ‘Chicken Hawks’ ”
How common is it? 27% of all Senatorial press releases!
2 What is the Chinese Government Censoring?
Previous approach: manual effort to see what is taken downData: Crimson Hexagon gets posts before the Chinese censor themWe analyzed 11 million posts, about 13% censoredPrevious understanding: they censor criticisms of the governmentResults:
Uncensored: criticism of the governmentCensored: attempts at collective action
Gary King (Harvard) Big Analytics 9 / 10