Big Data and the Social Sciences
-
Upload
abe-usher -
Category
Data & Analytics
-
view
100 -
download
0
description
Transcript of Big Data and the Social Sciences
![Page 1: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/1.jpg)
Big Data Technologyand the Social Sciences:
A Lecture at Mannheim University
Abe Usher CCHP, CISSP Chief Technology Officer, HumanGeo
![Page 2: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/2.jpg)
2
What’s In It For You?
Theory•Definitions and overview
•Where data are being generated
Practice•Google’s three secret techniques* for unlocking insights from data
•The kitchen model
•Recommended resources to build data science skills
Presentation slides: http://www.slideshare.net/abeusher/big-data-and-the-social-sciences
*Not specifically endorsed by Google. Also, not really a secret.
![Page 3: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/3.jpg)
3
Background
HumanGeo is focused on digital Human Geography:
Understanding the location attributes of individuals and groups
And the social attributes of locations
Through ‘Big Data’ analysis of billions geolocated data elements
![Page 4: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/4.jpg)
4
Big Data Wake-Up Call
Berkeley University Research http://goo.gl/zjSUr1
By 2016 the rate of data growth surpasses the rate of Moore’s Law
![Page 5: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/5.jpg)
5
Defining Big Data
http://knowyourmeme.com/memes/you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means
![Page 6: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/6.jpg)
6
Big Data Definition
Boring Traditional definition
“High volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
![Page 7: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/7.jpg)
7
Big Data Definition
Abe’s definition:
![Page 8: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/8.jpg)
8
The Original “Big Data”
1880 US Census•50 million people
•Data included: age, gender, number of insane people in household*
•Took 7 years to tabulate
•1890 Census estimated at 13 years to complete
*Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
![Page 9: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/9.jpg)
9
The Original “Big Data”
1880 US Census•50 million people
•Data included: age, gender, number of insane people in household*
•Took 7 years to tabulate
•1890 Census estimated at 13 years to complete
1890•63 million people
•Additional data: citizenship and military service
•New technology: Hollerith Tabulating System
•Took 6 weeks to tabulate (76x faster)
Takeaway• Better technology and
methodology led to 76x speedup
*Credit to Ken Krugler for this factoid: http://www.censusrecords.com/content/1880_census
![Page 10: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/10.jpg)
10
Data Generation
Where are data created?•Website interaction logs
•Social Media
•Cyber events
•Smartphones
What is the volume?•3B phone calls in USA
•700M Facebook posts
•500M tweets per day
•50B WhatsApp messages per day
Takeaway• Social media,
telecommunication, and instant messaging generate an increasingly high volume of data
![Page 11: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/11.jpg)
11
Traditional Modelof Interpreting Observations
Tracy Morrow (aka “Ice T”)
How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)?
http://www.npr.org/2005/08/30/4824690/original-gangster-rapper-and-actor-ice-t
![Page 12: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/12.jpg)
12
Tracy Morrow (aka “Ice T”)
How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)?
“Game knows game, baby.”
Traditional Modelof Interpreting Observations
![Page 13: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/13.jpg)
13
Tracy Morrow (aka “Ice T”)
How can you identify a legitimate hip-hop artist (versus someone who just gets up and rhymes)?
“If you have expert knowledge, then you are capable of answering complex questions by interpreting domain specific information.” [paraphrased]
Traditional Modelof Interpreting Observations
![Page 14: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/14.jpg)
Trust Models for complex data
• August Gorman carried out a plot to grab fractions of a penny from a corporate payroll system. http://goo.gl/vAScel
14
IMDB: 4.9/10Rotten Tomatoes: 26/100
![Page 15: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/15.jpg)
Trust Models for complex data
• Peter Gibbons hatches a plot to write a computer virus that grab fractions of a penny from a corporate retirement account. http://goo.gl/rDg1U
• Known in security circles as a salami attack.
15
IMDB: 7.9/10Rotten Tomatoes: 79/100
Takeaway point: Little bits of value (information) provide deep insights in the aggregate
![Page 16: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/16.jpg)
16
1. Aggregation
2. Visualization
3. Correlation
New Models of Interpreting (Big) Data
Takeaways• Expert based knowledge is no
longer sufficient.• Simple mathematical methods
create value from captured data
![Page 17: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/17.jpg)
17
Aggregation(Counting)
William Thomson, 1st Baron Kelvin
"When you can measure what you are speaking about, and express it in
numbers, you know something about it.”
Takeaway• Aggregation via counting
things is the most common way to exploit Big Data
![Page 18: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/18.jpg)
The book “Fearless” is much more popular than the 80s movie “Navy Seals.”It also has a more favorable distribution of reviews.
Aggregation:A Tale of Two Products
![Page 19: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/19.jpg)
The distribution we’re looking for looks like the #1 hand:Responses concentrated in the most positive category,With very few responses that were unfavorable.
Aggregation:A Tale of Two Products
![Page 20: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/20.jpg)
Aggregation & Visualization:Counting with Google Trends
![Page 21: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/21.jpg)
Aggregation & Visualization:Bing Search vs. Google Search
![Page 22: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/22.jpg)
Aggregation:Diet Pepsi vs. Diet Coke
![Page 23: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/23.jpg)
Aggregation & Visualization:Big Data vs. Britney Spears
![Page 24: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/24.jpg)
Geospatial Visualization Example:Social Drift in DC
Takeaway• Visualization provides a
powerful mechanism for Exploratory Data Analysis
A
![Page 25: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/25.jpg)
25
Correlation:Canadian Flu Research
Gunther Eysenbach•Professor @ University of Toronto
•Focused on eHealth
•Google Ads user
Infodemiology•2004-2005 tracked flu related searches
•54,507 Ad impressions in Canada
•High R^2 correlation to actual flu activity
http://gunther-eysenbach.blogspot.com/
Infodemiology paper: http://goo.gl/aeUZtA
Takeaway• Human behavior in response
to Google Ads related to the flu was highly correlated with “officially reported” cases of the flu.
![Page 26: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/26.jpg)
26
Correlation:Google Flu Trends
“Google Flu Trends provides near real-time estimates of flu activity for a number of countries and regions around the world based on aggregated search queries.”
Process•Map searches to regions
•Quantify “normal”
•Detect “anomalies”
NPR: http://goo.gl/Iv7A87
NYT: http://goo.gl/mNyAi7
![Page 27: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/27.jpg)
27
Correlation:Box Office Hit Prediction
“Use of socially generated ‘big data’ to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science.”
Simple factors•number of total page views
•number of total edits made
•number of users editing
•number of revisions in the article's revision history
Early Prediction of Movie Box Office Success: http://goo.gl/BWf7H1
Counts of Wikipedia factors correlate to Box Office sales
![Page 28: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/28.jpg)
28
Big Data:Significance for Social Sciences
1. Proxy variables.Digital exhaust collected for purposes other than survey often creates ‘proxy variables’ that provide complementary insights.
2. Aggregation Insights.Combining many small observations leads to insights that we can trust.
3. Data Linking.It is possible to ‘link’ or synchronize records between digital exhaust and instrumented surveys by selecting a common dimension (e.g. location).
The future of social science will involve combining “fuzzy Big Data insights” with instrumented survey results
![Page 30: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/30.jpg)
Chef Ingredients Utensils Recipes
The kitchen model of value creation
YourStaff
YourData
Technology Techniques
![Page 31: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/31.jpg)
31
Take Action:Experiment yourself
Exploratory Data Analysis lifecycle:• collect - Twitter API, Datasift.com• clean - open refine• analyze - Python or R• visualize - Google Earth
Related data: https://s3.amazonaws.com/devbackup/germany.txt.gz
Related code: https://github.com/abeusher
![Page 32: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/32.jpg)
32
Take Action: Explore
Google Trends http://goo.gl/8eJZg Google Ngram http://goo.gl/4U09fa
Google Correlate http://goo.gl/nEhe8D Bing Keyword Research http://goo.gl/q2V88g
![Page 33: Big Data and the Social Sciences](https://reader033.fdocuments.net/reader033/viewer/2022051013/547e43f9b4795989508b4b37/html5/thumbnails/33.jpg)
33
Contact information
Abe Usher
Email: [email protected] Twitter: @abeusherLinkedIn: http://goo.gl/DUxZOP Presentations: http://goo.gl/bCa3Qt