WK 13 - How to Prepare Ecological Data Sets for Effective Analysis and Sharing
#DataVizInSixWeeks, Wk 5 - Data
-
Upload
anne-stevens -
Category
Technology
-
view
124 -
download
1
Transcript of #DataVizInSixWeeks, Wk 5 - Data
#DataVizInSixWeeksCopyright Anne Stevens
Week One
What is data visualization? Historical context
Week Four
Design issues & best practices
Week Two
Visualization types
Week Five
Big data, data management
Week Three
Perception and cognitionWeek Six
Synthesis
Data Viz In Six WeeksAn Introduction to Visual Analytics course taught at OCAD University, Toronto
By Anne Stevens
Get Data
Clean it
Combine
#DataVizInSixWeeksCopyright Anne Stevens
Social media: early warning
Source: MIT, Health Maphealthmap.org
#DataVizInSixWeeksCopyright Anne Stevens
Social media: early warning
Source: BioDiasporabiodiaspora.com/
#DataVizInSixWeeksCopyright Anne Stevens
Data activism - Ushihidi
Source: Kenyan elections, 2010 – Ushihidiilissafrica.wordpress.com/2011/03/23/crowdsourcing-with-ushahidi/
#DataVizInSixWeeksCopyright Anne Stevens
Crowd sourced data - Ushihidi
Source: Haiti earthquake crisis map – Ushihiditheextremecentrist.wordpress.com/
#DataVizInSixWeeksCopyright Anne Stevens
Data driven journalism
#DataVizInSixWeeksCopyright Anne Stevens
Data driven journalism
www.theguardian.com/news/datablog/2011/jul/28/data-journalism
#DataVizInSixWeeksCopyright Anne Stevens
Open data movement
www.twoviewsbeyond.com/safi/charlie-hebdo-right-say-dumb-stuff-four-interesting-dumb-commentaries/
#DataVizInSixWeeksCopyright Anne Stevens
The 3 V’s: Volume, Variety, Velocity
Source: datasciencecentral.com/forum/topics/the-3vs-that-define-big-data
#DataVizInSixWeeksCopyright Anne Stevens
Finding patterns in all the noise
[Pole] ran test after test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.
Source: forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/
#DataVizInSixWeeksCopyright Anne Stevens
Big Data challenges
Very informal data
Very messy data
Echo chamber effect
Not a representative sample of society
Volatility
#DataVizInSixWeeksCopyright Anne Stevens
Curation, not content, creates value
Get Data
Clean it
Combine it
Explore
Visualize /Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Open Data Portals, eg. City of Toronto
Scrape social media
APIs
Scrape websites
XPath code + Scrape Similar Chrome Extension + Google Docs
RSS Feeds
Create your own data (eg. Survey Monkey)
Sensor data, GPS, mobile phone data (not just names and numbers)
Get Data
Clean it
Combine it
Explore
Visualize /Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Big Data is messy data
Open data is messy
Excel tools
Google Refine
Structure unstructured data
Restructure the data set
Data Wrangler
Google Refine
Excel Pivot Tables
Tableau Reshaper
Get Data
Clean it
Combine it
Explore
Visualize/Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Excel cleaning tools
Text to Columns (a split function)
Remove Duplicates
=SUBSTITUTE(cell ref, “to be replaced”, “replaced with this”)
=FIND(“character to find pos’n of”, cell ref)
=LEFT(cell ref, number of characters to grab from left side)
=RIGHT(cell ref, number of characters to grab from right side)
=LEN(cell ref)
=CONCATENATE(1st thing, 2nd thing, 3rd thing, …)
Paste Special -> Values
=TRIM(cell ref)
=VALUE(cell ref)
#DataVizInSixWeeksCopyright Anne Stevens
Restructure the data set Make data as RAW as possible
One row of headers
Convert section headers to columns
Eliminate empty cells & rows
#DataVizInSixWeeksCopyright Anne Stevens
Source: Data Wranglerhttp://vis.stanford.edu/wrangler/
#DataVizInSixWeeksCopyright Anne Stevens
#DataVizInSixWeeksCopyright Anne Stevens
Connect data from different sources
Make structure & syntax consistent
Structure unstructured data
Get Data
Clean it
Combine it
Explore
Visualize/Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Combining data sets
Provides context that can lead to new insight
Presents a lot of challenges
Social media is typically informal and unstructured Formal vs informal data
Structured vs unstructured data
Data viz needs structured data
#DataVizInSixWeeksCopyright Anne Stevens
Combining data
Challenges Combining structured with unstructured data
Non-standard vocabulary, units, accuracy
$ values from different years have to be adjusted for inflation
Don’t mix weighted & unweighted data
Don’t mix raw and normalized data
MAUP (modified areal unit problem)
Licensing issues
#DataVizInSixWeeksCopyright Anne Stevens
Probe into data
Histograms for variable distribution
Log vs. linear axes scales
Get Data
Clean it
Combine it
Explore
Visualize/Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Use existing chart libraries (Tableau etc.)
Create original visualizations (D3.js, Processing etc.)
Test with sample data sets
Get Data
Clean it
Combine it
Explore
Visualize/Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Update
Maintain
Check
Get Data
Clean it
Combine it
Explore
Visualize/Analyse
Maintain
#DataVizInSixWeeksCopyright Anne Stevens
Resources
Xpath tutorials
annielytics.com/blog/google-docs/how-to-scrape-the-web-using-google-docs/
w3schools.com/xpath/default.asp
distilled.net/blog/distilled/guide-to-google-docs-importxml/
Google Refine: OpenRefine
openrefine.org
Tutorial: http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial
Scraper / Scrape Similar
chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd
mnmldave.github.io/scraper/
Data cleaning
schoolofdata.org/courses
#DataVizInSixWeeksCopyright Anne Stevens
Week One
What is data visualization? Historical context
Week Four
Design issues & best practices
Week Two
Visualization typesWeek Five
Big data, data management
Week Three
Perception and cognitionWeek Six
Synthesis
Data Viz In Six WeeksAn Introduction to Visual Analytics course taught at OCAD University, Toronto
By Anne Stevens
stevensanne.com
stevensanne.com/blog/
@3_ring_binder