Data Journalism 2: cleaning, combining, communicating

39
Monday, 5 March 2012

description

 

Transcript of Data Journalism 2: cleaning, combining, communicating

Page 1: Data Journalism 2: cleaning, combining, communicating

Monday, 5 March 2012

Page 2: Data Journalism 2: cleaning, combining, communicating

Watch: West Wing on maps www.youtube.com/watch?v=n8zBC2dvERM

Monday, 5 March 2012

Page 3: Data Journalism 2: cleaning, combining, communicating

Online JournalismCity UniversityPaul Bradshaw

Data 2: clean, combine, communicate

Monday, 5 March 2012

Page 4: Data Journalism 2: cleaning, combining, communicating

5 things you need to know about eachData journalism in actionWalkthrough

Themes

Monday, 5 March 2012

Page 5: Data Journalism 2: cleaning, combining, communicating

.

How clean is the data?

Monday, 5 March 2012

Page 8: Data Journalism 2: cleaning, combining, communicating

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article”

- Pete WardenMonday, 5 March 2012

Page 9: Data Journalism 2: cleaning, combining, communicating

http://delicious.com/paulb/benfordslawMonday, 5 March 2012

Page 10: Data Journalism 2: cleaning, combining, communicating

1. Data always needs cleaning up2. Treat the ‘source’ like a source3. Use the right ‘average’ and percentage4. Watch for changing context: inflation, boundaries, classification5. Always work on copies of raw data

5 things you need to know about cleaning data

Monday, 5 March 2012

Page 11: Data Journalism 2: cleaning, combining, communicating

Monday, 5 March 2012

Page 12: Data Journalism 2: cleaning, combining, communicating

“What the Independent have done is confuse the UK’s deficit with our debt [making] the debt problem look around eight times worse than it is. And it used the whole of its front page to do so.”

- James BallMonday, 5 March 2012

Page 14: Data Journalism 2: cleaning, combining, communicating

A town has two hospitals. Hospital A is bigger than hospital B. One of them has a birth rate of 60% boys. Which one is it more likely to be?

Question?

Monday, 5 March 2012

Page 15: Data Journalism 2: cleaning, combining, communicating

The smaller hospital is more likely to have a 60% birth rate - larger samples are more stable.

Question?

Monday, 5 March 2012

Page 16: Data Journalism 2: cleaning, combining, communicating

16http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/

Monday, 5 March 2012

Page 17: Data Journalism 2: cleaning, combining, communicating

Measurement doesn't answer anything if there's only one variableStatistical significanceSample size and selectionControls and the placebo effectRegression to the meanRead up.

What is the data worth?

Monday, 5 March 2012

Page 18: Data Journalism 2: cleaning, combining, communicating

Data > Text to columns or =SPLITFind & replace=IF(condition, if met, if not)=TRIM, =CONCATENATE=RIGHT, =LEFT, =MID=REPLACE, =SUBSTITUTE=LEN

Getting data ready to answer questions

Monday, 5 March 2012

Page 19: Data Journalism 2: cleaning, combining, communicating

Edit cells > common transformsEdit cells > split multi-valued cellsFacet > text facetExport...

Walkthrough: cleaning data in Google Refine

Monday, 5 March 2012

Page 20: Data Journalism 2: cleaning, combining, communicating

.

Communicating data stories

Monday, 5 March 2012

Page 21: Data Journalism 2: cleaning, combining, communicating

Monday, 5 March 2012

Page 22: Data Journalism 2: cleaning, combining, communicating

1. Choose the chart for the purpose2. For answers or for story?3. Good design is when there’s nothing more to take away4. It should be self-contained & have refs5. Be careful with scales and classes

5 things you need to know about visualising data

Monday, 5 March 2012

Page 23: Data Journalism 2: cleaning, combining, communicating

or http://chartchooser.juiceanalytics.com/Monday, 5 March 2012

Page 24: Data Journalism 2: cleaning, combining, communicating

http://junkcharts.typepad.com/junk_charts/trifecta-checkup/

Monday, 5 March 2012

Page 25: Data Journalism 2: cleaning, combining, communicating

What is wrong with this picture?

Monday, 5 March 2012

Page 26: Data Journalism 2: cleaning, combining, communicating

Monday, 5 March 2012

Page 27: Data Journalism 2: cleaning, combining, communicating

http://simplecomplexity.net/statistics-without-context/

Monday, 5 March 2012

Page 28: Data Journalism 2: cleaning, combining, communicating

.

Monday, 5 March 2012

Page 29: Data Journalism 2: cleaning, combining, communicating

ManyEyes, Tableau, Number PictureWordle, TagxedoBatchGeo, FusionTablesGephiDelicious.com/paulb/vis+tools

Visualisation tools

Monday, 5 March 2012

Page 30: Data Journalism 2: cleaning, combining, communicating

Publish embed code & link to dataHave or join a Flickr group for visualisations, comment on othersTumblr blog Digg, Reddit, StumbleuponBuzzdata

Distribution: getting social

Monday, 5 March 2012

Page 31: Data Journalism 2: cleaning, combining, communicating

.

Mashing data

Monday, 5 March 2012

Page 32: Data Journalism 2: cleaning, combining, communicating

1. It is what a journalist does best2. Look for a point of connection: place? Person? Company? Date? Code?3. Mashups can be live, updated or static 4. What an API can do5. What APIs there are

5 things you need to know about mashing data

Monday, 5 March 2012

Page 33: Data Journalism 2: cleaning, combining, communicating

Monday, 5 March 2012

Page 34: Data Journalism 2: cleaning, combining, communicating

Yahoo! Pipes, xFruitsOpenHeatMapMapalist, Maptube, FusionTablesScraperwikiGoogle Refine

Mashup tools

Monday, 5 March 2012

Page 35: Data Journalism 2: cleaning, combining, communicating

Edit column > Add column by fetching URLsUse GREL (Google Refine Expression Language)Search web for help & examples

Walkthrough: grabbing geo data with Google Refine

Monday, 5 March 2012

Page 36: Data Journalism 2: cleaning, combining, communicating

.

Questions?

Monday, 5 March 2012

Page 37: Data Journalism 2: cleaning, combining, communicating

Links

OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj09Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/statistics Delicious.com/paulb/mashups

Monday, 5 March 2012

Page 38: Data Journalism 2: cleaning, combining, communicating

Before the lab: play with these techniques yourself, have problems, find solutions, raise questions. Install Google Refine and Tableau on your laptop to use.- Visualise, interrogate or mash data

Lab

Monday, 5 March 2012

Page 39: Data Journalism 2: cleaning, combining, communicating

Books

Kaiser Fung - Numbers Rule Your WorldBen Goldacre - Bad ScienceDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data

Monday, 5 March 2012