DataJournalism: How To get data and process them?
-
Upload
lorenzo-pellizzari -
Category
Technology
-
view
311 -
download
0
description
Transcript of DataJournalism: How To get data and process them?
Workshop on Data Journalism
February 17, 2014Ghent
How to get the data and
how to process them?
Lorenzo Pellizzari1
2
About me …
Get the data
Receive it
Advanced search techniquesScrape it
How to get the data?
3
Receive it
4
1
Analyzing the War Logs (Associated Press)
Advanced search techniques: Google
5
2
79.300.000 results
5results
Advanced search techniques: SPARQL
6
2
http://dbpedia.org/sparql
Advanced search techniques: SPARQL
7
2
Advanced search techniques: SPARQL
8
2
http://latemar.science.unitn.it/spacetime/spacetime.html
Freedom of Information laws
9
3
Freedom of Information laws
10
3
Scrape your data
11
4
“Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.” (Wikipedia)
http://www-news.iaea.org/
Scrape your data
12
4
Scrape your data
13
4
14
What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]
Process the data
http://www.kdnuggets.com/
15
The software for data analysis
Share of R- or SAS-related posts to Stack Overflow by week.
http://r4stats.com/articles/popularity/
16
The software for data analysis
17
Example: ABC News
Scraping: Main data coming from gouvernemental websites
Variety of reports: Data on salt and water
FOI: Data on chemical releases
Interactive map of gas wells and leases in Australia
http://datajournalismhandbook.org/
18
Example: ABC News
• A web developer and designer
• A lead journalist
• A part time researcher with expertise in data extraction, excel spread sheets and data cleaning
• A part time junior journalist
• A consultant executive producer
• A academic consultant with expertise in data mining, graphic visualization and advanced research skills
• The services of a project manager and the administrative assistance of the ABC’s multi-platform unit
• Importantly we also had a reference group of journalists and others whom we consulted on a needs basis
http://datajournalismhandbook.org/
19