Post on 13-Feb-2017
Big Data, Little Data, No Data – Who is in Charge of Data Quality?
World Data Systems Webinar #99 May 2016
Christine L. BorgmanDistinguished Professor & Presidential Chair in Information StudiesUniversity of California, Los Angeles
Center for Knowledge Infrastructureshttps://knowledgeinfrastructures.gseis.ucla.edu/
Andrea Scharnhorst, Head of ResearchData Archiving and Networked Services (DANS)Netherlands
Long tail of dataV
olu
me
of
dat
a
Number of researchers
Slide: The Institute for Empowering Long Tail Research3
Open Data: OECD criteria
• Openness • flexibility • transparency• legal conformity • protection of intellectual property • formal responsibility • professionalism • interoperability • quality• security • efficiency • accountability • sustainability
4
Organization for Economic Cooperation and Development (2007)http://www.oecd.org/science/sci-tech/38500813.pdf
• Purposes– Record of observations– Reference– Reproducibility of research– Aggregate multiple sources
• Users– Investigator– Collaborators– Unaffiliated or unknown others
• Time frame– Months– Years– Decades– Centuries
http://chandra.harvard.edu/photo/2013/kepler/kepler_525.jpg
Why sustain access to research data?
5
Big Science <–> Little Science
• Large instruments
• High cost
• Long duration
• Many collaborators
• Distributed work
• Centralized data collection
• Small instruments
• Low cost
• Short duration
• Small teams
• Local work
• Decentralized data collection
7Sloan Digital Sky Survey Sensor networks for science
8
C.L. Borgman (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. MIT Press
http://www.genome.gov/dmd/img.cfm?node=Photos/Graphics&id=85327
Data are representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship.
How to sustain data?
• Identify the form and content• Identify related objects• Interpret• Evaluate• Open• Read• Compute upon• Reuse• Combine• Describe• Annotate…
9Image from Soumitri Varadarajan blog. Iceberg image © Ralph A. Clevenger. Flickr photo
Whose value in data?
10The Stewardship Gap http://bit.ly/stewardshipgap
Community norms and goals
Who takes responsibility for data
Resources available for
stewardship
Who takes what actions
Knowledge about stewardship
Who makes long-term
commitments
Envisioning the Digital Data Archive of the Future: A Case Study of DANS Users
• Data Archiving and Networked Services
Andrea Scharnhorst, Henk van den Berg, Peter Doorn
• University of California, Los Angeles
Christine Borgman, Milena Golshan, Ashley Sands
• DANS visiting scholars
Herbert van de Sompel, Los Alamos National Lab
Andrew Treloar, Australian National Data Service
13
Research Questions
• Who contributes data to DANS, when, why, how, and to what effects?
• Who acquires data from DANS, when, why, how, and to what effects?
• What roles do DANS archivists play in acquiring, curating, and disseminating data?
14
Knowledge Infrastructures for Data Archiving and Data Sharing
Contributed data sets
Data ArchiveData sets selected
for reuse
Economics of the Knowledge Commons
16
Subtractability / Rivalry
Low High
Exclusion Difficult Public GoodsGeneral knowledgePublic domain data
Common-pool resourcesLibrariesData archives
Easy Toll or Club GoodsSubscription journalsSubscription data
Private GoodsPrinted booksRaw or competitive data
Adapted from C. Hess & E. Ostrom (Eds.), Understanding knowledge as a commons: From theory to practice. MIT Press.
http://www.census.gov/population/cen2000/map02.gif
No Data
ncl.ucar.edu
http://onlineqda.hud.ac.uk/Intro_QDA/Examples_of_Qualitative_Data.php
Marie Curie’s notebook aip.org
hudsonalpha.org
17
Pisa Griffin
• Data not available
• Data not released
• Data not usable
Big Data, Little Data, No Data: Scholarship in the Networked World
• Part I: Data and Scholarship – Ch 1: Provocations– Ch 2: What Are Data? – Ch 3: Data Scholarship– Ch 4: Data Diversity
• Part II: Case Studies in Data Scholarship– Ch 5: Data Scholarship in the Sciences– Ch 6: Data Scholarship in the Social Sciences– Ch 7: Data Scholarship in the Humanities
• Part III: Data Policy and Practice – Ch 8: Releasing, Sharing, and Reusing Data– Ch 9: Credit, Attribution, and Discovery– Ch 10: What to Keep and Why
19C.L. Borgman, MIT Press, 2015