Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in...
Transcript of Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in...
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
Infrastructure, Curation, and Metadata
Mark A. Parsons
NSF Workshop on Cyberinfrastructure for Polar Sciences10 September 2013
From Interregional Highways: Message from the President of the United States Transmitting a Report of the National Interregional Highway Committee, Outlining and Recommending a National System of Interregional Highways, 12 Jan. 1944.CC-BY Eric Fischer http://www.flickr.com/photos/walkingsf/8270270785/
http://www.shockblast.net/aerial-photographs/urban-sprawl-by-christoph-gielen-arizona/
Infrastructure is
Relationships, interactions, and connectionsbetween humans, technologies, and institutions
Interchangecc-by-sa Steven Vance http://www.flickr.com/photos/jamesbondsv/8475376363/
Ranch ExitCC-BY-SA Ken Lund http://www.flickr.com/photos/kenlund/2381991900/
Understanding Infrastructure: Dynamics, Tensions, and Design
2007Paul Edwards, Steven Jackson, Geoffrey Bowker,
and Cory Knobelhttp://hdl.handle.net/2027.42/49353
datawrangler
Data Management for CLPX—AGU 11 December 2003
Example of Snow Pit DataWe start with this:
Data Management for CLPX—AGU 11 December 2003
Example of Snow Pit Data
… …
Summary Table
Density Table
Stratigraphy Table
Temperature Table
And create this:
Curation is
continual and conscious improvement of data.
Curation matters
• Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data sheets had issues corrected in the field that would not have been correctable later).
• Reveals and documents tacit knowledge. Can even uncover science questions.
• Integrated much more than the field data with common grids and formats
• “Standard” formats often do not exist and need to be created by the community. Sometimes you need multiple formats.
• A mediated relationship between user and collector.
• Need to include curators in data collection even if they are not in the field.
• Get to know your local curator.
Metadata are
Everything necessary to make data independently understandable by a designated community.
Infrastructure is comprised of relationships.
Curators create and enhance relationships.
Metadata is often the information shared (often tacitly) in those relationships.
A standard sensor format?
• Temporally varying sensors (e.g. a borehole)
• Spatially varying sensors (e.g. a transect)
• Temporally and spatially varying sensors (e.g. a drifting buoy)