Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in...

27
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License Infrastructure, Curation, and Metadata Mark A. Parsons NSF Workshop on Cyberinfrastructure for Polar Sciences 10 September 2013

Transcript of Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in...

Page 1: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License

Infrastructure, Curation, and Metadata

Mark A. Parsons

NSF Workshop on Cyberinfrastructure for Polar Sciences10 September 2013

Page 2: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

From Interregional Highways: Message from the President of the United States Transmitting a Report of the National Interregional Highway Committee, Outlining and Recommending a National System of Interregional Highways, 12 Jan. 1944.CC-BY Eric Fischer http://www.flickr.com/photos/walkingsf/8270270785/

Page 4: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Infrastructure is

Relationships, interactions, and connectionsbetween humans, technologies, and institutions

Page 5: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Interchangecc-by-sa Steven Vance http://www.flickr.com/photos/jamesbondsv/8475376363/

Page 6: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Ranch ExitCC-BY-SA Ken Lund http://www.flickr.com/photos/kenlund/2381991900/

Page 7: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Understanding Infrastructure: Dynamics, Tensions, and Design

2007Paul Edwards, Steven Jackson, Geoffrey Bowker,

and Cory Knobelhttp://hdl.handle.net/2027.42/49353

Page 8: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 9: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 10: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 11: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 12: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 13: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 14: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 15: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 16: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 17: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 18: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data
Page 19: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

datawrangler

Page 20: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Data Management for CLPX—AGU 11 December 2003

Example of Snow Pit DataWe start with this:

Page 21: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Data Management for CLPX—AGU 11 December 2003

Example of Snow Pit Data

… …

Summary Table

Density Table

Stratigraphy Table

Temperature Table

And create this:

Page 22: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Curation is

continual and conscious improvement of data.

Page 23: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Curation matters

• Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data sheets had issues corrected in the field that would not have been correctable later).

• Reveals and documents tacit knowledge. Can even uncover science questions.

• Integrated much more than the field data with common grids and formats

• “Standard” formats often do not exist and need to be created by the community. Sometimes you need multiple formats.

• A mediated relationship between user and collector.

• Need to include curators in data collection even if they are not in the field.

• Get to know your local curator.

Page 24: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Metadata are

Everything necessary to make data independently understandable by a designated community.

Page 25: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

Infrastructure is comprised of relationships.

Curators create and enhance relationships.

Metadata is often the information shared (often tacitly) in those relationships.

Page 26: Infrastructure, Curation, and Metadata · Curation matters • Embedding “data wranglers” in the field significantly improved data quality and completeness. (~20% of the data

A standard sensor format?

• Temporally varying sensors (e.g. a borehole)

• Spatially varying sensors (e.g. a transect)

• Temporally and spatially varying sensors (e.g. a drifting buoy)