IEDA Overview & Updates, March 2014

40
IEDA: OVERVIEW & UPDATES

description

IEDA Overview presentation given by Kerstin Lehnert at the March 2014 IEDA Policy Committee Meeting. Location: NSF, Arlington, VA.

Transcript of IEDA Overview & Updates, March 2014

Page 1: IEDA Overview & Updates, March 2014

IEDA: OVERVIEW & UPDATES

Page 2: IEDA Overview & Updates, March 2014

IEDA Supports the Full Data Life Cycle

2

Page 3: IEDA Overview & Updates, March 2014

3

Domain-Specific Data Stewardship• Domain-specific guidelines, templates, software tools, and

user support/training that facilitate data submission• including domain-specific tools for data management planning and

compliance reporting• Development, maintenance, and promotion of domain-

specific, community-based standards for data and metadata• Provenance documentation, uncertainties, semantics (vocabularies,

taxonomy), formats• User interfaces optimized for science questions• Harmonization & integration of data for advanced mining &

analysis • Access to external data in relevant otherMapping of data to

standards-based interfaces for interoperability

Page 4: IEDA Overview & Updates, March 2014

Domain-specific Repository

Science Community

Central Role of Discipline-specific Repositories

4

Libraries Archives

Computer Science

Publishers, editors

Metadata registrationSoftware (tool) development

InteroperabilityData policies

Persistent access Bibliometrics

Data CurationData access & discovery

Data productsData harmonization (standards)

User Support

Funding Agencies

Data Facilities

Registries

Page 5: IEDA Overview & Updates, March 2014

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

• QA/QC, documentation• Persistent identification (DOI)• Long-term archiving

5

Page 6: IEDA Overview & Updates, March 2014

6

Marine Geoscience Data System

Data Collections and Custom Data Access:

• GeoPRISMs• Ridge2000 • MARGINS • Academic Seismic Portal (ASP)• Antarctic and Southern Ocean

(ASODS)

• Metadata Catalog and File repository

• Catalog inventory > 0.5 million files, 47 TB, 2,500 programs

Page 7: IEDA Overview & Updates, March 2014

7

EarthChem Library

• Repository for geochemical data• analytical data sets• syntheses• models• reports

• Online data submission• Templates for data annotations• Quality control following the Editors

Roundtable best practices

Page 8: IEDA Overview & Updates, March 2014

8

IGSN / SESAR• IGSN: Unique, persistent, resolvable identifiers• SESAR: registry of samples in the Earth Sciences• Searchable catalog of samples across the Earth Sciences• Preservation and persistent access of sample metadata

• Used across all Earth Science communities that deal with samples • User services for sample metadata management• submission, editing, transfer of ownership, tracking of subsamples, etc.

• International governance by the IGSN e.V.• non-profit organization, founded in 2011, registered in Germany• currently 13 members (4 new members in 2013)

Page 9: IEDA Overview & Updates, March 2014

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

• Web-based User interfaces (specialist & non-specialists)• Programmatic access interfaces (interoperability)• GeoMapApp, GoogleEarth, etc.• Links to the literature

9

Page 10: IEDA Overview & Updates, March 2014

10

New in 2013: IEDA Data Browser

Page 11: IEDA Overview & Updates, March 2014

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

• Visualization tools (GeoMapApp, Virtual Ocean, Earth Observer)

• Syntheses & Products

11

Page 12: IEDA Overview & Updates, March 2014

Global Multi-Resolution Topography Synthesis

12

Compilation of multi-beam sonar data collected by scientists and institutions worldwide, edited and merged into a single continuously updated compilation of high-resolution seafloor topography.

Page 13: IEDA Overview & Updates, March 2014

13

Global Synthesis of rock compositions (EarthChem, PetDB)

• Map of basalt samples from mid-ocean ridges• Color scaled to the 87Sr/86Sr ratio measured on

these samples

Page 14: IEDA Overview & Updates, March 2014

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

• Web-based data submission• Data Management Plan tool• Data Compliance Report tool• Community

14

Page 15: IEDA Overview & Updates, March 2014

15

Use of DMP ToolTarget Program Count

other 27

OIA 3

OCE 134

OPP 11

SBE 1

BIO 5

EAR 94

Total 275

Page 16: IEDA Overview & Updates, March 2014

16

IEDA Infrastructure• Cooperative Agreement with NSF• Sustainable funding• Formal community governance & guidance

• Professional data management policies & procedures• Persistent identification of data & samples (DOI, IGSN)• Standards-compliant metadata catalog• Long-term archiving agreements with National Geophysical Data Center

& Columbia University Libraries• Risk management

• “Accreditation” as member of the World Data System• Disciplinary expertise

Page 17: IEDA Overview & Updates, March 2014

17

System Usage• # of unique visitors to the IEDA web site increased by 251% • 7998 unique visitors between Oct 2012 and Sept 2013• primary pages accessed: Data Management Plan & IEDA collections.

Results of the user survey of the project “Stakeholder Alignment in the Geosciences: Assessing the Potential Impacts of EarthCube”, showing that IEDA ranks with top 5-8 most cited data sources in the Earth SciencesJ. Cutcher-Gershenfeld, presentation at the EarthCube Domain End-user workshop for Paleogeoscience, February 2013

Page 18: IEDA Overview & Updates, March 2014

18

Downloads from IEDA Systems

Data Collection Year 2 Year 3PetDB 2166 2326

SedDB 52 200

EarthChem Library 95 401

EarthChem Portal (1153) 567

MGDS 5,049 4,331

GMRT 7,200 10,177

Page 19: IEDA Overview & Updates, March 2014

19

Citations of IEDA Systems

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130

100

200

300

400

500

600

PetDB + EarthChem

MGDS + GMRT + GMA

Page 20: IEDA Overview & Updates, March 2014

20

IEDA Data Publication

Page 21: IEDA Overview & Updates, March 2014

IEDA Data Publication: “Best Practice”

21

JournalTrusted Data Repository ArticleData

File

Reciprocal citation by DOI

Page 22: IEDA Overview & Updates, March 2014

22

Data Publication Flow

Synthesis databases

JournalPortal

IEDA Metadata

CatalogData

Manuscript

DOI linking

Review

IEDA Data Managers

Editors

Submission Publication Integration

IEDA Data Managers

Page 23: IEDA Overview & Updates, March 2014

Links to Journals

23

Page 24: IEDA Overview & Updates, March 2014

Links to Journals

24

Page 25: IEDA Overview & Updates, March 2014

25

New: Linking with Data Journals

Page 26: IEDA Overview & Updates, March 2014

26

New: Linking Samples, Data, & Publications

Page 27: IEDA Overview & Updates, March 2014

27

Future Capabilities

Mockups by Elsevier Developer Beate Specker

Page 28: IEDA Overview & Updates, March 2014

28

Editors Roundtable• Based on the Editors Roundtable in Geochemistry (2007/8)• policy recommendations for reporting of geochemical data

• Goal: Establish an ongoing forum for information exchange between editors, publishers, professional societies, and data facilities• regular meetings at major conferences• wiki (knowledge hub) for best practices, guidelines, capabilities for data

publication and data citation, • focus on domain-specific requirements, practices, data facilities, etc.

• Will be international and independent of a specific institution or society (ESIP?)• Could serve as a role model for other disciplines

Page 29: IEDA Overview & Updates, March 2014

29

IEDA Data Rescue Initiative• preserve valuable legacy data sets that are in danger because

of impending retirement or degradation• augment data collections maintained by IEDA• improve procedures and tools for user contributions

• 2013 International Data Rescue Award in the Geosciences• IEDA Data Rescue Mini-Awards• Data Rescue Process Study (collaboration with Elsevier Research Data

Services)

Page 30: IEDA Overview & Updates, March 2014

30

IEDA Data Rescue Mini-awardsDelano J, Hauri E, Saal A, Shearer C: “Geochemistry of Lunar Glasses”

Gill J: “Geochemical & geochronological data from Fiji,IBM, and Endeavor segments”

Tivey, M:“Near-bottom Magnetic Data Rescue”

Page 31: IEDA Overview & Updates, March 2014

31

Lessons Learned

• Investigators Lessons• Take ownership of your own legacy• Data curation by others may not be complete or correct

• Data rescue of an entire career does not need to be overwhelming • Start with small steps• Disciplinary repositories will help and guide you to what is needed

• Despite the time investment, data rescue is worth it• Others will now be able to re-use the data• Notes taken years ago actually explain anomalies

• Repository Lessons• For Long Tail Data, every project is different • A small incentive will motivate investigators• Data Rescue missions help the repository determine next steps for development of

tools and services

Page 32: IEDA Overview & Updates, March 2014

32

• $5,000 award (sponsored by Elsevier) plus trophy• International jury• 16 submissions

Page 33: IEDA Overview & Updates, March 2014

33

Award Ceremony at AGU 2013

Page 34: IEDA Overview & Updates, March 2014

34

Page 35: IEDA Overview & Updates, March 2014

35

Collaborations• New subawards• to UTIG for ASP@UTIG• to M. Ghiorso (OFM-Research) to migrate LEPR data system into IEDA

infrastructure (includes Trace KD database developed by Roger Nielsen)

• Industry collaborations• Elsevier funds Data Rescue Process Study • ESRI will help with GeoMapApp

• EarthCube projects

Page 36: IEDA Overview & Updates, March 2014

36

EarthCube Projects• “Deploying Web Services Across Multiple Geoscience Domains”

• BB, lead: T.Ahern, IRIS; IEDA co-PI: Carbotte): The project is focused on developing web services for broadening access to data collections of IRIS, IEDA, UNAVCO, UCAR, Caltec, and SDSC by other disciplines. (main)

• “Community Inventory of EarthCube Resources for Geosciences Interoperability (CINERGI)”• BB, lead: I. Zaslavsky, SDSC/UCSD; IEDA co-PI: Lehnert): The project focuses on developing an

inventory of EarthCube resources, including data systems, standards, services, etc. • “Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery”

• BB, lead: T. Narock, University of Maryland, IEDA co-PI: R. Arko: The project focuses on applying Semantic Web technologies, including Linked Data, to support sharing and integration of ocean science data sets.

• “C4P: Collaboration and Cyberinfrastructure for Paleobiosciences”• RCN, lead: K. Lehnert, IEDA; project focuses on advancing cyberinfrastructure for paleobiosciences

• “Building a Sediment Experimentalist Network (SEN)”• RCN, lead: W. Kim, UT Austin; IEDA co-PI: Hsu

• “EarthCube Test Enterprise Governance: An Agile Approach”• Test Enterprise Governance, lead: L. Allison, University of Arizona; IEDA sub-awardee: Lehnert

Page 37: IEDA Overview & Updates, March 2014

37

Council of Data Facilities“The mission of the Council of Data Facilities is to serve in a coordinating and facilitating role”• Provide a collective voice on behalf of the member data facilities to the

NSF and other foundations and associations, as appropriate.• Identify, endorse, and promote standards and best or exemplary

practices in the organization and operation of a data facility.• Identify and support the development and utilization of shared

infrastructure services, including computing services, professional staff development and training services, and related activities.

• Foster innovation through collaborative projects.• Collaborate with standard-setting bodies with respect to standards for

data sharing and interoperability, metadata, and related matters.

Page 38: IEDA Overview & Updates, March 2014

38

Council of Data Facilities• Definition: “A data facility is eligible for membership in the

Council if it acquires, curates, preserves, and/or disseminates data, software, models and data services for one or more defined communities in the geosciences.” • Category A: NSF-funded not-for-profit or academic data facilities • Category B: Federally Funded Research and Development Centers

(FFRDCs) and other federal, state, and local data facilities.• Category C: International, private, and other not-for-profit or academic

data facilities..• Category D: Associate members

• Membership categories A, B, and C are all voting members of the Council, with each member sending one designated representative to the General Assembly.

Page 39: IEDA Overview & Updates, March 2014

39

Council of Data Facilities• provide advice and guidance to the NSF via the Council’s

Executive Committee on matters pertaining • identify and develop opportunities for collaboration (shared

infrastructure, professional development of staff, etc.)• contribute to the development of geoscience

cyberinfrastructure standards and identified best practices and their implementation or adoption, and help ensure compliance and integration into architectures and workflows in their respective facilities• educate other members of the Council on new developments

relevant to data centers in their respective fields, disciplines, and domains (international, private foundation, etc.).

Page 40: IEDA Overview & Updates, March 2014

IEDA: A Multi-Disciplinary Microcosm

www.iedadata.org

40

• geochemistry, marine geophysics, marine geology, geochronology, and more• sensor data versus sample-based observations & experiments• raw data (e.g. multi-beam), field data, lab data, derived data, samples• gridded data, point data, time-series data, maps, photos, and more• file sizes vary from a few kilobytes to terabytes