IEDA Overview & Updates, March 2014
-
Upload
iedadata -
Category
Technology
-
view
158 -
download
2
description
Transcript of IEDA Overview & Updates, March 2014
IEDA: OVERVIEW & UPDATES
IEDA Supports the Full Data Life Cycle
2
3
Domain-Specific Data Stewardship• Domain-specific guidelines, templates, software tools, and
user support/training that facilitate data submission• including domain-specific tools for data management planning and
compliance reporting• Development, maintenance, and promotion of domain-
specific, community-based standards for data and metadata• Provenance documentation, uncertainties, semantics (vocabularies,
taxonomy), formats• User interfaces optimized for science questions• Harmonization & integration of data for advanced mining &
analysis • Access to external data in relevant otherMapping of data to
standards-based interfaces for interoperability
Domain-specific Repository
Science Community
Central Role of Discipline-specific Repositories
4
Libraries Archives
Computer Science
Publishers, editors
Metadata registrationSoftware (tool) development
InteroperabilityData policies
Persistent access Bibliometrics
Data CurationData access & discovery
Data productsData harmonization (standards)
User Support
Funding Agencies
Data Facilities
Registries
IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• QA/QC, documentation• Persistent identification (DOI)• Long-term archiving
5
6
Marine Geoscience Data System
Data Collections and Custom Data Access:
• GeoPRISMs• Ridge2000 • MARGINS • Academic Seismic Portal (ASP)• Antarctic and Southern Ocean
(ASODS)
• Metadata Catalog and File repository
• Catalog inventory > 0.5 million files, 47 TB, 2,500 programs
7
EarthChem Library
• Repository for geochemical data• analytical data sets• syntheses• models• reports
• Online data submission• Templates for data annotations• Quality control following the Editors
Roundtable best practices
8
IGSN / SESAR• IGSN: Unique, persistent, resolvable identifiers• SESAR: registry of samples in the Earth Sciences• Searchable catalog of samples across the Earth Sciences• Preservation and persistent access of sample metadata
• Used across all Earth Science communities that deal with samples • User services for sample metadata management• submission, editing, transfer of ownership, tracking of subsamples, etc.
• International governance by the IGSN e.V.• non-profit organization, founded in 2011, registered in Germany• currently 13 members (4 new members in 2013)
IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Web-based User interfaces (specialist & non-specialists)• Programmatic access interfaces (interoperability)• GeoMapApp, GoogleEarth, etc.• Links to the literature
9
10
New in 2013: IEDA Data Browser
IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Visualization tools (GeoMapApp, Virtual Ocean, Earth Observer)
• Syntheses & Products
11
Global Multi-Resolution Topography Synthesis
12
Compilation of multi-beam sonar data collected by scientists and institutions worldwide, edited and merged into a single continuously updated compilation of high-resolution seafloor topography.
13
Global Synthesis of rock compositions (EarthChem, PetDB)
• Map of basalt samples from mid-ocean ridges• Color scaled to the 87Sr/86Sr ratio measured on
these samples
IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Web-based data submission• Data Management Plan tool• Data Compliance Report tool• Community
14
15
Use of DMP ToolTarget Program Count
other 27
OIA 3
OCE 134
OPP 11
SBE 1
BIO 5
EAR 94
Total 275
16
IEDA Infrastructure• Cooperative Agreement with NSF• Sustainable funding• Formal community governance & guidance
• Professional data management policies & procedures• Persistent identification of data & samples (DOI, IGSN)• Standards-compliant metadata catalog• Long-term archiving agreements with National Geophysical Data Center
& Columbia University Libraries• Risk management
• “Accreditation” as member of the World Data System• Disciplinary expertise
17
System Usage• # of unique visitors to the IEDA web site increased by 251% • 7998 unique visitors between Oct 2012 and Sept 2013• primary pages accessed: Data Management Plan & IEDA collections.
Results of the user survey of the project “Stakeholder Alignment in the Geosciences: Assessing the Potential Impacts of EarthCube”, showing that IEDA ranks with top 5-8 most cited data sources in the Earth SciencesJ. Cutcher-Gershenfeld, presentation at the EarthCube Domain End-user workshop for Paleogeoscience, February 2013
18
Downloads from IEDA Systems
Data Collection Year 2 Year 3PetDB 2166 2326
SedDB 52 200
EarthChem Library 95 401
EarthChem Portal (1153) 567
MGDS 5,049 4,331
GMRT 7,200 10,177
19
Citations of IEDA Systems
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130
100
200
300
400
500
600
PetDB + EarthChem
MGDS + GMRT + GMA
20
IEDA Data Publication
IEDA Data Publication: “Best Practice”
21
JournalTrusted Data Repository ArticleData
File
Reciprocal citation by DOI
22
Data Publication Flow
Synthesis databases
JournalPortal
IEDA Metadata
CatalogData
Manuscript
DOI linking
Review
IEDA Data Managers
Editors
Submission Publication Integration
IEDA Data Managers
Links to Journals
23
Links to Journals
24
25
New: Linking with Data Journals
26
New: Linking Samples, Data, & Publications
27
Future Capabilities
Mockups by Elsevier Developer Beate Specker
28
Editors Roundtable• Based on the Editors Roundtable in Geochemistry (2007/8)• policy recommendations for reporting of geochemical data
• Goal: Establish an ongoing forum for information exchange between editors, publishers, professional societies, and data facilities• regular meetings at major conferences• wiki (knowledge hub) for best practices, guidelines, capabilities for data
publication and data citation, • focus on domain-specific requirements, practices, data facilities, etc.
• Will be international and independent of a specific institution or society (ESIP?)• Could serve as a role model for other disciplines
29
IEDA Data Rescue Initiative• preserve valuable legacy data sets that are in danger because
of impending retirement or degradation• augment data collections maintained by IEDA• improve procedures and tools for user contributions
• 2013 International Data Rescue Award in the Geosciences• IEDA Data Rescue Mini-Awards• Data Rescue Process Study (collaboration with Elsevier Research Data
Services)
30
IEDA Data Rescue Mini-awardsDelano J, Hauri E, Saal A, Shearer C: “Geochemistry of Lunar Glasses”
Gill J: “Geochemical & geochronological data from Fiji,IBM, and Endeavor segments”
Tivey, M:“Near-bottom Magnetic Data Rescue”
31
Lessons Learned
• Investigators Lessons• Take ownership of your own legacy• Data curation by others may not be complete or correct
• Data rescue of an entire career does not need to be overwhelming • Start with small steps• Disciplinary repositories will help and guide you to what is needed
• Despite the time investment, data rescue is worth it• Others will now be able to re-use the data• Notes taken years ago actually explain anomalies
• Repository Lessons• For Long Tail Data, every project is different • A small incentive will motivate investigators• Data Rescue missions help the repository determine next steps for development of
tools and services
32
• $5,000 award (sponsored by Elsevier) plus trophy• International jury• 16 submissions
33
Award Ceremony at AGU 2013
34
35
Collaborations• New subawards• to UTIG for ASP@UTIG• to M. Ghiorso (OFM-Research) to migrate LEPR data system into IEDA
infrastructure (includes Trace KD database developed by Roger Nielsen)
• Industry collaborations• Elsevier funds Data Rescue Process Study • ESRI will help with GeoMapApp
• EarthCube projects
36
EarthCube Projects• “Deploying Web Services Across Multiple Geoscience Domains”
• BB, lead: T.Ahern, IRIS; IEDA co-PI: Carbotte): The project is focused on developing web services for broadening access to data collections of IRIS, IEDA, UNAVCO, UCAR, Caltec, and SDSC by other disciplines. (main)
• “Community Inventory of EarthCube Resources for Geosciences Interoperability (CINERGI)”• BB, lead: I. Zaslavsky, SDSC/UCSD; IEDA co-PI: Lehnert): The project focuses on developing an
inventory of EarthCube resources, including data systems, standards, services, etc. • “Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery”
• BB, lead: T. Narock, University of Maryland, IEDA co-PI: R. Arko: The project focuses on applying Semantic Web technologies, including Linked Data, to support sharing and integration of ocean science data sets.
• “C4P: Collaboration and Cyberinfrastructure for Paleobiosciences”• RCN, lead: K. Lehnert, IEDA; project focuses on advancing cyberinfrastructure for paleobiosciences
• “Building a Sediment Experimentalist Network (SEN)”• RCN, lead: W. Kim, UT Austin; IEDA co-PI: Hsu
• “EarthCube Test Enterprise Governance: An Agile Approach”• Test Enterprise Governance, lead: L. Allison, University of Arizona; IEDA sub-awardee: Lehnert
37
Council of Data Facilities“The mission of the Council of Data Facilities is to serve in a coordinating and facilitating role”• Provide a collective voice on behalf of the member data facilities to the
NSF and other foundations and associations, as appropriate.• Identify, endorse, and promote standards and best or exemplary
practices in the organization and operation of a data facility.• Identify and support the development and utilization of shared
infrastructure services, including computing services, professional staff development and training services, and related activities.
• Foster innovation through collaborative projects.• Collaborate with standard-setting bodies with respect to standards for
data sharing and interoperability, metadata, and related matters.
38
Council of Data Facilities• Definition: “A data facility is eligible for membership in the
Council if it acquires, curates, preserves, and/or disseminates data, software, models and data services for one or more defined communities in the geosciences.” • Category A: NSF-funded not-for-profit or academic data facilities • Category B: Federally Funded Research and Development Centers
(FFRDCs) and other federal, state, and local data facilities.• Category C: International, private, and other not-for-profit or academic
data facilities..• Category D: Associate members
• Membership categories A, B, and C are all voting members of the Council, with each member sending one designated representative to the General Assembly.
39
Council of Data Facilities• provide advice and guidance to the NSF via the Council’s
Executive Committee on matters pertaining • identify and develop opportunities for collaboration (shared
infrastructure, professional development of staff, etc.)• contribute to the development of geoscience
cyberinfrastructure standards and identified best practices and their implementation or adoption, and help ensure compliance and integration into architectures and workflows in their respective facilities• educate other members of the Council on new developments
relevant to data centers in their respective fields, disciplines, and domains (international, private foundation, etc.).
IEDA: A Multi-Disciplinary Microcosm
www.iedadata.org
40
• geochemistry, marine geophysics, marine geology, geochronology, and more• sensor data versus sample-based observations & experiments• raw data (e.g. multi-beam), field data, lab data, derived data, samples• gridded data, point data, time-series data, maps, photos, and more• file sizes vary from a few kilobytes to terabytes