Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data...

19
Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data...

Page 1: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Summary Report fromThursday, 3 March 2011 Pine Room Data Integration Breakout Group

Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data

Page 2: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Discussion Prompt In your view/experience what parts of data integration

implementations/applications or frameworks are well established (or not) in your discipline(s) and what are the common gaps?

Moderator: Cyndy Chandler (WHOI, BCO-DMO)Rapporteur: Chris Mattmann (NASA JPL, USC)Discussion notes kept at TWC hosted titanpad site

Page 3: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Participants• Bob Arko (Lamont-Doherty Earth Observatory)• Joanne Luciano (TWC, RPI)• Anna Milan (National Geophysical Data Center)• Bob Simons (NOAA)• Brian Wee (NEON, Inc.)• Leslie Hsu (LDEO)• Roland Viger (USGS)• James Wilson (James Madison University)• Tom Narock (NASA/GSFC)• Cathy Constable (SIO, UCSD)• Ruth Duerr (NSIDC)• Yoori Choi (CUAHSI)• Lee Allison, Arizona Geological Survey • Erin Robinson (ESIP)• Kavitha Chandrasekar, Indiana University• Bob Detrick (NSF)• Clifford Jacobs (NSF)• Leonard Jonson (NSF)

Page 4: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Data Integration

• What does that mean?Combining more than one data source into a single data

object. Different from display of multiple data sources in a single view.

Example: a database joinTime series data sets made up of a variety of sources of

data often require data integration.Data aggregation and interoperability are related concepts.

Group did not come to consensus.

Page 5: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Geo Disciplines Represented

• Geology• Hydrology• Oceanography• Geophysics• Geography• Marine geology and geophysics• Space science• Air quality• Computational neuroscience• Multi-disciplinary or discipline-agnostic: data management,

computer science and archive

Page 6: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Geo-Data Integration

• What aspects are well established or not?• Identify common gaps?

Page 7: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

• For many projects, two common themes emerged as being associated with some level of success in ability to do data integration:– ‘long-term’ commitment of funding support– Active engagement of funding managers

Examples:Unidata (Atmospheric Sciences)CUASHI (Hydrography)IRIS (Earthquake)US JGOFS, US GLOBEC, US WOCE (Ocean Sciences)ODP (Ocean Drilling)NEON

Page 8: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Support for Data Integration

Development of community of practice• Infrastructure to foster communication (workshops)• Mentoring of students and early career PIs• Development of tools (e.g. Unidata developed

NetCDF which has been adopted by many communities)

• Education and training• The persistence and recognition of a ‘named’

community can enable funds to flow from some agencies to researchers

Page 9: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Support for Data Integration

• Some communities agreed on common data formats that facilitated data integration

• Pressures from funding agencies or community needs resulted in common software tools

• Some communities identified ‘primary’ or ‘core’ variables (e.g. common, essential measurements)

Page 10: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Summary

• ‘Long-term’ funding support enables development of a community-of-practice that fosters communication, education and training, development and adoption of common tools and identification of core measurements. Communities-of-Practice can divide up the labor and work collaboratively to address shared challenges (economy of scale).

Page 11: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Additional Observations

• Tension between local and global (single PI to coordinated project to national to international). An awareness of global use of data could help with subsequent data integration.

• Early planning/specs for data management are important but traditionally difficult to obtain funding.

Page 12: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Gaps

• Lack of awareness/understanding that keeping data ‘alive’ (usable) is not free

• Many people think data stewardship and data preservation are "solved problems” (not).

• "bit level preservation" has been solved, but what is the useful lifespan of those files? What effort is required to make the archived data compatible with all the latest tools and technology. Ability to use a dataset declines over time, without continuing and ongoing attention to ensure that it's still meeting the current access requirements.

Page 13: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Gaps

• Historical or legacy data (originating PI is no longer active in the research community)

• no national policy for scientific preservation• different disciplines have different

interpretations of features in a dataset• Lack of guidelines for best practices regarding

metadata required to document model results* software, methodology, inputs, outputs, etc

Page 14: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Gaps

• Misconception that you create metadata one time, and it's forever good– not a true statement– somehow the metadata needs to be updated– systems and the infrastructure need to support

this– metadata needs to evolve over time

Page 15: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Suggestion

Group agreed that ESIP would be an appropriate community in which to continue these discussions and start to do some much needed planning and cross-disciplinary solutions needed to address the gaps and improve infrastructure for geo-data integration.

Page 16: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Additional Comments

• NRC study done 7-8 years ago about the loss of data and samples in the geosciences:

http://www.nap.edu/openbook.php?record_id=10348&page=R1

• Geoscience Data and Collections: NATIONAL RESOURCES IN PERIL

Page 17: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Additional Comments

• Marine Metadata Interoperability (MMI) http://marinemetadata.org/

Collection of ‘Guides’ on topics including Semantic Web technologies, controlled vocabularies, ontologies, standards, metadata best practices, and much more.

• MMI Ontology Registry and Repository (ORR) is a web application through which you can create, update, access, and map ontologies and their terms. http://mmisw.org/orr/#b

Page 18: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Additional

• CUASHI: Hydrologic Ontology System (funded by NSF)

http://his.cuahsi.org/ontologyfiles.htmlhttp://water.sdsc.edu/hiscentral/startree.aspx

• "Data Management Plan" template available from CUAHSI (February 2011). It is available at http://www.cuahsi.org/his-dmp.html; and includes data inventory, data and metadata standards, data management life cycle, etc.

Page 19: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation.

Additional Comments

• EXILIR http://www.bbsrc.ac.uk/science/international/elixir.aspx European life science infrastructure for biological information.

• Its Mission: To construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society.