ISO 19115 Experiences in NASA’s Earth Observing System (EOS) ClearingHOuse (ECHO)
Matthew CechiniRaytheon - EED ID: IN31C-07
Agenda ECHO Metadata Overview
Introduction Problem Space Solutions
• ISO 19115 Lessons Learned– Perceived Issues– Gotchas– Kudos
• Conclusion
Introduction
Earth Observing System (EOS) ClearingHOuse (ECHO) An integral component of metadata management within
NASA’s Earth Observing System Data and Information System (EOSDIS) acting as the core metadata repository and providing a centralized mechanism for metadata and data discovery and retrieval.
How metadata is used by ECHO Discovery Presentation/Documentation Interoperability Validation
Metadata Format Landscape Existing catalog utilizes ECHO format (based upon ECS
data model). Future science missions projected to provide ISO 19115
metadata.
Problem Space
Data discovery and retrieval tenets:1. There exists a set of users who will require the entire
metadata record for advanced analysis.2. There exists a set of ‘core’ metadata fields
recommended for data discovery.3. There exists a set of users who will require a ‘core’ set
of metadata fields for discovery only. 4. There will never be a cessation of new formats or a
total retirement of all old formats.5. Users should be presented metadata in a consistent
format of their choosing.
Solutions
ECHO’s metadata processing solution:1. Identify a cross-format set of ‘core’ metadata fields for
discovery.2. Implement format-specific indexers to extract the ‘core’
metadata fields into an optimized query capability.3. Archive the original metadata in its entirety for
presentation to users requiring the full record.4. Provide on-demand translation of ‘core’ metadata to
any supported result format or standard. ECHO’s usage of ISO 19115/19139
1. Archive original metadata for documentation and advanced usage.
2. Extract ‘core’ metadata fields for data discovery.3. Provide format translations from ISO to/from supported
formats.
Agenda ECHO Metadata Overview
Introduction Problem Space Solutions
• ISO 19115 Lessons Learned– Perceived Issues– Gotchas– Kudos
• Conclusion
Online Resources MimeType
The existing standard could be included, similar to how GML is incorporated, though maintained separately.
MimeType values facilitate automated access where different file types resuls in different workflows (e.g. displaying native jpg images or extracting from hdf). File extensions are not always indicative.
Type Code List values promote interoperability, but potentially
reduce the ability for intra-community customization. A type attribute allows for more detailed identification for
automated access (e.g. specific service protocols http://xml.opendap.org/ns/DAP/3.3# )
ISO 19115 - Perceived Issue
Services Resources Data Discovery
How are links to discovery services made available (e.g. data casting feeds or search endpoints)?
Endpoints may support multiple response formats, how would that be included?
Data Processing Support for data processing links appears to be not
supported. Both series and dataset level metadata may have
URLs to services that expose subsetting, projection, and other services.
Some service-specific information may be required and will need to be included in the metadata.
ISO 19115 - Perceived Issue
Hierarchical Keyword Structure
Representation Non-Standard Delimiters▪ A self-defining hierarchy could be introduced within the
keyword structure allowing for customized keyword lists.
Automated Usage Optional Fields▪ A flat representation of keyword structures that have optional
levels may cause issues for automated keyword parsing.▪ Translation into a metadata format where hierarchy is
expected may not be possible.
<gmd:keyword> <gco:CharacterString>Earth Science > Oceans > Ocean Temperature > Sub-skin Sea Surface Temperature </gco:CharacterString></gmd:keyword>
<gmd:keyword> <gco:CharacterString>Earth Science | Oceans | Ocean Temperature | Sub-skin Sea Surface Temperature </gco:CharacterString></gmd:keyword>
ISO 19115 - Perceived Issue
Spatial Representations Coordinate Systems
Cartesian vs. Geodetic▪ EX_GeographicBoundingBox does not specify a coordinate
system. Two-D Coordinate Systems▪ Unable to find where coordinate reference systems like WRS-
2 and MODIS H/V tiling are a) defined and b) utilized. Orbit Metadata
Series Level▪ Unable to find where series level orbit metadata is
represented (e.g. swath width, period, inclination angle, etc…).
▪ This information may be required for data discovery. Dataset Level▪ Similar concern regarding placement of orbit metadata,
again used for discovery (e.g. orbit number, crossing longitude, etc…)
ISO 19115 - Perceived Issue
Gotchas Terminology
Natural difficulties reconciliing terminology between communities.▪ Dataset & Granules vs. Series & Dataset▪ Archive Center vs Custodian
Codelists are a double edged sword providing consistency but removing specificity and community vernacular.
Citation Overload Contact information can be represented in numerous
locations. Potentially stale contact information may be difficult to
track down Combined Series & Dataset Metadata
Good Idea… Combining series and dataset metadata during presentation.
Bad Idea… Combining series and dataset metadata during archival.
Kudos Citations
Thorough support for providing citations within the metadata.
Metadata Lineage ISO lineage provides an excellent means to capture
repeatable processing history information. Distribution Information
Thorough support for online and offline access options including support for ordering.
Conclusion ISO 19115 is on it’s way to becoming a viable
metadata standard for metadata as a means of documentation.
ISO 19115 is a bit verbose for the pragmatic requirements of data discovery (specifically dataset level).
ISO 19115 lacks support for the growing presence of data processing services.
All metadata standards are expected to have issues and will improve over time.
Top Related