Introduction to SeaDataNet Metadata

26
Introduction to Introduction to SeaDataNet Metadata SeaDataNet Metadata Roy Lowry British Oceanographic Data Centre SeaDataNet Training Course SeaDataNet Training Course

description

SeaDataNet Training Course. Introduction to SeaDataNet Metadata. Roy Lowry British Oceanographic Data Centre. Overview. An introduction to the SeaDataNet metadata formats covering Purpose Entity definition History Population Strengths Weaknesses. Overview. SeaDataNet metadata formats - PowerPoint PPT Presentation

Transcript of Introduction to SeaDataNet Metadata

Page 1: Introduction to SeaDataNet Metadata

Introduction to SeaDataNet Introduction to SeaDataNet MetadataMetadata

Roy Lowry

British Oceanographic Data Centre

SeaDataNet Training CourseSeaDataNet Training Course

Page 2: Introduction to SeaDataNet Metadata

OverviewOverview

• An introduction to the SeaDataNet metadata formats covering

PurposeEntity definitionHistoryPopulation StrengthsWeaknesses

Page 3: Introduction to SeaDataNet Metadata

OverviewOverview

• SeaDataNet metadata formats

European Directory of Marine Organisations (EDMO)

Cruise Summary Report (formerly ROSCOP)

European Directory of Marine Environmental Datasets (EDMED)

European Directory of the Ocean Observing System (EDIOS)

SeaDataNet Common Data Index (CDI)

European Directory of Marine Environmental Research Projects (EDMERP)

Page 4: Introduction to SeaDataNet Metadata

EDMOEDMO• Purpose

Provides SeaDataNet with an address book of organisations associated with marine data

Provides descriptions of these organisations

• Entity definition Any group of people sharing a common postal

address engaged in activities associated with marine data acquisition and use

• History Developed by Maris during SEA-SEARCH in

response to a need to improve address metadata management across the project

Page 5: Introduction to SeaDataNet Metadata

EDMO EDMO

• Population

On-line Content Management System fronted by a web form (http://www.sea-search.net/organisations/)

Partners are responsible for maintenance of their national record set

Management supported by a reasonably sophisticated access control system that authenticates users and grants access to the appropriate database subset

Page 6: Introduction to SeaDataNet Metadata

EDMOEDMO

• Strengths

The maintenance tool. Please use it to look after the entries for your country

Provides a single point of entry for SeaDataNet metadata documents associated with a given organisation

Centralisation of metadata common to other catalogues, replacing four independently maintained address metadata repositories

Rich information content, including descriptions, logos and spatial location information

Page 7: Introduction to SeaDataNet Metadata

EDMOEDMO

• Weaknesses

Simple data model is poorly equipped for the management of organisational evolution

Organisations merge, fragment, rename and move

All we can do in EDMO is document this using plain language fields

Text fields contain embedded markup

These look very nice when displayed through the search interface

However, the markup causes problems generating XML documents for record transport between systems

Examples including graphics and relative URLs break when transported by copy/paste

Page 8: Introduction to SeaDataNet Metadata

CSRCSR• Purpose

To document the operational and data generation activities of an oceanographic research cruise

• Entity definition

A subject of some controversy I am a metadata purist and support the definition of a

‘cruise’ as the interval of time between leaving port and returning to port

Thus for a 3-leg cruise I would generate 3 CSR records whilst others would generate just one. I do this because:

Combining records is easier than splitting them Cruise ‘legs’ for some ships can be VERY different (e.g. 3

legs of a Meteor cruise: one JGOFS, one OMEX, one WOCE)

Merging ‘legs’ is a slippery slope – I’ve even encountered a single record covering the activities of two ships three months apart

Page 9: Introduction to SeaDataNet Metadata

CSRCSR• Entity definition (continued)

Problem with my definition is that the real world creates grey areas. For example, does a personnel change by pilot boat in an estuary count as ‘docking’?

Others, extend the definition to cover any activity collecting oceanographic data (shoehorning) I believe this is a very bad thing to do The activity super-class and other activity sub-classes

are much better described by other metadata standards (e.g. in OGC Observations and Measurements)

Later on in SeaDataNet we could consider incorporating some of these to further enrich our metadata portfolio

In the meantime remember that it is NOT necessary to have every measurement covered by a CSR. If it isn’t appropriate, don’t create one.

Page 10: Introduction to SeaDataNet Metadata

CSRCSR• History

Originally a paper form developed by IOC called a ROSCOP

Replaced in 1990 by the Cruise Summary Report with richer content (but the name ROSCOP stuck)

Numerous on-line databases developed during the 1990s

Primary repositories now DOD for SeaDataNet partners and ICES for non-SeaDataNet

Page 11: Introduction to SeaDataNet Metadata

CSRCSR• Population

On-line web-form (http://www.sea-search.net/roscop/welcome.html)

XML schema available for bulk transfers

• Strengths

Flexible population mechanisms

Long history with a massive legacy population

Cruise is (or should be) a well defined concept to oceanographers

Page 12: Introduction to SeaDataNet Metadata

CSRCSR• Weaknesses

“Parameter” vocabulary

Really a vocabulary describing shipborne activities

No clear equivalent elsewhere for interoperability, but ontological mapping to multiple vocabularies might provide a solution

On-line systems developed using plaintext fields when controlled vocabularies would have made interoperability between repositories more straightforward

Spatial coverage limitations

Coarse-grained

Described using Marsden Squares but BODC has deployed a Web Service to convert these to ISO19115/DIF standard bounding boxes

Page 13: Introduction to SeaDataNet Metadata

EDMEDEDMED• Purpose

To describe marine environmental datasets to promote their discovery

• Entity definition

A dataset, but what is a dataset?

ISO19101 defines a dataset as ‘an identifiable collection of data’ which covers everything from the parameters measured on a single water sample to the 7,500,000 CTDs is the USNODC World Ocean Database

Sound judgement is needed to decide upon appropriate granularity

Best approach is to establish objective criteria

Worth remembering that a measurement may be included in more than one dataset

Posing this question to metadata specialists can provide good sport!

Page 14: Introduction to SeaDataNet Metadata

EDMEDEDMED• History

Developed by BODC in late 80s

Adopted by EU MAST Data Committee, then SEA-SEARCH and now SeaDataNet

• Population Form interface to stand-alone Access database that

is submitted to BODC for ingestion

XML schema available for bulk transfers

• Strengths Content quality controlled on ingestion, therefore

standards are high

Rich content developed during SEA-SEARCH

Page 15: Introduction to SeaDataNet Metadata

EDMEDEDMED

• Weaknesses

Developed in splendid isolation, including vocabularies, therefore interoperability with other systems is difficult

Heavy dependence on plaintext fields: a problem that should be addressed during SeaDataNet

Page 16: Introduction to SeaDataNet Metadata

EDIOSEDIOS• Purpose

To describe marine environmental datasets comprising data that are collected repeatedly, regularly and routinely in order to promote their discovery (initially for operational planning purposes)

• Entity definition

A dataset comprised of data that are collected repeatedly, regularly and routinely, but what is a dataset (c.f. EDMED)?

• History

Developed as an EU project led by EuroGOOS

Inherited by SeaDataNet

Page 17: Introduction to SeaDataNet Metadata

EDIOSEDIOS• Population

Currently an issue There is a Word-based form (the MIF)

– Developed in parallel to the data model and database with no evidence of communication

– Completed MIFs entered into the database at BODC, requiring significant interpretation and information rehashing (long and painful process)

SeaDataNet work in progress

– IFREMER/BODC working to produce an XML schema to facilitate large-scale transfer

– Maris/BODC developing a web-form based content management system along the lines of EDMO

Page 18: Introduction to SeaDataNet Metadata

EDIOSEDIOS• Strengths

Rich data model based on structured fields with minimal plaintext

Data model includes hierarchical relationships between entities (project one-to-many observing programmes one-to-many measurement series)

Data model includes support for complex spatial objects (polygons not boxes)

Data model is particularly well suited to the description of operational oceanographic systems

Page 19: Introduction to SeaDataNet Metadata

EDIOSEDIOS• Weaknesses

At the start of SeaDataNet EDIOS had 17 local vocabularies

Extremely poor content governance

Undergoing replacement with managed SeaDataNet standard vocabularies (6 down 11 to go)

Legacy content has not been systematically quality controlled

Page 20: Introduction to SeaDataNet Metadata

EDIOS EDIOS • How is EDIOS different from EDMED?

Both are content standards designed to describe datasets

Any dataset described by an EDMED document could be described by an EDIOS document and vice versa

Once vocabularies have been harmonised and some mappings set up it should be possible to generate an EDMED document from an EDIOS document

Generation of an EDIOS document from an EDMED document will never be possible

Page 21: Introduction to SeaDataNet Metadata

EDIOSEDIOS• How is EDIOS different from EDMED?

SeaDataNet convention is to use EDIOS for ‘qualifying’ datasets and EDMED for everything else

EDMED currently has a working population mechanism, but EDIOS does not

Advice to partners

Identify datasets to be described by EDIOS documents, map them to the EDIOS data model (relational schema and Access prototype on BSCW) and gather together the necessary information

Prepare EDMED documents for all other data sets and get them into BODC

Submit EDIOS entries to BODC once the necessary systems are operational

Page 22: Introduction to SeaDataNet Metadata

CDICDI• Purpose

To provide an ultra-light discovery metadata description of accessible SeaDataNet data objects

Used to build a manageable fine-grained index of discrete data objects (millions of entries)

• Entity definition

The fundamental SeaDataNet data delivery unit such as a current meter record or a CTD profile

• History

Developed by SEA-SEARCH as a pilot for SeaDataNet

Page 23: Introduction to SeaDataNet Metadata

CDICDI• Population

XML schema describing files that should be generated automatically from existing digital indexes

• Strengths

Light content makes efficient handling of large numbers of records possible

• Weaknesses

Light content restricts available information

Page 24: Introduction to SeaDataNet Metadata

EDMERPEDMERP

• Purpose

Description of European marine research projects and programmes

• Entity definition

A co-ordinated collection of marine data acquisition activities in Europe

• History

Developed by Maris during SEA-SEARCH

Page 25: Introduction to SeaDataNet Metadata

EDMERPEDMERP

• Population

Access form: resulting mdb file submitted to Maris

On-line content management system planned

• Strengths

Provides centralised project metadata

• Weaknesses

Local vocabularies and plaintext

Page 26: Introduction to SeaDataNet Metadata

That’s All Folks!That’s All Folks!

Questions or Geoff?