Data mediators experience with metadata – A national data centre view Peter Burnhill (Director) &...
-
Upload
marybeth-owens -
Category
Documents
-
view
214 -
download
0
Transcript of Data mediators experience with metadata – A national data centre view Peter Burnhill (Director) &...
Data mediators experience with metadata
– A national data centre view
Peter Burnhill (Director) & Tony MathysEDINA National Data Centre
University of Edinburgh
With contributions from David Medyckyj-Scott
http://edina.ac.uk/
Activating metadata: the role of metadata in effective spatial data exploitation,
Cambridge, 6–7th July 2005
NIEeS Metadata Workshop
Overview
• EDINA national data centre
• Acting as a mediator
• Internal use of metadata
• Issues and challenges
• Dataset publishing
EDINA
• A National Data Centre for Tertiary Education since 1995– based at the University of Edinburgh Data Library
• Our mission... to enhance the productivity of research, learning and teaching
in UK higher and further education • Focus is service but also undertake r&D projects to
services• Major content provider within the acadmia• Strategic move toward interoperability & shared services
role• Substantial experience in handling and delivering key
geospatial data and geo-referenced information
Existing Geo-data Services
Services
Our interest in metadata has a long history…
• Beginning in the 1980s, more than 25 years experience with geospatial metadata initiatives, policies, projects and services e.g.
– ESRC Computer files cataloguing group (1980s)– Register of spatially referenced data for Scotland (1991)– “Metadata in the Geosciences” (published 1991)– Global Environmental Network for Information Exchange (GENIE) 1990s– Rawa Taio – environmental metadata service (NZ, 1996)– MetroGIS, Minneapolis/Saint Paul Metropolitan Organisation for
promoting spatial data sharing (1998)– State representative on ANZLIC metadata WG– Geo-data browser – Go-Geo! portal (2000+)– Advisors to AskGiraffe and now hosting GIGateway service– UK GEMINI (Geo-spatial Metadata Interoperability Initiative)
Simplified workflow
Discover
Locate
Access
Use
Publish
Fit for purpose?
Preserve
Metadata provided by EDINA
Discover
Locate
Access
Use
Support information e.g.• OS user guides• Map sheet metadata e.g. survey date• Legend files• Format descriptions• Explanations of key concepts
Metadata records for OS products, DBDs and agcensus(114 metadata records created by EDINA and published on Go-Geo! – another 100+ still to produce)
EuroGlobalMap metadata records supplied by National Mapping Agencies
What we are supplied with
• No metadata at all• or partial
– e.g. sheet/tile level and not ‘collection’ level
• or incomplete It lacks– a product specification– lineage information (history, differences between
‘editions’, why changed)– quality statement– descriptions of processing– information on file formats– coding book (definitions of attributes)and so on…
• Not machine readable
Internal metadata activity
• Organisational memory is important to EDINA– “stored information from an organisations history that can be
brought to bear on present decisions”– distributed across different retention facilities and often informal
i.e. it’s in someone’s heads– now trying to formalise it – what, when, where, how, who and why
• Activities– creating discovery level records– documenting processing steps occurring through the life cycle of
a dataset– data quality statement which describe the completeness,
consistency and accuracy of the dataset– created an ISO 19115 data quality extension– how do we code processing steps?
Issues and challenges
• Motivating people to document datasets is a key challenge– seen as onerous task and left undone– we were saying this in ’80s and situation no better now
• Difficult to fully automate – requires human interpretation
• If we don’t do it, risk of data loss or expensive re-acquisition
• Greater ROI from re-use• It’s a people and organisational problem
– but also concerns about IPR, copyright and mechanisms for sharing
Dataset publishing
• Re introduce the concept of Dataset Publishing (Callahan, Johnson, and Shelley 1996)
– analogous to publishing papers– rewards people for publishing datasets (e.g. promotion,
RAE)– involves establishment of procedures (e.g. standards to
use, peer review) & resources to manage procedures* Should minimise time and effort required
– a dataset description is the equivalent of the bibliographic record
– need tools to assist in creation, maintenance and dissemination of dataset descriptions
• EDINA involved in two related activities– Go-Geo! Portal Phase 4b– GRADE – (Geospatial Repository for Academic
Deposit and Extraction)
EDINA data publishing support projects
Go-Geo! Portal – phase 4b• JISC funded, 18 month project• Go-Geo! portal serves as a
discovery tool now extending to become a publication tool
• Promote and encourage geospatial metadata creation within UK tertiary education
• A pilot study with 4 universities to establish a business model for metadata creation and maintenance based on the use of Go-Geo! resources as local data management tools
GRADE• JISC funded project, 18 months• Looking at utility of geospatial
data repositories for storing and sharing of geospatial data
• Comparing thematic v. institutional v. informal
• Compendium of use cases of intended data sharing
• Assess interoperability aspects of geospatial data repositories
• www.gogeo.ac.uk• www.gogeo.ac.uk/Phase4b.html• edina.ac.uk/projects/grade/
Comments and observations
• We need to understand better the life cycles of data and metadata as they are disseminated across the academic community– Authorship of data and metadata as data are merged,
generalised, augmented, new data derived, new editions published
– Tracking and recording digital rights as this happens
• Are we documenting what users really want to know?– Subject and content
• On the annotation of datasets and metadata• Thesauri v. controlled terms v. ontologies • Making metadata actionable
Conclusions
• Metadata creation should happen close to data creation
• Metadata population and maintenance must be viewed as an on-going long term process
• Need to think more about what happens once metadata and data is published
• Can we really call ourselves spatial data management professionals?
Contact details
Peter BurnhillDirector Edina National Data Centre
Tony MathysGo-Geo Project
Tel.: +44 (0)131 650 3302Fax: +44 (0)131 650 3308
EDINA web site: http://edina.ac.ukGo-Geo!: www.gogeo.ac.uk