European Space Weather Week 3 Brussels, November 13-17, 2006 Atmospheric Data Management - A...

23
European Space Weather Week 3 Brussels, November 13-17, 2006 Atmospheric Data Management - A Challenge - Anne De Rudder and Sue Latham Rutherford Appleton Laboratory, UK
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    7

Transcript of European Space Weather Week 3 Brussels, November 13-17, 2006 Atmospheric Data Management - A...

European Space Weather Week 3Brussels, November 13-17, 2006

Atmospheric Data Management

- A Challenge -

Anne De Rudder and Sue Latham

Rutherford Appleton Laboratory, UK

European Space Weather Week 3Brussels, November 13-17, 2006

?

In 2 or 3 decades, the universe of data has gone from…

…to

European Space Weather Week 3Brussels, November 13-17, 2006

• One of the NERC designated Data Centres and a component of the NCAS

• Documented long-term data archive (currently about 130 catalogued datasets)

• About 8,000 registered users worldwide, among whom 3,000 have applied for access to specific datasets and 2,000 have downloaded data in the past year

• Data management in support to NERC research programmes, grants and facilities and occasionally to some international research projects

• Data are distributed via the web

• Assistance to users regarding atmospheric data issues (trajectories, online help desk, visualisation facilities, software, links, …)

The BADC

http://badc.nerc.ac.uk/

European Space Weather Week 3Brussels, November 13-17, 2006

• Data policies – their purpose and implementation

• Model versus observation

• Metadata

• Citation and publication

• Data access networks (grids)

• Speaking the same language

• A few traps to beware of

Contents

European Space Weather Week 3Brussels, November 13-17, 2006

Aims

• Ensuring the swift exchange of knowledge within a research project.

• Ensuring that the newly acquired knowledge, or at least the material on which it relies, is kept for possible future reference, improvement and use and is made available to the community.

• Ensuring that the data is documented in a way that will allow long-term access to — and understanding of it

• Ensuring that researchers’ rights are not infringed on.

Data policies

Data management plans

• To implement the principles outlined in the data policy

• To plan how and when data will be generated, shared, stored within a project

• DMPs also include arrangements for the provision of supporting third-party data (e.g. met data from the UK MetOffice, provision of NRT data or forecasts to support field campaigns)

European Space Weather Week 3Brussels, November 13-17, 2006

o a discussion forum

o a way to work on common documents

o a way to validate and format preliminary data

Data policies

To provide a long-term archive to the community:

• Regular backups on at least two supports and in two places

• Advertisement of the dataset (dataset catalogue, dataset “publication”)

To ease the exchange of knowledge within the project:

• Submission schedule and deadlines taking into account the synergy between the different groups taking part in the project

• Common format (often seen as a devilish obstacle in our Excel times…)

• Provision of a workspace (e.g. BSCW) to be used as

European Space Weather Week 3Brussels, November 13-17, 2006

as possible

Data policies

To ensure that this long-term archive can be read, interpreted and used:

• Use of a worldwide metadata standard (CF Convention)

• Use formats that allow the metadata to be attached to the data inseparably

• Documentation (metadata) should be as

specific accurate explicit complete

European Space Weather Week 3Brussels, November 13-17, 2006

Metadata

• To associate to a dataset key terms that will allow its discovery.

• To give all the information needed to read, understand, interpret the data.

Metadata standards

Integrate a terminology, recommendations on the metadata content and some format considerations

The Climate Forecast Metadata Convention was developed for NetCDF but is largely applicable to information provided with any atmospheric data regardless of its format.

Providing (good) metadata and conforming to metadata standards is a habit that still needs to be acquired…

European Space Weather Week 3Brussels, November 13-17, 2006

• In order to allow the researchers to be the first ones to analyse and publish their data, while at the same time ensuring some synergy between the different groups participating to the project

• During the project duration or for a certain period of time after the end of the project, access is restricted to the project participants…

• With exceptions for close collaborators or participants to associated projects

• This retention period ranges from 1 to …10 years!

• Password protected system

• Modalities of application and of access granting vary (e.g. consultation of PI, list of authorised users, etc.)

… after which, the data is released to the public domain.

Data policies

Protecting researchers’ work and rights: Temporary restriction of access

Access to restricted data – Authorised Users

Project participants

Project participants

• Immediate availability

• On application

External Collaborators

(during retention period)

• Must apply for access

• Applications channelled through Project PI(s)

External Collaborators

Public

Public

• Discovery metadata immediately visible

• Free access to the data after the retention period (sometimes, Conditions of Use continue to apply)

European Space Weather Week 3Brussels, November 13-17, 2006

Data policies

European Space Weather Week 3Brussels, November 13-17, 2006

Protecting researchers’ work and rights: Conditions of use and publication

Data policies

• Applying during the project and sometimes after it has ended

• Sometimes included in the data files, as a stamp

• Committing the user to respect rules such as

o Restricting the use of the data to the research topic stated at the time of application

o Not to disclose the data to other parties

o Contacting the data provider

o Acknowledging the data provider

o Offer co-authorship to the data provider

European Space Weather Week 3Brussels, November 13-17, 2006

Research facility

National programme

International project

Intercontinental initiative

Data policies

European Space Weather Week 3Brussels, November 13-17, 2006

(Quoted by David Stevenson, University of Edinburgh, at an UTLS Ozone Science Meeting)

Model versus observation

any output of model computation (e.g. simulations),

datasets resulting from some kind of data assimilation technique,

compilation of observations from different sources (synthesized datasets)

Is there such a clear difference between the two things?

Is processed or derived data observation or modelling?

Is a programme “model data”?

Nobody believes a modelling paper except the author. Everybody believes an observational paper…

except the author.

For the purpose of data management,

Model

data=

… which have in common to be more likely or more quickly superseded by newer versions than observations are.

They are also usually the end-product of project, while observations are a starting point for further analyses and studies.

European Space Weather Week 3Brussels, November 13-17, 2006

BADC Guidelines for the Archival of Simulated Data

o Likely future existence of a community of potential users.

o Historical, legal or scientific importance likely to persist.

o The results will be used in an intercomparison exercise.

o Integration of observation data in a way that adds value to the observations.

o The results have been the basis of a publication.

o The results have confirmed or led to some outstanding discovery.

Model versus observation

• Codes archived only as metadata to support model output

• Datasets peer-reviewed at regular intervals (a few years)

• Criteria to select model runs to be archived for the long-term

European Space Weather Week 3Brussels, November 13-17, 2006

Citation and publication

Some projects gather together the worlds of librarians and data scientists, e.g.

CLADDIER

To investigate how datasets can be (better)

• versioned

• catalogued

• peer-reviewed

• referenced in papers

• published

European Space Weather Week 3Brussels, November 13-17, 2006

Citation and publication

European Space Weather Week 3Brussels, November 13-17, 2006

E-grids

Networks linking several organisations with similar or complementary competences in such a way as to ensure their interoperability.

E.g. network of data repositories, models and computers allowing the user to search and use these resources simultaneously and transparently.

Issues:

• Transfer of information (balance between redundant storage and speed of transfer)

• Authentication (security and access)

• Format conversion

• Vocabulary (metadata standards)

European Space Weather Week 3Brussels, November 13-17, 2006

E-grids

European Space Weather Week 3Brussels, November 13-17, 2006

The NERC Data Grid (NDG) Project

• Infrastructure system to enable the discovery and retrieval of data held at distributed data centres via one single portal

• Partners: BADC, BODC, PCMDI (LLN)• Security issues tackled through “role mapping”, i.e. definition of

equivalent authorisations (avoiding the user the need to register with each organisation)

• A discovery metadatabase already exists based on MOLES = Metadata Objects for Links in Environmental Science

• Further we intend to make the connection between data held in managed archives and data held by individual research groups seamless in such a way that the same tools can be used to compare and manipulate data from both sources.

• What will be completely new will be the ability to compare and contrast data from an extensive range of (US, European, UK, NERC) datasets from within one specific context.

E-grids

European Space Weather Week 3Brussels, November 13-17, 2006

E-grids

European Space Weather Week 3Brussels, November 13-17, 2006

Standard terminologies

Speaking the same language

• Sets of terms of reference with, sometimes, unique identifiers (key values), definitions and version numbers

• System of relationships between terms (synonyms, inclusion, related terms)

• Underpin catalogues and search engines

• Ex.: GCMD, CF, SeaDataNet

MOLES (Metadata Objects for Links in Environmental Science):

• The metadata scheme underpinning the NDG discovery tool (based on a set of XML records) and the next BADC catalogue (relational metadatabase)

• Developed in-house

• Integrates tentative mappings between GCMD, CF, SeaDataNet

European Space Weather Week 3Brussels, November 13-17, 2006

Lessons learnt and traps to avoid

• Envisage the data policy at an early stage of a project proposal and in consideration of already running projects that may become associated or involved.

• Design and develop an open standard terminology with direct input from the researchers and carefully thought relationships between terms.

• Do not try to build a terminology that covers everything but focus on the vocabulary needed in your community.

• Resist the temptation of replacing tools (software, applications, conceptual tools) every time a new shiny one is launched on the market.