Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint

22
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ) Hamburg, Germany

description

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint. N.P. Drakenberg , H. Höck , M . Lautenschlager, H. Luthardt , H.Ramthun , M. Stockhause , H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ) - PowerPoint PPT Presentation

Transcript of Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint

Page 1: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Preservation and Long Term Access of Data at the World Data

Centre for ClimateFrank Toussaint

N.P. Drakenberg, H. Höck, M. Lautenschlager,H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann

World Data Centre for Climateat the German Climate Computing Centre (DKRZ)

Hamburg, Germany

Page 2: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Overview• The WDC for Climate in several

collaborations• Data Storage: Technology – Tapes and Disks• Data Storage: LObStER – the Tape Storage

Tool• Storage Policy• Long Term Archiving• DOI - Digital Object Identifier

Page 3: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

The German Climate Computing Centre (DKRZ)is held by…• Max Planck Society,

University of Hamburg, and others.

• Mission:Provide HP computing power and storage for the German Earth Science community

The World Data Centre for Climate

Page 4: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

WMO Information System (WIS)

• National Centres• Global Information

System Centres• Data Collection and

Production Centres

The WDCC as WISData Collection & Production Centre

Page 5: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

The WDCC in theICSU World Data System

• International Council for Science (ICSU)World Data System (WDS)

World Data Centres (WDC)• WDC Cluster Earth System Research:

WDC-Mare, WDC-RSAT, WDC-Climate

Page 6: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

UK: BADC~ 1 PByte HD

DE: WDCC~1 PByte HD

US: PCMDI:~1 PByte HD

CMIP5/IPCCData Federation

Replicated model output

7

CMIP5 Data Nodes

CMIP5/IPCC-AR5

PCMDI, BADC, & WDCC form a data federation

About 1 PB Data are replicated

Page 7: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Evolutionof Data Quantities

Climate Model Data:Relative homogeneous but huge amounts!

Needed: Tape access (nearline)

Page 8: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Data Flows

Archive: files

Container: Blobs

Appl. Server

Storage@DKRZTDS

LobServer

HPSS9 PB

CERA

DB Layer• What• Where• Who

• When• How

Midtier

Page 9: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Huge amounts of data in each container file Very different sizes of records: 64b .. 2 Gb Efficient administration of all records Irregular access patterns

(access latency independent of the record position) Transactional behaviour for read/write Fault tolerance for HD, controller, tapes, etc

LObStER:Large Object Storage and

Efficient Retrieval

Page 10: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Lobsterconfiguration

manager

generic JDBC-driver

App

licat

ion

specific JDBC-drivers loaded

LObStER

IntranetInternet

App

licat

ion

Page 11: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

show-container

read-record

fetch-records

Lobsterobject

manager

Cache

Oracle RDB(or other)

LObStER

Page 12: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Container files with blocked format 64-bit files and 64-bit internal position referencing Max file size: 16384 PBytes Entries stored in ≥1 blocks Block sizes 2k, k ∈ { 8, 9, 10, …, 62 }

LObStER:The Data Containers

Page 13: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

direct-pointer-blocks

data-blocks

indirect-pointer-block

header-blocks

LObStER:The Data Containers

Page 14: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Several steps:o specification & concepto filling of metadata & datao quality checks & DOI

• LTA for, e.g., EUCLIPSE, MedCLIVAR, combine

Long Term Archiving

Page 15: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

LTA

Costs depend on complexity and efforts at our site:• metadata• reformatting• etc

Page 16: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

• Quality Checks on three levels

QC L1: conformity to general standards(format, ...)

QC L2: coarse automated content checksQC L3: detailed spot checks:

TQA – Technical Quality AssuranceSQA – Scientific Quality Assurance

Long Term Archiving

Page 17: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

QC servicesQC services

QC Service Layer

Distributed QC Level2 Checks at

Multiple Sites

Central QCRepository

Central QC Level3 Checks

DOI Publication Agency

Long-Term Archive

QC L2 Tool

QC Service Layer

QC L3 Tools

SQA GUI

Project QC Metadata

Repository

LTA:CMIP5 as an Example

of a Federated Activity

Page 18: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Data

DOI Publication Agency with Long Term Archive

TQA

DOI Target Page

Project MD

Repository

Quality

Control

DataCatalogu

e

MDInput

DOI Catalogu

e

MD LTAData Long

Term Archive

(LTA)

SQA by Author

MD ondata

MD onquality

MD on model &

simulation

MD harvestduring project

MD harvest after archiving

DOI access

Registration

Data from

nodes

Data Nodes IDF

MD export

LTA:CMIP5 as an

Example

Page 19: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

WDC-Climate asPublishing Agency of the IDF

International DOI FoundationInternational

DOI Foundation

RegistrationAgencies

NationalOrganizations

Publisher

DataCite

doi.org

DataCite.org

tib-hannover.de

wdc-climate.de

TIB, BL, …

WDCC, …

Page 20: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

Visibility of LTA Datain Public Catalogues• DOI is given

• Catalogue metadata issent to the RegistrationAgency via the national organization

Page 21: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

The Data Life Cycle Management

Virtual Research Environment

Data Production

Data Evaluation

Data Disseminatio

n

Long Term Archive

Page 22: Preservation and Long Term Access of Data at the World Data Centre for Climate Frank  Toussaint

THANK YOU,

QUESTIONS?