Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint
description
Transcript of Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint
Preservation and Long Term Access of Data at the World Data
Centre for ClimateFrank Toussaint
N.P. Drakenberg, H. Höck, M. Lautenschlager,H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann
World Data Centre for Climateat the German Climate Computing Centre (DKRZ)
Hamburg, Germany
Overview• The WDC for Climate in several
collaborations• Data Storage: Technology – Tapes and Disks• Data Storage: LObStER – the Tape Storage
Tool• Storage Policy• Long Term Archiving• DOI - Digital Object Identifier
The German Climate Computing Centre (DKRZ)is held by…• Max Planck Society,
University of Hamburg, and others.
• Mission:Provide HP computing power and storage for the German Earth Science community
The World Data Centre for Climate
WMO Information System (WIS)
• National Centres• Global Information
System Centres• Data Collection and
Production Centres
The WDCC as WISData Collection & Production Centre
The WDCC in theICSU World Data System
• International Council for Science (ICSU)World Data System (WDS)
World Data Centres (WDC)• WDC Cluster Earth System Research:
WDC-Mare, WDC-RSAT, WDC-Climate
UK: BADC~ 1 PByte HD
DE: WDCC~1 PByte HD
US: PCMDI:~1 PByte HD
CMIP5/IPCCData Federation
Replicated model output
7
CMIP5 Data Nodes
CMIP5/IPCC-AR5
PCMDI, BADC, & WDCC form a data federation
About 1 PB Data are replicated
Evolutionof Data Quantities
Climate Model Data:Relative homogeneous but huge amounts!
Needed: Tape access (nearline)
Data Flows
Archive: files
Container: Blobs
Appl. Server
Storage@DKRZTDS
LobServer
HPSS9 PB
CERA
DB Layer• What• Where• Who
• When• How
Midtier
Huge amounts of data in each container file Very different sizes of records: 64b .. 2 Gb Efficient administration of all records Irregular access patterns
(access latency independent of the record position) Transactional behaviour for read/write Fault tolerance for HD, controller, tapes, etc
LObStER:Large Object Storage and
Efficient Retrieval
Lobsterconfiguration
manager
generic JDBC-driver
App
licat
ion
specific JDBC-drivers loaded
LObStER
IntranetInternet
App
licat
ion
show-container
read-record
fetch-records
Lobsterobject
manager
Cache
Oracle RDB(or other)
LObStER
Container files with blocked format 64-bit files and 64-bit internal position referencing Max file size: 16384 PBytes Entries stored in ≥1 blocks Block sizes 2k, k ∈ { 8, 9, 10, …, 62 }
LObStER:The Data Containers
direct-pointer-blocks
data-blocks
indirect-pointer-block
header-blocks
LObStER:The Data Containers
Several steps:o specification & concepto filling of metadata & datao quality checks & DOI
• LTA for, e.g., EUCLIPSE, MedCLIVAR, combine
Long Term Archiving
LTA
Costs depend on complexity and efforts at our site:• metadata• reformatting• etc
• Quality Checks on three levels
QC L1: conformity to general standards(format, ...)
QC L2: coarse automated content checksQC L3: detailed spot checks:
TQA – Technical Quality AssuranceSQA – Scientific Quality Assurance
Long Term Archiving
QC servicesQC services
QC Service Layer
Distributed QC Level2 Checks at
Multiple Sites
Central QCRepository
Central QC Level3 Checks
DOI Publication Agency
Long-Term Archive
QC L2 Tool
QC Service Layer
QC L3 Tools
SQA GUI
Project QC Metadata
Repository
LTA:CMIP5 as an Example
of a Federated Activity
Data
DOI Publication Agency with Long Term Archive
TQA
DOI Target Page
Project MD
Repository
Quality
Control
DataCatalogu
e
MDInput
DOI Catalogu
e
MD LTAData Long
Term Archive
(LTA)
SQA by Author
MD ondata
MD onquality
MD on model &
simulation
MD harvestduring project
MD harvest after archiving
DOI access
Registration
Data from
nodes
Data Nodes IDF
MD export
LTA:CMIP5 as an
Example
WDC-Climate asPublishing Agency of the IDF
International DOI FoundationInternational
DOI Foundation
RegistrationAgencies
NationalOrganizations
Publisher
DataCite
doi.org
DataCite.org
tib-hannover.de
wdc-climate.de
TIB, BL, …
WDCC, …
Visibility of LTA Datain Public Catalogues• DOI is given
• Catalogue metadata issent to the RegistrationAgency via the national organization
The Data Life Cycle Management
Virtual Research Environment
Data Production
Data Evaluation
Data Disseminatio
n
Long Term Archive
THANK YOU,
QUESTIONS?