C3-Grid * Federation System for Climate Data Handling

32
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 1 C3-Grid * Federation System for Climate Data Handling Stephan Kindermann German Climate Computing Center – DKRZ * Collaborative Climate Community Grid Project (Part of D-Grid Initiative)

description

C3-Grid * Federation System for Climate Data Handling. Stephan Kindermann German Climate Computing Center – DKRZ. * C ollaborative C limate C ommunity Grid Project (Part of D-Grid Initiative). Overview. C3Grid Overview: Architecture, Partners, Goals.. - PowerPoint PPT Presentation

Transcript of C3-Grid * Federation System for Climate Data Handling

Page 1: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 1

C3-Grid* Federation System for Climate Data Handling

Stephan Kindermann

German Climate Computing Center – DKRZ

* Collaborative Climate Community Grid Project (Part of D-Grid Initiative)

Page 2: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 2

C3Grid Overview: Architecture, Partners, Goals..

C3Grid Federation System Components: C3Grid ISO Discovery Metadata and Metadata Catalog

A short interop. study: C3Grid ISO Metadata / Geonetwork

Data Access and Preprocessing

C3Grid Security

Overview

C3Grid / IPCC ?

Page 3: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 3

C3Grid Data and Job Management Middleware

D-Grid(SRM, d-cache,..)

D-Grid(SRM, d-cache,..)

C3Grid: Overview

World Data Centers Research Institutes

Climate Mare RSAT PIK GKSSAWI MPI-M

Universities

FU Berlin Uni Köln

Data Access Interface

DWD

ISO Discovery Metadata

Data +

Metadata

WorkflowData +

Metadata

Grid Data / Job Interface

ISO 19139

Discovery

Catalog

Result Data Products + Metadata

C3Grid Data Providers

Collaborative Grid Workspace(A)(B)

?!

IFM-GeomarDKRZ

Portal

C3RC

Page 4: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 4

(A) Metadata for Data Discovery: Design and Implementation

Data Access Interface ISO Discovery Metadata

ISO 19139

Discovery

Catalog

C3Grid Data Providers

(A)

?

Page 5: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 5

(A) Metadata – harvesting and lookup components

<<centralBuffer>>

DOM tree

validate againstschema

<<centralBuffer>>

DOM tree

transform byXSL

apply XPath

field

apply XPath

field

add documentto index

serializeDOM

XMLblob

accept Document asDOM tree

LuceneIndex

LuceneIndex

LuceneIndex

VirtualIndex

VirtualIndex

DataProvider

DataProvider

FileSystem

OAI-PMHHarvester

OAI-PMHHarvester

DirectoryHarvester

Index Builder

Sea

rch

Inte

rfac

eS

earc

h In

terf

ace

• Fast Range Queries

• Java API + Web Service Interface

made available on sourceforge.net see also: http://www.panfmp.org

• Technology

ISO 19115/19139 metadata profile

OAI-PMH harvesting catalogue

lucene based catalogue search

GridSphere based portal

Page 6: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 6

(A) C3Grid ISO 19139 profile

Design criteria:

no schema extensions, profiling by restriction restriction using schematron constraints „the granularity of the discovery metadata should reflect

the logical organization of the data repository at a sufficiently coarse grained level“ (1)

CF based content description Link to resource metadata infrastructure

(GT4-MDS based)

(1) Inspire: DT Metadata – Draft Implementing Rules for Metadata (version 2, 02/02/2007)

Page 7: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 7

(A) C3Grid ISO Profile

Description at aggregate level (e.g. experiment)

Aggregate extent description

with multiple verticalExtent sections

Sub-selection in data request

Page 8: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 8

(A) C3Grid ISO Profile: CF usage

<contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>air_temperature</gco:RecordType></attributeDescription> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension></MD_CoverageDescription></contentInfo><contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>sea_surface_temperature</gco:RecordType></attrib…> <dimension xlink:href="#verticalCRS_m"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension></MD_CoverageDescription></contentInfo>

Reference to vertical CRS

Content description based on (extended) CF names

Link to corresponding vertical CRS

Page 9: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 9

(A) C3Grid ISO profile

Data Distributor Info:

• reference to C3Grid resource metadata catalog (MDS) (names service endpoints)

• (optional: service endpoints)

Page 10: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 10

(A) C3Grid ISO profile

Data provenance description:

• by now (data staging output): simple sequence of ProcessStep descriptions

• later (c3grid processed data): combined Source/ProcessStep blocks + external data provenance store

Page 11: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 11

Page 12: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 12

C3Grid ISO Profile: A short geonetwork experiment

Federation building: • OAI-PMH, WebDAV, Z39.50, geonet

• Full ISO metadata support (ISO19139/19119)

• OGC CSW 2.0 reference impl.

• RSS and GeoRSS newsfeeds

• SKOS based thesauri

• adaptable to new schema`s

• schematron constraint checking

On roadmap:

• flexible ISO profile support

• shibboleth integration

Page 13: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 13

C3Grid ISO Profile: A short geonetwork experiment

Page 14: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 14

Building complex metadata federations …

Harvesting via:• CSW• OAI-PMH• Geneonet• Web-Dav

Page 15: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 15

C3Grid ISO Profile: A short geonetwork experiment

Import / Edit / Search: ok

Missing:

• content (CF) search

• vertical search

• temporal BBox search

• data staging

Page 16: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 16

Page 17: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 17

Page 18: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 18

complete portal protoype to seach, access (pre-process) data described by C3Grid ISO profile in 3 weeks based on geonetwork open source solution ..

Page 19: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 19

Page 20: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 20

(B) Data Access and Preprocessing

World Data Centers Research Institutes University Partners

Data Access Interface ISO Discovery Metadata

Data +

MetadataData Analysis Workflow

Data +

Metadata ISO

Discovery

Catalog

Result Data Products + Metadata

C3Grid Data Providers

Collaborative Grid Workspace (A)(B)

?

!

Page 21: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 21

Data Staging Request

Data IDs

Output Properties

Selection:• lon, lat, alt• time• content: CF

Data Staging Web Service

DBFlatFile

ArchiveDistributed C3Grid Work Space

(B) Data Access and Pre-Processing: Implementation

Offer Time / resource

estimation

skeleton implstatus..

Provider staging jars

Provider staging scripts

MD DB

WS GRAM

JSDL baseddescription

Processing jobs

Local resource manager

• C3Grid Generation 1: secured plain web services(status)

• C3Grid Generation 2: WSRF service interfaces (scheduled november 08)

• Generation 2+: full PKI/SAML security stack

Page 22: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 22

C3Grid Middleware Components

Scheduler: Globus WSRF based, accepts WSL workflow description: compute tasks + data staging tasks

Datamanagement: Globus WSRF based, offer negotiation with scheduler, consistent view to distributed data, (later: replica management, caching)

Globus MDS Resource Metadata Catalog: service registry, resource status

Dependency on Globus SW stack, no high level impl. support tools, impl. Globus 4.1.x migration ??, problems with delegation impl. (insufficient docu. and guidance)

Page 23: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 23

C3Grid Workflow Analysis

workflow-related

task-relatedanalysis and

preparation of workflows

monitoring and management of

workflow execution

(individual) scheduling strategy to

optimize the management

Handler to facade single/ specific Tasks

interaction an moitoring via WS

Notification standard

Page 24: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 24

(C) Security Infrastructure

Identity ProviderHome Organisation

Attribute ProviderVirtual Organisation

MyProxyMyProxy

Delegation Service

Grid Service

Grid ServiceGrid

Resource

GRAM / DataRAM

C3Grid Middleware

GridShibSAML tools

wflowclient

SLCS(CA)

SLCS(CA)

X509 Grid-proxy

GridShib for GT policy

Portal

<..SAML Assertions..>

SAML SAML

SAML

SAML

Personal /Group

Account

„Home attributes + VO attributes“

DFN

Browser

Webstartapp

Shibb.login

Page 25: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 25

(C) Security Infrastructure

Status:• Shibb IdP`s running at core C3Grid partners • Online CA for short-lived credentials tested, set up & operated by DFN (the German NREN)• Online CA (DFN-SLCS) accreditation process with EUGridPMA started• SLC contain campus attributes as SAML assertion • Java Webstart app to bootstrap SLCS in development at DFN• GridShib SAML Tools (v0.6.0) tested• Prototype of shibbolized GridSphere portal tested • open issues with GT4 proxy-delegation implementation

Next:• Integration of components• Virtual home organization for C3 users without a Shibboleth IdP • Integration of VO attributes (shibbolized VOMS)

Page 26: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 26

C3Grid / IPCC Use Case

(0) IPCC Metadata harvested / mirrored in CERA DB (WDCC)

(1) Metadata visible in C3 Portal

(2) User issues IPCC data import from external repository

(3) User OpenID IdP / + IPCC_Access role external repos

(4) Download ?? C3 Repository

(5) C3Grid grants access to users with IPCC_Access role

‘grant procedure ?’: before each wflow exec. contact to IdP/AttributeService ?? or more offline method ?

C3RC / C3 WorkspaceIPCC data import

Wflow result publication

Analysis wflow

Page 27: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 27

Appendix

Page 28: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 28

C3Grid Content Info (Version 2)

<contentInfo> <MD_CoverageDescription> <attributeDescription> <gco:RecordType> CF_name_with_attribute</gco:RecordType> </attributeDescription>

<contentType> <MD_CoverageContentTypeCode codeList="http://wis.wmo.int/2006/catalogues/cf-standard-name-table.xml"

codeListValue="air_temperature"> air_temperature with a cell_methods attribute including time:mean

(interval: 1 day) </MD_CoverageContentTypeCode> </contentType> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension><descriptor>

<gco:CharacterString>K</gco:CharacterString> </descriptor></MD_RangeDimension</dimension>

</MD_CoverageDescription> </contentInfo>

Page 29: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 29

Security Aspect: C3Grid step 0 step 1

Page 30: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 30

Page 31: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 31

Page 32: C3-Grid *  Federation System for Climate Data Handling

18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 32

Lucene+ Index

(C) Data Reuse of Analysis Results: Metadata Generation

OAI-Harvester

WS Interface

C3Grid Workspace

wflow

m_tool

OAI-PMH Server

Portal

p_data

parent

process step

source

collection

• Time stamp• Description• Citation info

• Description

*

+

+

0..*

0..1

is_part_ofhas_parent

is_generated_by

is_generated_by

has_input

“quality check”

APIPrototype (Python)

Context description of Analysis Data:

• Aggregation

• Processing history