Eudat and Big Data in Science

24
EUDAT EUDAT and Big Data in Science Wolfgang Gentzsch, Advisor, EUDAT HPCC 2013 Newport RI, 26-28 March 2013

description

In this video from the 2013 National HPCC Conference, Wolfgang Gentzsch presents: EUDAT and Big Data in Science. Big data science emerges as a new paradigm for scientific discovery that reflects the increasing value of observational, experimental and computer-generated data in virtually all domains, from physics to the humanities and social sciences. Addressing this new paradigm, the EUDAT project is a European data initiative that brings together a unique consortium of 25 partners — including research communities, national data and high performance computing (HPC) centers, technology providers, and funding agencies — from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data. The design and deployment of these services is being coordinated by multi-disciplinary task forces comprising representatives from research communities and data centers.” You can watch the presentation with audio at insideHPC: http://insidehpc.com/2013/03/27/video-eudat-and-big-data-in-science/

Transcript of Eudat and Big Data in Science

Page 1: Eudat and Big Data in Science

EUDAT

EUDAT and Big Data in Science

Wolfgang Gentzsch, Advisor, EUDAT

HPCC 2013 Newport RI, 26-28 March 2013

Page 2: Eudat and Big Data in Science

Data trends

2

Increasing complexity and variety

Gigabytes

Terabytes

Petabytes

Exabytes

Zettabytes

Exp

on

enti

al g

row

th

• Where to store it?

• How to find it?

• How to make the most of it?

• How to ensure

interoperability?

Page 3: Eudat and Big Data in Science

If there are hundreds of Research Infrastructures, how

many different data management systems can we sustain?

3

The EUDAT Case

Page 4: Eudat and Big Data in Science

Tru

st

Data

C

ura

tion

Common Data Services

Users

User functionalities, data capture

& transfer, virtual research

environments

Persistent storage, identification,

authenticity, workflow execution,

mining

Data

Generators

Community Support Services

Data discovery & navigation,

workflow generation, annotation,

interpretability

Collaborative Data Infrastructure

-A framework for the future? -

Page 5: Eudat and Big Data in Science

5

Page 7: Eudat and Big Data in Science

• EPOS: European Plate Observatory System

• CLARIN: Common Language Resources and Technology Infrastructure

• ENES: Service for Climate Modelling in Europe

• LifeWatch: Biodiversity Data and Observatories

• VPH: The Virtual Physiological Human

• All share common challenges:

– Reference models and architectures

– Persistent data identifiers

– Metadata management

– Distributed data sources

– Data interoperability

Five research communities on Board

7

Page 8: Eudat and Big Data in Science

8

Page 9: Eudat and Big Data in Science

9

Page 10: Eudat and Big Data in Science

10

Page 11: Eudat and Big Data in Science

11

Page 12: Eudat and Big Data in Science

12

Page 13: Eudat and Big Data in Science

13

Page 15: Eudat and Big Data in Science

Data Staging Safe Replication Simple Store

AAI Metadata Catalogue

Dynamic replication

to HPC workspace

for processing

Data curation and

access optimization

Researcher data

store (simple

upload, share and

access)

Aggregated EUDAT metadata domain.

Data inventory

Network of trust

among

authentication

and

authorization

actors

EUDAT Portal Integrated APIs and harmonized access to EUDAT facilities

Building Blocks of the CDI

Page 16: Eudat and Big Data in Science

SAFE_REPLICATION@EUDAT

16

Allow communities to replicate

data to selected data centers

for storage and do this in a

robust, reliable and highly

available manner.

Improve data curation and

accessibility.

More info: [email protected]

Page 17: Eudat and Big Data in Science

DATA_STAGING@EUDAT

17

Allow the communities to

dynamically replicate a subset

of their data stored in EUDAT

to an HPC workspace in order

to be processed.

More info: [email protected]

Page 18: Eudat and Big Data in Science

METADATA@EUDAT

18

Create a joint metadata

domain for all data stored by

EUDAT data centers and a

catalogue which exposes the

data stored within EUDAT,

allowing data searches.

The EUDAT repository should

provide an inventory of

metadata from different

communities More info: [email protected]

Page 19: Eudat and Big Data in Science

SIMPLE_STORE@EUDAT

19

Create an easy to use service that

will help researchers mediated by

the participating communities to

upload and store data which is not

part of the officially handled data

sets of the community.

This service will address the long

tail of “small” data and the

researchers/citizen scientists

creating/manipulating them.

More info: [email protected]

Page 20: Eudat and Big Data in Science

Persistent_Identifyers@EUDAT

20

Deploy a robust, highly

available and effective PID

service that can be used within

the communities and by

EUDAT.

Keeping track of the “names”

of data sets deposited with

the CDI requires robust

mechanisms.

More info: [email protected]

Page 21: Eudat and Big Data in Science

AAI@EUDAT

21

Provide a solution for a working

AAI system in a federated

scenario.

Design the AA infrastructure to

be used during the EUDAT

project and beyond.

More info: [email protected]

Page 22: Eudat and Big Data in Science

OPERATION TEAM

22

Page 23: Eudat and Big Data in Science

Work plan for the next months

• Moving the services to a production environment

• Capturing additional requirements

• Integrating new partners to EUDAT (in particular research communities) – Working groups, pilots, observers and associate partners

• Collaborating with other initiatives – European e-Infrastructures: EGI, PRACE, DANTE, HELIX

NEBULA, SCIDIPS-ES, etc.

– Global initiatives: RDA, CODATA, etc

• Defining EUDAT’s path to sustainability – Cost and funding models

– Governance

23

Page 24: Eudat and Big Data in Science

Welcome to the 2nd EUDAT Conference!

24

28-30 October 2013, Rome

•International event with

keynotes from Europe and

US

• A forum to discuss the

future of data infrastructures

• Project presentations and

poster sessions

• Training tutorials