eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service...

37

Transcript of eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service...

Page 1: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management
Page 2: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

eXtreme-DataCloudservice catalogue

Page 3: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

http://www.extreme-datacloud.eu/

Page 4: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

TABL

E OF

CON

TENT

S

The XDC Challenge 6

XDC Service Catalogue 8

Why a Catalogue? 11

Service Index 13Caching On Demand 14dCache 15Dynafed 16EOS 17FTS 18ONEDATA 19PaaS Orchestrator 20RUCIO 21TOSCA types and template plugin 22Orchent 23PaaS Orchestrator Dashboard 24

Research Communities and Case Study 26

CTA 26ECRIN 28LifeWatch 30WLCG 32XFEL 34

XDC in the Context of the European e-Infrastructures 36

Technical Support 36

Share the XDC Experience 37

Page 5: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

6

eXtreme-DataCloud Service Catalogue

Dear Readers,

It is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management services that can be coherently harmonized in the current and in the next generation e-In-frastructures. Our mission is to devel-op open, interoperable and easy-to-use services to build worldwide distribut-ed computing infrastructures. But not only that. We also want to address re-quirements from the most demanding, data intensive experiments in several scientifi c domains, from Lifescience to High Energy Physics and from Medical Science to Astrophysics. Our products must be capable of operating, distrib-uting and accessing data at the unprec-edented scale requested by those ex-periments. Scientifi c computing is now entering into the Exascale era and XDC has stepped up to face this challenge in the data management domain. This Service Catalogue is based on a “tool-box” of already existing, production quality services that the project has enriched with new functionalities and usability improvements in order to make complex infrastructures exploit-able by an increasing number of user communities, in addition to targeting the so-called long tail of science. XDC

The XDC Challenge

provides tools to federate geographi-cally distributed storage resources and to manage data and metadata stored in those resources, all glued together by a modern authorization framework. In the end, XDC provides the building blocks for the creation of “DataLakes”, federated storage buckets tightly linked by high bandwidth connections, that look like a single, huge datastore. XDC services allow the user to automate the data lifecycle within the “DataLake” and provide tools for data transfers to and from the Lake and for transpar-ent data access from remote locations through caching mechanisms. The pro-vided functionalities allow the users to realize a policy driven, Quality-of-Ser-vice based data management in dis-tributed computing. With our tools we are also addressing the dynamic na-ture and fl exibility of the modern e-In-frastructures that, due to the advent of virtualization techniques, Cloud computing paradigms and IaaS/PaaS orchestration tools, have become “liq-uid”. Computing and storage resources can be created, destroyed, attached and detached from the infrastructure with ease, with a few mouse clicks, at a time rate inconceivable only a few years ago. Relevant examples of this vola-tile nature of modern e-infrastructures

Page 6: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

7

eXtreme-DataCloud Service Catalogue

are the use of resources temporarily available in huge HPC centres, or the creation of diskless sites to cope with peak user activity, or the adaptation of sites providing only storage for long-term preservation. When it comes to data management, this high dynamic-ity poses huge challenges in terms of effi ciency, transparency and reliability. Therefore, one of the main objectives of this project has been to provide data management solutions for the dynamic extension between a comput-ing centre and a remote site providing transparent bi-directional access to the

data stored in both locations and the dynamic inclusion of sites with limit-ed storage capacity, proving transpar-ent access to the data stored remotely. XDC is carried on by partners with long-standing experience in the fi eld of data management and distributed e-infrastructure and we believe that our products can play an important role in the creation of an European Open Science Cloud so we encourage you to download the software, test it and provide your feedback. All the components are freely downloadable with open source licences.

Happy Reading,

The XDC Collaboration

Page 7: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

8

eXtreme-DataCloud Service Catalogue

On March 15th, 2020 the eXtreme-DataCloud project announced the general availability of its second public software release, codenamed Quasar, built on the foundation laid by XDC-1 (codenamed Pulsar) to provide new features and enhancements for many of its software components and services.The releases come after an initial phase of requirement gatherings which involved different European scientifi c collaborations in life science, medical science, phys-ics and astrophysics. This resulted in the improvement of software components addressing existing technical gaps concerning ease-of-use and effi cient usage of distributed data and compute resources.

TABLE 1: USE CASES ANALYSED BY XDC

LifeWatch

EXFEL

WLCG

CTA

ECRIN

Water quality parameter forecast used for supplying and other hu-man-related uses like predicting alert and warning to allow citizens and authorities to put in place appropriate countermeasures.

CASE STUDY/APPLICATION

Life Cycle Management of data produced by the European XFEL, PETRA III and FLASH infrastructures.

Adoption of scalable solutions aimed to integrate different data man-agement systems and computing infrastructures. Data-Lakes creation and access tools.

Archive system for the Cherenkov Telescope Array to extract different physics parameters, like Gamma-ray direction and energy.

Store multinational clinical trials. Data have to be harmonised and securely transferred in order to be compliant with the local and EU privacy policies Regulations.

RESEARCH COMMUNITY

XDC Service Catalogue

Page 8: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

9

eXtreme-DataCloud Service Catalogue

TABLE 2: LIST OF REQUIREMENTS PROVIDED BY THE DIFFERENT USER COMMUNITIES

Smart Caching

Provide smart caching mechanisms to support the remote extension of a site to remote locations and to provide alternative models for large data centres.

Caching mechanism should guarantee that data are accessed trans-parently from any location without the need of explicitly copying them on the client location.

FUNCTIONALITY REQUIREMENT

Encryption and Security (AAI)

Encryption management service to store sensitive data in remote locations.

Metadata and Data Life Cycle Management

Associate metadata in a fl exible way and without a predefi ned format to the data that are uploaded.

Metadata integration from different sources, metadata searching and discovering.

Functionalities related to Open Data LifeCycle Management, such as Digital Identifi er minting, use of open protocols and standards (OAIS, OAI-PMH), FAIR data, etc.

Policy-DrivenData Management

Data Management based on Quality of Service and Data Lifecycle Management.

Specifi cation on where the data need to be stored (location, media, type of storage system or hardware, number of replicas, etc.).

Data management based on usage and access pattern. Moving closer to computing, “smart” media migration, etc.

Pre-processing, Processing & Ingestion

Different types of processing at data ingestion time (experiment-in-dependent quality checks before storing data, data skimming, metadata extraction, indexing, formatting, etc.).

The community requirements were translated into a set of XDC components, which are now released and offered as a contribution to the implementation of the European Open Science Cloud.

Page 9: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

10

eXtreme-DataCloud Service Catalogue

All the XDC components are inte-grated into a comprehensive Au-thentication and Authorization Architecture, with support for user authentication through mul-tiple methods (SAML, OpenID Connect and X.509), support for distributed authorization policies and a Token Translation Service, creating credentials for servic-es that do not natively support OpenID Connect.

The eXtreme-DataCloud soft-ware is released under com-monly available Open Source licences and can be deployed on both public and private cloud infrastructures.

eXtreme-DataCloud Software Releases Documentation is available from the offi cial XDC Repositories (repo.indigo-da-tacloud.eu/repository/xdc/), for operating systems packag-es and Docker containers.

OPEN

SOURCE

COMPONENTS

Page 10: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

11

eXtreme-DataCloud Service Catalogue

Why a Catalogue?

The purpose of this catalogue

is to give resource providers

and researchers from all dis-

ciplines all around Europe a

practical guide to identify the

best eXtreme-DataCloud ser-

vices for their research and

e-infrastructure.

Page 11: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management
Page 12: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

13

eXtreme-DataCloud Service Catalogue

XDC releases provide open source compo-nents and high-level features that address spe-cifi c data and compute solutions, as described in this section. Those solutions include QoS management, preprocessing at ingestion and automated data transfers. Therefore a global orchestration layer is needed to take care of the execution of those complex workfl ows. Figure 1 highlights the main components and their role among the three different levels: Storage, Federation, and Orchestration.

Service Index SC

AN M

E! FIGURE 1: XDC PRODUCTION LEVEL COMPONENTS

Rucio

INDIGOOrchestrator

xRootD Cache

HTTP Cache

QoS CDMI

OrchestrationStor

age

Federation

Caching On Demand

dCache

Dynafed

EOS

FTS

Onedata

PaaS Orchestrator

RUCIO

TOSCA types

& templates plugin

PaaS Orchestrator

Dashboard

Orchent

XDC COMPONENTS

Page 13: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

14

SCAN

ME!

eXtreme-DataCloud Service Catalogue

CachingOnDemand

Short Service Name

2.0.0Version

Storage, FederationSolution Type

https://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/cod.html

Release Notes

CachingOn Demand

The CachingOnDemand system provides recipes and PaaS description templates for an end to end deployment of an XCache cluster.

XDC Functionalities

The new functionalities include: a container based application with support for Docker, new Ansible recipes including the support for Kubernetes and Centos7

bare metal deployment.

Detailed documentation is available at:

GitHub.io

https://cloud-pg.github.io/CachingOnDemand/

Page 14: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

15

SCAN

ME!

eXtreme-DataCloud Service Catalogue

dCache

Short Service Name

6.1.3Version

Storage, Orchestration

Solution Type

https://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/dcache/XDC-2.html

Release Notes

dCache is a distributed storage system proven to scale to hundreds of Petabytes. Originally conceived as a disk cache (hence the name) in front of a tertiary storage to provide effi cient data access for data intensive scientifi c experiments in the fi eld of High Energy Physics (HEP) it has evolved into a highly scalable general-purpose open source storage solution.

XDC Functionalities

The new functionalities include: storage events support with Kafka and SSE, sup-port to notify events, new plugin for SSE, possibility for clients to discover chang-es in dCache namespace using an interface modelled after the inotify API, update of dCache View, more robust and scalable 3rd party copying functionality.

Detailed documentation is available at:

dCache Manuals

General purpose https://www.dcache.org/manuals/index.shtml

XDC functionalities https://www.dcache.org/manuals/UserGuide-6.0/

dCache

Page 15: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

16

SCAN

ME!

eXtreme-DataCloud Service Catalogue

Dynafed

Short Service Name

1.5.0Version

FederationSolution Type

http://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/dynafed.html

Release Notes

The Dynamic Federations system allows a very fast dynamic namespace to be exposed via HTTP and WebDAV, built on the fl y by merging and caching (in memory) metadata items taken from a number of (remote) endpoints.One of its fundamental features is to redirect GET/PUT requests to the site or cluster hosting the requested fi le that is closer to the client that requested it. The focus is on performance, scalability and real-time fault resilience with respect to sites that can go offl ine.From the perspective of a normal user, HTTP and WebDAV clients can browse the Dynamic Federation as if it were a unique partially cached name space, which is able to redirect them to the right host when they ask for a fi le replica. Dynafed also supports writing.

XDC Functionalities

OIDC support, both as a Relying Party (redirecting a browser to an IdP) and Pro-tected Resource (consuming OAuth access tokens for non-interactive access), is now provided, facilitating in this way the integration with the XDC Orchestrator and allowing browser based access without X509 certifi cates. Dynafed can now function as the active party for data distribution, having enabled the “Fourth par-ty copy” feature. This allows services without third party copy support (such as S3) to participate fully in the data distribution infrastructure.

Detailed documentation is available at:

Dynafed Documentation

http://lcgdm.web.cern.ch/dynafed-dynamic-federation-project

Dynafed

Page 16: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

17

SCAN

ME!

eXtreme-DataCloud Service Catalogue

EOS

Short Service Name

4.6.3Version

StorageSolution Type

https://releases.extreme-datacloud.eu/en/latest/re-leases/quasar/eos/v4.6.3.html

Release Notes

EOS is an open source storage software solution to manage multi PB storage for the CERN Large Hadron Collider LHC. Core of the implementation is the XRootD framework providing a feature-rich remote access protocol.

XDC Functionalities

QoS classes and QoS API when interacting with namespace entries have now been included, as well as the CDMI gateway for QoS transitions. The Converter Driver, part of the Converter Engine, has been reworked using a threadpool ap-proach and the saving of information in persistent storage (QuarkDB implemen-tation). This allows persistence, as well as more fl exibility over the conversion execution, such as runtime confi gurable threads and runtime statistics.

Detailed documentation is available at:

EOS - OpenStorage Documentation

https://eos-docs.web.cern.ch/eos-docs/

Information specifi c to XDC features can be found under the sections:

“Confi guration” (setting a Filesystem to use logical path).

“Using EOS” (describing the adoption of storage/fi les process).

“Client Commands” (the command to trigger the import [adoption] procedure).

EOS

Page 17: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

18

SCAN

ME!

eXtreme-DataCloud Service Catalogue

FTS

Short Service Name

3.9.0Version

Federation, Orchestration

Solution Type

https://releases.extreme-datacloud.eu/en/latest/re-leases/pulsar/fts/v3.9.0.html

Release Notes

FTS3 is a bulk data mover, created to distribute globally the multiple petabytes of data from the LHC (Large Hadron Collider) at CERN.Its purpose is to effi ciently schedule data transfers, maximising the use of avail-able network and storage resources while ensuring that any policy limits are respected.

XDC Functionalities

OID Connect support has been added, on the basis of OAuth2 tokens issued by an Authorization Server such as IAM. Full support of managed QoS transitions is now available.

Detailed documentation is available at:

FTS3 Documentation

http://fts-docs-devel.web.cern.ch/fts-docs-devel/

In particular the Features section with information on OIDC and QoS support:

http://fts-docs-devel.web.cern.ch/fts-docs-devel/docs/features.html

FTS

Page 18: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

19

SCAN

ME!

eXtreme-DataCloud Service Catalogue

ONEDATA

Short Service Name

19.02.1Version

Storage, Orchestration, Federation

Solution Type

https://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/onedata/v19.02.1.html

Release Notes

ONEDATA is a global data management system, providing easy access to distrib-uted storage resources, supporting a wide range of use cases from personal data management to data-intensive scientifi c computations.

XDC Functionalities

Release of python bindings for ONEDATA in a form of onedataFS enabling the simplifi cation and performance improvement of programmatic operations on data located in ONEDATA space. The support of ECRIN and CTA use cases has been improved by enhancing the fi le indexing performance for scanning a 800k dataset provided by ECRIN and redesigning the changes stream API, so as to allow more fi ne-grained control over the stream.

Detailed documentation is available at:

ONEDATA Guides

https://onedata.org/#/home/documentation/index.html

ONEDATA

Page 19: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

20

SCAN

ME!

eXtreme-DataCloud Service Catalogue

The INDIGO PaaS Orchestrator is a component of the PaaS layer that allows the instantiation of resources on Cloud Management Frameworks (like OpenStack and OpenNebula ) and Mesos clusters. It takes the deployment requests, expressed through templates written in TOSCA YAML Simple Profi le v1.0, and deploys them on the best cloud site available. In order to do that it:

Gathers SLAs, monitoring information and other data from platform services. Asks the cloud provider ranker for a list of the best cloud sites.

XDC Functionalities

The main XDC functionalities are the implementation of timeout for deployment creation/update and credentials management for providers not integrated with IAM, the credentials management for providers not integrated with INDIGO IAM, and the update of A4C Tosca Parser library (v2.1.0-DEEP-1.2.1). The retry strategy for Marathon deployments has been improved.

Detailed documentation is available at:

GitHub README

https://github.com/indigo-dc/orchestrator/blob/v2.3.0-FINAL/ README.md

GitBook Guides

https://indigo-dc.gitbooks.io/indigo-paas-orchestrator/content/

PaaS Orchestrator

Short Service Name

2.3.0-FINALVersion

OrchestrationSolution Type

https://releases.ex-treme-datacloud.eu/en/latest/releases/qua-sar/paas-orchestrator.html

Release Notes

PaaSOrchestrator

Page 20: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

21

SCAN

ME!

eXtreme-DataCloud Service Catalogue

Initially developed by the ATLAS collaboration at CERN, the service is able to manage accounts, fi les, datasets and distributed storage systems. RUCIO has been integrated with the main above mentioned services becoming the data man-agement policy engine for XDC.

XDC Functionalities

The authentication and authorization mechanism was extended to support (JWT) tokens using OpenID Connect protocol. Rucio user pre-provisioning (via new Rucio SCIM client) has been implemented as a ‘Rucio probe’ script.

Detailed documentation is available at:

GitHub README

https://github.com/indigo-dc/orchestrator-dashboard/blob/master/README.md

RUCIO

Short Service Name

1.22.0Version

OrchestrationSolution Type

https://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/rucio.html

Release Notes

RUCIO

Page 21: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

22

SCAN

ME!

eXtreme-DataCloud Service Catalogue

TOSCA types and template

Short Service Name

4.0.1Version

Service AutomationSolution Type

https://releases.extreme-datacloud.eu/en/latest/re-leases/quasar/ttt/v4.0.1.html

Release Notes

The TOSCA types repository shows a YAML description of new types added fi rst in the INDIGO-DataCloud project, and afterwards in the DEEP-HybridData-Cloud (DEEP) and eXtreme DataCloud (XDC) projects, to extend TOSCA Simple Profi le in YAML Version 1.0 to add high level entities. The TOSCA Templates repository contains templates supporting the use cases for INDIGO-DataCloud, EOSC-hub, DEEP-HybridDataCloud and eXtreme DataCloud projects.

XDC Functionalities

Added new TOSCA types for BlockStorage, AttachesTo relationship, Elasticsearch and Kibana, while improving the TOSCA types for Kubernetes. Updated TOSCA templates to ensure the compliance with Simple Profi le in YAML 1.0. Added TO-SCA templates for the LifeWatch use-case.

Detailed documentation is available at:

TOSCA types README

https://github.com/indigo-dc/tosca-types/blob/master/README.md

TOSCA templates README

https://github.com/indigo-dc/tosca-templates

How to deploy a TOSCA Template

https://github.com/indigo-dc/tosca-templates/blob/master/doc/tosca- deploy.md

TOSCA types and template

Page 22: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

23

SCAN

ME!

eXtreme-DataCloud Service Catalogue

Orchent

Short Service Name

1.2.6Version

User SolutionSolution Type

https://releases.extreme-datacloud.eu/en/latest/releas-es/quasar/orchent/v1.2.6.html

Release Notes

Orchent is a command line application to manage deployments and their resources through the PaaS Orchestrator in a fast and easy way.

Detailed documentation is available at:

GitHub README

https://github.com/indigo-dc/orchent/blob/1.2.6/README.md

Orchent

Page 23: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

24

SCAN

ME!

eXtreme-DataCloud Service Catalogue

PaaS Orchestrator Dashboard

Short Service Name

1.1.0Version

User SolutionSolution Type

https://releases.ex-treme-datacloud.eu/en/latest/releases/quasar/paas-dash-board.html

Release Notes

PaaS Orchestrator Dashboard

The fi rst release of INDIGO PaaS Orchestrator - Simple Graphical UI allows users to easily deploy desired workfl ows and infrastructures. The PaaS Orchestrator Dashboard is a Python application built with the Flask microframework. Flask-Dance is used for Openid-Connect/OAuth2 integration. The docker image uses Gunicorn as WSGI HTTP server to serve the Flask Application.

Detailed documentation is available at:

GitHub README

https://github.com/indigo-dc/orchestrator-dashboard/blob/master/README.md

Page 24: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management
Page 25: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

Research Communities and Case Study

CTA

Very high-energy electromagnetic radiation reaches Earth from a large part of the Cosmos, carrying crucial and unique information about the most energetic phenomena in the Universe. CTA1 (Cherenkov Telescope Array) will answer many of the persisting questions by enabling the detection of more than thousands sources over the whole sky.

1 www.cta-observatory.org

Page 26: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

27

eXtreme-DataCloud Service Catalogue

Use Case

Goals

Ingest (FITS)

DLO,...,DL5LFN list

Query

Data retrieval(FITS files)

Data request

PRODUCER CONSUMER

OAIS

MetadataDatabase

Preprocessing

Data MenagmentFile catalogue

LFN PFN

Integration of different tools based on the Cloud, that allow the management of the data life cycle, and the query of data based on FAIR principles.

Integration of tools that allow archiving the event/monitoring/calibration data  produced by the tel-escope site in a distributed storage solution ena-bling the management of the metadata.

Management of the proprietary period to pre-vent unauthorized access to data.

The general objective of this use case is the integration of this large data source in order to extract different physics pa-rameters, like Gamma-ray direction & energy. Files are associated with metada-ta. Retrieve & query services can return fi les using the metadata parameters. Observation fi les are private for at least one year.

The archive must prevent unau-thorized access. CTA operators need to be able to confi gure poli-cies based on the type of storage: low & high latency storage.

Page 27: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

Research Communities and Case Study

ECRIN

ECRIN2 (European Clinical Research Infrastructure Network) is a not-for-profi t intergovernmental organ-isation that provides support for the development and implementation of multinational clinical research projects in Europe. These are mostly investigator initiated (rather than industry sponsored) clinical trials, run by non-com-mercial Clinical Trials Units (CTUs) based within universities or hospitals, though ECRIN does also support trials initiated by biotech and medical device small and medium enterprises (SMEs).

2 www.ecrin.org

Page 28: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

29

eXtreme-DataCloud Service Catalogue

Use Case

Goals

In recent years there has been a growing acceptance that to accurately assess the results of trials, and in particular to com-bine the results from different trials in meta-analyses, it is necessary to have ac-cess to the original source data, the “in-dividual participant data” (IPD), as well as the result summaries found in pub-lished papers. As a result, more and more

researchers are making such material (generically, “clinical trial data objects”) available for sharing. The researcher or reviewer wishing to locate relevant data objects for a study is therefore faced with a overwelming mosaic of possible source locations and access mechanisms, and this problem of ‘discoverability’ will

become much worse in the future.

Use a data management system, easily accessible via web based technologies, able to catalogue all the diverse data and documents associated with clinical research.

Adopt a generic metadata schema that cannot only describe the data objects themselves, but also link them to the source trials and provide information on how the data objects can be accessed.

Integration of tools to harvest and map the meta-data from existing data repositories, using availa-ble APIs and/or data mining techniques.

Page 29: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

Lifewatch

Research Communities and Case Study

LifeWatch3 is the e-Science and Technology European Infra-structure for Biodiversity and Ecosystem Research, that aims to advance science in these disciplines and to address the big environmental challenges as well as to support knowl-edge-based strategic solutions to environmental preser-vation. This mission is achieved by providing access to a multitude of datasets, services, tools and computing re-sources in general, enabling the construction and operation of Virtual Research Environments (VREs) bringing Informa-tion and Communication Technologies closer to the fi nal re-searcher.

3 www.lifewatch.eu

Page 30: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

31

eXtreme-DataCloud Service Catalogue

Use CaseDue to the eutrophication and harmful algae blooms the water quality is decreas-ing and it has a very negative economic impact. The forecasting of these harmful events is not easy. However, there are dif-ferent and heterogeneous data sources, satellite data, real-time monitoring sys-tems based on sensors, observations, and meteorological data, which can be inte-grated to feed the hydrological and water

quality models, thus automating mode-ling and prediction of water quality. The general objective of this use case is to use Cloud based services and technologies for the integration of this heterogeneous and large data sources in order to perform different models to obtain more valuable information, like water availability and other water quality parameters such as turbidity, chlorophyll concentration, etc.

Integration of different tools based on the Cloud, that allow the management of data life-cycle and the production of data based on FAIR+R principles.

Integration of tools that allow the automation of saving the data produced by the sensors stored in distributed storage solutions and the management of metadata for the integration of different sources.

Automatic deployment of modeling applications.

Goals

Page 31: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

Research Communities and Case Study

WLCG

The WLCG4 (Worldwide LHC Computing Grid) is a worldwide collaboration involving thousands of research-ers distributed worldwide that rely on a distributed e-in-frastructure shared by the LHC (Large Hadron Collider) experiments at CERN. WLCG is, in fact, composed of at least four main distinct communities, one per experiment: ALICE, ATLAS, CMS and LHCb.

4 wlcg.web.cern.ch

Page 32: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

33

eXtreme-DataCloud Service Catalogue

Use Case

Goals

Intelligent dataset distribution and data lifecycle management based on policies at the site and the federated level of resources.

Smart Caching solutions able to temporarily mi-grate data close to the geographical location where the analysis is performed.

Orchestrating Computing Workfl ows based on policy driven or adaptive data movements.

The foreseen computing requirements for the next data taking (named High Luminosity-LHC) will require 10 times more resources with respect to what can be provided by the extrapolation of the technology evolution consid-ering a constant budget. This means that in the near future a considerable increase in fl exibility of the resources

consumed is needed and a stronger in-tegration among the distributed stor-age resources must be pursued in order to reduce the number of replicas and better exploit tiered-storage systems

(providing different Quality-of-Services).

Page 33: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

Research Communities and Case Study

European XFEL

The European XFEL1 is a X-RAY Light Source, located in Hamburg, Germany. It is installed mainly in underground tunnels which can be accessed on three different sites. The 3.4 kilometre-long facility will run from the DESY campus in Hamburg to the town of Schenefeld in Schleswig-Holstein. At the research campus in Schenefeld, teams of scientists from all over the world will carry out experiments using the X-ray fl ash-es. Using the X-ray fl ashes of the European XFEL, scientists will be able to map the atomic details of viruses, decipher the molecular composition of cells, take three-dimensional images of the nanoworld, fi lm chemical reactions, and study processes such as those occurring deep inside planets.

1 www.xfel.eu

Page 34: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

35

eXtreme-DataCloud Service Catalogue

Use Case

Goals

Trigger

Write backof processed file

raw data events

MaccaroonsExperiment

FaaS on auto-scaling OpenStackinstanceProcessing

KafkaEvent Broker

dCacheStorage Systemftp

XFEL is expected to improve its data management in order to make derived data (images) available for further pro-cessing at the Kurchatov Institute NRC as fast as possible after the raw data has been calibrated at DESY. This will cover initial data taking, a pre-processing to check for misaligned or faulty images, the raw data storage, the calibration, the

storage of the calibrated derived data, the wide area transfer of that data to the Kurchatov Institute (NRC) and the fi nal analysis. For this, work on the WAN transfer will be required. Additionally, in order to support the data retention policy that XFEL has defi ned, improved quality of service (QoS) capabilities are needed for long-term archival of data.

Enabling fl exible wide-area data management based on storage events. File-based events can be triggered on creation, access, deletion and other conditions. Events can be fl exibly routed and used to trigger either data management or computing tasks.

Fast data taking, a pre-processing to check for misaligned or faulty images.

Improved quality of service (QoS) capabilities to support the data retention policy.

Page 35: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

36

eXtreme-DataCloud Service Catalogue

XDC in the context of the European e-Infrastructures

Technical Support

XDC makes data management services available providing a homogeneous set of interfaces that allows integration with other production services in distributed e-in-frastructure. The main target is of course the European Open Science Cloud (EOSC) and the computing infrastructures used by the scientifi c communities represented in the project, i.e. the Worldwide LHC Computing Grid and the EGI Federated Cloud. We are collaborating with other software providers for the EOSC in order to adopt whenever possible the same protocols, standards, interfaces and tools to maximize interoperability. In particular the DEEP-HybridDataCloud1 project is the computing counterpart of XDC in the fi eld of Artifi cial Intelligence frameworks instantiated on the Cloud at the PaaS level. XDC and DEEP are pushing their prod-ucts into the EOSC marketplace for general availability.

1 https://deep-hybrid-datacloud.eu/

Most complex software contains bugs, and we are not an exception. One of the features of free and open source software is the ability to report bugs, helping to fi x or improve the software you use. For such reasons, the eXtreme-DataCloud project relies on the project contacts http://www.extreme-datacloud.eu/contact/ and on the specifi c dedicated channels for each tool.

Page 36: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

37

eXtreme-DataCloud Service Catalogue

Share the XDCExperience

You can socialize with us via

Developers, researchers and IT enthusiasts

Please feel free to ask for more information on how to use or adopt the XDC solution for your work to:

[email protected]

https://twitter.com/xtremedatacloud

https://www.linkedin.com/groups/12181004/

Finally, you can also visit the eXtreme-Datacloud Websiteand be informed about the project, such as new releases and community events.

http://www.extreme-datacloud.eu/

Page 37: eXtreme-DataCloudIt is with great pleasure that we intro-duce to you the eXtreme-DataCloud Service Catalogue. eXtreme-DataCloud was created to develop and release en-hanced data management

38

eXtreme-DataCloud Service Catalogue

XDC is run by…