PhenoMeNal e-infrastructure (slides)

PhenoMeNal(Horizon 2020)

Marco CapucciniPhD student

Uppsala University, Sweden

What is PhenoMeNal?

PhenoMeNal: Phenome and Metabolome aNalysisA comprehensive and standardized e-infrastructure for:

Processing, analysis and information-mining of medical molecular genotyping and

phenotyping Big Data, generated by metabolomics applications

3 years to achieve this (2 and a half left)

Background

Metabolomics: identification and quantification of molecules involved in metabolomic pathways

● Mass spectrometry (MS): good molecular coverage, and high-throughput (35K spectra per hour)

● Linking metabolomics data to genomic data, will enable personalized evidence-based medicine

13 partners

Uppsala University (UU) leads Work Package 5 (WP5)

WP5: Operations and Maintenance of PhenoMeNal GRID/Cloud

Ola SpjuthAssociate professor FarmBio

Marco Capuccini

PhD studentScientific

Computing

Anders Larsson

ResearcherSciLifeLab

Stephanie Herman

PhD studentMedical Science

Payam Emami

PhD studentMedical Science

Kim KultimaAssociate professor Medical Science

WP5 in a nutshell

Many tools for MS are already available, and trivially parallelizable:

Goal: provide MS tools to medical doctors and biologists, in a scalable, secure and

easy-accessible way

Service-oriented Architecture

Microcroservice-oriented architecture:● SaaS Every tool is provided as a

microservice, which is isolated, minimal and complete. Containerization (e.g. Docker) plays an important role.

Infrastructure:● PaaS Microservice orchestration through

middleware frameworks● IaaS EC2, GCE, OpenStack, Vagrant and

EGI federated cloud

MANTL by Cisco Cloud

Orchestration● Marathon: long-lasting services● Chronos/Galaxy/Jupyter: workflows● Kubernetes (experimental)

Service discovery● Consul: LAN and WAN level● Traefik: reverse proxy

Resource management & scheduling● Mesos● Kubernetes (experimental)

Provisioning● Terraform: host cloud provisioning● Ansible: VM provisioning

Workflow tools

Chronos

Galaxy

DSL language in Jupyter

Current status

Some achievements:● Working deployments on GCE and Smog

(UU OpenStack Installation)● We are able to run some simple use casesIn progress, reproducing:

Large-scale Metabolomic Profiling Identifies Novel Biomarkers for Incident Coronary Heart

DiseaseGanna et. al

Future work

Speeding up deployments:● Packer didn't work

○ MANTL components are Dockerized and stored in an attached volume

● REPL/DockerHub mirrors Autoscaling:● Run a microservice to autoscale the

cluster via Terraform/Ansible● Integrate INDIGO components

Future work (cont'd)

Datacenter federation (Consul)● How do we protect data that can be

accessed only by certain users?● Data federation: Object Storage/iRods, or is

it enough to store on GlusterFS?User accounts● JupyterHub (SaaS level), MANTL (PaaS

level, single-user) and EGI Federated Cloud (IaaS, level). It seems tricky at the moment.

Questions?

Marco [email protected]

PhenoMeNal e-infrastructure (slides)

Documents

Transcript of PhenoMeNal e-infrastructure (slides)