PhenoMeNal e-infrastructure (slides)
Transcript of PhenoMeNal e-infrastructure (slides)
PhenoMeNal(Horizon 2020)
Marco CapucciniPhD student
Uppsala University, Sweden
What is PhenoMeNal?
PhenoMeNal: Phenome and Metabolome aNalysisA comprehensive and standardized e-infrastructure for:
Processing, analysis and information-mining of medical molecular genotyping and
phenotyping Big Data, generated by metabolomics applications
3 years to achieve this (2 and a half left)
Background
Metabolomics: identification and quantification of molecules involved in metabolomic pathways
● Mass spectrometry (MS): good molecular coverage, and high-throughput (35K spectra per hour)
● Linking metabolomics data to genomic data, will enable personalized evidence-based medicine
13 partners
Uppsala University (UU) leads Work Package 5 (WP5)
WP5: Operations and Maintenance of PhenoMeNal GRID/Cloud
Ola SpjuthAssociate professor FarmBio
Marco Capuccini
PhD studentScientific
Computing
Anders Larsson
ResearcherSciLifeLab
Stephanie Herman
PhD studentMedical Science
Payam Emami
PhD studentMedical Science
Kim KultimaAssociate professor Medical Science
WP5 in a nutshell
Many tools for MS are already available, and trivially parallelizable:
Goal: provide MS tools to medical doctors and biologists, in a scalable, secure and
easy-accessible way
Service-oriented Architecture
Microcroservice-oriented architecture:● SaaS Every tool is provided as a
microservice, which is isolated, minimal and complete. Containerization (e.g. Docker) plays an important role.
Infrastructure:● PaaS Microservice orchestration through
middleware frameworks● IaaS EC2, GCE, OpenStack, Vagrant and
EGI federated cloud
MANTL by Cisco Cloud
Orchestration● Marathon: long-lasting services● Chronos/Galaxy/Jupyter: workflows● Kubernetes (experimental)
Service discovery● Consul: LAN and WAN level● Traefik: reverse proxy
Resource management & scheduling● Mesos● Kubernetes (experimental)
Provisioning● Terraform: host cloud provisioning● Ansible: VM provisioning
Workflow tools
Chronos
Galaxy
DSL language in Jupyter
Current status
Some achievements:● Working deployments on GCE and Smog
(UU OpenStack Installation)● We are able to run some simple use casesIn progress, reproducing:
Large-scale Metabolomic Profiling Identifies Novel Biomarkers for Incident Coronary Heart
DiseaseGanna et. al
Future work
Speeding up deployments:● Packer didn't work
○ MANTL components are Dockerized and stored in an attached volume
● REPL/DockerHub mirrors Autoscaling:● Run a microservice to autoscale the
cluster via Terraform/Ansible● Integrate INDIGO components
Future work (cont'd)
Datacenter federation (Consul)● How do we protect data that can be
accessed only by certain users?● Data federation: Object Storage/iRods, or is
it enough to store on GlusterFS?User accounts● JupyterHub (SaaS level), MANTL (PaaS
level, single-user) and EGI Federated Cloud (IaaS, level). It seems tricky at the moment.
Questions?
Marco [email protected]