TITRE SUR 1 OU 2 LIGNES MAXIMUM · Application Mapping and Scheduling E. Caron, T. Gautier, L....
Transcript of TITRE SUR 1 OU 2 LIGNES MAXIMUM · Application Mapping and Scheduling E. Caron, T. Gautier, L....
AVALONAlgorithms and Software Architecturesfor Distributed & High Performance Computing Platforms
avalon.ens-lyon.fr
Christian Perez
2016, April 19th
Avalon
Avalon Members @ April 1st, 2016 : 28 peopleFaculty Members (9)
(4 INRIA, 1 CNRS, 3 UCBL, 1 ENSL)
• Christian Perez, DR Inria, HDR, Project leader
• Laurent Lefèvre, CR Inria, HDR
• Gilles Fedak, CR Inria, HDR, HDR
• Thierry Gautier, CR INRIA
• Frédéric Suter, CR CNRS, HDR
• Yves Caniou, MCF UCBL
• Jean-Patrick Gelas, MCF UCBL
• Olivier Glück, MCF UCBL
• Eddy Caron, MCF ENS Lyon, HDR
PhD students (7+1)
• Daniel Balouek, CIFRE New Generation SR
• Radu Carpa, MESR/ENSL
• Hadrien Croubois, ENSL
• Pedro Paulo, Inria
• Issam Rais, Inria
• Jérôme Richard, Inria
• Violaine Villebonnet, Inria
• Vincent Lanore, ENSL (defended)
Postdoc / Temporary Researcher (2)
• Hélène Coullon, Inria
• Marcos Dias de Asuncao, Inria
Engineers (3+5)
• Simon Delamare, IR CNRS (80%)
• Jean-Christophe Mignot, IR CNRS (20%)
• Matthieu Imbert, Inria SED (40%)
• Marc Pinhède, Grid’5000, Inria
• Olivier Mornard, Moebus, Inria
• Romaric Guiller, e-Biothon, CNRS
• David Loup, Grid’5000, part-time apprentice
• Arnaud Lefray, Quirinus, Inria
Assistant
• Evelyne Blesle, Inria
2
Avalon
Avalon: Research Activities
Applications
Super-
computers
(Exascale)
Large scale
Desktop
Grids
Volatility
Clouds(IaaS, PaaS)
On demand
Grids
(EGI)
Heterogeneity
CPU/data-intensive Scientific Applications• From “simple” to code coupling
• Structure complexity• “New” forms of interactions (MR)
Computing platforms• Different characteristics
• Performance, energy, size, cost,reliability, QoS, etc.
• Hybridization• Sky computing, HPC@Cloud, Exascale,
Spot instance
Objectives• Expressiveness simplicity• Application portability• Resource specific optimizations
• Elastic resource management• Energy consumption
?
3
Avalon
Avalon: Research Activities
Programming Abstractions
Application &ResourceModels
Resource Abstractions
Algorithmics
Applications
Super-
computers
(Exascale)
Large scale
Desktop
Grids
Volatility
Clouds(IaaS, PaaS)
On demand
Grids
(EGI)
Heterogeneity
CPU/data-intensive Scientific Applications• From “simple” to code coupling
• Structure complexity• “New” forms of interactions (MR)
Computing platforms• Different characteristics
• Performance, energy, size, cost,reliability, QoS, etc.
• Hybridization• Sky computing, HPC@Cloud, Exascale,
Spot instance
Objectives• Expressiveness simplicity• Application portability• Resource specific optimizations
• Elastic resource management• Energy consumption
4
Avalon
Avalon: Research Activities
Programming Abstractions
Application &ResourceModels
Resource Abstractions
Algorithmics
Applications
Super-
computers
(Exascale)
Large scale
Desktop
Grids
Volatility
Clouds(IaaS, PaaS)
On demand
Grids
(EGI)
Heterogeneity
CPU/data-intensive Scientific Applications• From “simple” to code coupling
• Structure complexity• “New” forms of interactions (MR)
Computing platforms• Different characteristics
• Performance, energy, size, cost,reliability, QoS, etc.
• Hybridization• Sky computing, HPC@Cloud, Exascale,
Spot instance
Objectives• Expressiveness simplicity• Application portability• Resource specific optimizations
• Elastic resource management• Energy consumptionElasticity
Energy
5
Avalon Avalon
Avalon: Four Research Axes
Energy Application Profiling and ModelingJ.-P. Gelas, O. Glück, L. Lefèvre, J.-C. Mignot
• Energy proportional HPC and Clouds infrastructures
• Energy efficient core and access networks for Cloud scenarios
Data-intensive Application Profiling, Modeling, and ManagementG. Fedak, F. Suter, Y. Caniou, M. Dias de Asuncao
• Performance Prediction of Parallel Regular Applications (SimGrid/MPI)
• Modeling Large Scale Storage Infrastructure (SimGrid/Storage)
• Data Management for Hybrid Computing Infrastructures
Resource Agnostic Application Description ModelE. Caron, T. Gautier, C. Pérez
• Moldable Application Description Model (HLCM)
• Dynamic Adaptation of the Application Structure
Application Mapping and SchedulingE. Caron, T. Gautier, L. Lefèvre, C. Pérez, F. Suter,, M. Dias de Asuncao
• Energy-aware hybrid task scheduling (OpenMP)
• Application Mapping and Software Deployment, including Security
• Workflow Management in Cloud Infrastructure (DIET)
Applications
Super-
computers
(Exascale)
Large scale
Desktop
Grids
Volatility
Clouds(IaaS,
PaaS)
On demand
Grids
(EGI)
Heterogeneity
6
Avalon
Grid5000: An Experimental Testbed
Scientific issues• Large scale, volatile, complex systems
- Performance, fault tolerance, scalability, data storage, programming
models, algorithms, resource management, etc.
• Methodological challenges (reproducibility)
Instrument created in 2003
Hardware• 10 sites, ~26 clusters, ~1260 nodes, ~8000 cores
Dedicated backbone• 10Gb/s
Feature• Complete reconfiguration (system, application)
on a bare hardware, network isolation- Infiniband, Ethernet 10G, dedicated Renater network
• Support for production grid (gLite) software infrastructure
• Support for Cloud (OpenStack, Cloudstack, OpenNebula, Nimbus)
Collaboration with NSF Chameleon
Avalon
Profiling and Understanding Energy
Consumption of Real Applications
8
Avalon Avalon
Energy Efficient Software in HPC
Two focus: fault tolerance and data broadcast
Help users to choose the best service
Applications on exascale infrastructures
PhD of Mehdi Diouri
9
Avalon Avalon
Fast yet Accurate Simulation of MPI
Applications with SMPI (SimGrid)
Hybrid model Informed platform Accurate Simulation
description and bug detection
PhD of G. Markomanolis
10/32
+ =
Avalon Avalon
Xkaapi: Library for Dependent Tasks
• Efficient management of small grain dependent tasks
• Scheduling with work stealing algorithm/heuristic
• Loop scheduler for irregular application
11/32
1.8TFlopsCholesky
Avalon Avalon
High & Low Level Component Model
HLCM: Component Programming Model• Component model (hierarchical)
- Primitive and composite• Connector based
- Primitive and composite• Generic model
- Support meta-programming (template à la C++)• Currently static
HLCMi: an implementation of HLCM• Model-transformation based (EMF)• Connectors
- Use/Provide - Shared Data- Collective Communications, MxN- Some skeletons
- Domain Decomposition, MapReduce, etc.
L2C: Component Execution Model
• Minimalist component model for HPC
• Co-development with CEA in negotiation
PhD. of J. Bigot
ConnectorComponent Component
roles
12
Avalon
MapReduce for Large, Distributed, and
Dynamic Datasets Scheduling algorithms for optimizing shuffle phase
MapReduce runtime for• Distributed over hybrid and widely distributed infrastructures
• Cloud, Desktop PCs, sensors, smartphones…
• Dynamic, i.e. that grow or shrink during time, or partially
unavailable because of infrastructure failures.
MapReduce/BitDew• First implementation of MapReduce for Internet Desktop Grid
• 2-level scheduler, latency hiding, p-failures resilient, collective
communications
• Algorithm distributed result checking of intermediate
• MapReduce/ActiveData: incremental processing of dynamic
datasets
• Storage on hybrid Cloud + Desktop PCs nodes
• Privacy computing on hybrid infrastructures using Information
Dispersal AlgorithmsThroughput of WordCount
application on Grid’5000 (512
nodes) up to 2 TB
Execution time reduced by up to 47%!
Time of map phase and shuffle w.r.t
number of mappers and reducers
13
Avalon Avalon
Active-Data:
A Programming Model for Data Life-Cycle Management
A data life cycle model
Data management systems to expose data life cycle
Well-formalized representation Inspired by Petri Net
A programming model and a runtime environment
Associate a code to each step of the data life cycle
14
PhD of A. Simonet
Avalon Avalon
Mapping Applications over Clouds
Scheduling algorithms for dynamic workflows (PhD A. Museran)
• Workflows with conditionals and loops over IaaS Clouds
• HEFT-based algorithms on sub-workflows
• taking into account cost and performance
Simulation of IaaS frameworks using SimGrid (J. Rouzaud-Cornabas)
• Study of the performance of the Amazon platform
• Same API for the simulation of several different applications
Cloud workload prediction (PhD A. Museran)
• Identifying similar past occurrences of the current short-term workload history
Economic model for Cloud infrastructure
(PhD A. Museran, L. Rodero-Merino)
• Simulations of an economic approach to set resource
prices
• Resolve when to scale resources depending on the
users demand (ensuring fair share among users
15
Avalon
Desktop Grids and Hybrid Distributed
Computing Infrastructures (DCI)
SpeQuloS: a QoS service for Best-Effort DCIs • Best-Effort DCIs: Desktop Grids, Amazon EC spot instances…
• No resource availability guarantee
• SpeQuloS:
• Provides QoS to BoT executed on the European Desktop
Grid Infrastructure
• Dynamic provision of Cloud resources to alleviate the tail
effect
• Results
• Removes/halves the tail in 50%/80% of the BoTs
• Up to 9x speed-up
• Improves BoT execution predicability and stability
Multi-criteria Scheduling for Hybrid DCIs• Scalable and “smart” approach based on pull-based scheduler
with Promethee methods
• Criteria considered : time/cost/trust
• Platforms considered : Desktop Grid/Clouds/Grids
Execution time reduced by up to 47%!
Tail effect: the last fraction of the
BoT that takes the longest time to
complete
The SpeQuloS framework
16
Avalon Avalon
Major Software
DIET
• A hierarchical resource management (grids & clouds)
• Used in production in the Décrypthon project
• Co-development with SysFera (GRAAL’s startup)
Simgrid
• In collaboration with Algorille, Mescal and Univ. of Hawaii at Manoa
• Simulation platform
• Chosen by CERN to simulate their data management infrastructure
Bitdew/ActiveData
• Open source middleware for data management on desktop Grid
ShowWatts/Kwapi
• Energy usage measurements of large scale distributed systems
HCLM
• Implementation of a generic connector-based hierarchical component model
17
Avalon
Major Projects and Contracts
International
• Inria-Illinois-ANL-BSC-JSC-Riken/AICS
Joint Laboratory on Extreme Scale
Computing (2014-18)
• Green’Touch (2012-2016)
Europe
• PaaSage (FP7 ICT, 2012-16)- Model Based Cloud Platform Upperware
• Nesus (COST ICT, 2014-19)- Network for Sustainable Ultrascale Computing
• Centre d’Excellence H2020
• ETP4HPC (SRA 2)- Energy and Resiliency
- Programming Environment
- System Software and Management
Industry Contracts
• New Generation SR
• Defab
National
• PIA ELCI (2014-17)
• Environnement Logiciel pour le Calcul Intensif
• ANR Moebus (2013-2017)
- Multi-objective scheduling for HPC
National Laboratory of Excellence
• Mathematics and Computer Science (2011-
18)
• Physics, Radiobiology, Medical Imaging and
Simulation (PRIMES, 2012-19)
Inria
• IPL C2S@Exa (2013-17)
• IPL Discovery (2015-19)
• IPL Hac-Species (2016-20)
• ADT Aladdin
Startup Incubation
• Quirinus
19/04/201600 MOIS 2011Avalon Team Presentation @ INRIA Seminar 18
Avalon Avalon
Avalon: Four Research Axes
Energy Application Profiling and ModelingJ.-P. Gelas, O. Glück, L. Lefèvre, J.-C. Mignot
• Energy proportional HPC and Clouds infrastructures
• Energy efficient core and access networks for Cloud scenarios
Data-intensive Application Profiling, Modeling, and ManagementG. Fedak, F. Suter, Y. Caniou, M. Dias de Asuncao
• Performance Prediction of Parallel Regular Applications (SimGrid/MPI)
• Modeling Large Scale Storage Infrastructure (SimGrid/Storage)
• Data Management for Hybrid Computing Infrastructures
Resource Agnostic Application Description ModelE. Caron, T. Gautier, C. Pérez
• Moldable Application Description Model (HLCM)
• Dynamic Adaptation of the Application Structure
Application Mapping and SchedulingE. Caron, T. Gautier, L. Lefèvre, C. Pérez, F. Suter,, M. Dias de Asuncao
• Energy-aware hybrid task scheduling (OpenMP)
• Application Mapping and Software Deployment, including Security
• Workflow Management in Cloud Infrastructure (DIET)
Applications
Super-
computers
(Exascale)
Large scale
Desktop
Grids
Volatility
Clouds(IaaS,
PaaS)
On demand
Grids
(EGI)
Heterogeneity
19