Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Nouredine Melab (...

23
Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Nouredine Melab ([email protected]) Member of Grid5000 steering committee Laboratoire d’Informatique Fondamentale de Lille Parallel Cooperative Optimization Research Group INRIA DOLPHIN Project

Transcript of Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Nouredine Melab (...

Combinatorial Optimization on the Computational Grid

Experiments on Grid5000

Nouredine Melab ([email protected])Member of Grid5000 steering committee

Laboratoire d’InformatiqueFondamentale de Lille

Parallel Cooperative

Optimization Research

Group

INRIA DOLPHIN Project

Combinatorial optimization problems

High-dimensional and complex optimization problems in many areas of industrial concern

Parallel hybrid optimization methods allow to efficiently provide effective solutions, but they remain insufficient for large problems …

… Need of large scale parallelism (Grid computing)

(Multi-Objective))(..., ),(),( )( min

21xxxxf fff

n

Sx

Const.

2n

(Mono-Objective) )(min xf

Sx ( )

A taxonomy of optimization methods

Exact algorithms Heuristics

Branchand X

DynamicProgramming

A*Specific

HeuristicsMeta-heuristics

SingleSolution

Population of solutions

LocalSearch

SimulatedAnnealing

TabuSearch

EvolutionaryAlgorithms

Scatter,Swarm search

Near-optimal solutions for large problem instances

Optimal solutions for small problem

instances

Design and implementation of Grid-based algorithms …

Meta-heuristics (near-optimal)Parallel hybrid design

… solving challenging problems in combinatorial optimization

Exact algorithmsParallel design

Implementation(ParadisEO@Grid)

Cooperation

Implementation(B&B@Grid)

Protein Structure Prediction Flow-Shop scheduling problem

Supported by ANR-GRID DOCK

Supported byACI-GRID DOC-G

Combinatorial Optimization on the Computational GridExperiments on Grid5000

Supported by ANR-GRID CHOC

Meta-heuristics: Parallel models and hybridization mechanisms

Parallel models They allow to improve efficiency and effectiveness Population-based meta-heuristics

Island model, parallel evaluation of the population, parallel evaluation of a single solution

Single solution-based meta-heuristics Multi-start model, parallel exploration of the neighborhood,

parallel evaluation of a single solution

Hybridization mechanisms … … allow to combine different methods for better robustness

and effectiveness, but are CPU-time intensive

N. Melab, E-G. Talbi, S. Cahon, E. Alba and G. Luque. Parallel Meta-heuristics: Algorithms and Frameworks. Chapter 6 in “Parallel Combinatorial Optimization”, Wiley Series on Parallel and Distributed Computing, ISBN: 0-471-72101-8, Nov 2006.

“Gridification” of parallel hybrid meta-heuristics

Major properties of computational grids Multi-administrative domain, heterogeneity, dynamic availability

of resources, large scale

Major adaptations of the different models and mechanisms

Asynchronous design and implementation Granularity management and load balancing Checkpointing-based fault tolerance (a memory for each model) Adaptation of the parameters of each model (e.g. migration

topology for the island model)

N. Melab, S. Cahon and E-G. Talbi. Grid Computing for Parallel Bioinspired Algorithms. Journal of Parallel and Distributed Computing (JPDC), Elsevier Science, Vol.66(8), Pages 1052-1061, 2006.

Our contributions

Multi-Objective EO (MOEO) for the design of multi-objective evolutionary algorithms

Moving Objects (MO) for the design of local search algorithms

ParadisEO for parallel hybrid metaheuristics

PARAllel and DIStributed Evolving Objectshttp://www2.lifl.fr/OPAC/Softwares/ParadisEO/

Message passing (MPI, PVM) Clusters, Networks of Workstations,

Multi-programming (PThreads) Shared Memory Multi-processors

(SMP) Parallel distributed computing

Clusters of SMPs (CLUMPS) Grid computing

Condor-MW and Globus (MPICH-G2)

EO

ParadisEO@Grid

MO MOEO PVM, PThreads MPI (LAM, CH)Condor-MW Globus

S. Cahon, N. Melab and E-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, Elsevier Science, Vol.10(3), pages 357-380, May 2004.

Evolving Objects framework (EO)

European project(Geneura Team, INRIA, LIACS)

http://eodev.sourceforge.net

Transparent use

ParadisEO-G4: ParadisEO on Globus 4

Design and implementation Gridification of the parallel models and hybridization

mechanisms provided in ParadisEO MPICH-G2 as the communication library

Deployment on the computational Grid (Grid5000) Building of system image for Globus 4 including MPICH-G2 Virtual Globus Grid on Grid5000 for the Grid-based

deployment of the parallel hybrid meta-heuristics provided in ParadisEO

Design and implementation of Grid-based algorithms …

Meta-heuristics (near-optimal)Parallel hybrid design

… solving challenging problems in combinatorial optimization

Exact algorithmsParallel design

Implementation(ParadisEO@Grid)

Cooperation

Implementation(B&B@Grid)

Protein Structure Prediction Flow-Shop scheduling problem

Supported by ANR-GRID DOCK

Supported byACI-GRID DOC-G

Combinatorial Optimization on the Computational GridExperiments on Grid5000

Supported by ANR-GRID CHOC

Protein Structure Prediction on the GridModelling

The problem consists in finding …

… the ground-state (tertiary stable) conformation of a protein from its primary structure composed of a sequence of amino-acids (residues)

Modelled as a bi-objective optimization problem Candidate solutions: Molecular conformations

(geometries) – vectors of torsion angles Molecular conformation with lower free energies (bonded

atoms and non-bonded atoms)

Protein Structure Prediction on the GridComplexity and landscape analysis

For a molecule of 40 residues with 10 conformations per residue, 1040 conformations are obtained in average … 1018 years are required at 1014 conformations explored

per second!

Landscape analysis Multi-modal landscape Need of parallel hybrid (global and local) meta-heuristics and Grid computing

Parallel evaluation of

the population

High-level co-evolutionary hybridizationMulti-start model

High-level co-evolutionary hybridization

Cooperative GAs (Island model)

Parallel asynchronous hierarchical hybrid meta-heuristic

A-A. Tantar, N. Melab, E-G. Talbi, O. Dragos and B. Parain. A Parallel Hybrid Genetic Algorithm for Protein Structure Prediction on the Computational Grid. FGCS, Elsevier Science, Vol.23(3), 398-409, 2007.

... ...

...∂

1∂

2∂

n

...∂'

1∂'

2 ∂'n

Genetic Algorithm Population

Local Search

Optimized Individual

Grid5000: 7 sites, Avg. 800 CPUs – Execution time: 1h – Cumul. time: 1 month

Preliminary experimental results on Grid5000

Implementation with ParadisEO-G4

Protein: Tryptophan-cage from Protein Data Bank (PDB - 1L2Y)

Average Quality Improvement: 62%

Interconnection Grid5000-DAS

Benefits More resources for dealing with very large proteins with

grid-based meta-heuristics New scientific challenge: scalability of ParadisEO-G

Requirements Need of a virtual Globus grid between Grid5000 and DAS

Common certification authority ?

Get longer the default run time of jobs in DAS Deployment time of the virtual Globus grid ~ 10 minutes Only 5 minutes for the combinatorial optimization process on

DAS !!

Design and implementation of Grid-based algorithms …

Meta-heuristics (near-optimal)Parallel hybrid design

… solving challenging problems in combinatorial optimization

Exact algorithmsParallel design

Implementation(ParadisEO@Grid)

Cooperation

Implementation(B&B@Grid)

Protein Structure Prediction Flow-Shop scheduling problem

Supported by ANR-GRID DOCK

Supported byACI-GRID DOC-G

Combinatorial Optimization on the Computational GridExperiments on Grid5000

Supported by ANR-GRID CHOC

Parallel models for exact optimization(B&B inspired)

B&B = Exploration + bounding of tree nodes Parallel models

Parallel multi-parametric model Parallel exploration of the search tree Parallel evaluation of the bounds Parallel evaluation of a single bound/solution

Parallel exploration of the search tree Massive parallelism needing a computational grid Gridification is required

Efficient work distribution during the exploration Need of low cost communications of work units

Efficient checkpointing-based Fault tolerance Search of an exact solution in a volatile

environment Low cost communication and storage of work units

Efficient termination detection May be implicit

The proposed approach: objectives

The approach uses a special coding … Node number Work unit (collection of nodes) = an

interval

Principles of the approach

0

0

0

1 2

2

3 4

4

5

[0,2] [3,5]

[0,5]

The approach is Dispatcher-Worker based on the work stealing paradigm Dispatcher: maintains a pool of work units (intervals) and the global

solution found so far Worker: performs B&B on a given interval and updates the global

solution

Work distribution and check-pointing Communication of intervals (two numbers) Two efficient operators: folding and unfolding of intervals

Design and implementation of Grid-based algorithms …

Meta-heuristics (near-optimal)Parallel hybrid design

… solving challenging problems in combinatorial optimization

Exact algorithmsParallel design

Implementation(ParadisEO@Grid)

Cooperation

Implementation(B&B@Grid)

Protein Structure Prediction Flow-Shop scheduling problem

Supported by ANR-GRID DOCK

Supported byACI-GRID DOC-G

Combinatorial Optimization on the Computational GridExperiments on Grid5000

Supported by ANR-GRID CHOC

N jobs to be scheduled on M machines Each machine can not be simultaneously assigned to two

jobs (colors) Jobs (colors) must be scheduled in the same order on all

machines One objective must be minimized

Cmax: Makespan (Total completion time)

M1

M2

M3

The Flow Shop Scheduling Problem

4 jobs on 3 machines

Network of the campus of Université de Lille1

123

FIL (Lille1)170

IUT A118

1718

A grid of more than 2000 processors

Grid5000 node at Lille

RENATER

NR

...NR

Other sites of GRID’5000

Grid’5000Grid’5000

Front-end

IP forwarding NAT

Dispatcher on a computation node

Experimental results

Standard Taillard’s benchmark: Ta056 - 50 jobs on 20 machines

Best known solution: 3681, Ruiz & Stutzle, 2004 Exact solution: 3679, Mezmaz, Melab & Talbi, 2006

Running wall clock time: 25 days 46 min

CPU time on a single processor: 22 years 185 days 16 hours

Avg. num. of exploited processors: 328

Maximum number of exploited processors: 1 195

Parallel efficiency: 97 % Bordeaux (88), Orsay (360), Sophia (190), Lille (98), Toulouse (112), Rennes (456), Univ. Lille1 (304)

M. Mezmaz, N. Melab, E-G. Talbi. A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems. Research Report, INRIA 5945, July 2006 (https://hal.inria.fr/inria-00083814).

Interconnection Grid5000-DAS

Benefits More resources for solving efficiently and optimally larger

problem instances with grid-based combinatorial optimization New scientific challenge: scalability (limits and solutions) The dispatcher has never crashed on Grid5000 (up to 2500

processors)

Requirements Avoiding the special configuration of the front-end to allow

transparent inter-grid communications between the dispatcher and the workers

Viewing DAS as a Grid5000 site and vice versa ?

Best-effort reservation mode in DAS Long-running problems Using the nodes as long as they are not requested for reservation