A modeling approach for estimating execution time of long-running Scientific Applications

19
A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1 , Shu Shimizu 2 , Javier Figueroa 1,3 , Raju Rangaswami 1 , Javier Delgado 1 , Hector Duran 4 , Xabriel J. Collazo-Mojica 5 Presented by: Xabriel J. Collazo-Mojica 5 1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico, Mayagüez Campus, Puerto Rico Miami, Florida – April 2008

description

A modeling approach for estimating execution time of long-running Scientific Applications. Seyed Masoud Sadjadi 1 , Shu Shimizu 2 , Javier Figueroa 1,3 , Raju Rangaswami 1 , Javier Delgado 1 , Hector Duran 4 , Xabriel J. Collazo-Mojica 5 Presented by: Xabriel J. Collazo-Mojica 5 - PowerPoint PPT Presentation

Transcript of A modeling approach for estimating execution time of long-running Scientific Applications

Page 1: A modeling approach for estimating execution time of long-running Scientific Applications

A modeling approach for estimating execution time of long-running

Scientific ApplicationsSeyed Masoud Sadjadi1, Shu Shimizu2, Javier Figueroa1,3, Raju

Rangaswami1, Javier Delgado1, Hector Duran4, Xabriel J. Collazo-Mojica5

Presented by: Xabriel J. Collazo-Mojica5

1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico,

Mayagüez Campus, Puerto Rico  

Miami, Florida – April 2008

Page 2: A modeling approach for estimating execution time of long-running Scientific Applications

Presentation Outline

• Motivation

• Research Approach

• Research Validation

• Related Work

• Concluding Remarks

• Future Research

HPGC '08 - April 14 - LA Grid 2

Page 3: A modeling approach for estimating execution time of long-running Scientific Applications

Motivation

• The impact of hurricanes is devastating• The Weather Research and Forecasting (WRF)

model• Most popular• It is computational and storage intensive

• We need higher resolution and more precise forecast• Many organizations are willing to share resources• But these resources are dynamic and unpredictable

HPGC '08 - April 14 - LA Grid 3

Page 4: A modeling approach for estimating execution time of long-running Scientific Applications

Motivation

• At the time of a hurricane, we need to act fast• What resources should we allocate?• We need to finish in a strict deadline (i.e. on time for

hurricane forecast)• In the order of seconds, we need to make a decision

• We need to model execution time of WRF based on target resources • In our case: clusters with different parameters

HPGC '08 - April 14 - LA Grid 4

Page 5: A modeling approach for estimating execution time of long-running Scientific Applications

Approach to Modeling Resource Usage

WRF

HPGC '08 - April 14 - LA Grid 5

Page 6: A modeling approach for estimating execution time of long-running Scientific Applications

Approach to Modeling Execution Parallelism

• Platform heterogeneity• We assume identical individual resource

characteristics of computation, communication and storage power.

• Execution scale• We add a parameter to model the number of nodes

utilized during execution.

1 2 3 N…

HPGC '08 - April 14 - LA Grid 6

Page 7: A modeling approach for estimating execution time of long-running Scientific Applications

Application Resource Usage Model

• Characterize Applications according to their resource usage characteristics (i.e. application "profiles”)

• Assumptions:• Execution time is based on contributors• Product of contributors determines total execution

time• Computation nodes are homogeneous (e.g. Beowulf

cluster)• Non-ad-hoc application characteristics

HPGC '08 - April 14 - LA Grid 7

Page 8: A modeling approach for estimating execution time of long-running Scientific Applications

Application Resource Usage Model - Contributors

• Model aims to allow as many contributors as necessary• This paper focus: 2 contributors• First contributor: Parallelism

• Ppara = degree of parallelism• α0= constant contribution• α1 = variable contribution

• Second contributor: CPU Performance• Pclock = clock speed of compute node• ß0 = constant contribution related to CPU performance• ß1 = variable contribution related to CPU performance

HPGC '08 - April 14 - LA Grid 8

Page 9: A modeling approach for estimating execution time of long-running Scientific Applications

Experimental Approach - Environment

• GCB cluster: Rocks ver. 4.0, 8 nodes, each containing 32-bit x86 Intel 3.0 GHz processors, 1GB of main memory and uses a gigabit network connection

• Mind cluster: Rocks ver. 4.0, 16 nodes, each containing dual Xeon 3.6GHz processors, 2GB of main memory and uses gigabit network connection

• CPU vs. #-of-NODES:100% to 10% CPU percentages with intervals of 10%

• We use CPULimit

HPGC '08 - April 14 - LA Grid 9

Page 10: A modeling approach for estimating execution time of long-running Scientific Applications

Experimental Approach - Monitoring and Prediction

• Two tools were used• Amon – A Monitoring Tool

• Daemon-like application that collects and reports exploratory variables

• Aprof – A Profiling Tool• Statistical Prediction Program• Listens to Amon reports from compute nodes• Stores collected data as matrix for each application

HPGC '08 - April 14 - LA Grid 10

Page 11: A modeling approach for estimating execution time of long-running Scientific Applications

Experimental Approach - Monitoring and Prediction

HPGC '08 - April 14 - LA Grid 11

Page 12: A modeling approach for estimating execution time of long-running Scientific Applications

Application Resource Usage Model - Validation

• Intuitive Assumption that execution time decreases linearly with the inverse of total computational power.

• Predictions within a cluster (i.e. GCB to GCB)• GCB - FE 5.34% ME 5.86%• Mind - FE 5.66% ME 3.80%

• Predictions across clusters• GCB to Mind - FE 9.97% ME 5.86%• Mind to GCB - FE 5.83% ME 4.13%

• This results validate our simple model.

HPGC '08 - April 14 - LA Grid 12

Page 13: A modeling approach for estimating execution time of long-running Scientific Applications

Application Resource Usage Model - Mind to GCB prediction

HPGC '08 - April 14 - LA Grid 13

Page 14: A modeling approach for estimating execution time of long-running Scientific Applications

Concluding Remarks

• We've proposed a new approach for modeling resource usage and execution time of a distributed application

• Experimental results using WRF execution on two different clusters show good accuracy - within 10% from across cluster predictions• Using only two parameters - CPU speed and number of

nodes.• WRF specific, we are one step closer to devising a

complete solution for our goal of higher-resolution weather predictions and simulations.

HPGC '08 - April 14 - LA Grid 14

Page 15: A modeling approach for estimating execution time of long-running Scientific Applications

Related Work• S. Shimizu, R. Rangaswami, and H. A. Duran-Limon.

"Platform-independent Modeling and Prediction of Application Resource Usage Characteristics”

• Basis for prediction model• It is limited to one node

• D. M. Swany and R. Wolski. “Multivariate Resource Performance Forecasting In the Network Weather Service.”

• High-accuracy prediction model• They emphasize latency and bandwidth

HPGC '08 - April 14 - LA Grid 15

Page 16: A modeling approach for estimating execution time of long-running Scientific Applications

Related Work

• R. Badia, F. Escale, E. Gabriel , J. Gimenez, R. Keller, J. Labarta, M. S. Müller, Perf. “Prediction in a Grid Environment.”

• Offline prediction• Need to link their library to the application to be profiled

HPGC '08 - April 14 - LA Grid 16

Page 17: A modeling approach for estimating execution time of long-running Scientific Applications

Future Research

• Extend our parallelism model to address heterogeneous resources.

• Include more resource parameters to the model

• Started joint research with Barcelona Supercomputing Center

• We acknowledge that Amon & Aprof have limitations• We will integrate our tools with their simulation application -

DIMEMAS

HPGC '08 - April 14 - LA Grid 17

Page 18: A modeling approach for estimating execution time of long-running Scientific Applications

Acknowledgements• National Science Foundation

• REU Grant # IIS-0552555

• PIRE Grant # OISE-0730065

• CREST Grant # HRD-0317692

• GCB Grant # OCI-0636031

• IBM Research• LA Grid• FIU SCIS

HPGC '08 - April 14 - LA Grid 18

Page 19: A modeling approach for estimating execution time of long-running Scientific Applications

Questions?