A modeling approach for estimating execution time of long-running Scientific Applications
description
Transcript of A modeling approach for estimating execution time of long-running Scientific Applications
A modeling approach for estimating execution time of long-running
Scientific ApplicationsSeyed Masoud Sadjadi1, Shu Shimizu2, Javier Figueroa1,3, Raju
Rangaswami1, Javier Delgado1, Hector Duran4, Xabriel J. Collazo-Mojica5
Presented by: Xabriel J. Collazo-Mojica5
1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico,
Mayagüez Campus, Puerto Rico
Miami, Florida – April 2008
Presentation Outline
• Motivation
• Research Approach
• Research Validation
• Related Work
• Concluding Remarks
• Future Research
HPGC '08 - April 14 - LA Grid 2
Motivation
• The impact of hurricanes is devastating• The Weather Research and Forecasting (WRF)
model• Most popular• It is computational and storage intensive
• We need higher resolution and more precise forecast• Many organizations are willing to share resources• But these resources are dynamic and unpredictable
HPGC '08 - April 14 - LA Grid 3
Motivation
• At the time of a hurricane, we need to act fast• What resources should we allocate?• We need to finish in a strict deadline (i.e. on time for
hurricane forecast)• In the order of seconds, we need to make a decision
• We need to model execution time of WRF based on target resources • In our case: clusters with different parameters
HPGC '08 - April 14 - LA Grid 4
Approach to Modeling Resource Usage
WRF
HPGC '08 - April 14 - LA Grid 5
Approach to Modeling Execution Parallelism
• Platform heterogeneity• We assume identical individual resource
characteristics of computation, communication and storage power.
• Execution scale• We add a parameter to model the number of nodes
utilized during execution.
1 2 3 N…
HPGC '08 - April 14 - LA Grid 6
Application Resource Usage Model
• Characterize Applications according to their resource usage characteristics (i.e. application "profiles”)
• Assumptions:• Execution time is based on contributors• Product of contributors determines total execution
time• Computation nodes are homogeneous (e.g. Beowulf
cluster)• Non-ad-hoc application characteristics
HPGC '08 - April 14 - LA Grid 7
Application Resource Usage Model - Contributors
• Model aims to allow as many contributors as necessary• This paper focus: 2 contributors• First contributor: Parallelism
• Ppara = degree of parallelism• α0= constant contribution• α1 = variable contribution
• Second contributor: CPU Performance• Pclock = clock speed of compute node• ß0 = constant contribution related to CPU performance• ß1 = variable contribution related to CPU performance
HPGC '08 - April 14 - LA Grid 8
Experimental Approach - Environment
• GCB cluster: Rocks ver. 4.0, 8 nodes, each containing 32-bit x86 Intel 3.0 GHz processors, 1GB of main memory and uses a gigabit network connection
• Mind cluster: Rocks ver. 4.0, 16 nodes, each containing dual Xeon 3.6GHz processors, 2GB of main memory and uses gigabit network connection
• CPU vs. #-of-NODES:100% to 10% CPU percentages with intervals of 10%
• We use CPULimit
HPGC '08 - April 14 - LA Grid 9
Experimental Approach - Monitoring and Prediction
• Two tools were used• Amon – A Monitoring Tool
• Daemon-like application that collects and reports exploratory variables
• Aprof – A Profiling Tool• Statistical Prediction Program• Listens to Amon reports from compute nodes• Stores collected data as matrix for each application
HPGC '08 - April 14 - LA Grid 10
Experimental Approach - Monitoring and Prediction
HPGC '08 - April 14 - LA Grid 11
Application Resource Usage Model - Validation
• Intuitive Assumption that execution time decreases linearly with the inverse of total computational power.
• Predictions within a cluster (i.e. GCB to GCB)• GCB - FE 5.34% ME 5.86%• Mind - FE 5.66% ME 3.80%
• Predictions across clusters• GCB to Mind - FE 9.97% ME 5.86%• Mind to GCB - FE 5.83% ME 4.13%
• This results validate our simple model.
HPGC '08 - April 14 - LA Grid 12
Application Resource Usage Model - Mind to GCB prediction
HPGC '08 - April 14 - LA Grid 13
Concluding Remarks
• We've proposed a new approach for modeling resource usage and execution time of a distributed application
• Experimental results using WRF execution on two different clusters show good accuracy - within 10% from across cluster predictions• Using only two parameters - CPU speed and number of
nodes.• WRF specific, we are one step closer to devising a
complete solution for our goal of higher-resolution weather predictions and simulations.
HPGC '08 - April 14 - LA Grid 14
Related Work• S. Shimizu, R. Rangaswami, and H. A. Duran-Limon.
"Platform-independent Modeling and Prediction of Application Resource Usage Characteristics”
• Basis for prediction model• It is limited to one node
• D. M. Swany and R. Wolski. “Multivariate Resource Performance Forecasting In the Network Weather Service.”
• High-accuracy prediction model• They emphasize latency and bandwidth
HPGC '08 - April 14 - LA Grid 15
Related Work
• R. Badia, F. Escale, E. Gabriel , J. Gimenez, R. Keller, J. Labarta, M. S. Müller, Perf. “Prediction in a Grid Environment.”
• Offline prediction• Need to link their library to the application to be profiled
HPGC '08 - April 14 - LA Grid 16
Future Research
• Extend our parallelism model to address heterogeneous resources.
• Include more resource parameters to the model
• Started joint research with Barcelona Supercomputing Center
• We acknowledge that Amon & Aprof have limitations• We will integrate our tools with their simulation application -
DIMEMAS
HPGC '08 - April 14 - LA Grid 17
Acknowledgements• National Science Foundation
• REU Grant # IIS-0552555
• PIRE Grant # OISE-0730065
• CREST Grant # HRD-0317692
• GCB Grant # OCI-0636031
• IBM Research• LA Grid• FIU SCIS
HPGC '08 - April 14 - LA Grid 18
Questions?