Ajou University, South Korea Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min...

Ajou University, South Korea

Chameleon: A Resource Scheduler in A Chameleon: A Resource Scheduler in A Data Grid EnvironmentData Grid Environment

Sang Min Park Jai-Hoon Kim

Ajou University

South Korea

Ajou University, South Korea2

ContentsContents

Introduction to Data Grid

Related Works

Scheduling Model

Scheduler Implementation

Testbed and Application

Results

Conclusions


Introduction to Data GridIntroduction to Data Grid

Data Grid Motivations

Petabyte scale data production

Distributed data storage to store parts of data

Distributed computing resources which process the data

Two Most Important Approaches for Data Grid

Secure, reliable, and efficient data transport protocol

(ex. GridFTP)

Replication (ex. Replica catalog)

Replication

Large size files are partially replicated among sites

Reduce data access time

Application Scheduling, Dynamic replication issues are emerging


Related WorksRelated Works

Data Grid

Replica catalog – mapping from logical file name to physical instance

GridFTP – Secure, reliable, and efficient file transfer protocol

Job Scheduling

Various scheduling algorithms for computational Grid

Application Level Scheduling (AppLes)

Large data collection has not been concerned

Job Scheduling in Data Grid

Roughly analytical and simulation studies are presented

Our works define more in-depth scheduling model


Scheduling ModelScheduling Model- Assumptions- Assumptions

Assumptions

Site has both data storage and computing facilities

Files are replicated at part of Grid sites

Each site has different amount of computational capability

Grid users request job execution through Job schedulers

computing facilities

data store

Site B


data store

Site D


data store

Site C

Internet


data store

Site A

Job (Data Processing)Requests

Scheduler


Scheduling ModelScheduling Model- System Factors- System Factors

Dynamic system factors- Factors change over time

Network bandwidth

Data transfer time is proportional to network bandwidth

NWS- tool for measuring and forecasting network bandwidth

Available computing nodes

Determines execution time of jobs

Decided according to job load on a site

System attributes

Machine architecture (clusters, MPPs, etc)

Processor speed, Available memory, I/O performance, etc.


Scheduling ModelScheduling Model- System Factors- System Factors

Application specific factors- Unique factors Data Grid applications have

Size of input data (replica)

If not in the computing site, data fetch is needed

Much time will be consumed to transfer large size data

Size of application code

Application code should be migrated to sites

which perform computation

Not critical to the overall performance (small size)

Size of produced output data

When the computing job takes place at the remote site,

result data should be returned back to the local

Strongly related to the size of input data


Scheduling ModelScheduling Model- application scenarios- application scenarios

The model consists of 5 distinct application scenarios

1. Local Data and Local Execution

2. Local Data and Remote Execution

3. Remote Data and Local Execution

4. Remote Data and Same Remote Execution

5. Remote Data and Different Remote Execution



Terms in the scenarios

iN

inputD

appD

outputD

)(iLANBW

)( jiWANBW

iExec

Parameter Meaning

Number of available computing nodes at the site

Size of input data (replica)

Size of application codes

Size of produced output data

Bandwidth of WAN connection between sites

Bandwidth of LAN connection between nodes

Expected execution time of jobs



1. Local Data and Local Execution

locallocalLAN

outputappinputlocal ExecBWDDDNTime

)(1

)(

Local Site

ComputingNode

Execution

ComputingNode

Execution

ComputingNode

Execution

resultdata

Master Node

inputdata

appcodes

Input data (replica) is located in local, and processing is performed

with local available processors

Data in move consists of

Input data (replica)

Application code

Output data

Cost consists of

1. Data transfer time between master and computing nodes via LAN

2. Job execution time using local processors



2. Local Data and Remote Execution

iremoteiremoteLAN

outputappinputiremote

iremotelocalWAN

outputappinput

ExecBWDDDN

BWDDD Time

_)_(

_

)_(2

)(

Remote Site i

ComputingNode

Execution

ComputingNode

Execution

ComputingNode

Execution

resultdata

Master Node

inputdata

appcodes

resultdata

Master Node

inputdata

appcodesWAN

Local Site

Locally copied replica is transferred to remote computation siteCost consists of1. Data (input+codes+output) mo

vement time via WAN between local and remote site

2. Data movement time via LAN in a remote site

3. Job execution time on a remote site



3. Remote Data and Local Execution

Remote replica is copied into local site, and processing is performed on localCost consists of1. Input data movement time via

WAN between local and remote site

2. Data movement time via LAN in a local site

3. Job execution time on a local processors

locallocalLAN

outputappinputlocal

iremotelocalWAN

input

ExecBWDDDN

BWDTime

)(

)_(3

)(

inputdata

Replica Store

Local Site

ComputingNode

Execution

ComputingNode

Execution

ComputingNode

Execution

resultdata

Master Node

inputdata

appcodes WAN

Remote Site i



4. Remote Data and Same Remote Execution

Remote site having replica performs computation Cost consists of1. Data (code+output) movemen

t time via WAN between local and remote site

2. Data movement time via LAN in a remote site

3. Job execution time on a remote site

iremoteiremoteLAN

outputappinputiremote

iremotelocalWAN

outputapp

ExecBWDDDN

BWDDTime

_)_(

_

)_(4

)(

resultdata

Master Node

appcodes

Local Site

Remote Site i

ComputingNode

Execution

ComputingNode

Execution

ComputingNode

Execution

resultdata

Master Node

inputdata

appcodes

WAN



5. Remote Data and Different Remote Execution

Remote site j performs computation with replica copied from remote site iCost consists of1. Input replica movement time

via WAN between remote site i and j

2. Data (codes + output) movement time via WAN between local and remote j

3. Data movement time via LAN in a remote site j

4. Job execution time in a remote site j

jremotejremoteLAN

outputappinputjremote

jremotelocalWAN

outputapp

jremoteiremoteWAN

input

ExecBWDDDNBW

DDBW

D Time

_)_(

_

)_()__(5

)(

Remote Site i

ReplicaStore

inputdata

Remote Site j

ComputingNode

Execution

ComputingNode

Execution

ComputingNode

Execution

resultdata

Master Node

inputdata

appcodes

WAN WAN

resultdata

Master Node

appcodes

Local Site


Scheduling ModelScheduling Model- scheduler- scheduler

Operations of the scheduler1. Predict the response time of each scenario

2. Compare the response time of scenarios

3. Choose the best scenario and sites holding data and to perform job execution

4. Requests data movement and job execution


Scheduler ImplementationScheduler Implementation

Develop scheduler prototype, called Chameleon, for evaluating the scheduling modelBuilt on top of services provided by Globus

GRAMMDSGridFTPReplica Catalog

NWS is used for measuring and forecasting network bandwidthScheduling algorithms are based on the scheduling models presented

Globus

GRAM MDS GridFTPReplicaService

NWS

Networkmonitoring

...Middlewares

Computational Resources, Storage, Networks, etc. Local schedulersGrid Fabric(Resources)

Scheduler

Data MoverInformation

MonitorLocation

FinderRunner

gatherinformations

Chameleon

take resource locationsjob submission data copy

HEP, Earth Observation, Biology

Chameleon(ResourceScheduler)

Applications


Testbed for experimentsTestbed for experiments

Site Location Number of proc. Local Scheduler

Ajou University S.Korea 8 PBS

Yonsei Univ. 1 S.Korea 12 PBS

Yonsei Univ. 2 S.Korea 12 PBS

KISTI S.Korea 36 LSF

KUT S.Korea 6 PBS

Chonbuk Univ. S.Korea 1 Fork

Pusan Univ. S.Korea 24 PBS

POSTECH S.Korea 8 PBS

AIST Japan 10 SGE


ApplicationsApplications

Gene sequence comparison applications (Bioinformatics)

Computationally intensive analysis on the large size protein database

Bio-scientists predict structure and functions of newly found protein by comparing it with well known protein database

The size of database reaches over 500 MB

There are various versions of protein database

Large databases are replicated in Data Grid

Two well-known applications, Blast and FASTA, are executed


ApplicationsApplications- parameters- parameters

Parameters PSI-BLAST FASTA

Size of Input replica

(Protein Database)502 MB 502 MB

Size of output data 10 MB 200 MB

Size of application codes 7 MB 1 MB


Experimental Results (1)Experimental Results (1)

Yonsei Univ.SP LAB(site A)

Yonsei Univ.BIO LAB(site B)

Ajou Univ.(Local)

KISTI(site C)

ChonbukUniv.

(site E)Pusan Univ.

(site G)

KUT(site D)

WAN

POSTECH(site F)

AIST(site H)

: Site with replicateddatabase

: Site without database0

1000

2000

X: executionY: replica fetch

Z: code+result move

local site A site B site C

Chameleon

prediction(site A)

X: 2277Y: 0Z: 0

X: 1351Y: 0Z: 153

X: 1110Y: 698Z: 112

X: 977Y: 743Z: 113

X: 1216Y: 0Z: 115

Tim

e

Replication scenarioReplication scenarioResults when Results when

executing PSI-BLASTexecuting PSI-BLAST



0

1000

2000

3000


Z: code+result move


Chameleon

prediction(site C)

X: 3140Y: 0Z: 0 X: 1637

Y: 0Z: 1163

X: 1584Y: 620Z: 689

X: 1473Y: 628Z: 402

X: 1401Y: 700Z: 314

Tim

e

Results when executing FASTA Results when executing FASTA in the above replication in the above replication

scenarioscenario

0

1000

2000


Z: code+result move


Chameleon

prediction(site A)

X: 2277Y: 0Z: 0

X: 1351Y: 0Z: 153

X: 1110Y: 698Z: 112

X: 977Y: 743Z: 113

X: 1216Y: 0Z: 115

Tim

e

Results on the previous slideResults on the previous slide



Yonsei Univ.SP LAB(site A)

Yonsei Univ.BIO LAB(site B)

Ajou Univ.(Local)

KISTI(site C)

ChonbukUniv.

(site E)Pusan Univ.

(site G)

KUT(site D)

WAN

POSTECH(site F)

AIST(site H)

: Site with replicateddatabase

: Site without database

0

1000

2000

3000


Z: code+result move

local site A siteG site C

Chameleon

prediction(site C)

X: 2277Y: 0Z: 0

X: 1351Y: 932Z: 41

X: 1813Y: 791Z: 45 X: 977

Y: 1088Z: 33

X: 1095Y: 840Z: 50

Tim

e

No replication takes placeNo replication takes place Results when executing PSI-Results when executing PSI-BLAST BLAST



Number of Replica

Sites with Replica

1 Local

2 Local, E

3 Local, E, D

4 Local, E, D, F

5 Local, E, D, F, G

6 Local, E, D, F, G, H

7 Local, E, D, F, G, H, B

8 Local, E, D, F, G, H, B, A

9 Local, E, D, F, G, H, B, A, C

1000

1200

1400

1600

1800

2000

2200

2400

1 2 3 4 5 6 7 8 9Number of Replica

Res

po

nse-

Tim

e (s

ec.) Prediction

Actual Execution

Increasing the number of replicaIncreasing the number of replica Decreasing response timeDecreasing response time


ConclusionsConclusions

Job scheduling models for Data Grid

The models consist of 5 distinct scenarios

Scheduler prototype, called Chameleon, is developed which is based on the presented scheduling models

Perform meaningful experiments with Chameleon on a constructed Grid testbed

We achieve better performance by considering data locations as well as computational capabilities


ReferencesReferencesANTZ: http://www.antz.or.krApGrid: http://www.apgrid.orgB. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. “Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing,” IEEE Mass Storage Conference, 2001.Mark Baker, Rajkumar Buyya and Domenico Laforenza. “The Grid: International Efforts in Global Computing,” International Conference on Advances in Infrastructure for E-Business, Science, and Education on the Internet, SSGRR2000, L'Aquila, Italy, July 2000.F. Berman and R. Wolski. “The AppLes project: A status report,” Proceedings of the 8th NEC Research Symposium, Berlin, Germany, May 1997.Rajkumar Buyya, Kim Branson, Jon Giddy and David Abramson. “The Virtual Laboratory: A Toolset for Utilising the World-Wide Grid to Design Drugs,” 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May 2002.CERN DataGrid Project: http://www.cern.ch/grid/Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke. “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Applications, 23:187-200, 2001.Dirk Düllmann, Wolfgang Hoschek, Javier Jean-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Models for Replica Synchronisation and Consistency in a Data Grid,” 10th IEEE Symposium on High Performance and Distributed Computing (HPDC-10), San Francisco, California, August 2001.I. Foster and C. Kesselman. “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, 1999.I. Foster, C. Kesselman and S. Tuecke. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International J. Supercomputer Applications, 15(3), 2001.Cynthia Gibas. “Developing Bioinformatics Computer Skills,” O’REILLY, April 2001.The Globus Project: http://www.globus.org


ReferencesReferences

Leanne Guy, Erwin Laure, Peter Kunszt, Heinz Stockinger, and Kurt Stockinger. “Replica management in data grids,” Technical report, Global Grid Forum Informational Document, GGF5, Edinburgh, Scotland, July 2002.Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. “Data Management in an International Data Grid Project,”1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000.Kavitha Ranganathan and Ian Foster. “Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications,” 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, July 2002.Kavitha Ranganathan and Ian Foster. “Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid,” International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001.Kavitha Ranganathan and Ian Foster. “Identifying Dynamic Replication Strategies for a High Performance Data Grid,” International Workshop on Grid Computing, Denver, November 2001.Heinz Stockinger, Kurt Stockinger, Erich Schikuta and Ian Willers. “Towards a Cost Model for Distributed and Replicated Data Stores,” 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001 , Mantova, Italy, February 2001.S. Vazhkudai, S. Tuecke and I. Foster. “Replica Selection in the Globus Data Grid,” Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), Brisbane, Australia, May 2001.Rich Wolski, Neil Spring, and Jim Hayes. “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Journal of Future Generation Computing Systems, Volume 15, Numbers 5-6, pp. 757-768, October 1999.

Ajou University, South Korea Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min...

Documents

Transcript of Ajou University, South Korea Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min...