Grid Simulation Tools for Job Scheduling and Data File Replication

Khan—Scalable Computing and Communications

PrScalable Computing and Communications: Theory and Practice, First Edition. Samee U. Khan, Lizhe Wang, and Albert Y. Zomaya.© 2012 John Wiley & Sons, Inc. Published 2012 by John Wiley & Sons, Inc.

35Grid Simulation Tools for Job Scheduling and Data

File ReplicationJavid Taheri, Albert Y. Zomaya, and Samee U. Khan

35.1 IntroductIon

In recent years, grid computing has emerged as one of the main computational platforms to perform many extremely difficult and/or time-consuming tasks. The amount of data generated every second is in fact sometimes much more than what can be processed even by dedicated grids. This makes the design of optimum solu-tions to efficiently balance and deploy existing systems one of the main objectives/bottleneck in grids. These computational platforms are, however, too complicated to estimate/gauge efficiently of any algorithm without actually testing it. Thus, because accessing real systems is almost impossible for many reasons, including cost and trust, simulation becomes one of the inevitable stages before actual deployment of any algorithm in this field. In this chapter, first several simulation tools are listed, and then the problem statement behind all these grids is mathematically modeled and presented.

Simulation is one major step in modeling many real-world processes before their actual deployments. Proper simulations can provide an extensive study of a system and reveal its many unknown aspects before actual deployment, including, but not limited to, feasibility, behavioral, and performance analysis. Industrial processes, parallel and distributed systems, and environmental resources are among many that receive direct benefits from such simulations. Although simulations mean to

1

777

c35.indd 777 10/15/2012 1:58:16 PM

USER

Rectangle


Pr

778 GRID SIMULATION TOOLS

represent the operations of one system through using another, they are always designed for very specific purposes. For example, simulation of the heating process of a gas is quite different from its volume expansion, although they are related.

Research in grids can be divided into two major categories: analytical and experi-mental. In analytical studies, the overall goal is to develop purely analytical/mathematical models for different aspects of computing in grids. Analytical studies are particularly very useful to discover the tractability of many grid-related prob-lems, for example, NP-completeness of routing, partitioning, and scheduling. Therefore, although through such simulators fundamental theorems might be found, their models and assumptions are usually too simple to encompass all parameters of real systems and consequently to convince engineers to deploy them in reality. Thus, these analytical results are always followed by proper experiments to validate. Experimental research in grids aims to investigate, design, or implement real-world applications. Although such experiments are eminently believable and demonstrate feasibility of the proposed algorithms to be implemented in practice, they are usually very time and/or labor intensive: An experimental study may take hours, days, or even months of preparation. Furthermore, for such studies, the entire appli-cation should be designed and built to be fully functional; for example, it must include all design/algorithm alternatives as well as all hooks of deployments. Thus, it is usually considered as an acceptable engineering practice provided it is equipped with many full-fledged solutions so that the best algorithm could be found.

To this end, two different test beds are used to facilitate studies in this category of research: owned test beds and real grids. Owned test beds are usually small, well behaved, controlled, and stable; thus, they cannot be counted as true substitutes for real grids in most cases. Using real test beds, on the other hand, is not very practical, either. For example, having privileged access to real grids is still very challenging for many researchers. Also, real grids are not made as scientists’ playpens where different experiments may disrupt each other and result in infidelity of the whole grid. Such experiments can also easily be affected by platform failures and configu-ration changes. As a result, because experiments in real grids are assumed uncontrol-lable and unrepeatable, they cannot disclose true analysis of novel algorithms in many cases. Furthermore, regardless of what type of test bed (owned or real) is used, experiments are always limited to the test bed itself. Thus, it is always difficult (1) to detect what part of the experiment were due to idiosyncrasies of the test bed, (2) to convince extrapolation of results, and (3) to justify viability of a solution in many untested situations (e.g., different network capacities). These issues result in the lack of interest in using test beds for testing novel algorithms in many studies.

Simulation of real grids, on the other hand, can solve many, if not all, of the aforementioned difficulties. In such simulations, (1) there is no need to build a real system; (2) it is easy to conduct controlled and repeatable experiments; (3) there is no limit to experimental scenarios; and (4) it is possible for others to repeat the experiment and reproduce the results. However, the key issue of validation as a correspondence between the simulated environment and the real world still remains challenging and questionable if not dealt with properly. Although simulation of distributed computing platforms (as the ancestor of grids) has decades of history, most of these works cannot be generalized into this fairly new discipline. Simplistic platform models, simplistic application models, and too-straightforward simulation are major reasons for such inapplicability.

c35.indd 778 10/15/2012 1:58:16 PM


Pr

SIMULATION PLATFORMS 779

To solve these issues, grid simulation has gone through many phases to migrate from such simplistic models with hardly justifiable environments to encompass real systems. To achieve this, grid simulators are usually built using synthetic data of the following three components: (1) network topology, (2) computer resources, and (3) applications. Network topology is modeled as graphs with individual bandwidth and latency characteristics for each link. Computer resources are to model all aspects of computing or storing elements of a grid, for example, heterogeneous computer resources or storage capacities. Applications are to model/represent attributes of many well-known grid applications, for example, bioinformatics or astronomy. The level of simulation in any of the aforementioned sections can significantly affect the performance and reliability of a simulator. For example, to model networks, two extreme approaches are (1) to model connections as unlimited links or (2) to scru-tinize every single packet in the network. The first approach is very simple and fast; however, it is questionable for large grids with petabytes of data to transfer. The second approach, on the other hand, is very precise; however, it is unnecessarily complicated. Therefore, a trade-off is usually chosen in simulating many aspects of grids. Here, for example, extensive studies showed that assuming network links with only two attributes of bandwidth and latency is sufficiently precise in many cases [1–3].

35.2 SImulatIon PlatformS

Based on different levels of abstraction in the aforementioned components, several simulators are already designed and implemented to test/verify algorithms. A few of these simulators are designed to simulate specific characteristics, while others are designed to model general-purpose platforms.

35.2.1 SimJava

SimJava [4, 5] is a Java-based toolkit designed by the University of Edinburgh (United Kingdom) to simulate complex event-based systems. SimJava uses a dis-crete event simulation kernel and provides visual animation of its simulated objects during its simulation process. SimJava uses Java technology to ensure easy incorpo-ration of its live diagrams into web pages as well as easier portability across different platforms. SimJava provides a set of foundation classes to enable easy creation and animation of discrete event simulation models and comprises the following three packages: (1) eduni.simjava, (2) eduni.simanim, and (3) eduni.simdiag. eduni.simjava builds text-based java simulation and produces an output trace file to reflect the results of a simulation; eduni.simanim is equipped with easy applet templates to provide easy visualization of eduin.simjava output files; and eduni.simdiag is a col-lection of JavaBeans-based classes to provide high-level presentation of simulation results.

SimJava is designed to simulate static networks with active entities that com-municate with each other through sending/receiving passive event objects. As a result, SimJava is able to provide efficient lightweight packages to simulate and model hardware and distributed software systems, including communication proto-cols, parallel software modeling, and computer architectures [4, 5]. In fact, although

c35.indd 779 10/15/2012 1:58:16 PM


Pr


SimJava is not specifically designed to simulate grid environments, it has been used as a medium platform by many exclusively designed simulators in this area. Figure 35.1 shows a sample snapshot of SimJava during its simulation.

35.2.2 Bricks

Bricks [5, 6] is a performance evaluation system developed in Java by Tokyo Institute of Technology (Japan) to analyze and compare the performance of various schedul-ing schemes in high-performance global computing environments. Bricks provides (1) simulation of various behaviors of resource scheduling algorithms, (2) program-ming modules for scheduling, (3) network topology of clients and servers in global computing systems, and (4) processing schemes for networks and servers. Bricks also gathers information on global computing resources to analyze resource sched-uling algorithms as well as to systematically monitor and predict resources in global computing environments. Using its foreign interfaces, this component-based archi-tecture also enables exploiting of different system components to simulate different system algorithms.

Bricks operates as a discrete event simulator of a queuing system in virtual time and consists of following two components to coordinate the simulation behavior of global computing systems: (1) a global computing environment and (2) a scheduling unit. The global computing environment module in Bricks consists of clients, net-works, and servers. Clients represent users who submit specifically described com-puting jobs; networks represent interconnection of clients and servers; and servers represent computing resources that are executing submitted jobs. The scheduling unit of Bricks consists of NetworkMonitor, ServerMonitor, ResourceDB, Predictor, and Scheduler. NetworkMonitor measures network characteristics (e.g., bandwidth and latency) of every link in the network and stored in ResourceDB for later use; ServerMonitor measures the performance (e.g., load and availability) of different servers in a system and stores its results in ResourceDB; ResourceDB works as a scheduling database and stores all types of information about different parts of a system; Predictor uses stored information in ResourceDB and predicts resource availability and network conditions; and Scheduler uses Predictor’s suggestions and

FIGURE 35.1. A sample SimJava’s snapshot.

c35.indd 780 10/15/2012 1:58:16 PM


Pr


schedules new submitted jobs to computing resources of a system. Figure 35.2 shows the overall architecture of this simulator [6].

35.2.3 microGrid

MicroGrid [5, 7] is developed by the University of California at San Diego (United States) to provide platforms for developing/implementing virtual grid infrastruc-tures. MicroGrid platforms can be used to analyze grid resource management issues of Globus applications. Virtual grid infrastructures allow analysis of dynamic resource management techniques with a minimum amount of effort to increase transparency of many repeatable/controllable experiments. Using different control algorithms, MicroGrid provides full-scale evaluation of middleware, applications, and network services for computational grids. MicroGrid aims to achieve the fol-lowing goals: (1) to support scalable simulations of grid applications using a wide variety of scalable clustered resources, (2) to support realistic grid software environments so that simulated applications can be executed with identical applica-tion programming interfaces (APIs) and achieve accurate results, (3) to support configuring grid resource performance attributes, and (4) to be based on an open software environment so that other researchers can easily extend and improve its capabilities.

MicroGrid comprises two separate parts: (1) a local resource simulator to simu-late and (2) a network simulator (NS) to simulate interactions among local resources. Figure 35.3 shows the overall diagram of this simulator [7].

35.2.4 SimGrid

SimGrid [5, 8, 9] is also developed by University of California at San Diego (United States) to evaluate scheduling algorithms for distributed applications in heteroge-neous computational grids. SimGrid aims to provide fundamental/abstract functions so that domain-specific simulations can be built upon it. SimGrid specifically tried

2

FIGURE 35.2. Bricks’ architecture. 20

Predictor

(3)

(1) (6)

(7) (8)

(9)(10)

(5)

(2)

(4)

(0a)

(0a)

(0b)

(0b)

ServerMonitor

ResourceDBScheduler

NetworkPredictorServerPredictor

NetworkMonitor

Scheduling Unit

Global Computing Environment

Network

NetworkClient Server

c35.indd 781 10/15/2012 1:58:18 PM

USER

Rectangle

USER

Rectangle


Pr


to (1) provide the right model and level of abstraction for its intended purposes, (2) rapidly prototype and evaluate scheduling algorithms, (3) enable more realistic simulations, and (4) generate more accurate simulation results.

SimGrid includes many features such as (1) resource models for CPUs and network links; (2) arbitrary, dynamic, trace-based resource performance metrics; (3) resource time sharing and time slicing; (4) task dependencies; (5) performance pre-diction of simulation errors; (6) flexible simulation termination conditions; and (7) simple APIs.

Because resources (such as CPUs and network links) are assumed independent in SimGrid, no interconnection topology exists in this simulator. As a result, users can deploy this flexible computing environment to simulate their application-specific environments with arbitrary requirements. Here, because SimGrid cannot differen-tiate between computations and data transfers (both are seen as tasks), users must pay extra attention to ensure that computations and file transfers are properly scheduled on processors and network links, respectively. PSTSim [5] and DAGSim [5, 10] are among many applications that benefit from the flexible characteristic of SimGrid. PSTSim is to simulate scheduling of parameter sweep applications, and DAGSim is to evaluate scheduling algorithms for DAG-structured applications. Figure 35.4 shows the overall component overview of this simulator [9].

FIGURE 35.4. SimGrid’s component overview.

FIGURE 35.3. MicroGrid’s diagram.

c35.indd 782 10/15/2012 1:58:19 PM


Pr


35.2.5 GridSim

GridSim [5, 11] is developed by Monash University (Australia) to simulate applica-tion schedulers for distributed computing systems such as clusters and grids. GridSim is Java based and allows simulation of different classes of heterogeneous resources, users, applications, resource brokers (RBs), and schedulers in a distributed comput-ing environment.

GridSim’s architecture has five layers (Fig. 35.5). The first layer is to provide a portable/scalable Java interface and runtime environment. This layer is implemented using Java Virtual Machine (JVM) and can be executed in both single and multi-processor (including clusters) systems. The second layer is to provide basic discrete event infrastructures. SimJava [4] is used to build this layer on top of interfaces provided by the first layer. The third layer is to model and simulate core grid entities such as resources and information services. This layer is based on discrete event services defined in the second layer. The fourth layer is to simulate resource aggre-gators called grid RBs or schedulers. The fifth/final layer is to evaluate scheduling and resource management policies, heuristics, and algorithms. This layer uses its lower two layers and focuses on application and resource modeling with different scenarios.

GridSim entities are users, brokers, resources, grid information services, and input/outputs. Users represent grid customers with individual attributes (jobs types, scheduling optimization policies, activity rates, time zones, job deadlines, and budget affordability); brokers schedule jobs to resources based on users’ requirements; resources represent individual computing resources with individual characteristics

FIGURE 35.5. Five layers of GridSim. 16

Basic Discrete Event Simulation Infrastructure

Virtual Machine (Java, cJVM, RMI)

PCs ClustersWorkstations

. . .

SMPs Distributed Resources

GridSim Toolkit

Application Modeling

InformationServices

Resource Allocation

Grid Resource Brokers or Schedulers

Statistics

Resource Modeling and Simulation (with Time and Space shared schedulers)

Job Management

ClustersSingle CPU ReservationSMPs Load Pattern

Application Configuration

Resource Configuration

User Requirements

Grid Scenario

Network

SimJava Distributed SimJava

Resource Entities

Output

Application, User, Grid Scenario’s Input and Results

c35.indd 783 10/15/2012 1:58:19 PM

USER

Rectangle


Pr


(number of processors, cost of processing, processors’ speed, internal scheduling policy, local load factors, and time zone); grid information services provide resource registration services and monitor the list of available resources in a grid; and input/outputs define input/output information flow links among different entities of a grid. The Nimrod-G [12] RB and Libra [13] are among many products that use GridSim as their intermedium simulation framework. Figure 35.6 show a sample flow diagram of a GridSim-based simulation [11].

35.2.6 GangSim

GangSim [14] is developed by The University of Chicago (United States) to support studies of scheduling strategies in grid environments with a particular focus on investigating interactions among local and community resource allocation policies. GangSim is based on the Ganglia distributed monitoring framework in which mixing of simulated and real grids is allowed. GangSim models comprise the following real grid elements: a job submission infrastructure, a monitoring infrastructure, and a usage policy infrastructure. As shown in Figure 35.7, the principal components of GridSim are external schedulers (ESs), local schedulers (LSs), data schedulers (DSs), monitoring distribution points (MDPs), site policy enforcement points (S-PEPs), and virtual organization policy enforcement points (V-PEPs). In GangSim, sites aggregate computing nodes and VOs aggregate users, who may be further organized into groups. Here, each site is characterized by its capacity, number of CPUs, disk space, and its connecting networks. A VO is composed of a set of user 3

FIGURE 35.6. A sample GridSim flow diagram. 21

JobsAppli-cation

Scheduler

User #i Broker #i Output

Input

Output

Input

Resource #j

Job In Queue

Job Out Queue ProcessQueue

Output

Input

ResourceList

InformationService

Internet

ReportW

riter #iSta

tistic

s

Recor

der #i

ShutdownSignal

Manager #i

c35.indd 784 10/15/2012 1:58:20 PM

USER

Rectangle

USER

Rectangle

USER

Rectangle


Pr


groups. Users submit jobs and/or workloads. ESs, LSs, and DSs are decision-making points for different purposes. An ES queues submitted jobs and selects the best site candidate to execute each job. Upon this assignment, corresponding LSs receive jobs and schedule them to actual CPUs inside sites, while DSs provide data files needed to execute these jobs. GridSim assumes that data files are distributed once among sites prior to job scheduling and are not moved afterward. Pegasus and Euryale are examples of ES policies; Condor, PBS, and LSF are examples of LS policies [14]. In GangSim, MDPs represent monitoring infrastructure equipped with various metrics to evaluate performance of different components of a grid environ-ment. Policy enforcement points (PEPs) enforce policies to steer resource allocation units so that users’ required policies are always met. S-PEPs and V-PEPs perform this task in sites and VO levels, respectively.

35.2.7 monarc

MONARC [15] is developed in Java through the collaboration of Politechnica University of Bucharest (Romania), CERN and Caltech (United States). MONARC is a multithreaded process oriented simulation framework to model large-scale distributed systems. It is designed to provide realistic simulation of a wide-range of distributed system technologies with respect to their specific components and char-acteristics. MONARC particularly aims to (1) extend and optimize grid modules to provide better simulation of processing nodes, (2) design and run simulation experi-ments for data processing activities, job scheduling, and minimum spanning tree computation in overlay networks, and (3) make multithreading performance tests on multiprocessor platforms (Sun Enterprise 10,000, multicore PCs).

The two fundamental components of MONARC are computing and monitoring. The computing component aims to provide realistic simulations of all grid modules and their interactions through abstraction, while monitoring aims to provide mea-surement metrics for better analysis of a system. Computing components are further

4

FIGURE 35.7. GangSim’s components.

User User

ES

DS DSLS

Computers Storage

S sites

Work Manager

W Work Managers withExternal Managers

V-PEP

S-PEP S-PEPMDP MDP

MDP

MDPVO Policy

Site Policy

User

ES

LS

Computers Storage

Work Manager

V-PEPMDP VO Policy

UserN users User

c35.indd 785 10/15/2012 1:58:20 PM

USER

Rectangle


Pr


categorized into three layers: simulation engine, basic components, and specific components. The simulation engine layer is the core of this event-driven simulator; basic components aim to model jobs, CPUs, WANs, data files, LANs, as well as other fundamental modules of a system. Specific components use basic components to simulate the behavior of different types of jobs, for example, job schedulers with specific scheduling algorithms or database servers that support data replication. Using this structure, a wide range of models, from centralized to distributed systems, with arbitrary levels of complexity can be developed. Multiple regional centers with different hardware configurations and possibly different sets of replicated data are examples of models that can be created/simulated by MONARC. Figure 35.8 shows MONARC’s components and their comprising layers [15].

35.2.8 optorSim

OptorSim [16] is a time-based simulation package also written in Java to investigate the performance of different job scheduling and data replication schemes. OptorSim is designed based on the structure of the European DataGrid and has tried to simu-late all its necessary components with an emphasis on its replication infrastructure. OptorSim is composed of computing elements (CEs), storage elements (SEs), an RB, and a replica manager (RM).

CEs accept file-dependent jobs; SEs host data files required to execute jobs; RBs control scheduling of jobs in grid sites (CEs/SEs); and RMs inside SEs deploy their replica optimizers (ROs) to automatically create, delete, or replicate data files for each SE. OptorSim also incorporates a simulated peer-to-peer messaging system to implement auction-based replica algorithms. OptorSim can also be used to test new dynamic replication strategies to optimize data access efficiency in data grids. To launch OptorSim, it is provided with a grid configuration and a tailor-made RO algorithm as input, and in turn, it simulates execution of the provided file-dependent jobs in a grid environment. OptorSim also provides facilities to visualize simulation results for better performance analysis of replication algorithms.

OptorSim provided tools to define arbitrary data grid topologies, job creation patterns, jobs’ file access patterns, jobs’ processing time scheduling algorithms for

FIGURE 35.8. MONARC’s components and layers. 17

c35.indd 786 10/15/2012 1:58:20 PM

USER

Rectangle


Pr


RBs, replication algorithms for ROs, traffic patterns in network, and many more. Figure 35.9 shows the overall architecture of this simulator [16].

35.2.9 EcoGrid

EcoGrid [17] is another Java-based simulator to evaluate the performance of sched-uling algorithms in grids. EcoGrid is dynamically configurable and supports resource modeling, advance reservation of resources, and integration of new scheduling poli-cies. Although EcoGrid is primarily designed to evaluate economy-based scheduling algorithms, it can also be used for non-economy-based scheduling algorithms by setting only one of its parameters. EcoGrid uses the following components to model its grid environment (Fig. 35.10): configuration manager (CM), random number generator (RNG), load generator (LG), resource calendar (rc), computer node

FIGURE 35.9. OptorSim’s architecture.

FIGURE 35.10. EcoGrid’s components.

SA

LGRC

CN

GP

GS

G CC

JRE 1.6.0

DB & FSPLATFORM

MD CM

RNGGDP

GridUtilCoreLibrary

c35.indd 787 10/15/2012 1:58:20 PM


Pr


(CN), computer cluster (CC), media directory (MD), grid process (GP), grid (G), grid scheduler (GS), statistical analyzer (SA), and grid data provider (GDP). CM is to dynamically configure the grid simulator; RNG provides random numbers based on different statistical distributions; RC determines load, price, booking status, holi-days, time zone, and many other attributes of CNs; LG produces various work processing loads for simulation; SA provides computational components to analyze generated statistics of different component of a system; GDP facilitates data access from various sources and separates the data access functionality from the core grid functionality of a system; CCs are collections of CNs to present processing units of a grid; MD (1) stores details of service providing clusters to be later accessed by resource consumers and (2) provides a resource matching algorithm to determine the most suitable cluster and the most suitable node; GP represents a task with specific characteristics (e.g., execution time, input data file size, starting date, and deadline date) to be executed on a grid; GS represents a set of algorithms to sched-ule processes on clusters and nodes; the grid (G) represents a set of clusters to perform jobs; CNs consist of hard disks, memories, and processors to represent actual CEs of a grid.

35.2.10 Gridnet

GridNet [18] is a modular NS-based simulator, written in C++, to model different data grid configurations and resource specifications. GridNet modules are composed of objects (nodes, links, and messages) that are mapped into the NS’s application level object classes. Different network configurations, different types of nodes, dif-ferent node resources, replication strategies, and cost functions can be built using these NS-based objects. Here NS’s packets are used to simulate all actions inside a grid, for example, data exchange among nodes, user requests, and start/end of data transmission. To make a replication decision in GridNet performed by NS’s data transfer control unit, nodes generate new NS traffic to forward requests to other nodes or send requested data to a client. In fact, GridNet only added grid-specific elements such as grid nodes and replication strategies into the NS and uses NS’s event-driven capability to simulate a data grid environment.

GridNet adopts CERN’s data grid architecture (the European Organization for Nuclear Research) and Grid Physics Network (GriPhyN) [19] and assumes that a grid consists of several sites with different computational and data-storage resources. Here, each node can specify its relative computing capacity, specify its storage capac-ity, organize its local data files, and maintain a list of its peer replica neighbors. GridNet uses the following three node types to define its components: servers, caches, and clients. Server nodes represent main storage sites where grid data are stored (each node may host the whole or just some parts of a data file); cache nodes represent intermediate storage sites (e.g., a regional storage site) to replicate parts of data stored in main storage sites/servers; and client nodes represent sites where data access is requested. Figure 35.11 shows the simulation architecture of this simulator.

35.2.11 opportunistic Grid Simulation tools

Opportunistic Grid Simulation Tool (OGST) [20] was developed in Java as an exten-sion to the GridSim toolkit. OGST’s main objectives are (1) to assist developers of

c35.indd 788 10/15/2012 1:58:21 PM


Pr


opportunistic grid middlewares on validating their new concepts and implementa-tions under different execution conditions and scenarios, as well as (2) to simulate large-scale applications and resource scenarios involving several users in a repetitive and controlled way. OGST, which was developed based on the InteGrade project, also allows automatic creation of simulated grid environments composed of a large number of highly heterogeneous nodes. OGST explicitly implements two applica-tion execution models: regular and bag-of-tasks. OGST provides a library of sched-uling algorithms composed of four scheduling heuristics: InteGrade, OLB, MCT, and Min-Min; and also provide the support for implementing other algorithms. OGST uses the following components to simulate its environments: (1) feature generators (FG) to define simulated grid environments (nodes and network links) and applications with their arrival rate; (2) application submission and control tool (ASCT) to handle grid users for their application submissions and receive notifica-tion upon their jobs’ conclusions; (3) global resource manager (GRM) to receive application submissions; (4) scheduling strategy (SS) to schedule application tasks for execution on grid nodes; (5) trader manager (TM) to provide data availability of grid resources including node failure and recovery; (6) local resource manager (LRM) to execute scheduled application tasks scheduled to a node, maintain a list of waiting tasks for execution, and handle local resource loads; (7) simulation data record manager (SDRM) to store and collect simulation data such as conclusion time stamps and to generate performance graphs such as the average application completion time; and (8) application replication manager (ARM) to replicate resources in case of accidental node failures. Figure 35.12 shows how the main components of this simulator are connected to each other [20].

35.2.12 Schmng

Schelling Manager (SchMng) is a Visual Net C++-based software package devel-oped at The University of Sydney (Australia) to gauge the combined performance of different job scheduling and data replication techniques. Because different approaches made different assumptions to capture the complexity of solving grid-related job scheduling and data replication problems, SchMng was designed to encompass as many features as possible from all approaches [1, 21–31] previously designed to solve grid-related optimization problems. SchMng’s framework (shown

5

FIGURE 35.11. GridNet’s architecture.

NS LinkNS Node NS Node

Replica Manager Replica Manager

ReplicaOptimizer

DataMonitor

ReplicaOptimizer

DataMonitor

StorageElement

StorageElement

Replica RoutingTable

Replica RoutingTable

GridNet Node GridNet Node

c35.indd 789 10/15/2012 1:58:21 PM

USER

Rectangle


Pr


in Fig. 35.13) consists of heterogeneous (1) computational nodes, (2) storage nodes (SNs), (3) interconnecting networks, (4) schedulers, (5) users, (6) jobs, and (7) data files.

In SchMng, computer centers with heterogeneous CEs are modeled as a collec-tion of computational nodes (CNs); each CN (1) consists of several homogeneous CEs with identical characteristics and (2) is equipped with a local storage capability. Figure 35.14 shows a sample computer center consisting of four CNs with such storage capability. CNs are characterized by (1) their processing speed and (2) their number of processors. The processing speed for each CN is a relative number to reflect the processing speed of a CN as compared to other CNs in the system. The number of processors for each CN determines its capability to execute jobs with certain degrees of parallelism in a nonpreemptive fashion; that is, jobs cannot inter-rupt execution of each other during their runtimes. SNs are SEs in the system that host data files required by jobs. Two types of SNs exist in SchMng: isolated and attached. Isolated SNs are individual entities that only host data files and deliver them to requesting CNs; attached SNs, on the other hand, are local storage capaci-ties of CNs to host their local data files as well as to provide them to other CNs if requested. Although from the optimization point of view there is no difference between the two and they are treated equally in a grid system, isolated SNs usually have more capacity than the attached ones, whereas attached SNs can upload data files to their associated CNs almost instantly.

CNs and SNs are connected through an interconnection network that is com-posed of individual links. Each link has its own characteristics and is modeled using two parameters: delay and bandwidth. Delay is set based on the average waiting time for a data file to start flowing from one side of the link to the other; bandwidth is set based on the average bandwidth between two sides of the link. Although the above formulation differs from reality in which delay and bandwidth among nodes significantly vary based on a system’s traffic, extensive simulation showed that this difference is negligible when the number of jobs and data files increases in a system [1–3]. Schedulers are independent entities in the system that accept jobs and data

67

FIGURE 35.12. OGST’s components.

Resource Provider

LRMFGSDRM

BD

Utilities

Grid UserGRM

SSe.g. MCT

ARM

Cluster Manager

TM

ASCT

c35.indd 790 10/15/2012 1:58:21 PM

USER

Rectangle


Pr

FIGURE 35.13. SchMng’s framework.

FIGURE 35.14. A sample computer center in SchMng. 18

Prcs:128Spd:8

Prcs:128Spd:1

Prcs:8Spd:8

Prcs:8Spd:1

c35.indd 791 10/15/2012 1:58:22 PM

USER

Rectangle


Pr


files from users and schedule/assign/replicate them to relevant CNs and SNs. Schedulers, which can be connected to all CNs/SNs or only to a subset of them, are in fact the decision makers of the whole system that decide where each job and/or data file should be executed or stored/replicated, respectively. Schedulers can be either subentities of CNs/SNs or individual job/data file brokers that accept jobs and data files from users. In SchMng, the more general case in which schedulers are treated as individual job/data file brokers is assumed.

Users generate jobs with specific characteristics. Each user is only connected to one scheduler to submit his or her jobs. Here, although the majority of users only use preexisting data files in a system, they can also generate their own data files should they want to. Jobs are generated by users and are submitted to schedulers to be executed by CNs. Each job consists of several dependent tasks described by a DAG with specific characteristics, that is, (1) execution time and (2) number of processors. The execution time determines the number of seconds a particular task needs to be executed/finalized in the slowest CN in the system; the actual execution time of a task can be significantly reduced if it is assigned to a faster CN instead; the number of processors determines a task’s degree of parallelism. Using this factor, schedulers eliminate CNs whose processors are not enough to execute spe-cific jobs/tasks. Jobs are generated with different shapes to reflect different classes of operations as outlined by Task Graphs for Free (TGFF) [32] and have the fol-lowing characteristics: (1) width, (2) height, (3) number of processors, (4) time to execute, (5) shape, and (6) a list of required data files. Width is the maximum number of tasks that can run concurrently inside a job; height is the number of levels/stages a job has; the number of processors is the maximum number of processors its con-taining tasks need to be run; the time to execute determines the minimum time a job can be run on the slowest CN in a system; and the list of required data files determines a list of data files a CN must download before executing a job. Jobs’ shapes are (1) serious–parallel, (2) homogeneous–parallel, (3) heterogeneous–parallel, and (4) single task. Figure 35.15 and Table 35.1 show sample jobs and their characteristics. Data files are assumed to be owned by SNs and are allowed to have up to a predefined number of replicas in a system. Schedulers can only delete or move replicas; that is, the original data files are always kept untouched.

35.3 ProBlEm StatEmEnt: data-awarE JoB SchEdulInG (daJS)

This section described the underlying problem all GSs are trying to solve. Regardless of the type of applications submitted to grids, GSs are always trying to (1) minimize the execution of the submitted jobs, (2) minimize the transfer time of all data files to their requested jobs, or (3) both. Therefore, in this section, the DAJS problem is formally defined to cover both cases [2, 3].

DAJS is a bi-objective optimization problem and is defined as assigning jobs to CNs and replicating data files on SNs to concurrently minimize (1) the overall execution time of a batch of jobs as well as (2) the transfer time of all data files to their dependent jobs. As can be seen, DAJS can be easily converted to solely either optimize the execution time or the transfer time of a system. In its general formula-tion, because these two objectives are usually interdependent, and in many cases even conflicting, minimizing one objective usually results in compromising the other.

8

9

c35.indd 792 10/15/2012 1:58:22 PM

USER

Rectangle

USER

Rectangle


Pr

FIGURE 35.15. Job’s shape: (a) serious–parallel, (b) homogeneous–parallel, (c) heterogeneous–parallel, and (d) single task.

(b)

(c)

(a)

(d)

c35.indd 793 10/15/2012 1:58:23 PM


Pr


For example, achieving lower execution time requires scheduling jobs to powerful CNs; whereas, achieving lower transfer times requires using powerful links with higher bandwidths in a system. Table 35.2 summarizes symbols we use to mathemati-cally formulate the DAJS problem.

To formulate this problem, assume jobs are partitioned into several job sets, JobSets JSet JSet JSet J= { , , , }1 2 … N to be executed by CNs, and data files are parti-tioned into several data file sets, DataSets DSet DSet DSet D= { , , , }1 2 … N , to be repli-cated on SNs; a partition of a set is defined as the decomposition of a set into disjoint subsets whose union is the original set. For example, if NJ = 9 and NCN = 3; then, JobSets = {{1, 5, 7}, {2, 4, 8, 9}, {3, 6}} means jobs {J1, J5, J7}, {J2, J4, J8, J9}, and {J3, J6} are assigned/scheduled to CN1, CN2, and CN3, respectively.

Based on this model, DAJS is defined as finding elements of job sets and data file sets to minimize the following two objective functions:

1.

2.

. .

1.

1

=1

MinMax JSet

Min JSet

JSet CN

CN

CN

EX

TT

prcs

iN

i

i

i

N

i

s t

=

∑

≤ ii

i i

i N

i N

prcsCN

size sizeSNDSet SN

=≤ =

1, ,

2. 1, ,

……

Here, if JSeti = {J1, J2, JK} contains K jobs scheduled to be executed by CNi, then the execution time and the total transfer time of this job set can be calculated as follows:

JSet MaxEX ST EXi k

Kk kJ J= +=1( )

taBlE 35.2. notation Summary

Symbol Description

NCN, NSN, NJ, ND Total number of CNs, SNs, jobs, and data files in the systemCNprcs

i Total number of processors for the CN #iSNsize

i Size of the SN #iJ J Ji i i

ST EX TT, , Start, execution time, and transfer time for all data files after executing job #i

JSet JSetEX TTi i, Execution time and total transfer time of all data files addressed

by a collection of jobs described in JSeti to be executed by CNi

DSetsizei Total size of a collection of data files addressed by DSeti to be

hosted by SNi

taBlE 35.1. tasks’ characteristics for Jobs is figure 35.15

Shape Width HeightNumber of Tasks

Number of Processors

Time to Execute

Serious–parallel 6 12 62 7 491Homogeneous–parallel 7 12 53 8 260Heterogeneous–parallel 9 14 65 6 470Single task 1 1 1 4 20

19

c35.indd 794 10/15/2012 1:58:23 PM

USER

Rectangle


Pr

REFERENCES 795

and

JSetTT TTi k

k

K

J= ∑=1

.

In the stated bi-objective formulation, the first constraint is to guarantee that all CNs are capable of executing their assigned jobs, while the second constraint is to guarantee that the total size of all data files each SN hosts is less than its total capac-ity. It is also worth mentioning that the overall execution time of a set of jobs greatly depends on each CN’s local scheduling policy; thus, such scheduling policies must be carefully set for each case to achieve the best performance. Extensive research, however, showed that the local scheduling policy of first come, first served with backfilling usually results in optimal deployment of CNs’ resources when large numbers of jobs are submitted [33]. Based on extensive studies, SchMng also assumes that if several jobs in a CN require the same data file, the requested data file will be downloaded only once and then stored in a local repository (cache) for further local requests [1, 21–29, 34, 35].

rEfErEncES

[1] R. McClatchey, A. Anjum, H. Stockinger, A. Ali, I. Willers, and M. Thomas, “Data inten-sive and network aware (DIANA) grid scheduling,” Journal of Grid Computing, 5(1):43–64, 2007.

[2] J. Taheri, Y.C. Lee, and A.Y. Zomaya, “A bee colony algorithm for simultaneous job and datafile scheduling in grids,” Technical Report, The University of Sydney, 2011.

[3] J. Taheri, Y.C. Lee, and A.Y. Zomaya, “Bestmap: Network aware job and data allocation in grid environments,” Technical Report, The University of Sydney, 2011.

[4] F. Howell and R. McNab, “SimJava: A discrete event simulation package for java with applications in computer systems modelling,” First International Conference on Web-Based Modelling and Simulation, 1998, San Diego, CA.

[5] A. Sulistio, C.S. Yeo, and R. Buyya, “Simulation of parallel and distributed systems: A taxonomy and survey of tools,” ••. Available at http://www.cs.mu.oz.au/raj/papers/simtools.pdf. Accessed •• ••, 2011.

[6] A. Takefusa, S. Matsuoka, and H. Nakada, “Overview of a performance evaluation system for global computing scheduling algorithms,” in 8th IEEE International Symposium on High Performance Distributing Computing (HPDC8), 1999.

[7] H.J. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, K. Taura, and A. Chien, “The MicroGrid: A scientific tool for modeling computational grids,” Scientific Programming, 8(3):127–141, 2000.

[8] H. Casanova, “SimGrid: A toolkit for the simulation of application scheduling,” in First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 430–437, 2001.

[9] H. Casanova, A. Legrand, and M. Quinson, “SimGrid: A generic framework for large-scale distributed experiments,” in 10th International Conference on Computer Modeling and Simulation, pp. 126–131, 2008.

[10] A. Jarry, H. Casanova, and F. Berman, “DAGSim: A simulator for dag scheduling algo-rithms,” Technical Report RR2000-46, LIP, 2000.

111213

14

10

c35.indd 795 10/15/2012 1:58:23 PM

USER

Rectangle

USER

Rectangle

USER

Rectangle


Pr


[11] R. Buyya and M. Murshed, “GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing,” Concurrency and Computation: Practice and Experience, 14(13):1175–1220, 2002.

[12] R. Buyya, D. Abramson, and J. Giddy, “Nimrod-G: An architecture for a resource man-agement and scheduling system in a global computational grid,” in The 4th International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region (HPC-ASIA), Vol. 1, pp. 283–289, 2000.

[13] J. Sherwan, N. Ali, N. Lotia, Z. Hayat, and R. Buyya, “Libra: An economy driven job scheduling system for clusters,” in 6th International Conference on High Performance Computing in Asia Pacific Region (HPC-Asia), pp. 16–19, 2002.

[14] C.L. Dumitrescu and I. Foster, “GangSim: A simulator for grid scheduling studies,” in IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Vol. 2, pp. 1151–1158, 2005.

[15] C. Dobre and C. Stratan, “MONARC simulation framework,” Transactions on Automatic Control and Computer Science, 49(63):35–42, 2004.

[16] D.G. Cameron, A.P. Millar, C. Nicholson, R. Carvajal-schiaffino, F. Zini, and K. Stockinger, “OptorSim: A simulation tool for scheduling and replica optimisation in data grids,” in Computing in High Energy Physics (CHEP 2004), Interlaken, Switzerland, 2004.

[17] H. Mehta, P. Kanungo, and M. Chandwani, “EcoGrid: A dynamically configurable object oriented simulation environment for economy-based grid scheduling algorithms,” in The 4th Annual ACM Bangalore Conference, pp. 1–8, Bangalore, India: ACM, 2011.

[18] H. Lamehamedi, Z. Shentu, B. Szymanski, and E. Deelman, “Simulation of dynamic data replication strategies in data grids,” 2003.

[19] GriPhyN, “Grid physics network in atlas,” ••. Available at http://www.usatlas.bnl.gov/computing/grid/griphyn/. Accessed •• ••, 2011.

[20] G.C. Filho and F.J. da Silva e Silva, “OGST: An opportunistic grid simulation tool,” in 2nd International Workshop Latin American Grid (LAGrid 2008), Campo Grande, Mato Grosso do Sul, 2008.

[21] K. Ranganathan and I. Foster, “Decoupling computation and data scheduling in distrib-uted data-intensive applications,” in Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing (HPDC ’02), pp. 352–358, 2002.

[22] W. Hoschek, J. Jaen-Martinez, A. Samar, H. Stockinger, and K. Stockinger, “Data man-agement in an international data grid project,” in Grid Computing, Vol. 1971 of Lecture Notes in Computer Science (R. Buyya and M. Baker, eds), pp. 77–90. New York: Springer-Verlag, 2000.

[23] W.H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, and F. Zini, “OptorSim: A grid simulator for studying dynamic data replication strategies,” International Journal of High Performance Computing Applications, 17(4):403–416, 2003.

[24] A. Chakrabarti and S. Sengupta, “Scalable and distributed mechanisms for integrated scheduling and replication in data grids,” in Distributed Computing and Networking, Vol. 4904 of Lecture Notes in Computer Science (S. Rao, M. Chatterjee, P. Jayanti, C. Murthy, and S. Saha, eds), pp. 227–238. Berlin/Heidelberg: Springer, 2008.

[25] M. Tang, B.-S. Lee, X. Tang, and C.-K. Yeo, “The impact of data replication on job sched-uling performance in the data grid,” Future Generation Computer Systems, 22(3):254–268, 2006.

[26] R.-S. Chang, J.-S. Chang, and S.-Y. Lin, “Job scheduling and data replication on data grids,” Future Generation Computer Systems, 23(7):846–860, 2007.

[27] N.N. Dang and S.B. Lim, “Combination of replication and scheduling in data grids,” International Journal of Computer Science and Network Security, 7(3):304–308, 2007.

15

c35.indd 796 10/15/2012 1:58:23 PM

USER

Rectangle


Pr

REFERENCES 797

[28] S. Abdi and S. Mohamadi, “Two level job scheduling and data replication in data grid,” International Journal of Grid Computing & Applications, 1(1):23–37, 2010.

[29] A. Anjum, R. McClatchey, A. Ali, and I. Willers, “Bulk scheduling with the diana sched-uler,” IEEE Transactions on Nuclear Science, 53(6):3818–3829, 2006.

[30] F. Berman, G. Fox, and A.J.G. Hey, Grid Computing: Making the Global Infrastructure a Reality. Wiley Series in Communication Networking & Distributed Systems. New York: John Wiley & Sons, 2003.

[31] R. Subrata, A.Y. Zomaya, and B. Landfeldt, “A cooperative game framework for qos guided job allocation schemes in grids,” IEEE Transactions on Computers, 57(10):1413–1422, 2008.

[32] R.P. Dick, D.L. Rhodes, and W. Wolf, “TGFF: Task Graphs for Free,” in Proceedings of the 6th International Workshop on Hardware/Software Codesign, pp. 97–101, 1998.

[33] H. Shan, L. Oliker, and R. Biswas, “Job superscheduler architecture and performance in computational grid environments,” in Proceedings of the ACM/IEEE SC2003 Conference (SC ’03) (L. Oliker and R. Biswas, eds), pp. 44–58, 2003.

[34] S.-M. Park and J.-H. Kim, “Chameleon: A resource scheduler in a data grid environ-ment,” in Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGRID ’03), pp. 258–265, 2003.

[35] A. Chakrabarti, R. Dheepak, and S. Sengupta, “Integration of scheduling and replication in data grids,” in High Performance Computing—HiPC 2004, Vol. 3296 of Lecture Notes in Computer Science (L. Bougé and V. Prasanna, eds), pp. 85–101. Berlin/Heidelberg: Springer, 2005.

c35.indd 797 10/15/2012 1:58:23 PM


Pr

c35.indd 798 10/15/2012 1:58:23 PM

Grid Simulation Tools for Job Scheduling and Data File Replication

Documents

Transcript of Grid Simulation Tools for Job Scheduling and Data File Replication