Simulation of a Scheduling Strategy for Dependent Tasks in … · 2017-05-31 · Simulation of a...

Simulation of a Scheduling Strategy for Dependent Tasks in Cloud

Computing K.Nithyanandakumari

1, S.Sivakumar

2

1,2Department of Computer Science, Cardamom Planters’ Association College,

Bodinayakanur, Tamilnadu

Abstract

Cloud computing is an enthralling paradigm that supports better utilization of

information technology (IT) infrastructure, services and applications. Task scheduling is the

most important part of cloud computing. Scheduling problem involves tasks that must be

scheduled on resources subject to some constraints to optimize the objective function. The

scheduling strategy is divided into dependent task scheduling and independent task

scheduling. Dependency is ensured that the tasks are executed in some precedence order i.e.,

a task can only be scheduled after all its parent tasks are completed. For dependent task

scheduling, the relation between the tasks is typically represented by a task graph or

precedence graph. This research work is to simulate a scheduling strategy that specifies when

and on which resource each task will be executed. The objective is to maximize the system

throughput with the assignment of task to a suitable processor, maximizing resource

utilization, and minimizing execution time.

Keywords: Cloud computing, Task scheduling, Task Dependency, Scheduling strategy.

I. INTRODUCTION

Cloud computing is a modern computing paradigm refined from grid computing, parallel

computing and distributed computing which provides dynamic services over the Internet

[1].The relentless services offered by cloud computing are perceived through its innovative

global data centers that are firmed on virtualized compute and storage technologies [2]. Using

the cloud computing technology, one can lease the required computing resources, software or

a development platform from the cloud service provider and pay as per the usage. Different

resources and services are offered to different users by the cloud service providers due to its

diversified, dynamic and flexible nature [3]. The goal of cloud service provider is to

dynamically provide the widely distributed set of services viz. hardware, software,

applications, huge data storage, virtualized resources, high throughput, improved resource

utilization, high reliability, Quality of Service (QoS), build-in disaster recovery and high

efficiency to the customers. In order to achieve these objectives, one of the most imperative

integral of cloud computing is task scheduling [4].

Task scheduling allows optimal allocation of resources among given tasks in a finite time to

achieve the desired quality of service. The goal of scheduling is to map tasks to appropriate

resources that optimize one or more objectives. Formally, scheduling problem involves tasks

that must be scheduled on resources subject to some constraints to optimize some objective

function [5]. Efficient scheduling is necessary for both cloud service users as well as

providers. Scheduling are categorized at user level and system level. At user level, scheduling

deals with problems raised by service provisioning between providers and customers. The

system level scheduling handles resource management within data centers [6-7].Scheduling

process in cloud can be generalized into three steps [8]:

a. Resource discovering and filtering: Data center broker discovers the resources present in

the network system and collects status information related to them.

b. Resource selection: Target resource is selected based on certain parameters of task and

International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com

546

resource.

c. Task submission: Task is submitted to the selected resource.

Further, scheduling algorithms differ based on dependency among tasks to be scheduled. One

of the important factors that have a high impact on the selection of the scheduling strategy is

the task dependency. Based on the dependency, the tasks may be classified as independent or

dependent tasks. The independent tasks have no dependencies among the task and have no

precedence order to be followed during scheduling and this is called independent task

scheduling. In contrast, the dependent tasks have task-precedence order to be met during the

scheduling process [9]. Scheduling of dependent tasks is also known as work-flow

scheduling. The dependent tasks are the tasks that in the task-sets submitted by users exist a

certain dependent relationship between each other, and the relationship shows that before the

subtask be executed, the execution results of its parent tasks must be known [10]. And this

relationship can be represented by a Directed Acyclic Graph(DAG) in which nodes represent

the tasks and edges represent the communication overhead as shown in Fig. 1 [11]. Proper

scheduling can have significant impact on the performance of the system.

Figure 1. General DAG model

In line with that, this paper proposes a model for dependent task scheduling in cloud

computing to minimize the overall execution time subject to the QoS based constraints viz.

precedence constraints.

II. SYSTEM MODEL FOR DEPENDENT TASK SCHEDULING

A scheduling is a process that maps and manages the execution of dependent tasks on the

distributed resources. It allocates suitable resources to workflow tasks so that the execution

can be completed to satisfy objective functions subject to the given constraints. Fig.2 shows

the system model for dependent task scheduling. Given large task is divided into subtasks by

the task partitioning module and the subtasks are given to the load leveller. Load leveller uses

a distribution method which assigns tasks across all the resources according to the computing

capability of resources and the task requirements in order to achieve the load balance and

trying to avoid congestion.

Load leveller has access to the information on available resources from the resource

monitoring module that monitors the entire datacenter resources. The assignment of tasks to

appropriate resources is called matching of tasks. Finally, the Scheduler allocates resources to


547

tasks for execution. This allocation is called mapping. In this scheduling model the tasks are

scheduled individually as they are received, no waiting for next batch time interval.

Figure 2. System model for dependent task scheduling

Let T={T1,T2,T3,…..Tn} be the set of tasks. A task Ti is usually made of several subtasks that

represents a computational and indivisible schedulable unit {t1, t2,t3,….tk}.

We consider the subtasks to be the part of large tasks. The characteristics of subtasks are:

Non-preemptive: A task should entirely be completed in the resource.

Precedence relation between tasks: Specify an order in which the subtasks

can be processed (Dependency Constraint).

Such precedence relations are described by a four-tuple to represent the DAG G.

Let G= (V, E, W, C) where V=t, the pair (i,j) ∈ 𝐸 iff ti must be completed before tj.

W is the collection of execution time where W= {wi} which shows ti’s execution time.

C is the collection of communication time where C= {ci,j}.It means that the

communication cost from task ti to tj is ci,j.

Let R={R1, R2, R3, ….Rl} be the set of resources. A resource is a basic computational

entity where the tasks are scheduled, allocated and processed accordingly. A set of

characteristics can be added to each resource Rj∈ 𝑅 and represented as RCj = {s,m,d} where

s- Processing speed of resource in Million Instructions Per Second (MIPS).

m- Available memory in bytes.

d- Available disk space.

In addition, resource pre-reservation is also needed because of task dependencies to assure

the correct execution of the workflow.

In order to describe the mathematical model, some notations are given in this section.

• pred(ti): the collection of task i’s predecessors, npred(ti) is the number of

predecessors of task ti.

• succ(ti): the collection of task i’s successors, nsucc(ti) is the number of

successors of task ti.

• est(ti): the earliest start time of task ti.

• F(ti): the finishing time of task ti.

Optimization Criteria

Task scheduling is a multi-objective combinatorial optimization problem. It can be defined

as the problem of simultaneous minimization or maximization of multiple conflicting

objectives. This scheduling strategy appraises two objectives as minimization objectives and

a maximization objective. They are:


548

Makespan: Indicates the finishing time of the last task tk. The most popular

optimization criterion while scheduling tasks is to minimize makespan.

Makespan = max {F(tk)} where F(tk)indicates the finishing time of subtask tk.

Flowtime:It is the sum of finishing times of all the tasks. To minimize the

flowtime, tasks should be executed in ascending order of their processing

time.

Flowtime = 𝐹(𝑡𝑖)𝑘𝑖=1 where F(ti)indicates the finishing time of subtask ti.

Resource utilization: Another important criterion is maximization of resource

utilization i.e. keeping resources as busy as possible.

Resource utilization = 𝑇𝑖𝑚𝑒 𝑡𝑎𝑘𝑒𝑛 𝑏𝑦 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒 𝑖 𝑡𝑜 𝑓𝑖𝑛𝑖𝑠 ℎ 𝑎𝑙𝑙 𝑡𝑎𝑠𝑘𝑠𝑙𝑖=1

𝑀𝑎𝑘𝑒𝑠𝑝𝑎𝑛 𝑋 𝑙

Where 𝑙 is the number of processors.

The above defined objectives must be attained with the action of the following constraints.

At any instance a task is executing on only one resource.

Ensure that a task hosted on only on a resource which is operating.

Total processing requirements of all tasks hosted on resource should not

exceed the maximum processing capacity of that resource.

Total memory requirements of all tasks hosted on resource should not exceed

the maximum memory available with that resource.

If there are precedence orders among tasks, then a task cannot be scheduled

until its entire parent tasks are finished.

In the state of multi-objective optimization, multiple solutions are generated rather

than a single solution. These multiple solutions form a set called Pareto optimal set. Pareto

optimization is widely used to solve multi-objective optimization problems having conflicting

objectives. Rather than constructing a single solution, multiple solutions are generated that

satisfy Pareto optimality criterion. A solution S is chosen only if no solution is better than S

taking into account entire objectives. Suppose if S is worse than some solution S0 with

respect to one objective, S is chosen given that it is better than S0 with respect to some other

objective. Hence every Pareto optimal solution is good with respect to some optimization

criterion. Pareto approach is chosen to provide as many non-dominated solutions as

possible.A solution is said non-dominated if it is better in at least one objective with respect

to all other solutions in the Pareto set. Therefore, finding Pareto optimal set of a problem is

the main concern of multi-objective optimization.

III. CONCLUSION

Schedulers are essential for cloud computing. It determines on which processing resource,

the tasks of a workflow should be allocated.Task scheduling in cloud is multi-objective in its

general formulation. This paper illustrates a general multi-objective dependent task

scheduling model presenting the main characteristics to be considered when scheduling tasks.

The author is working with the programming model and deploys it in the IaaS cloud model.

REFERENCES [1] S. Selvarani, G. SudhaSadhasivam, “Improved Cost-Based Algorithm For Task Scheduling In

Cloud Computing,” IEEE Explore, Jan 2011.


549

[2] Shawish, A., &Salama, M. Cloud computing: paradigms and technologies. In Inter-

cooperative Collective Intelligence: Techniques and Applications ,(pp. 39-67). Springer Berlin

Heidelberg 2014.

[3] Ferguson, D., Yemini, Y., Nikolaou, C.: Microeconomic for load balancing in distributed computer Systems. In: Proceeding of the Eighth International Conference on Distributed

Systems. San Jose, pp. 491–499. IEEE Press (1988).

[4] MehwishAwan, Munam Ali Shah,” A Survey on Task Scheduling Algorithms in CloudComputing Environment”, International Journal of Computer and Information

Technology (ISSN: 2279 – 0764),Volume 04 – Issue 02, March 2015.

[5] Mala Kalra ,Sarbjeet Singh,” A review of metaheuristic scheduling techniques in cloud computing”, Egyptian Informatics Journal (2015) 16, pp.275–295.

[6] Paul M. and Sanyal G., “Survey and Analysis of Optimal Scheduling Strategies in

CloudEnvironment,” in Proceedings of the IEEE International Conference on Information and

Communication Technologies, Georgia, USA, pp.789-792, 2012. [7] Qiyi H. and Tinglei H., “An Optimistic Job Scheduling Strategy based on QoS for Cloud

Computing,” in Proceedings of the IEEE International Conference on Intelligent Computing

and Integrated Systems, Guilin,China, pp. 673-675, 2010. [8] EsmaInsafDjebbar, GhalemBelalem,” Tasks Scheduling and Resource Allocation for High

Data Management in Scientific Cloud Computing Environment”, © Springer International

Publishing AG 2016.S. Boumerdassi et al. (Eds.): MSPN 2016, LNCS 10026, pp. 16–27,

2016. [9] Ruby Annette J, Aisha BanuW, Shriram, “A Taxonomy and Survey of Scheduling Algorithms

inCloud: based on Task Dependency”, International Journal of Computer Applications (0975 –

8887),Volume 82 – No 15, November 2013. [10] Jayaswal S, Agarwal P,” Balancing U-shaped assembly lines with resource dependent task

times: A Simulated Annealing approach”, Journal of Manufacturing Systems, 2014, 33(4):

522-534. [11] K. Nithyanandakumari, Dr. S. Sivakumar,” A study on DAG model for Task Scheduling in

Cloud Environment”, 2017 International Conference on Advanced Computing and

Communication Systems (ICACCS -2017), Jan. 06 – 07, 2017, Coimbatore, INDIA. IEEE

ISBN No. 978-1-5090-4558-7. [12] Jia Yu, RajkumarBuyya and KotagiriRamamohanarao,”Workflow Scheduling Algorithms for

Grid Computing”, Grid Computing and Distributed Systems (GRIDS) Laboratory,Department

of Computer Science and Software Engineering,The University of Melbourne, VIC 3010 Austraila.

[13] Tingting Wang, ZhaobinLiu, Yi Chen, YujieXu, Xiaoming Dai,” Load Balancing Task

Scheduling based on Genetic Algorithm in Cloud Computing”, IEEE 12th International

Conference on Dependable, Autonomic and Secure Computing, 978-1-4799-5079-9/14 $31.00 © 2014 IEEE.

[14] RituGarg, Awadhesh Kumar Singh,”Multi-Objective Optimization to Workflow

Gridscheduling using Reference Point based Evolutionary Algorithm”, International Journal of Computer Applications (0975 –8887)Volume 22–No.6, May 2011.

[15] Xhafa F, Abraham A. Computational models and heuristic methods for Grid scheduling

problems. Future Generation Computer Systems 2010;26:608–21. http://dx.doi.org/10.1016/j.future.2009.11.005.


550

Simulation of a Scheduling Strategy for Dependent Tasks in … · 2017-05-31 · Simulation of a...

Documents

Transcript of Simulation of a Scheduling Strategy for Dependent Tasks in … · 2017-05-31 · Simulation of a...