Simulation of a Scheduling Strategy for Dependent Tasks in … · 2017-05-31 · Simulation of a...
Transcript of Simulation of a Scheduling Strategy for Dependent Tasks in … · 2017-05-31 · Simulation of a...
Simulation of a Scheduling Strategy for Dependent Tasks in Cloud
Computing K.Nithyanandakumari
1, S.Sivakumar
2
1,2Department of Computer Science, Cardamom Planters’ Association College,
Bodinayakanur, Tamilnadu
Abstract
Cloud computing is an enthralling paradigm that supports better utilization of
information technology (IT) infrastructure, services and applications. Task scheduling is the
most important part of cloud computing. Scheduling problem involves tasks that must be
scheduled on resources subject to some constraints to optimize the objective function. The
scheduling strategy is divided into dependent task scheduling and independent task
scheduling. Dependency is ensured that the tasks are executed in some precedence order i.e.,
a task can only be scheduled after all its parent tasks are completed. For dependent task
scheduling, the relation between the tasks is typically represented by a task graph or
precedence graph. This research work is to simulate a scheduling strategy that specifies when
and on which resource each task will be executed. The objective is to maximize the system
throughput with the assignment of task to a suitable processor, maximizing resource
utilization, and minimizing execution time.
Keywords: Cloud computing, Task scheduling, Task Dependency, Scheduling strategy.
I. INTRODUCTION
Cloud computing is a modern computing paradigm refined from grid computing, parallel
computing and distributed computing which provides dynamic services over the Internet
[1].The relentless services offered by cloud computing are perceived through its innovative
global data centers that are firmed on virtualized compute and storage technologies [2]. Using
the cloud computing technology, one can lease the required computing resources, software or
a development platform from the cloud service provider and pay as per the usage. Different
resources and services are offered to different users by the cloud service providers due to its
diversified, dynamic and flexible nature [3]. The goal of cloud service provider is to
dynamically provide the widely distributed set of services viz. hardware, software,
applications, huge data storage, virtualized resources, high throughput, improved resource
utilization, high reliability, Quality of Service (QoS), build-in disaster recovery and high
efficiency to the customers. In order to achieve these objectives, one of the most imperative
integral of cloud computing is task scheduling [4].
Task scheduling allows optimal allocation of resources among given tasks in a finite time to
achieve the desired quality of service. The goal of scheduling is to map tasks to appropriate
resources that optimize one or more objectives. Formally, scheduling problem involves tasks
that must be scheduled on resources subject to some constraints to optimize some objective
function [5]. Efficient scheduling is necessary for both cloud service users as well as
providers. Scheduling are categorized at user level and system level. At user level, scheduling
deals with problems raised by service provisioning between providers and customers. The
system level scheduling handles resource management within data centers [6-7].Scheduling
process in cloud can be generalized into three steps [8]:
a. Resource discovering and filtering: Data center broker discovers the resources present in
the network system and collects status information related to them.
b. Resource selection: Target resource is selected based on certain parameters of task and
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
546
resource.
c. Task submission: Task is submitted to the selected resource.
Further, scheduling algorithms differ based on dependency among tasks to be scheduled. One
of the important factors that have a high impact on the selection of the scheduling strategy is
the task dependency. Based on the dependency, the tasks may be classified as independent or
dependent tasks. The independent tasks have no dependencies among the task and have no
precedence order to be followed during scheduling and this is called independent task
scheduling. In contrast, the dependent tasks have task-precedence order to be met during the
scheduling process [9]. Scheduling of dependent tasks is also known as work-flow
scheduling. The dependent tasks are the tasks that in the task-sets submitted by users exist a
certain dependent relationship between each other, and the relationship shows that before the
subtask be executed, the execution results of its parent tasks must be known [10]. And this
relationship can be represented by a Directed Acyclic Graph(DAG) in which nodes represent
the tasks and edges represent the communication overhead as shown in Fig. 1 [11]. Proper
scheduling can have significant impact on the performance of the system.
Figure 1. General DAG model
In line with that, this paper proposes a model for dependent task scheduling in cloud
computing to minimize the overall execution time subject to the QoS based constraints viz.
precedence constraints.
II. SYSTEM MODEL FOR DEPENDENT TASK SCHEDULING
A scheduling is a process that maps and manages the execution of dependent tasks on the
distributed resources. It allocates suitable resources to workflow tasks so that the execution
can be completed to satisfy objective functions subject to the given constraints. Fig.2 shows
the system model for dependent task scheduling. Given large task is divided into subtasks by
the task partitioning module and the subtasks are given to the load leveller. Load leveller uses
a distribution method which assigns tasks across all the resources according to the computing
capability of resources and the task requirements in order to achieve the load balance and
trying to avoid congestion.
Load leveller has access to the information on available resources from the resource
monitoring module that monitors the entire datacenter resources. The assignment of tasks to
appropriate resources is called matching of tasks. Finally, the Scheduler allocates resources to
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
547
tasks for execution. This allocation is called mapping. In this scheduling model the tasks are
scheduled individually as they are received, no waiting for next batch time interval.
Figure 2. System model for dependent task scheduling
Let T={T1,T2,T3,…..Tn} be the set of tasks. A task Ti is usually made of several subtasks that
represents a computational and indivisible schedulable unit {t1, t2,t3,….tk}.
We consider the subtasks to be the part of large tasks. The characteristics of subtasks are:
Non-preemptive: A task should entirely be completed in the resource.
Precedence relation between tasks: Specify an order in which the subtasks
can be processed (Dependency Constraint).
Such precedence relations are described by a four-tuple to represent the DAG G.
Let G= (V, E, W, C) where V=t, the pair (i,j) ∈ 𝐸 iff ti must be completed before tj.
W is the collection of execution time where W= {wi} which shows ti’s execution time.
C is the collection of communication time where C= {ci,j}.It means that the
communication cost from task ti to tj is ci,j.
Let R={R1, R2, R3, ….Rl} be the set of resources. A resource is a basic computational
entity where the tasks are scheduled, allocated and processed accordingly. A set of
characteristics can be added to each resource Rj∈ 𝑅 and represented as RCj = {s,m,d} where
s- Processing speed of resource in Million Instructions Per Second (MIPS).
m- Available memory in bytes.
d- Available disk space.
In addition, resource pre-reservation is also needed because of task dependencies to assure
the correct execution of the workflow.
In order to describe the mathematical model, some notations are given in this section.
• pred(ti): the collection of task i’s predecessors, npred(ti) is the number of
predecessors of task ti.
• succ(ti): the collection of task i’s successors, nsucc(ti) is the number of
successors of task ti.
• est(ti): the earliest start time of task ti.
• F(ti): the finishing time of task ti.
Optimization Criteria
Task scheduling is a multi-objective combinatorial optimization problem. It can be defined
as the problem of simultaneous minimization or maximization of multiple conflicting
objectives. This scheduling strategy appraises two objectives as minimization objectives and
a maximization objective. They are:
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
548
Makespan: Indicates the finishing time of the last task tk. The most popular
optimization criterion while scheduling tasks is to minimize makespan.
Makespan = max {F(tk)} where F(tk)indicates the finishing time of subtask tk.
Flowtime:It is the sum of finishing times of all the tasks. To minimize the
flowtime, tasks should be executed in ascending order of their processing
time.
Flowtime = 𝐹(𝑡𝑖)𝑘𝑖=1 where F(ti)indicates the finishing time of subtask ti.
Resource utilization: Another important criterion is maximization of resource
utilization i.e. keeping resources as busy as possible.
Resource utilization = 𝑇𝑖𝑚𝑒 𝑡𝑎𝑘𝑒𝑛 𝑏𝑦 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒 𝑖 𝑡𝑜 𝑓𝑖𝑛𝑖𝑠 ℎ 𝑎𝑙𝑙 𝑡𝑎𝑠𝑘𝑠𝑙𝑖=1
𝑀𝑎𝑘𝑒𝑠𝑝𝑎𝑛 𝑋 𝑙
Where 𝑙 is the number of processors.
The above defined objectives must be attained with the action of the following constraints.
At any instance a task is executing on only one resource.
Ensure that a task hosted on only on a resource which is operating.
Total processing requirements of all tasks hosted on resource should not
exceed the maximum processing capacity of that resource.
Total memory requirements of all tasks hosted on resource should not exceed
the maximum memory available with that resource.
If there are precedence orders among tasks, then a task cannot be scheduled
until its entire parent tasks are finished.
In the state of multi-objective optimization, multiple solutions are generated rather
than a single solution. These multiple solutions form a set called Pareto optimal set. Pareto
optimization is widely used to solve multi-objective optimization problems having conflicting
objectives. Rather than constructing a single solution, multiple solutions are generated that
satisfy Pareto optimality criterion. A solution S is chosen only if no solution is better than S
taking into account entire objectives. Suppose if S is worse than some solution S0 with
respect to one objective, S is chosen given that it is better than S0 with respect to some other
objective. Hence every Pareto optimal solution is good with respect to some optimization
criterion. Pareto approach is chosen to provide as many non-dominated solutions as
possible.A solution is said non-dominated if it is better in at least one objective with respect
to all other solutions in the Pareto set. Therefore, finding Pareto optimal set of a problem is
the main concern of multi-objective optimization.
III. CONCLUSION
Schedulers are essential for cloud computing. It determines on which processing resource,
the tasks of a workflow should be allocated.Task scheduling in cloud is multi-objective in its
general formulation. This paper illustrates a general multi-objective dependent task
scheduling model presenting the main characteristics to be considered when scheduling tasks.
The author is working with the programming model and deploys it in the IaaS cloud model.
REFERENCES [1] S. Selvarani, G. SudhaSadhasivam, “Improved Cost-Based Algorithm For Task Scheduling In
Cloud Computing,” IEEE Explore, Jan 2011.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
549
[2] Shawish, A., &Salama, M. Cloud computing: paradigms and technologies. In Inter-
cooperative Collective Intelligence: Techniques and Applications ,(pp. 39-67). Springer Berlin
Heidelberg 2014.
[3] Ferguson, D., Yemini, Y., Nikolaou, C.: Microeconomic for load balancing in distributed computer Systems. In: Proceeding of the Eighth International Conference on Distributed
Systems. San Jose, pp. 491–499. IEEE Press (1988).
[4] MehwishAwan, Munam Ali Shah,” A Survey on Task Scheduling Algorithms in CloudComputing Environment”, International Journal of Computer and Information
Technology (ISSN: 2279 – 0764),Volume 04 – Issue 02, March 2015.
[5] Mala Kalra ,Sarbjeet Singh,” A review of metaheuristic scheduling techniques in cloud computing”, Egyptian Informatics Journal (2015) 16, pp.275–295.
[6] Paul M. and Sanyal G., “Survey and Analysis of Optimal Scheduling Strategies in
CloudEnvironment,” in Proceedings of the IEEE International Conference on Information and
Communication Technologies, Georgia, USA, pp.789-792, 2012. [7] Qiyi H. and Tinglei H., “An Optimistic Job Scheduling Strategy based on QoS for Cloud
Computing,” in Proceedings of the IEEE International Conference on Intelligent Computing
and Integrated Systems, Guilin,China, pp. 673-675, 2010. [8] EsmaInsafDjebbar, GhalemBelalem,” Tasks Scheduling and Resource Allocation for High
Data Management in Scientific Cloud Computing Environment”, © Springer International
Publishing AG 2016.S. Boumerdassi et al. (Eds.): MSPN 2016, LNCS 10026, pp. 16–27,
2016. [9] Ruby Annette J, Aisha BanuW, Shriram, “A Taxonomy and Survey of Scheduling Algorithms
inCloud: based on Task Dependency”, International Journal of Computer Applications (0975 –
8887),Volume 82 – No 15, November 2013. [10] Jayaswal S, Agarwal P,” Balancing U-shaped assembly lines with resource dependent task
times: A Simulated Annealing approach”, Journal of Manufacturing Systems, 2014, 33(4):
522-534. [11] K. Nithyanandakumari, Dr. S. Sivakumar,” A study on DAG model for Task Scheduling in
Cloud Environment”, 2017 International Conference on Advanced Computing and
Communication Systems (ICACCS -2017), Jan. 06 – 07, 2017, Coimbatore, INDIA. IEEE
ISBN No. 978-1-5090-4558-7. [12] Jia Yu, RajkumarBuyya and KotagiriRamamohanarao,”Workflow Scheduling Algorithms for
Grid Computing”, Grid Computing and Distributed Systems (GRIDS) Laboratory,Department
of Computer Science and Software Engineering,The University of Melbourne, VIC 3010 Austraila.
[13] Tingting Wang, ZhaobinLiu, Yi Chen, YujieXu, Xiaoming Dai,” Load Balancing Task
Scheduling based on Genetic Algorithm in Cloud Computing”, IEEE 12th International
Conference on Dependable, Autonomic and Secure Computing, 978-1-4799-5079-9/14 $31.00 © 2014 IEEE.
[14] RituGarg, Awadhesh Kumar Singh,”Multi-Objective Optimization to Workflow
Gridscheduling using Reference Point based Evolutionary Algorithm”, International Journal of Computer Applications (0975 –8887)Volume 22–No.6, May 2011.
[15] Xhafa F, Abraham A. Computational models and heuristic methods for Grid scheduling
problems. Future Generation Computer Systems 2010;26:608–21. http://dx.doi.org/10.1016/j.future.2009.11.005.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
550