[IEEE 2008 Second International Conference on Future Generation Communication and Networking...

A Hybrid Task Scheduling for Multi-Core Platform

Liang-Teh Lee*, Huang-Yuan Chang*†, and Shu-Wei Chao* *Dept. of Computer Science and Engineering, Tatung University, Taiwan

†Dept. of Electronic Engineering, Technology and Science Institute of Northern Taiwan, Taiwan [email protected], [email protected], [email protected]

Abstract

With the advancement of technology, various services can be combined in a single computer system, such as Mobile Computing, and Digital Home, etc. With the properties of compact size, low power consumption, and full computing power, multi-core processor systems can be used to fulfill these kind of applications. For a set of hybrid tasks which comprise large real-time and non-real-time applications, however, currently, the operating system does not have any scheduling policy to improve the performance on multi-core platforms effectively. The task scheduling of operating system for general purpose applications can not satisfy the working demand of hybrid tasks, especially with large amount of real-time tasks. In this paper, we proposed a hybrid task scheduling scheme for multi-core platforms. In the proposed scheme, a two-level hierarchical scheduling is applied to adjust real-time and non-real-time tasks. It can not only maintain the response time of general tasks, but also support the real-time requirements for other tasks. The experimental results show that a higher efficiency can be obtained by applying the proposed scheme on multi-core architectures. 1. Introduction

With the advancement of technology, a wide range of digital technology products and services are also quickly popularity. Users face a variety of applications, the need for simpler and more convenient environment to use. Now various services can be combined in a single computer system, such as Mobile Computing, and Digital Home, etc. A multi-core CPU (or chip-level multiprocessor, CMP) [1] combines two or more independent cores into a single package composed of a single integrated circuit. It provided mini size, low power consumption, and full computing power to fulfill this kind of applications. Such chips include Intel’s Core 2 Duo and Core 2 Quad, IBM’s PowerPC,

and AMD’s Opteron. The extent of its popularity is also the biggest advantages of such platforms.

However, currently, the operating system does not have any scheduling policy to improve the performance on multi-core platforms effectively. The task scheduling of operating system for general purpose applications can not satisfy the working demand of hybrid tasks, especially with large amount of real-time tasks. Additionally, systems will also need to support non-real-time task, such as Digital Home. VOD(Video On Demand) has real-time requirement for multimedia applications, environment sensing need to sense the change of temperature or humidity, and user interface providing processing capability for one or more user terminals. For that hybrid set which include real-time tasks and non-real-time tasks, only use general purpose scheduling or traditional real-time scheduling cannot get the performance that it should earn.

In this paper, a solvable method has been proposed. In the proposed scheme, a two-level hierarchical scheduling is applied to adjust real-time and non-real-time tasks. It can not only maintain the response time of general tasks, but also support the real-time requirements for other tasks. The experimental results show that a higher efficiency can be obtained by applying the proposed scheme on multi-core architectures. To reasonably constrain the discussion, we henceforth limit attention to the multi-core architecture shown in Fig. 1, wherein all cores are symmetric and share a chip-wide L2 cache.

‧‧‧ Core 1 Core M

L1 L1

L2

Figure 1. Multi-core architecture

2008 Second International Conference on Future Generation Communication and Networking Symposia

978-0-7695-3546-3/08 $25.00 © 2008 IEEE

DOI 10.1109/FGCNS.2008.152

40

Multiprocessor real-time scheduling can be divided into three major categories according to the degree of run-time migration that is allowed for job instances of a task across processors [2]: full migration, no migration and restricted migration. No migration means tasks are statically (off-line) partitioned and allocated to processors. At run-time, job instances of tasks are scheduled on their respective processors by processors’ local scheduling algorithm, like single processor scheduling. Oppositely, full migration means jobs are allowed to arbitrarily migrate across processors during their execution. This usually implies a global scheduling strategy, where a single shared scheduling queue is maintained for all processors and a processorwide scheduling decision is made by a single (global) scheduling algorithm, but restricted migration, where some form of migration, such as at job boundaries, is allowed.

The partitioned scheduling paradigm practice more easily, because once tasks are allocated to processors, the multiprocessor real-time scheduling problem becomes a set of single processor real-time scheduling problems, one for each processor, which has been well-studied and for which optimal algorithms exist. The computational complexity of the partitioned scheduling is as same as single processor scheduling either algorithms or schedulability test. However, in the circumstances that task sets could be changed, the partitioned scheduling paradigm is analogous to the bin-packing problem which is known to be NP-hard in the strong sense. Therefore, the partitioned scheduling paradigm is not easy to reach a high CPU utilization rate.

The global scheduling paradigm can achieve 100% CPU utilization rate in the ideal situation, but it suffers from tasks migrate frequently, context switch, and high computational complexity of algorithms. These overhead has a high impact on the performance.

In view of this, we proposed a two-level hierarchical scheduling. In first level, it uses deferred server (DS) [3] to adjust real-time and non-real-time tasks. In second level, it address real-time task to maintain the response time of general tasks, and support a real-time requirement for other tasks. It has low computational complexity, but can get high CPU utilization rate like the global scheduling. The experimental results show that a higher efficiency can be obtained by applying the proposed scheme on multi-core architectures.

In other related research, the Pfair (proportionate fairness) [4][5][6] class of algorithms that allow full migration and fully dynamic priorities have been shown to be theoretically optimal. But, it incur significant run-time overhead due to their quantum-based scheduling approach.

Enrico Bini [7] proposed a approach for analyzing the schedulability of periodic task sets under the Rate Monotonic priority assignment. Using this approach, it derived a new schedulability test, called δ-HET (Hyperplanes δ-Exact Test), which can be tuned through a parameter to balance complexity versus acceptance ratio. In work on single processor systems that support simultaneous multithreading (SMT), can also improves processor throughput by SMT co-scheduling algorithms [8]. In addition, an approach for supporting soft real-time periodic tasks on performance asymmetric multi-core platforms has been proposed [9].

We will be based on the general schedule for multiprocessor, to discuss how to increase the efficiency of the use of feature on multi-core architectures. The rest of this paper is organized as follows. In chapter 2, we present a brief overview of real-time scheduling. Then, in chapter 3, we describe our proposed scheduling on multi-core platform. In chapter 4, we present experimental results, and in chapter 5, we conclude.. 2. Background

According to the real-time system, we assume that the scheduler will allocate N successive periodic tasks to M processors. The kth processor is denoted as Pk, where k is in [1, M]. The ith task is denoted as Ti, where i is in [1, N]. Each task Ti is defined by Ti(ei, pi, di, ri) where ei is a worst-case execution time, pi is a period, di is deadline, and ri is release time. Then Ui = ei/pi indicates a processor utilization of Ti. For simplicity, we consider pi=di and ei <= pi. Summarized above description, task Ti’s can be characterized as Ti(ei, pi, ri).

In non-real-time systems, an optimal scheduler scheduling does not necessarily mean “fastest average response time” or “shortest average waiting time.” It is one which may fail to meet a deadline of a task only if no other scheduler can. EDF(earliest deadline first) and LLF(least laxity first) are optimal algorithms in the uniprocessor environment, but we saw that the EDF algorithm was not optimal in the multiprocessor environment.

Figure 2 shows an example of the EDF and LLF [10] to schedule a set of three periodic tasks: T1(6,9,0), T2(4,7,0), and T3(4,7,0) executed on two processors. In figure 2(a), EDF algorithm assigns priority to tasks according to their absolute deadline. At time 9, T1 can’t complete work before deadline. In figure 2(b), LLF algorithm assigns priority to tasks according to their relative laxity (pi - ei), and all tasks complete work before respective deadline. Therefore, EDF

41

algorithm is no longer an optimal scheduler in the multiprocessor environment.

Although the LLF algorithm on the multiprocessor platform is still the optimal scheduling, frequently context switch were occurred. If it spends too much time for context switches, tasks may be no way to complete before respective deadline. In addition, the migration of T2 and T3 must also be taken into account.

3. Scheduling Algorithms

For scheduling real-time tasks and non-real-time tasks together, our proposed scheduling uses two-level hierarchical approach. Level 1 is responsible for real-time and non-real-time scheduling work together, and partitioned task set into several groups to facilitate after scheduling. Level 2 devotes to scheduling real-time tasks. It uses modified LLF algorithm and task weight to adjust priorities. The quantity ei/pi is the weight, or utilization, of Ti , denoted wt(Ti).

Level 1: Non-real-time tasks can be seen as sporadic tasks, and so we can use DS (deferred server) to deal with. It is characterized by a period and a computation time. In figure 3, the periodic task set is composed of two tasks, T1(3,6,0) and Tds(2,4,0). Tds is the task server with the highest priority. The non-real-time task set is composed of two tasks J1(e = 2, r = 1) and J2(e = 3, r = 6). J2 cannot complete its work since the replenishment amount is only 2. It is executed to completion when the server replenishes its capacity, i.e., within the interval [8, 9].

For scheduling real-time tasks, we try to classify all real-time tasks into heavy and light. The task whose weight greater than 1/2 is heavy and the task whose weight smaller than 1/2 is light. We, then, partitioned the task set into several groups according to weight. Every group can have only one heavy task but more than one light tasks, yet we limit the total weight of the group to one. Figure 4 shows an example of the classification, which follows the scheduling convenient. For non-real-time task, they can be seen as sporadic tasks, and so we can use DS to deal with.

Level 2: We start to schedule real-time tasks in level 2. According to the result of the classification

in level 1, our approach has a process with three steps: First, priorities are assigned to heavy tasks according to EDF. And then, priorities are assigned to light tasks according to LLF. Finally, light tasks will be scheduled when the heavy task which belongs to the same group is executing. Moreover, there is a rule to be followed: if an idle processor exists, the light task which has highest priority can get the resource of the processor.

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

T1 T2

T3

P1

(a)

0 1 2 3 4 5 6 7 8 9 10

T1

T2 T3

0 1 2 3 4 5 6 7 8 9 10

(b)

Miss deadline

P2

P1

P2

Figure 2. (a)Global EDF and (b)Global LLF schedule for a set of three periodic tasks on two processors.

0 1 2 3 4 5 6 7 8 9 10 11 12

J1

0 1 2 3 4 5 6 7 8 9 10 11 12

Tds J2 J2

T1

Figure 3. Deferred server (DS) J1, J2 are two different non-real-time tasks. T1,1 is the first job of T1.

Figure 4. Example of the classification in level 1

42

Due to the overhead which caused by the migration of tasks, we use EDF to schedule heavy tasks. Heavy tasks consume the time in migration and context switch are more than light tasks consume, so we must use LLF to inhibit the overhead. When the laxity of the light task is zero, it can preempt the heavy task which has lowest priority. 4. Experimental Results

In this section, we first present ideal results from experiments, and then provide results from experiments where we implement our approach in the SESC simulator [11][12]. SESC is a microprocessor architectural simulator developed primarily by i-acoma research group at UIUC and various groups at other universities that models different processor architectures, such as single processors, chip multiprocessors and processors-in-memory. It models a full out-of-order pipeline with branch prediction,

caches, buses, and every other component of a modern processor necessary for accurate simulation.

In our first experiment, we randomly generated 30

task sets, and simulated the scheduling of these task sets on the one-core and two-core systems. Every task were randomly generated such that periods varied from 2 to 10 time units, and computation time varied from 1 to 9 time units. Run time of every task set is the least common multiple (LCM) of task periods. We use the average deadline miss ratio (ADMR) to measure the performance of scheduling. It can be defined as:

execution ofnumber total1

⎟⎠

⎞⎜⎝

⎛∑=

N

iiMiss

Figure 5(a) shows the result of our first experiment

on a one-core processor. Because EDF and LLF are optimal algorithms in the uniprocessor environment, the task set which total weight less than one will not have deadline miss. When the total weight of the task set more than one, the average deadline miss ratio will

Figure 5. Average deadline miss ratio for EDF, LLF, and proposed scheduling on (a) one-core (b) two-core processor in the ideal circumstance

Figure 6. Average deadline miss ratio for EDF, LLF, and proposed scheduling on (a) one-core (b) two-core processor in the ideal circumstance

43

rise significantly. Furthermore, we can see that the average deadline miss ratio of LLF is more than EDF when the total weight of the task set over 1.1 approximately. Figure 5(b) shows the result of our first experiment on a two-core processor. Because EDF will not be optimal in the multiprocessor environment, the deadline miss occurred when the task set which total weight over 1.8.

As the first experiment, we randomly generated 30 task sets, and simulated the scheduling of these task sets on the one-core and two-core systems. In addition, we add DS for non-real-time tasks and randomly generated a sporadic task. In our second experiment, we set the time unit as one millisecond, and run more realistic and complex experiments by using the SESC simulator. The simulated architecture consists of two cores, each with dedicated 16K L1 data (4-way set associative) and instruction (2-way set associative) caches with random and LRU replacement policies, respectively, and a shared 2048K 8-way set associative on-chip L2 cache with an LRU replacement policy. Each cache has a 64-byte line size.

Figure 6(a) shows the result of our second experiment on a one-core processor. Because of adding the non-real-time tasks, the average deadline miss ratios were raised. The efficiency of our approach is the same as LLF roughly on two-core processor, when the total weight of task sets is greater than 1.2. Figure 6(b) shows the result of our second experiment on a two-core processor. Either for applying to a one-core processor or a two-core processor, the proposed scheduling can reach as high efficacy as LLF in the low overhead situation.

Figure 7 shows the average accept ratio of non-real-time tasks of proposed scheduling on (a) one-core and (b) two-core processor when the server capacity is 1, 2, 4, and 8. The average accept ratio is defined as the number of completed non-real-time tasks divided by total number of non-real-time tasks. From the figure, we can see that one-core and two-core processors have the similar average accept ratios. If the server capacity is 1, since the server capacity is too short, the additional cost between the context switch will be higher than the actual computation time. More and more complex architectures must exacerbate this problem. Therefore, the general multiprocessor systems will also incur such a problem. In this experiment, we use multi-threading to implement all tasks and run on multicore platforms. It has reduced the interference, which is due to the lacking time of tasks context switch.

5. Conclusions and Future Work

We have presented an approach for supporting soft real-time periodic tasks and non-real-time tasks. It used a two-level hierarchical scheduling method that decreases average deadline miss ratio. It can not only maintain the response time of general tasks, but also support the real-time requirements for other tasks. The experimental results show that a higher efficiency can be obtained by applying the proposed scheme on multicore architectures. In addition, the Pfair class of algorithms that allow full migration and fully dynamic priorities have been shown to be theoretically optimal. But, it incurred significant run-time overhead due to their quantum-based scheduling approach.

We know that the performance improved isn’t enough by only modified the scheduling algorithm. We have to use the other ways to enhance performance significantly. There are two kinds of methods that we are going to modify for enhancing the whole system performance.

Besides, for scheduling algorithms, we should consider what to do before scheduling and after

Figure 7. Average accept ratio for proposed scheduling on (a) one-core (b) two-core processor

44

scheduling. For holding the accept ratio of real-time tasks, we can use the schedulability analysis to determine whether a task can be safely added to a particular core before scheduling. If applications need to restore task priorities that have missed deadline, it will deal properly with feedback policy after scheduling. Other modern technology can also help us to improve the performance, especially on the hardware architecture, like SMT, asymmetric multicore platforms, functionally processors, etc. These research directions will be taken into account in the future. 6. References [1] R. Kalla, Balaram Sinharoy, and J. M. Tendler, “IBM Power5 chip: a dual-core multithreaded processor”, IEEE Micro, volume 24, issue 2, Mar-Apr 2004, pp. 40–47. [2] J. Anderson, V. Bud, and U. C. Devi, “An edf-based scheduling algorithm for multiprocessor soft real-time systems”, IEEE ECRTS, July 2005, pp. 199–208. [3] J. Strosnider, J. Lehoczky, and L. Sha, “The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments”, IEEE Transactions on Computers, volume 44, issue 1, 1995, pp. 73–91. [4] S. Baruah, N. Cohen, C. G. Plaxton, and D. Varvel, “Proportionate progress: A notion of fairness in resource allocation”, Algorithmica, volume 15, 1996, pp. 600. [5] J. Anderson and A. Srinivasan, “A new look at pfair priorities”, Technical Report TR00-023, University of North Carolina at Chapel Hill, Sept. 2000. Available at http://uuu.cs.unc.edu/~anderson/papers.html. [6] Hyeonjoong Cho, B. Ravindran, and E. D. Jensen, “An Optimal Real-Time Scheduling Algorithm for Multiprocessors”, proceedings of the IEEE International Real-Time Systems Symposium, 2006, pp. 101–110. [7] E. Bini and G.C. Buttazzo, “Schedulability analysis of periodic fixed priority systems”, IEEE Transactions on Computers, volume 53, issue 11, Nov. 2004, pp. 1462–1473. [8] R. Jain, C. Hughs, and S. V. Adve, “Soft real-time scheduling on simultaneous multithreaded processors”, proeedings of the 23rd Real-Time System Symposium, 2002, pp. 134–145. [9] J. M. Calandrino, D. Baumberger, Tong Li, S. Hahn, and J. H. Anderson, “Soft Real-Time Scheduling on Performance Asymmetric Multicore Platforms”, proceedings of the Real Time and Embedded Technology and Applications Symposium, April 2007, pp. 101–112. [10] F. Cottet, J. Delacroix, C. Kaiser, and Z. Mammeri, Scheduling in Real-Time Systems, John Wiley & Sons, Ltd., 2002, pp. 31–32.

[11] J. Renau, SESC website http://sesc.sourceforge.net. [12] Jack E. Veenstra and Robert J. Fowler, “MINT: a front end for efficient simulation of shared-memory multiprocessors”, Modeling, Analysis, and Simulation of Computer and Telecommunication Systems,” Proceedings of MASCOTS '94, 1994, pp. 201–207.

45

[IEEE 2008 Second International Conference on Future Generation Communication and Networking...

Documents

Transcript of [IEEE 2008 Second International Conference on Future Generation Communication and Networking...