An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task...

28
An Energy-efficient Task Scheduler for Multi-core Platforms with per- core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information Engineering, National Taiwan University You-Cheng Syu, Pangfeng Liu Department of Computer Science and Information Engineering, National Taiwan University Graduate Institute of Networking and Multimedia, Nation Taiwan University Chao-Jui Chang, Jan-Jan Wu Institute of Information Science, Academia Sinica Research Center for Information Technology Innovation, Academia Sinica Po-Wen Cheng, Wei-Te Hsu Information and Communications Research Laboratories, Industrial Technology Research Institute

Transcript of An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task...

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics

Ching-Chi LinInstitute of Information Science, Academia Sinica

Department of Computer Science and Information Engineering, National Taiwan University

You-Cheng Syu, Pangfeng LiuDepartment of Computer Science and Information Engineering, National Taiwan University

Graduate Institute of Networking and Multimedia, Nation Taiwan University

Chao-Jui Chang, Jan-Jan WuInstitute of Information Science, Academia Sinica

Research Center for Information Technology Innovation, Academia Sinica

Po-Wen Cheng, Wei-Te HsuInformation and Communications Research Laboratories, Industrial Technology Research Institute

IntroductionModern processors support DVFS

on a per-core basis.◦Dynamic Voltage and Frequency

Scaling(DVFS)

For the same core, increasing computing power means higher power consumption.

ChallengeFind a good balance between

performance and power consumption.

Two ScenariosBatch mode

◦A set of computation-intensive tasks with the same arrival time.

Online mode◦Two types of tasks with different

priorities. Interactive and non-interactive

◦Tasks can arrive at any time.

Example: Judge SystemOnline mode

◦Users submit their code/answers, and wait for their scores. Interactive: user requests, such as score

querying Non-interactive: processing user

submissions.

Batch mode◦Re-judge and validate all submitted

code/answers.

Our ContributionPresent task scheduling

strategies that solves three important issues simultaneously.◦The assignment of tasks to cores◦The execution order of tasks on a

core◦The processing frequency for the

execution of each task.

Our Contribution(Cont.)For batch mode, we propose

Workload Based Greedy algorithm.

For online mode, we propose Least Marginal Cost heuristic.

ModelsTask Model

◦Assume the number of CPU cycles required to complete a task, Lk, is known.

◦The arrival time of a task batch mode: 0. online mode: known.

Models(Cont.)Processing frequency

◦Only a set of discrete processing frequencies, pi, is available.

◦The core frequency remains the same while executing a task.

Models(cont.)Power and Performance

◦For a task jk

◦E(pk) and T(pk) are the energy and time required to execute one cycle with frequency pk.

)(

)(

kkk

kkk

pTLt

pELe

Task Scheduling in Batch ModeTwo categories:

◦Tasks with deadline◦Tasks without deadline

Two environments:◦Single core◦Multi-core

Four combinations in total.

Tasks with Deadline[Objective] Every task must meet

its deadline, and the overall energy consumption is less than E*.

An NP-Complete problem on both single and multi-core platform.◦Reduce the Partition problem.

Tasks without Deadline[Objective] Minimize the cost

function C

◦Re : the cost of a joule of energy

◦Rt : the cost of a second

})()({

C

11

1,.

1

k

iiit

n

kkke

n

ktimekenergyk

n

kk

pTLRpELR

CC

C

Tasks without Deadline: Single CoreRewrite cost function C into

Minimize C(k, pk) for every task in order to minimize C.

Define C(k) = min{C(k, pk)}◦C(k) is a non-increasing function of k.

n

kkk LpkC

1

),(C

Minimizing the Cost Since and C(k) is

non-increasing.The tasks are in non-decreasing

order of Lk in an optimal solution.Choose pk for each sorted task

with the minimum C(k, pk).

n

kkLkC

1

)(min(C)

Tasks without Deadline: Multi-CoreTwo cases

◦Homogeneous multi-core Same T and E for every cores.

◦Heterogeneous multi-core Different T and E.

Same idea◦Minimize total cost by minimizing

C(k) for every task on all cores.

Workload Based GreedySort the tasks according to Lk in

descending order.Start from the task with largest Lk

◦Find k on core j with min Cj(k) among all cores, and assign the task to the corresponding position.

◦Compute pk for the task.

Repeat until all tasks are scheduled.

J2 J3 …

Sorted Tasks(in descending order)

Workload Based Greedy Example

… Core0

… Core1

… Core2

Execution Order

J1

J1

Task Scheduling in Online Mode[Objective] minimize the total

cost for every time interval during the execution of tasks.◦Time interval: the time between two

consecutive arrival event.

Some AssumptionsTwo categories of tasks:

◦Interactive tasks◦Non-interactive tasks◦Interactive tasks have higher priority

than non-interactive tasksTasks can arrive at any time.Multi-core environment.

Least Marginal CostFor every new arrival task

◦For each core, compute the minimum cost and position of inserting the task.

◦Insert the task to the corresponding position of the core with minimum cost among all cores.

Notice that interactive tasks have higher priority than non-interactive tasks.

EvaluationConduct experiments to compare

the overall cost between our scheduling strategy with the others.

Environment: ◦24 physical servers, each with 4 core

X5460 CPU * 2 with hyperthreading,16 GB memory, and 250 GB disk.

Evaluation: Batch ModeInput: 12 benchmarks from

SPEC2006int◦train and ref inputs

Experimental Results: Batch Mode

Workload Based Greedy(WBG) Opportunistic Load Balancing(OLB) Power-Saving(PS)

◦The total cost reduction is about 27% and 20% to OLB and PS, respectively.

WBG OLB PS 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Time Energy

WBG OLB PS 0

0.2

0.4

0.6

0.8

1

1.2

1.4

Time

WBG OLB PS 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Energy

Evaluation: Online ModeInput: trace from an online

judging system.◦768 non-interactive tasks.◦50,525 interactive tasks.◦Length of trace: half hour.

Experimental Results: Online Mode

Least Marginal Cost(LMC) Opportunistic Load Balancing(OLB) On-Demand(OD)

◦The total cost reduction is about 17% and 24% to OLB and OD, respectively.

LMC OLB OD0

0.2

0.4

0.6

0.8

1

1.2

1.4

Time Energy

LMC OLB OD0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Time

LMC OLB OD0.9

0.95

1

1.05

1.1

1.15

Energy

ConclusionWe propose energy-efficient

scheduling algorithms for multi-core systems with DVFS features.◦For batch mode and online mode.◦The experimental results show

significant cost reductions.

We will integrate our work into our existing judging system.

Questions?