This slide will be uploaded to ƒhiro ... · This slide will be uploaded to ... Classification of...

91
Hiroaki Takada 1 Challenges of Hard Real-Time OS This slide will be uploaded to http://www.ertl.jp/hiro/tmp/100905.pdf

Transcript of This slide will be uploaded to ƒhiro ... · This slide will be uploaded to ... Classification of...

Hiroaki Takada 1

Challenges of Hard Real-Time OS

This slide will be uploaded to http://www.ertl.jp/̃hiro/tmp/100905.pdf

Hiroaki Takada

Self Introduction Hard Real-Time Systems ‒ Main Focus of this Lecture Necessity and Problems of Multiprocessor Systems ▶  Shared resource contentions

Classification of Multiprocessors ▶  Classifications from SW and HW points of view ▶  SMP, FDMP, LCMP

RTOS for Multiprocessors RTOS for FDMP ▶  Necessity, Functionality, Implementation Issues

RTOS for SMP ▶  Functionality, Our Approach

Energy Consumption Optimization Concluding Remarks

Challenges of Hard Real-Time OS

2

Hiroaki Takada 3

Challenges of Hard Real-Time OS

Hiroaki Takada

Current Positions ▶  Professor, Nagoya University ▶  Executive Director, Center for Embedded Computing Systems (NCES), Nagoya University

▶  Chairman, TOPPERS Project and several others

Major Research Topics ▶  Real-time operating systems for embedded systems ▶  Real-time scheduling and analysis ▶  Electronic system-level design of embedded systems ▶  Automotive embedded systems ! Several joint projects with Toyota Motor Corp. and other Japanese automotive industries

Challenges of Hard Real-Time OS

4

Hiroaki Takada

Nagoya ▶  Center city of third largest metropolitan area in Japan ▶ Tokyo (incl. Yokohama), Osaka, Nagoya, …

▶  Located around the center of Japanese Main Island (between Tokyo and Osaka)

▶  Manufacturing industry center of Japan ▶  Automotive industries are concentrated, especially ▶ The headquarters of Toyota Motor Corp. (located in Toyota City) is near to Nagoya.

Nagoya University ▶  National University located in Nagoya City ▶  Within top 10 (I hope top 5!) universities of Japan

Challenges of Hard Real-Time OS

5

Hiroaki Takada

NCES = Nagoya Univ. Center for Embedded Computing Systems

Objectives ▶  To establish a research and educational hub for embedded systems for satisfying strong industrial demands on technologies and human resources.

Scope of NCES Activities ▶  Under the collaborations of academia, industry and/or government, NCES is involved in the following activities related to embedded systems: ▶  applied research aimed at practical use based on the fundamental research at University ▶  development of prototype software ▶  education/training of embedded system engineers

Challenges of Hard Real-Time OS

6

Hiroaki Takada

Projects funded by Industries ▶  OS for in-vehicle multimedia systems (TMC) ▶  Next-generation automotive network (AutoNetworks Technologies, Ltd.)

▶  Analysis and design of real-time task scheduling for automotive integrated control systems (TMC)

▶  Fault-tolerant design support through architecture description of automotive systems (TCRL)

Projects funded by Government ▶  Energy Consumption optimization of embedded systems (JST CREST)

▶  Automotive software platform conforming to the functional safety standard (METI)

Challenges of Hard Real-Time OS

7

Hiroaki Takada

TOPPERS = Toyohashi Open Platform forEmbedded and Real-Time Systems

Objectives of the Project ▶  To develop various open-source software for embedded systems including RTOS and to promote their use. Building a widely used open-source OS as Linux in the area of embedded systems!

Main Activities of the Project ▶  Building a definitive µITRON-conformant RTOS ▶  Developing a next generation RTOS technology ▶  Developing software development technology and tools for embedded systems

▶  Fostering Embedded System Engineers

Challenges of Hard Real-Time OS

8

Hiroaki Takada

! All SW listed below can be downloaded from the TOPPERS website at http://www.toppers.jp/.

TOPPERS/JSP Kernel (JSP = Just Standard Profile) ▶  RTOS conformant to the standard profile of µITRON4.0 specification

TOPPERS/ATK1 (ATK = Automotive Kernel) ▶  RTOS conformant to OSEK/VDX OS specification

TOPPER/ASP Kernel (ASP = Advanced Standard Profile) ▶  Improvement of JSP kernel ▶  Basis of TOPPERS new generation kernels

TOPPERS/FMP Kernel (FMP = Flexible Multiprocessing) ▶  Extension of ASP kernel to various types of multiprocessor systems

Challenges of Hard Real-Time OS

9

Hiroaki Takada

TECS (TOPPERS Embedded Component System) ▶  Specification and tools for component-based development of embedded software.

TINET ▶  Compact TCP/IP protocol stack conformant to ITRON TCP/IP API specification.

▶  Both IPv4 and IPv6 are supported. TLV (TraceLogVisualizer) ▶  Customizable tool to visualize various trace logs, including the trace log of RTOS

Several Open Educational Materials ▶  Educational materials including presentation slides, software, and so on.

Challenges of Hard Real-Time OS

10

Hiroaki Takada

Consumer Applications

11

PM-A970 (EPSON)

DO!KARAOKE (PANASONIC)

IPSiO GX e3300 (Ricoh)

GT-541 (Brother) UA-101 (Roland)

Hiroaki Takada

Industrial and Other Applications

12

Kizashi (SUZUKI)

ASTRO-H (JAXA) ... under development

AP-X (Kyowa MEDIX)

OSP-P200 (Okuma)

DP-350 (Daihen)

Hiroaki Takada

Challenges of Hard Real-Time OS

13

Hiroaki Takada

Definition of Hard Real-Time Systems ▶  If a timing constraint of the system is violated, some catastrophic event (such as loss of life) occurs.

Examples of Hard Real-Time Systems ▶  Automotive control systems ▶ Engine management, Brake control, Steering control, Airbag control, …

▶  Railway control systems ▶  Aerospace applications ▶  Process conrtol systems ▶  Robotics (in Future?) ▶  …

Challenges of Hard Real-Time OS

14

Hiroaki Takada

System Components ▶  control computer (ECU) ▶  many sensors ▶  crank position sensor ▶  air flow meter ▶  intake temperature sensor ▶  throttle sensor

▶  some actuators Basic Functions of the Control System ▶  Calculates fuel injection volume and ignition timing and controls the actuators in every rotation cycle.

Challenges of Hard Real-Time OS

15 Courtesy: Toyota Motor Corp.

Hiroaki Takada

Timing Behavior of Engine Management System ▶  When rotation speed is 6000rpm, one cycle is 20msec.

▶  Timing precision of the ignition is 10μsec. order.

Challenges of Hard Real-Time OS

16 Courtesy: Toyota Motor Corp.

Hiroaki Takada

Required Real-Time Property (Example) ▶  The calculation of the fuel injection volume must be finished before the injection timing.

▶  The calculation of the ignition timing must be finished before the ignition timing.

▶  There is no additional value, even if these calculations finish earlier.

Safety Requirement (Example) ▶  Missing an ignition must not happen, because inflammable gas is emitted outside of the engine and can lead to a fire (because catalyst burns).

▶  If the ignition plug of a cylinder is broken, fuel must not be injected to the cylinder.

Challenges of Hard Real-Time OS

17

Hiroaki Takada

Challenges of Hard Real-Time OS

18

Hiroaki Takada

! Use of more than one processors in an embedded system is NOT new.

Conventional use of multiprocessors in embedded systems ▶  Large-scale & high-performance embedded systems ▶  Subsystems with different requirements (e.g. mechanical control & GUI)

▶  Integration of independently-developed subsystems ▶  Processors embedded in components (e.g. sound chip) ! Use of multiple processors in a system usually increases the product cost, however.

On-chip multiprocessors changed this situation ▶  On-chip multiprocessors can often reduce cost.

19

Challenges of Hard Real-Time OS

Hiroaki Takada

to achieve both high performance and low power (or energy consumption) simultaneously

Limitation of performance improvement of (single) processor Higher performance processor is less energy efficient.

▶  Clock frequency improvement relied on transistor scaling so far, but we cannot expect much in future. ▶  signal integrity, process variation, leakage power, etc.

▶  Exploiting more ILP (instruction-level parallelism) results in increasing power consumption. ▶  deep pipelines, wide issues, out-of-order execution, branch prediction, register renaming, speculation, etc.

▶  Larger number of slower processors can achieve higher energy efficiency.

Challenges of Hard Real-Time OS

20

Hiroaki Takada

Energy Efficiency is Getting Worse

Challenges of Hard Real-Time OS

21

Source: Chris Rowen, “The Reinvention of the Microprocessor for MPSOC,” MPSoC 2006.

Hiroaki Takada

Software development becomes difficult ▶  Programming model is different. ▶ Parallel programming is much more difficult than sequential programming.

▶  Reuse of existing software is not easy. ▶ Realization of mutual exclusion is different.

! Software engineers do not like to use multiprocessors. Inherent Unpredictability

serious difficulty for hard real-time systems ▶  Shared resource contentions ▶  Cache misses ▶  cache memory is MUST for SMP (Symmetric Multiprocessor)

Challenges of Hard Real-Time OS

22

Hiroaki Takada

Example 1: Contention for shared bus (or memory)

Assumptions ▶  10ns for each memory access ▶  round robin arbitration ▶  The task accesses the shared memory 10000 times during an execution and finishes execution in 1ms without any shared bus contention.

▶  WCET (worst case execution time) considering shared bus contentions is 1ms+10ns×(N-1)×10000. ▶  1.1ms if N=2 ▶  1.3ms if N=4 … acceptable or too pessimistic?

Challenges of Hard Real-Time OS

23

processor 1 processor 2 processor N

shared memory

shared bus

Hiroaki Takada

Example 2: Contention for spinlock (or shared variable)

Assumptions ▶  The task locks the spinlock 10 times during an execution (for accessing the shared variable) and finishes execution in 1ms without any spin lock contention. ▶  The spinlock is locked for 10µs. ▶  FCFS-ordered spinlock

▶  WCET considering spinlock contentions is 1ms+10µs×(N-1)×10. ▶  1.1ms if N=2 ▶  1.3ms if N=4 … same with Example 1

Challenges of Hard Real-Time OS

24

processor 1 processor 2 processor N …

spinlock shared variable

Hiroaki Takada

Distribution of Execution Times

Challenges of Hard Real-Time OS

25

prob

ability de

nsity P

exec. time t

probability that the exec. time of the program is t

min exec. time average exec. time

analyzed max exec. time true max exec. time cannot be known

can be unbounded

If this difference is large, the analysis is said to be pessimistic.

Hiroaki Takada

p-reliable WCET (Probabilistic Real-Time Guarantee)

Challenges of Hard Real-Time OS

26

exec. time t

probability that the exec. time of the program is less than t (in logarithmic axis)

min exec. time average exec. time

analyzed max exec. time true max exec. time

p-reliable max. exec time

(1 -∫P

) in loga

rithm

ic axis

1 0.1 10-2

10-3 required reliability p

Hiroaki Takada

Exec. Time Distributions of the Example 1 & 2

Challenges of Hard Real-Time OS

27

prob

ability de

nsity P

exec. time t

Spinlock contentions (Example 2)

min exec. time average exec. time

analyzed max exec. time true max exec. time

Shared bus contentions (Example 1)

Hiroaki Takada

Performance Metrics for Hard Real-Time Systems ▶  Appropriate performance metric depends on the application.

When Analyzed WCET is Used ▶  Accept the pessimism or decrease shared resource accesses.

When p-reliable WCET is Used ▶  Shared bus contentions can be handled with probabilistic real-time guarantee.

▶  Spinlock contentions again depend on the application. → discussed in detail, later

Challenges of Hard Real-Time OS

28

Hiroaki Takada

Challenges of Hard Real-Time OS

29

Hiroaki Takada

Kind of Processors ▶  Homogeneous ▶  Heterogeneous ▶ Mild … common basic instruction set + extensions ▶  Strong … different instruction set

▶  Multi-threading Interconnections among Processors ▶  Shared memory (tightly-coupled) ▶  Symmetric or UMA (uniform memory access time) ▶  Distributed or NUMA (non-uniform memory access time)

▶  Message passing (loosely-coupled, distributed system) SMP and AMP ▶  SMP (symmetric multiprocessor): homogeneous and symmetric shared memory

▶  AMP (asymmetric multiprocessor) or ASMP: others

Challenges of Hard Real-Time OS

30

Hiroaki Takada

Shared memory (tightly-coupled) ▶  Symmetric (SMP) ▶  Each processor has the same role. ▶  Each task can be executed on any processor. ▶  Tasks are (statically or dynamically) distributed to processors to obtain maximum performance (load balancing).

▶  Function-distributed (FDMP) ▶  Processors have different roles. ▶  Each task is fixed to a processor and cannot migrate to others.

Message passing (loosely-coupled, distributed system) ▶  Symmetric ▶  Function-distributed

Challenges of Hard Real-Time OS

31

Hiroaki Takada

Definition (again) and Characteristics ▶  In HW, each processor can symmetrically access (almost) all resources.

▶  In SW, each task can be executed on any processor. ▶ HW SMP is necessary to realize SW SMP. ▶ SW FDMP (function-distribution) can be implemented on HW SMP.

▶  Multi-threading processor is basically same with SMP from SW point of view. ▶ Except that simultaneous multi-threading (SMT) is different from SMP when considering the affinity among tasks.

Challenges of Hard Real-Time OS

32

Hiroaki Takada

SMP Example: ARM MPCORE

Challenges of Hard Real-Time OS

33

Source: Web site of ARM

interrupt controller for distributing interrupt among processors

up to 4 ARM11 processor + L1 cache

peripherals (timer, etc.) for each processor

cache coherence controller

Hiroaki Takada

Application Area and Advantages ▶  Widely used for PC and servers. ▶  Effective, when workload is changed dynamically. ▶  Performance can be raised to some extent, without tuning of HW and SW. As the result, reusability of HW and SW is high.

▶  Chip can be general-purpose (advantage of HW SMP).

Disadvantages ▶  Pessimistic analyzed WCET, because most resources are shared among processors.

▶  High cost and energy consumption, because coherent cache and high speed interconnect are necessary to obtain high performance.

Challenges of Hard Real-Time OS

34

Hiroaki Takada

Definition (again) ▶  Processors have different roles. ▶  Each task is fixed to a processor.

Characteristics ▶  HW architecture can be optimized considering the role of each processor. Optimization result is often an AMP architecture. ▶ Memory and peripherals are connected to the local bus of a processor. ▶ Acceleration HW unit or coprocessor is added to a processor. ▶ Suitable type of processor to its role is used.

Challenges of Hard Real-Time OS

35

Hiroaki Takada

FDMP Example 1: Toshiba MPEG2 Codec LSI ▶  6 MeP (Media embedded processor) cores with different acceleration HW units are used.

Challenges of Hard Real-Time OS

36

Source: Web site of Toshiba

Motion Estimation MeP Module

Bit-stream Process MeP Module

Video Filter MeP Module

Audio CODEC MeP Module

Video CODEC MeP Module System Control MeP Module

MeP Core MeP Core MeP Core

MeP Core MeP Core MeP Core

DCT/Q/MC HWE

SIMD UCI

VLC DSP

Audio VLIW Coprocessor

Video Filter HWE

Bit-stream HWE

Block Match HWE

LSI

Global Bus

SDRAMC

Host IF

JTAGDebug IF

Hiroaki Takada

FDMP Example 2: TI OMAP 1710 ▶  1 general-purpose processor (ARM926) and 1 DSP (TMS320C55x) are used.

Challenges of Hard Real-Time OS

37

Source: Web site of TI

Hiroaki Takada

Advantages ▶  Analyzed WCET is tight (less pessimistic), because shared resources are limited.

▶  Low energy consumption, because coherent cache and high speed interconnect can be omitted.

Disadvantages ▶  Tuning of both HW and SW is necessary to obtain the above advantages. This results in lower reusability.

▶  Software designer must assign tasks to processors. ▶  Dynamic load balancing is not easy (but possible). ▶  Chip becomes special-purpose (disadvantage of HW AMP).

Challenges of Hard Real-Time OS

38

Hiroaki Takada

Definition (again) and Characteristics ▶  In HW, no shared memory among processors. ▶  In SW, only message communication is used among processors. ▶ When a task accesses a resource attached to another processor, an access request is sent to the processor. ▶ Even if HW has a shared memory, when the shared memory is used only for implementing message passing, the system is LCMP from SW point of view.

▶  Latency of inter-processor communication is longer than tightly-coupled multiprocessor. Bandwidth is not necessarily narrow.

Challenges of Hard Real-Time OS

39

Hiroaki Takada

OS Support for LCMP ▶  OS supporting LCMP can be classified as a distributed OS.

▶  Distributed OS was intensively studied, but was not successful. Why? ▶ Under long communication latency, a larger granularity request can achieve higher performance. ▶ OS system call request is too small.

▶  Implementing remote requests in upper layer is successful. ▶ RPC (remote procedure call) ▶ Distributed object framework (CORBA, SOAP)

! Exclude from the scope of the following discussions.

Challenges of Hard Real-Time OS

40

Hiroaki Takada

Challenges of Hard Real-Time OS

41

Hiroaki Takada

Ideal RTOS for Multiprocessor ▶  From application SW point of view, it is ideal that existing application software running on uniprocessor RTOS can run on multiprocessor RTOS without modification.

▶  In other words, multiprocessor RTOS should be compatible with uniprocessor RTOS, except that more than one tasks are executed in parallel.

Difficulty ▶  Parallel execution of tasks is a great difference! ▶  Many existing application SW does not work under parallel execution.

Challenges of Hard Real-Time OS

42

Hiroaki Takada

General-Purpose OS ▶  Most general-purpose OS, such as Windows and Linux, support SMP.

▶  These OS take care of load balancing. ▶  Though there are real-time extensions of these OS, they are not suitable for hard real-time systems.

RTEMS (Real-Time Executive for Multiprocessor Systems) ▶  RTEMS supports multiprocessor systems as its recent name shows.

▶  Only FDMP is supported (no task migration). ▶  Implemented with remote invocation method (explained in the next section).

Challenges of Hard Real-Time OS

43

Hiroaki Takada

TOPPERS/FMP Kernel ▶  TOPPERS/FMP kernel supports both FDMP and SMP.

▶  Task migration is supported (explained later). ▶  Implemented with direct access method.

AUTOSAR OS Specification ▶  Multiprocessor extension is included in the latest release (Release 4.0).

▶  Only FDMP is supported (no task migration) ▶  Can be Implemented both with remote invocation method and with direct access method.

Others ▶  Many commercial OS, including VxWorks, OSE, and Thread-X, now support multiprocessor systems.

Challenges of Hard Real-Time OS

44

Hiroaki Takada

Challenges of Hard Real-Time OS

45

Hiroaki Takada

Using uniprocessor RTOS for FDMP design ▶  An FDMP system can be designed using a uniprocessor RTOS on each processor.

▶  In this case, communication with another processor is implemented in application level.

▶  A problem of this approach: ▶ When some task is moved to another processor in design time, inter-processor communication is necessary to be re-designed and re-implemented. ▶  SW reusability is low.

With FDMP RTOS … ▶  Tasks on different processors can communicate with normal RTOS API.

▶  Existing SW for uniprocessor is easier to port to multiprocessor.

Challenges of Hard Real-Time OS

46

Hiroaki Takada

Object Management Concept ▶  Each OS object (task, semaphore, mailbox, …) belongs to one of the processors.

▶  Task is executed only by the processor to which it belongs.

▶  All tasks can access all OS objects with same API.

Challenges of Hard Real-Time OS

47

I/O processor 1

local memory

tasks

synchronization/communication objects

I/O processor 2

local memory

tasks

Hiroaki Takada

Task Scheduling ▶  Tasks are scheduled independently on each processor with the same scheduling algorithm with uniprocessor RTOS.

▶  No dynamic task migration. → The technique to realize mutual exclusion among tasks by

raising priority (typically, priority ceiling protocol) cannot be used across processor boundaries.

Interrupt Processing ▶  Each interrupt is assigned to one of the processors. ▶  The interrupt handler is executed by the processor to which the interrupt is assigned.

▶  Function to disable interrupts is valid only within the processor.

→ Mutual exclusion among tasks on different processors cannot be realized by disabling interrupts.

Challenges of Hard Real-Time OS

48

Hiroaki Takada

Mutual Exclusion by Disabling Interrupts ▶  Realizing mutual exclusion (among tasks and interrupt handlers) by disabling interrupts is widely used in existing application SW for uniprocessors. ▶  This technique cannot be used across processor boundaries.

▶  Function to disable interrupts on all processors is not a solution, because … ▶ When a processor disables interrupts on all processors, another processor is possibly executing an interrupt handler! ▶  In order to realize mutual exclusion among processors, a processor must wait until the other processors complete the execution of interrupt handlers, in addition to disable interrupts on all processors.

Challenges of Hard Real-Time OS

49

Hiroaki Takada

How Application SW Realizes Mutual Exclusion? ▶  Mutual exclusion among tasks ▶ With semaphore or mutex function of RTOS, mutual exclusion among any tasks can be realized. ▶ With the functions to disable interrupt and to temporarily raise priority, only mutual exclusion among tasks on the same processor can be realized.

▶  Mutual exclusion between a task and an interrupt handler ▶ With the function to disable interrupt, mutual exclusion between a task and an interrupt handler on the same processor can be realized. ▶  In order to support mutual exclusion between a task and an interrupt handler on different processors, new function (such as application-level spinlock) is necessary.

Challenges of Hard Real-Time OS

50

Hiroaki Takada

Implementation Policy ▶  To exploit the advantage of FDMP, as many task as possible should be executed within a processor (without accessing remote resources).

→ OS-internal data structures (such as object management blocks and ready queues) should be prepared for each processor.

Challenges of Hard Real-Time OS

51

I/O processor 1

local memory

tasks

synchronization/communication objects

I/O processor 2

local memory

tasks

Hiroaki Takada

Two Implementation Methods ▶  Operations on a remote object can be implemented with two methods. ▶ Direct access method ▶ Remote invocation method

Direct Access Method ▶  Operations on a remote object are realized by directly accessing the object management block located on a local shared memory. ▶ Mutual exclusion among processors is necessary to avoid parallel access to the object management block. ▶ When task switching is necessary on the target processor, an interrupt request is sent to the processor (called IPI: inter-processor interrupt).

Challenges of Hard Real-Time OS

52

Hiroaki Takada

Remote Invocation Method ▶  Operations on a remote object are realized by requesting the operations to the processor to which the object belongs. ▶ Whenever an operation is requested to a processor, an interrupt request is sent to the processor. ▶ The requesting processor must wait for the completion of the operation to obtain the result.

Direct Access vs. Remote Invocation ▶  When shared memory access latency is short, direct access method is advantageous.

▶  Remote invocation can be realized as a middleware. ! We focus on direct access method, later.

Challenges of Hard Real-Time OS

53

Hiroaki Takada

Necessity of Spinlocks ▶  When a processor accesses OS-internal data structures within OS, spinlocks are usually used for realizing mutual exclusion. ▶ Spinlock: busy waiting lock for mutual exclusion (a task repeatedly checks the lock in a loop and waits until the lock becomes available) ▶ Because blocking locks (semaphore, mutex, …) are realized with OS, they cannot be used for implementing the OS itself!

Challenges of Hard Real-Time OS

54

Hiroaki Takada

Test&Set Lock: A Typical Spinlock Algorithm ▶  Simplest version using atomic test&set instruction:

▶  Simple and widely used. ▶  Not appropriate for hard real-time system, because the time until a processor can acquire a lock is unbounded.

Controlling the Lock Acquisition Order ▶  In hard real-time systems, the order in which a processor acquires the lock must be controlled. ▶ Ticket-based spinlock algorithms ▶ Queue-based spinlock algorithms

Challenges of Hard Real-Time OS

55

while  (test&set(L)  ==  Locked)  {  /*  spin  */  }  /*  cri9cal  sec9on  */  L  =  Released;  

Hiroaki Takada

Queue-based Spinlock Algorithms ▶  Processors waiting for a lock make a queue. ▶  Many variations. ▶ MCS lock, …

▶  The queue can be FCFS order or priority order, depending on the algorithm.

▶  Some algorithms support dequeueing (called timeout or preemption)

Spinlock Waiting Time ▶  A processor is just spinning while waiting for a spinlock.

▶  The spinlock waiting time is completely wasted, thus should be shorten. More harmful than priority inversion!

Challenges of Hard Real-Time OS

56

Hiroaki Takada

Giant Lock ▶  All data structures are guarded with a single lock. ▶  Only one processor can execute OS at the same time. ▶  Chance of parallel execution is limited.

▶  Can be effective, when the number of processors is small. Fine Grained Lock ▶  Data structures are divided into some classes and each class is guarded with one lock. ▶  The average spinlock waiting time is decreased.

▶  However, the analyzed maximum spinlock waiting time cannot be decreased (and is often increased).

▶  Depending on OS functionality and lock unit design, two or more spinlocks need to be obtained in a nested manner (nested spinlock). ▶  This increases the worst-case execution time. ▶  Deadlock avoidance may be necessary.

Challenges of Hard Real-Time OS

57

Hiroaki Takada

Inter- and Intra Processor Synchronizations ▶  OS-internal data structures must be guarded both from other tasks on the same processor (intra-processor sync.) and from tasks on other processors (inter-processor sync.)

▶  Inter-processor sync. is realized by acquiring a spinlock. ▶  Intra-processor sync. is realized by disabling interrupts.

Spinlock Acquisition vs. Disabling Interrupts ▶  Spinlock acquisition and disabling interrupt should be atomic, because … ▶  If a spinlock is acquired first, other processors wastefully wait for the spinlock during an interrupt service. ▶  If interrupts are disabled first, the worst-case interrupt latency becomes long.

Challenges of Hard Real-Time OS

58

Hiroaki Takada

Spinlock with Preemption ▶  Test&set lock can be easily modified to handle this situation.

▶  Queue-based spinlocks can also be modified (queue-based spin lock with preemption/timeout).

Challenges of Hard Real-Time OS

59

disable_interrupts;  while  (test&set(L)  ==  Locked)  {  

 if  (interrupt_requested)  {      enable_interrupt;      /*  interrupts  are  serviced  here.  */      disable_interrupt;    }  

}  /*  cri9cal  sec9on  */  L  =  Released;  enable_interrupts;  

Hiroaki Takada

Nested Spinlock with Preemption ▶  When two spinlocks are necessary to be acquired in a nested manner, the solution becomes more complex.

Challenges of Hard Real-Time OS

60

retry:    disable_interrupts;    while  (test&set(L1)  ==  Locked)      /*  same  with  the  previous  slide  */    while  (test&set(L2)  ==  Locked)  {      if  (interrupt_requested)  {        L1  =  RELEASED;          /*  L1  should  be  released  */        enable_interrupt;        /*  interrupts  are  serviced  here.  */        disable_interrupt;        goto  retry;      }    }    /*  cri9cal  sec9on  */    L2  =  Released;  L1  =  RELEASED;    enable_interrupts;  

Hiroaki Takada

Is this Retry OK for Hard Real-Time Systems? ▶  Unbounded number of retry is usually harmful for hard real-time systems.

▶  In this case, however, by adding the overhead of the retry to the WCET of the interrupt handler, the schedulability test is possible.

Challenges of Hard Real-Time OS

61

Hiroaki Takada

Queue-Based Nested Spinlock with Preemption ▶  Queue-based nested spinlock with preemption raises several new problems. ▶ With simple application of FCFS-order spinlocks, the worst-case waiting time becomes O(nm), where n is the number of processors and m is the maximum nest level. Totally FCFS approach and priority-inheritance spinlock are necessary to solve this. ▶ When an interrupt is serviced during spinning, the order within the queue should be canceled or preserved? ▶ When a spinlock is necessary to be acquired within the interrupt handler, how to handle it?

▶  A solution exists, but is very complex. The overhead is also large. ▶ We are trying to implement this with HW.

Challenges of Hard Real-Time OS

62

Hiroaki Takada

▶  When FDMP RTOS is implemented with remote invocation method, spinlock is not necessary because there are no shared data structure.

▶  The intra- vs. inter-processor problem also exists in a different form as follows. ▶ With remote invocation method, requests from other processors are notified as interrupts (IPI). ▶ Which should be assigned higher priority: IPI or local interrupts from I/O devices? ▶  If IPI is given a higher priority, the worst-case interrupt latency to local interrupts becomes long. ▶  If local interrupts are given higher priorities, requesting processors wastefully wait for local interrupt handlings.

▶  This problem is more difficult to solve. ▶  Application-level solution will be necessary.

Challenges of Hard Real-Time OS

63

Hiroaki Takada

Challenges of Hard Real-Time OS

64

Hiroaki Takada

Object Management Concept ▶  All OS objects (task, semaphore, mailbox, …) are global (do not belong to any processor).

▶  OS dynamically determines on which processor a task is executed.

Challenges of Hard Real-Time OS

65

I/O

processor 1

global memory

tasks

synchronization/communication objects

processor 2 processor n …

Hiroaki Takada

Task Scheduling ▶  A straight-forward extension of priority-based scheduling is to always execute top n high priority tasks. n : the number of processors

▶  The function to bind a task to a processor (processor affinity) is very useful.

▶  This extension is not good, because … ▶  Poor schedulability. ▶  Too many task migrations can happen.

Interrupt Processing ‒ Two Approaches ▶  Each interrupt is statically assigned to a processor. ▶  The processor serving an interrupt request is dynamically determined. ▶  The processor that accepts the interrupt first. ▶  The processor that executes lowest priority task.

Challenges of Hard Real-Time OS

66

Hiroaki Takada

An Example of Too Many Task Migrations ▶  Suppose the following task set consisting of 4 tasks.

▶  When task A is activated, task C is migrated to processor 2. When task B is activated task C goes back to processor 1.

▶  Static task allocation can be better.

Challenges of Hard Real-Time OS

67

processor 1 processor 2 high priority task A bound to processor 1

high priority task B bound to processor 2

middle priority unbound task C

low priority unbound task D

activated frequently and finishes in short

activated frequently and finishes in short

Hiroaki Takada

Basic Concept ▶  Task migration is introduced to FDMP RTOS. ▶  Similar approach with Linux 2.6.

Object Management Concept (same with FDMP RTOS) ▶  Each OS object belongs to one of the processors. ▶  Task is executed by the processor to which it belongs.

Task Migration Support ▶  New service calls to move a task to another processor are introduced.

▶  Policy-mechanism separation ▶ Migration mechanism: supported by OS ▶ Migration policy: up to application

Challenges of Hard Real-Time OS

68

Hiroaki Takada

Introduced Service Calls ▶  mig_tsk(ID tskid, ID prcid) ▶ Migrate the task specified with tskid to the processor specified with prcid. ▶ Our implementation has a restriction that only the tasks that are assigned to the same processor with the calling task can be migrated.

▶  mact_tsk(ID tskid, ID prcid) ▶ Activate the task specified with tskid on the processor specified with prcid. ▶ Useful service call, when a task is periodically activated and the execution time of each activation is short.

Challenges of Hard Real-Time OS

69

Hiroaki Takada

Dynamic Migration ▶  The workload of each processor is measured in runtime. ▶  A task is selected from a high workload processor and is migrated to a low workload processor.

▶  Because guarantee of real-time constraints is difficult, this is suitable for soft real-time tasks (hard real-time system often includes soft real-time tasks).

▶  How to determine the workload of a processor is the issue. Planned Migration ▶  The best allocation of tasks to processors for each operation mode of the system is determined in design time.

▶  In runtime, tasks are migrated by the application SW when the operation mode is changed.

▶  Easier to guarantee real-time constraints.

Challenges of Hard Real-Time OS

70

Hiroaki Takada

Approach in Linux 2.6 ▶  Load average (average number of active tasks) of a processor is thought to be the workload of the processor.

Our Approach under Investigation ▶  Each target task reports its progress to the OS. ▶  In a multimedia application, the number of generated frames but not consumed yet can be defined to be its progress.

▶  Average progress of the target tasks assigned to a processor is thought to be the workload.

▶  Better than the Linux approach when the progress of a task can be defined appropriately.

Challenges of Hard Real-Time OS

71

Hiroaki Takada

Challenges of Hard Real-Time OS

72

Hiroaki Takada

Project Title ▶  HW/SW co-optimization for low energy embedded systems

Project Members ▶  Nagoya Univ. (H. Takada) ▶  Kyushu Univ. (T. Ishihara) ▶  Toshiba (T. Fukaya) ▶  Ritsumeikan Univ. (H. Tomiyama)

Funding and Period ▶  The project is funded by the CREST program of JST (Japan Science and Technology Agency)

▶  From Oct. 2005 to Mar. 2011

Challenges of Hard Real-Time OS

73

Hiroaki Takada

Basic Directions and Approaches ▶  Target is embedded systems. ▶ We assume that application is known. ▶ The characteristics of application are fully exploited.

▶  Co-optimization across design layer boundaries. ▶ Rooms for optimization become large through the co-optimization from SW design layer to HW circuit layer.

▶  Minimize energy consumption while guaranteeing required QoS (performance, reliability, …). ▶  In more precise, minimize energy consumption while guaranteeing real-time constraints.

Most Important Concept ▶  Dynamic Energy/Performance Scaling (DEPS)

Challenges of Hard Real-Time OS

74

Hiroaki Takada

What is DEPS? ▶  Control the system to work under the optimal tradeoff point of energy and performance. ▶  In our project, use the slowest processor or slowest processor configuration with which real-time constraints can be met.

▶  A generalization of DVFS (Dynamic Voltage/Frequency Scaling)

Motivation ▶  The operational voltage of recent processors is too low to take the advantage of DVFS.

▶  Higher performance processor becomes more complicated resulting in worse energy efficiency.

Challenges of Hard Real-Time OS

75

Hiroaki Takada

Limitation of DVFS (Dynamic Voltage/Frequency Scaling) ▶  DVFS dynamically changes the operation voltage and frequency of a processor depending on how busy the processor is.

▶  DVFS was an effective approach to reduce energy consumption.

▶  The operational voltage of recent processors is very low, there is little room to lower the voltage dynamically.

Concept of DEPS ▶  A generalization of DVFS. ▶  DEPS uses other control parameters in addition to voltage/frequency.

Challenges of Hard Real-Time OS

76

Hiroaki Takada

Rationale behind DEPS ▶  Higher performance processor is less energy efficient, because of ▶  large cache, deep pipelines, wide issues, out-of-order execution, branch prediction, …

▶  By turning off these features when the processor is not busy, energy consumption can be reduced.

How DEPS Works? ▶  DEPS controls the system to work under the optimal tradeoff point of energy and performance.

▶  In our project, DEPS switches to the slowest processor configuration (or slowest processor) with which real-time constraints can be met.

Challenges of Hard Real-Time OS

77

Hiroaki Takada

Theoretical Difference between DVFS and DEPS ▶  In DVFS, the execution time of a task can be estimated from the frequency.

▶  In DEPS, the relationship between execution time and processor configuration (cache size, pipeline depth, … in addition to frequency) depends on the characteristics of the task.

Challenges of Hard Real-Time OS

78

Energy

Execution timeTask 2 ( period = P 2 )

Task 1 ( period = P 1 )

_ C21(T21,E21)

_ C22(T22,E22)

_ C23(T23,E23)

_ C11(T11,E11)

_ C12(T12,E12)

_ C13(T13,E13)

_ C14(T14,E14)

Cij (Tij, Eij)•Cij DEPS configuration j of task i•Tij worst case exe. time under Cij•Eij energy consumption during Tij

(1/Performance)

Hiroaki Takada

What is DEPS-Ready IP Core? ▶  An IP core (incl. processor) which has several configurations with different energy/performance tradeoff and can dynamically switches among them.

▶  Or a set of IP cores which have the same instruction set but have different energy/performance tradeoff. e.g. a pair of ARM7 and ARM11 processors

Multi-Performance Processor (MPP) ▶  Our DEPS-ready processor core. ▶  MPP has two processing element with different voltage and frequency and dynamically switches between them quickly.

▶  Cache size can be changed dynamically.

Challenges of Hard Real-Time OS

79

Hiroaki Takada

Multi MPP Core Chip under Development ▶  An SMP consisting of 3 MPP Cores ▶  Frequency is low because an old 0.18µm process is used.

Challenges of Hard Real-Time OS

80

DMAC

PE-M 30MHz @1.0V

8KB

4KB

Tag 2KB 2KB 2KB 2KB

4KB

PE-H 60MHz @1.8V

8KB

BUSIF DMAC

PE-M 30MHz @1.0V

8KB

4KB

Tag 2KB 2KB 2KB 2KB

4KB

PE-H 60MHz @1.8V

8KB

BUSIF DMAC

PE-M 30MHz @1.0V

8KB

4KB

Tag 2KB 2KB 2KB 2KB

4KB

PE-H 60MHz @1.8V

8KB

BUSIF

AMBA AHB 30MHz

The number of cache ways can be changed

dynamically.

Processing element with high voltage and high frequency

Processing element with low voltage and

low frequency

Hiroaki Takada

Assumption ▶  Existence of DEPS-ready IP core

Target System ▶  Hard real-time embedded systems ▶  A set of independent periodic tasks or sporadic tasks (aperiodic tasks with minimal inter-arrival separation)

▶  Priority-base scheduling with static priority assignment Optimization Policy ▶  Constraint: guarantee that all tasks complete within their deadlines

▶  Goal: minimize energy consumption

Challenges of Hard Real-Time OS

81

Hiroaki Takada

Three Stage Optimizations ▶  Intra-task optimization ▶ Characteristics of each task is extracted by analyzing the execution trace obtained by instruction-set level simulation.

▶  Inter-task optimization ▶ Each task is allocated to a processor and assigned an execution time budget.

▶  Runtime optimization ▶ RTOS dynamic calculates the slack time and determines the optimal (not necessarily best) processor configuration.

Challenges of Hard Real-Time OS

82

Hiroaki Takada

Input ▶  Source program of a task ▶  Input data for the task with weight (frequently executed input data has to have a large weight)

▶  Information on processor configurations What is done in this Stage ▶  Check points are inserted to the task code for intra-task DEPS. ▶  Processor configuration can be changed at each check point. ▶  Check points are determined by analyzing the execution trace of the task (called execution trace mining)

▶  DEPS profile of the task is generated.

Challenges of Hard Real-Time OS

83

Hiroaki Takada

Where to Insert Check Points? ▶  Points at which characteristics of the task is greatly changed. ▶ For example, boundaries of processor-bound section and memory-bound section of the task.

▶  Points at which remaining worst-case execution time (RWCET) can be predicted more precisely. ▶ Typically, after a conditional branch such that RWCET when branch is taken and that when not-taken have large difference.

▶  If a task can execute for long without passing a check point, a check point should be inserted to avoid this case.

Challenges of Hard Real-Time OS

84

Hiroaki Takada

DEPS Profile ▶  list of all effective combi-nations of configurations

▶  remaining worst-case execution time (RWCET) under each combination

▶  remaining average energy consumption (RAEC) under each combination

Challenges of Hard Real-Time OS

85

CP0 CP1 CP2 CP3 RWCET RAEC config2 config2 config1 config4 23.7 433.2 config3 config3 config1 config4 28.3 345.1 config3 config4 config2 config6 32.1 301.5 config4 config4 config2 config6 35.2 273.8 config6 config6 config4 config6 45.1 205.2

DEPS Profile of a Task (Image)

CP1

CP2

CP3

100%

10%

90%

CP0 EXIT

START

Control Flow and Check Points of an Example Task

Hiroaki Takada

Input ▶  Task set information (cycle, deadline, priority, …) ▶  DEPS profile of each task ▶  Information on processor configurations

What is done in this Stage ▶  Tasks are allocated to processors. ▶  Basically, workload should be balanced to reduce energy consumption.

▶  Execution time budgets are assigned to tasks. ▶  In other words, default combination of configurations are determined for each task. ▶  Budgets are assigned to minimize energy consumption.

▶  DEPS management table for each check point is generated.

Challenges of Hard Real-Time OS

86

Hiroaki Takada

DEPS Management Table ▶  all effective configurations for each check point ▶  remaining worst-case execution time (RWCET) under each configuration

Challenges of Hard Real-Time OS

87

CP3 RWCET config4 13.6 config6 19.4

CP1

CP2

CP3

100%

10%

90%

CP0 EXIT

START

DEPS Management Table for each Check Point

CP0 RWCET config2 23.7 config3 32.1 config4 35.2 config6 45.1 CP1 RWCET

config2 18.7 config3 22.7 config4 27.8 config6 36.5

CP2 RWCET config1 14.6 config2 17.4 config4 20.8

Hiroaki Takada

RTOS Supporting DEPS ▶  Multiprocessor RTOS with following extensions ▶  Slack time calculation ▶ Slack time (the maximum time that can be wasted without missing any deadline) is calculated in run time (approximation is used).

▶  Configuration determination and switch ▶ For each check point, RTOS determines the configuration from the calculated slack time and DEPS management table. ▶ Slack time is consumed greedily. ▶ RTOS switches the configuration at each check point and at each task switching.

Challenges of Hard Real-Time OS

88

Hiroaki Takada

Challenges of Hard Real-Time OS

89

Hiroaki Takada

FDMP has an advantage, currently ▶  With the current technologies, real-time constraints are hard to guarantee with SMP.

▶  With FDMP, it is relatively easy. Then, why SMP? ▶  Considering the mask cost of future large-scale LSI, the advantage of HW SMP that chip can be general-purpose has significant meaning.

Future Challenge ▶  How to realize a predictable real-time system with general-purpose chip. ▶ Cooperation of SW and HW is indispensable.

Challenges of Hard Real-Time OS

90

Hiroaki Takada

▶  How to realize a high-performance embedded system with limited energy consumption and low cost?

My Future View ▶  General-purpose chip including heterogeneous manycore and reconfigurable logic.

▶  Interconnection network and memories can be imple-mented on another chip and are integrated as a 3D LSI.

▶  Challenge: HW and SW design environment and tools for this chip.

Challenges of Hard Real-Time OS

91

processor core memory

reconf. logic