D4.2: Real-Time Linux (final) -...

23
This Project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement: 688860 Project title: High-Performance Real-time Architectures for Low-Power Embedded Systems Acronym: HERCULES Project ID: 688860 Call identifier: H2020 - ICT 04-2015 - Customised and low power computing Project Coordinator: Prof. Marko Bertogna, University of Modena and Reggio Emilia D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final Version: 1.1 Deliverable No.: D4.2 Lead task beneficiary EVI Partners Involved: EVI Author: Claudio Scordino Status: Final Date: 30/6/2018 Nature 1 : S Dissemination level PU Public X PP Restricted to other programme participants (including the Commission Services – CS & IAB) RE Restricted to a group specified by the consortium (including the IAB) CO Confidential to consortium (including CS & IAB) 1 For deliverables: R = Report; P = Prototype; D = Demonstrator; S = Software/Simulator; O = Other For milestones: O = Operational; D = Demonstrator; S = Software/Simulator; ES = Executive Summary; P = Prototype

Transcript of D4.2: Real-Time Linux (final) -...

Page 1: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

This Project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement: 688860

Project title: High-Performance Real-time Architectures for Low-Power Embedded Systems

Acronym: HERCULES

Project ID: 688860

Call identifier: H2020 - ICT 04-2015 - Customised and low power computing

Project Coordinator: Prof. Marko Bertogna, University of Modena and Reggio Emilia

D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Version: 1.1

Deliverable No.: D4.2

Lead task beneficiary

EVI

Partners Involved: EVI

Author: Claudio Scordino

Status: Final

Date: 30/6/2018

Nature1: S

Dissemination level

PU Public X

PP Restricted to other programme participants (including the Commission Services – CS & IAB)

RE Restricted to a group specified by the consortium (including the IAB)

CO Confidential to consortium (including CS & IAB)

1 For deliverables: R = Report; P = Prototype; D = Demonstrator; S = Software/Simulator; O = Other For milestones: O = Operational; D = Demonstrator; S = Software/Simulator; ES = Executive Summary; P = Prototype

Page 2: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page i

This Project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement: 688860

Document history

Version Date Author Comments

0.1 2018-06-14 EVI First version

0.2 2018-06-25 PIT Internal review by PIT.

1.0 2018-06-28 EVI Integrated review comments

1.1 2018-06-30 UNIMORE Final revision

This document reflects only the author's view and the EU Commission is not responsible for any use that may be made of the information it contains.

Page 3: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page ii

This Project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement: 688860

Table of contents

Document history ........................................................................................................................................... i

Table of contents ........................................................................................................................................... ii

List of figures ................................................................................................................................................... 4

Glossary ............................................................................................................................................................ 4

1. EXECUTIVE SUMMARY ...................................................................................................................... 5

2. INTRODUCTION .................................................................................................................................. 6

2.1. Objectives ............................................................................................................................. 6

3. SCHED_DEADLINE .............................................................................................................................. 7

3.1 Description ....................................................................................................................................... 7

3.2 Usage ................................................................................................................................................ 8

3.3 Issues ................................................................................................................................................ 8

4. GRUB .................................................................................................................................................... 10

4.1 Description ......................................................................................................................................10

4.2 Installation .......................................................................................................................................10

4.3 Usage ...............................................................................................................................................10

4.4 Testing the correctness ..................................................................................................................11

4.5 Evaluating the benefits ...................................................................................................................12

5. GRUB-PA ............................................................................................................................................. 14

5.1 Description ......................................................................................................................................14

5.2 Installation .......................................................................................................................................14

5.3 Usage ...............................................................................................................................................14

5.4 Test setup ........................................................................................................................................14

5.5 Experimental results .......................................................................................................................17

6. CONCLUSIONS .................................................................................................................................. 21

7. REFERENCES ...................................................................................................................................... 22

Page 4: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 4

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

List of figures

Figure 1: Priorities of the Linux kernel scheduling classes. ................................................................................ 7 Figure 2: Code coverage of test-dl. ...................................................................................................................11 Figure 3: Analysis example of the trace through kernelshark. ...........................................................................12 Figure 4: CDF of response times (CBS vs GRUB). ...........................................................................................13

Figure 5: Example of Python code on LISA. ......................................................................................................15 Figure 6: Example of CPU frequency plot on LISA............................................................................................16 Figure 7: Example of task execution plot on LISA. ............................................................................................16 Figure 8: Example of GRUB's bandwidth plot on LISA. .....................................................................................16 Figure 9: Example of power consumption measured through the Baylibre's tool. .............................................17 Figure 10: Test 1 of GRUB-PA (1 task, 90%, 100msec). ..................................................................................18 Figure 11: Test 2 of GRUB-PA (1 task, 100%, 100msec). ................................................................................19 Figure 12: Test 3 of GRUB-PA (1 task, 90%, 10msec). ....................................................................................19

Figure 13: Test 4 of GRUB-PA (4 tasks, 90%, 100msec). .................................................................................20 Glossary

Item Description

CBS Constant Bandwidth Server

GA Grant Agreement

GPOS General-Purpose Operating System

GRUB Greedy Reclamation of Unused

Bandwidth

OS Operating System

Page 5: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 5

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

1. EXECUTIVE SUMMARY

Deliverable D4.2 is a software package containing the final version of the real-time Linux OS for the reference many-core platform, developed in Task 4.1 of the HERCULES project. As illustrated in the GA, such task aims at “instrumenting the general-purpose Linux OS with proper real-time functionalities to achieve the required timing predictability and power efficiency.” This document describes the developed software and explains how to install and use the software package provided along with the document.

Page 6: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 6

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

2. INTRODUCTION

2.1. Objectives

As illustrated in the GA, in HERCULES “the SCHED_DEADLINE scheduler will be used as a starting point and will be properly improved with better algorithms (e.g., IRIS [Mar04]) or resource sharing strategies (e.g., M-BWI [Fag12]); moreover, it will be extended to implement the mechanisms designed in WP5, which could be either power-aware real-time algorithms (e.g., GRUB-PA [Sco06]) or synchronization with power management hardware components.” The work in Task 4.1, therefore, has moved along three different lines of development:

1. the design/implementation of a set of changes to add reclaiming features to the existing implementation of the SCHED_DEADLINE scheduler, by implementing the Greedy Reclamation of Unused Bandwidth (GRUB [Lip00]) algorithm;

2. the design/implementation of an extension to add energy-aware functionalities on top of such reclaiming mechanism;

3. the design/implementation of a test-suite for properly testing both the previous functionalities. Additionally, Task 4.1 also investigated the possibility of changing the resource sharing algorithm currently used in SCHED_DEADLINE (i.e. Deadline Inheritance) to a more sophisticated algorithm (e.g., M-BWI [Fag12] or BROE [Ber09]).

Page 7: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 7

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

3. SCHED_DEADLINE

3.1 Description

Since release 2.6.23, Linux has a modular scheduling framework that can be easily extended with additional policies. It has been implemented by Molnar [Lel16] as a replacement of the previous O(1) scheduler. It consists of a main schedule() function and a set of scheduling classes, each encapsulating a specific scheduling policy. The binding between each policy and the related scheduler is done through a set of hooks (i.e., function pointers) provided by each scheduling class and called by the core scheduler. SCHED_DEADLINE [Lel16, WikSD] was originally born within the context of the ACTORS project [Bin11], financially supported by the European Commission under the FP7 framework. ACTORS combined virtualization, feedback control, and a data-flow programming model for realizing a high-performance infrastructure for adaptable resource management in embedded devices. The reference platform was a Linux-based multi-core ARM chip. Within this project, partner EVI was in charge of the development of a real-time CPU scheduler at kernel level. Such scheduler was created in collaboration with the Real-Time Systems Laboratory of Scuola Sant’Anna (which already had a wide experience in development of real-time schedulers for Linux). Rather than making ad-hoc changes to the Linux scheduling core (an approach that proved to offer a lower overhead [Cer14]), EVI created an additional in-kernel scheduling class to ease the acceptance of the scheduler into the official Linux kernel. The project, in fact, from the very beginning aimed at the ambitious goal of creating a real-time scheduler available by default on all Linux distributions. Thus, SCHED_DEADLINE sits on top of the Linux modular scheduling framework, without changing the behaviour of already existing scheduling policies (i.e., best-effort and fixed-priority). The counterpart is that SCHED_DEADLINE automatically inherited the performance and the latencies of the Linux scheduling framework. Figure 1 shows the set of scheduling policies currently available on Linux.

Figure 1: Priorities of the Linux kernel scheduling classes.

On March 2014, after the submission of nine releases, SCHED_DEADLINE has been accepted and merged into the Linux kernel 3.14. SCHED_DEADLINE provides a scheduling policy suitable for real-time applications by implementing a resource reservation algorithm in the Linux kernel. This algorithm, named Constant Bandwidth Server (CBS [Abe98]), implements resource reservations over an Earliest Deadline First (EDF) scheduler [Liu73].

Page 8: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 8

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

Resource reservations [Mer93] is an effective technique for providing temporal isolation, which allows using real-time scheduling even in GPOSs. The basic idea is to assign each real-time task a “reservation'' consisting of a “runtime” (called “budget” in the real-time literature) and a “period”. The scheduler will guarantee that the task will execute at most for a time equal to the “runtime” every “period”. If the task tries to execute for a longer time, then the scheduler throttles it (preventing him to be selected for execution) until the end of the current reservation period. In this way, each task is constrained to not use more than its reserved CPU share (called “bandwidth” in the real-time literature).

3.2 Usage

Unfortunately, despite the long time already elapsed from the integration of SCHED_DEADLINE into the official Linux kernel, the GNU libc has not yet integrated the API for dealing with this scheduler. This is somehow similar to what is happening with the (still missing) gettid() libc call. To use the SCHED_DEADLINE scheduler it is therefore necessary to rely on external libraries or on a direct syscall as in the example below:

#ifdef __x86_64__ #define __NR_sched_setattr 314 #endif #ifdef __i386__ #define __NR_sched_setattr 351 #endif #ifdef __arm__ #define __NR_sched_setattr 380 #endif #ifdef __aarch64__ #define __NR_sched_setattr 274 #endif /*…*/ struct sched_attr attr; attr.size = sizeof(attr); attr.sched_flags =0; attr.sched_nice = 0; attr.sched_priority = 0; /* This creates a 10ms/30ms reservation */ attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = 10 * 1000 * 1000; attr.sched_period = attr.sched_deadline = 30 * 1000 * 1000; sched_setattr(0, &attr, flags);

3.3 Issues

Although referred with the name of “CBS”, the algorithm actually implemented in the original SCHED_DEADLINE is HCBS (i.e. CBS with Hard Reservation [ScoPhD]). In fact, if a real-time task tries to execute for more than its share, it is blocked, even when no other tasks are ready for execution. In the scheduling terms, this means that the algorithm is not “work conserving”. This behaviour has been designed to avoid starvation of lower priority tasks; moreover, this allows solving the “greedy task” problem (also known as

Page 9: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 9

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

“deadline aging” [ScoPhD, Sco18]), where a task consumes its future reservation due to other real-time tasks not consuming their owns. However, this approach does not allow fully exploiting the system resources; it would be therefore desirable to allow the real-time tasks to get a bigger CPU share whenever the system is idle or underutilized.

Page 10: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 10

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

4. GRUB

4.1 Description

The Greedy Reclamation of Unused Bandwidth (GRUB [Lip00]) algorithm aims at adding the reclaiming functionality to the CBS-class of algorithms, preserving the same semantics and real-time guarantees. It introduces an additional state (i.e., ActiveNonContending) entered by a task that blocks and whose share cannot (yet) be reclaimed. The task enters the Inactive state when it persists in being blocked and its bandwidth can be successfully reclaimed. An in-depth description of the algorithm along with a few examples can be found in the real-time literature [Lip00, Sco06, ScoPhD] and in the deliverable D5.1 During the last decades, several scheduling algorithms that outperform CBS have been proposed in the literature. The reason why the HERCULES Consortium has chosen the GRUB algorithm is that it is the one explicitly agreed by the Linux kernel community during the RT Summit (Berlin, October 2017). Since Task 4.1 aimed at releasing the code as public domain software, the Consortium has decided to implement the very same algorithm requested by the Linux community. Implementing the GRUB algorithm in Linux has been far from easy, due to the partitioned nature of the Linux scheduler (consisting of a set of isolated runqueues, one for each scheduling class and for each physical core), and because the original algorithm was designed for a single-core machine [Abe14]. In the HERCULES project, partner EVI collaborated with Scuola Sant’Anna and ARM Ltd. to implement the GRUB algorithm by modifying the existing SCHED_DEADLINE code. In particular, EVI’s activity has been focused on the design and development of the code for tracking the inactive bandwidth of the runqueues as well as on testing the overall implementation through unit testing.

4.2 Installation

The GRUB implementation has been already merged into the official Linux kernel since release 4.13 [Lin01, WikSD] with some additional features (i.e., runtime overrun signal) merged since release 4.16 [Lin02]. The code is therefore available and maintained directly inside the official Linux kernel. The patches are also provided in the grub/ and runtime-overrun/ directories inside the software package delivered along with this document.

4.3 Usage

The API for using GRUB is the same of SCHED_DEADLINE. The reclaiming features can be enabled by just setting

attr.sched_flags = SCHED_FLAG_RECLAIM; in the sched_setattr() call. The SCHED_FLAG_RECLAIM is set to 0x02 inside the uapi/linux/sched.h file. Similarly, the SCHED_FLAG_DL_OVERRUN flag (equal to 0x04) lets the task getting the delivery of a SIGXCPU signal whenever there is a runtime overrun. The following code shows how to install a user-space handler for receiving and handling such signal:

static void handler (int signum) {

/*Handle signal…*/

Page 11: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 11

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

}

int main (void) {

struct sigaction s; bzero(&s, sizeof(s)); s.sa_handler = handler; sigaction (SIGXCPU, &s, NULL); /*…*/

}

4.4 Testing the correctness

The GRUB algorithm is quite complex with respect to the original SCHED_DEADLINE implementation [Abe14]: it adds a further state, a kernel timer for each task that enters the ActiveNonContending state, and keeps track of a couple of bandwidths for each runqueue. Unfortunately, the Linux modular scheduling framework illustrated in the previous sections is far from being perfect; it does not fully isolate the scheduling classes and it makes very hard to intercept the events relevant for the GRUB algorithm. For this reason, we needed to implement a unit testing framework to ensure that the bandwidth is correctly accounted in all circumstances and states in which the SCHED_DEADLINE class and the real-time tasks could enter. We therefore implemented a set of tests, then integrated with a set of scripts for the traditional Linux kernel analysis tools (i.e., ftrace [Ftrace], trace-cmd[LWN10], kernelshark[LWN11] and ARM’s LISA[LISA]). Through this framework we have been able of verifying the accounting of the bandwidth as well as the amount of CPU share given to the tasks in several different circumstances. The tests have been made publicly available through EVI’s GitHub account [Test-dl]. They are also provided in the grub-tests/ directory inside the software package delivered along with this document. Using the gcov tool we measured a code coverage equal to 84% for the deadline.c file:

Figure 2: Code coverage of test-dl.

Page 12: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 12

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

The instructions for installing and using the test-suite are provided in the README.md file available on both EVI’s GitHub repository [Test-dl] and in the grub-tests/ directory inside the software package delivered along with this document.

Such instructions explain how to directly run the tests on the target. The graphical analysis tools (e.g., kernelshark) can be run by connecting to the board using the -X option of ssh (which enables X11 forwarding). A screenshot follows.

Figure 3: Analysis example of the trace through kernelshark.

Note that kernelshark allows getting a list of the events traced by the kernel, to filter a specific task (either in the graph or in the list of events) or a specific event, and to measure the distance between two events in the graph, thus allowing to check the timing correctness of the relevant events in the Linux kernel.

4.5 Evaluating the benefits

The following experimental result, shown by EVI at the Linux RT Summit’17, compares GRUB with the previous CBS algorithm. The experiment runs two tasks:

Page 13: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 13

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

• Task 1 with reservation (6 ms, 20 ms) and a constant execution time of 5 ms.

• Task 2 with reservation (45 ms, 260 ms) and an execution time with occasional variance (i.e. 35 ms – 52 ms).

Figure 4 shows the Cumulative Distribution Function (CDF), which is the probability that the Response time is less or equal to x ms.

Figure 4: CDF of response times (CBS vs GRUB).

It is possible to note that with the original CBS, task T2 occasionally experiences a response time higher than the reservation period. Using GRUB, instead, T2 always completes before the reservation period (by reclaiming the spare bandwidth unused by T1). In practice, GRUB allows any task to reclaim unused real-time bandwidth whenever available. The result is a lower amount of deadline misses, especially on systems not optimally configured.

Page 14: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 14

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

5. GRUB-PA

5.1 Description

The GRUB-PA algorithm exploits the reclaiming feature of the original GRUB algorithm for lowering the CPU operating frequency (hence, its power consumption) respecting the deadlines of the real-time tasks. The algorithm, already presented in the real-time literature [Sco06, ScoPhD], has been described in the deliverable D5.1. In the project HERCULES, partner EVI has collaborated with ARM Ltd and Scuola Sant’Anna for designing and developing the power-aware extensions to GRUB. The resulting work has been illustrated in a paper presented at SAC 2018 [Sco18].

5.2 Installation

The energy efficiency extension of GRUB (i.e., GRUB-PA [ScoPhd, Sco06, Sco18]), co-authored by partner EVI in the context of the HERCULES project, has been already merged into the official Linux kernel since release 4.16 [Lin03, WikSD]. The code is therefore available and maintained directly inside the official Linux kernel. The patches are also provided in the grub-pa/ directory inside the software package delivered along with this document.

5.3 Usage

The energy efficiency extension is automatically invoked whenever the CPU’s schedutil frequency governor is selected in the Linux kernel configuration. Unfortunately, such extension is currently available only for ARM architectures, as the x86 architecture does not provide any reliable mechanism for getting the actual CPU frequency. In the future we will investigate the usage of performance counters to get a reliable estimation of the CPU frequency also on such architecture.

5.4 Test setup

The tests of the GRUB-PA scheduler have been performed by using ARM’s LISA testing framework [LISA]. By integrating with Jupyter [Jup], this open-source tool allows to write and run Python-based scripts (called “notebooks”) directly on a browser, as the following screenshot shows:

Page 15: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 15

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

Figure 5: Example of Python code on LISA.

The example of script provided inside the tests-grub-pa/ directory, for example, allows to run real-time tasks on ARM-based NXP i.mx6 and Odroid XU4 targets through simple statements written in Python. The framework allows to describe and run a test, then plot the CPU frequency as shown in Figure 6:

Page 16: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 16

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

Figure 6: Example of CPU frequency plot on LISA.

It is also possible to plot the execution schedule of specific tasks as shown in Figure 7:

Figure 7: Example of task execution plot on LISA.

Plot the trace of traced variables like the GRUB runqueue’s bandwidth:

Figure 8: Example of GRUB's bandwidth plot on LISA.

Page 17: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 17

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

The power consumption has been measured through a Baylibre ACME Cape [BayLib1, BayLib2] integrated with LISA:

Figure 9: Example of power consumption measured through the Baylibre's tool.

Note that when using the ACME Cape (which relies on a separate Beaglebone Black for measuring the energy consumption), the time is not aligned with the time reported by the ftrace trace.

5.5 Experimental results

To validate the proposed scheduler we have performed a set of extensive tests using a Freescale Sabre board based on a quad-core Cortex-A9 ARM SoC [NXP] that can run at three different frequencies: 396 MHz, 792 MHz and 996 MHz. The choice of this specific board depended on two factors:

• the very good support by the standard mainline Linux kernel, easing upstream integration;

• the processor very similar to the one of the HERCULES reference platforms (i.e. multi-core ARM). The synthetic real-time load has been generated through the rt-app toolkit [rt-app], implementing one or more real-time tasks composed by periodic jobs with a fixed execution time, run through the ARM's LISA framework [LISA] already described in the previous section. To have consistent data, all the provided values were averaged over 10 consecutive runs. The energy consumption has been measured through the Baylibre's ACME Cape board [BayLib1, BayLib2] integrated with LISA. We highlight the fact that the measured values are related to the energy consumed by the whole embedded board, not just the SoC. In particular, the tests aimed at comparing the GRUB-PA implementation versus the state of the art (i.e. the mainline schedutil governor) in terms of both energy consumption and real-time guarantees (i.e., percentage of deadline misses).

Page 18: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 18

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

The following results, officially presented by partner EVI at the Linux RT Summit 2017, show that GRUB-PA allows a significant reduction of the energy consumption still guaranteeing the real-time constraints. In particular, the results compare the energy consumption and the deadline misses of the performance governor (i.e., CPU frequency always at the maximum value), of the current schedutil implementation (i.e., CPU frequency at the maximum value when executing SCHED_DEADLINE tasks) and the proposed GRUB-PA implementation (i.e., CPU frequency at the minimum value that allows to respect the real-time constraints). The first set of experiments consisted in a periodic user-space task with period 100 ms and fixed execution time C, scheduled by a SCHED_DEADLINE reservation with a runtime Q such that C=0.9*Q. Multiple experiments have been performed, with the runtime Q ranging from 10ms to 100ms (and hence C ranging from 9ms to 90ms). Figure 10 shows the average energy consumption measured by the energy meter for different values of the reservation as well as the percentage of deadline misses registered by the rt-app tool. Such figures show that both the performance governor and the proposed GRUB-PA scheduler have a negligible amount of deadline misses. However, GRUB-PA is more efficient in terms of energy consumption whenever the real-time load is below 70%. It is worth noting that the original schedutil governor (then superseded by GRUB-PA) presented a quite high percentage of deadline misses.

A second set of experiments further stressed the deadline scheduler by setting the execution time C of the task exactly equal to runtime Q of the reservation serving the task. The experimental results, shown in Figure 11, confirm the trend already observed.

Figure 10: Test 1 of GRUB-PA (1 task, 90%, 100msec).

Page 19: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 19

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

A third set of experiments aimed at investigating the behaviour of the scheduler when reducing the timing granularity. We have thus reduced the task period (and the SCHED_ DEADLINE reservation period) to 10ms, with the reservation's runtime Q ranging from 1ms to 10ms and the task execution time C (equal to 0.9*Q) ranging from 0.9ms to 9ms. The results illustrated in Figure 12 show a significant gain in terms of energy efficiency. Looking at the distribution of the misses with respect to the reservation's utilization, we can see that GRUB-PA has a worse number of misses than the current schedutil governor only for values of the reservation bandwidth higher than 90%; the default schedutil governor, in fact, presents a non-negligible percentage of misses across the whole range of values. Again, this means that the average percentage of misses is lower using GRUB-PA than the governor originally available in the mainline kernel.

A further set of experiments aimed at investigating the performance when increasing the number of real-time tasks. The fourth test run four SCHED_DEADLINE tasks, equal to the number of the available cores. The four tasks, encapsulated in different reservations, had a period of 100ms and execution time equal to 90% of their reservation's runtime, similarly to the first test.

Figure 11: Test 2 of GRUB-PA (1 task, 100%, 100msec).

Figure 12: Test 3 of GRUB-PA (1 task, 90%, 10msec).

Page 20: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 20

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

The experimental results, shown in Figure 13, confirm the behavior already illustrated: despite a very small amount of deadline misses in case of 30% and 100% total bandwidth, the proposed implementation shows a real-time behavior similar to the performance governor with a much lower energy consumption. Additionally, we report a system deadlock of the target that we experienced using the schedutil governor and an utilization equal to 100% (such deadlock, instead, did not happen when using the proposed GRUB-PA scheduler).

Further experimental results have been reported in the paper presented at the SAC 2018 conference [Sco18]. Such results confirm the trend already illustrated and have paved the way to the inclusion of GRUB-PA into the official Linux kernel (happened with release 4.16). For the sake of completeness, it is worth reporting that we noted a performance degradation (in terms of amount of deadline misses) whenever the granularity of the real-time task’s period becomes comparable with the time needed by the platform for switching frequency. In such case, the platform (meant as both hardware and operating system) cannot react in time to the request of increasing frequency, and the amount of deadline misses becomes not negligible. In such circumstance (which depends on both the specific task set and the hardware platform) it is better to rely on a fixed CPU frequency for respecting the real-time constraints of the tasks.

Figure 13: Test 4 of GRUB-PA (4 tasks, 90%, 100msec).

Page 21: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 21

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

6. CONCLUSIONS

This document illustrated how to use the software designed and developed in Task 4.1 of the HERCULES project and provided as deliverable D4.2. It has also shown the efficiency in terms of power consumption of the developed GRUB-PA algorithm. An important achievement of the HERCULES project is that such code (i.e. the GRUB and GRUB-PA algorithms) have already become part of the official Linux kernel, thus contributing to such OS widely used in several application domains. Task 4.1 has also investigated the possibility of replacing the current resource reservation policy in the Linux kernel (i.e. Deadline Inheritance) with a more efficient algorithm (i.e. M-BWI or BROE). Unfortunately, the lack of a clear goal from the Linux community did not allow defining a path for identifying and developing an algorithm widely accepted. Nevertheless, partner EVI will continue to provide its support and experience to the Linux kernel community even out of the scope of the HERCULES project.

Page 22: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 22

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

7. REFERENCES

[Abe98] Luca Abeni and Giorgio Buttazzo. Integrating multimedia applications in hard real-time

systems. In Proceedings of the 19th IEEE Real-Time Systems Symposium (RTSS), pages 4 –

13, Madrid, Spain, December 1998.

[Abe14] Luca Abeni, Juri Lelli, Claudio Scordino, Luigi Paolopoli, Greedy CPU reclaiming for

SCHED_DEADLINE, Proceedings of 16th Real-Time Linux Workshop (RTLWS),

Dusseldorf, Germany, October 2014.

[BayLib1] BayLibre, ACME Cape, https://baylibre-acme.github.io/

[BayLib2] BayLibre, ACME Cape Wiki, http://wiki.baylibre.com/doku.php?id=acme:start

[BayGraph] BayLibre, pyacmegraph, https://github.com/baylibre-acme/pyacmegraph

[Ber09] Marko Bertogna, Nathan Fisher, Sanjoy Baruah, Resource-sharing servers for Open

Environments, IEEE Transactions on Industrial Informatics. 5(3): 202-220. August 2009.

2009 Best Paper for the IEEE Transactions on Industrial Informatics.

[Bin11] Enrico Bini, Giorgio Buttazzo, Johan Eker, Stefan Schorr, Raphael Guerra, Gerhard Fohler,

Karl-Erik Arzen, Vanessa Romero Segovia, Claudio Scordino, Resource Management on

Multicore Systems: The ACTORS Approach, IEEE Micro, vol. 31, no. 3, pp. 72-81,

May/June 2011.

[Cer14] Cerqueira F, Vanga M, Brandenburg BB. Scaling global scheduling with message passing.

20th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS

2014), Berlin, Germany, 2014; 263–274.

[Fag12] Dario Faggioli, Giuseppe Lipari and Tommaso Cucinotta, Analysis and implementation of

the multiprocessor bandwidth inheritance protocol, Real-Time Systems, vol. 48, no. 6, pp.

789-825, 2012.

[Ftrace] https://www.kernel.org/doc/Documentation/trace/ftrace.txt

[Jup] Jupyter, http://jupyter.org/

[Lip00] Giuseppe Lipari and Sanjoy K. Baruah. Greedy reclamation of unused bandwidth in constant

bandwidth servers. In IEEE Proceedings of the 12th Euromicro Conference on Real-Time

Systems, Stokholm, Sweden, June 2000.

[Lel16] Juri Lelli, Claudio Scordino, Luca Abeni, Dario Faggioli, Deadline scheduling in the Linux

kernel, Software: Practice and Experience, 46(6): 821-839, June 2016.

[Lin01] sched/deadline: Implement GRUB accounting,

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c52f14d384628

db0217a7a9080ab800d5ffb2d72

[Lin02] sched/deadline: Implement "runtime overrun signal" support,

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34be39305a77b

8b1ec9f279163c7cdb6cc719b91

[Lin03] sched/cpufreq: Use the DEADLINE utilization signal,

Page 23: D4.2: Real-Time Linux (final) - Hercules2020hercules2020.eu/wp-content/uploads/2016/04/D4.2_Realtime_Linux_… · D4.2: Real-Time Linux (final) Document title: Real-Time Linux - Final

Page 23

☐☐HERCULES:

High-Performance Real-time Architectures for Low-Power Embedded Systems

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4edd662ac165

7126df7ffd74a278958b133a77d

[Liu73] Liu CL, Layland JW. Scheduling algorithms for multiprogramming in a hard-real-time

environment. Journal of the Association for Computing Machinery, 20(1):46–61, 1973.

[LISA] ARM LISA framework, https://github.com/ARM-software/lisa

[LWN10] Linux Weekly News, trace-cmd: A front-end for Ftrace, https://lwn.net/Articles/410200/

[LWN11] Linux Weekly News, Using KernelShark to analyze the real-time scheduler,

https://lwn.net/Articles/425583/

[Mar04] Luca Marzario, Giuseppe Lipari, Patricia Balbastre, and Alfons Crespo. IRIS: A New

Reclaiming Algorithm for Server-Based Real-Time Systems. In Proceedings of the IEEE

Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 211–

218, 2004.

[Mer93] Mercer CW, Rajkumar R, Tokuda H. Applying hard real-time technology to multimedia

systems. Workshop on the Role of Real-Time in Multimedia/Interactive Computing System,

1993.

[NXP] NXP i.MX6, https://www.nxp.com/products/processors-and-microcontrollers/applications-

processors/i.mx-applications-processors/i.mx-6-processors:IMX6X_SERIES

[rt-app] Scheduler tools, rt-app, https://github.com/scheduler-tools/rt-app

[Sco06] Claudio Scordino, Giuseppe Lipari, A Resource Reservation Algorithm for Power-Aware

Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions on Computers,

December 2006.

[Sco18] Claudio Scordino, Luca Abeni, Juri Lelli, Energy-Aware Real-Time Scheduling in the Linux

Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC 2018), Pau, France,

April 2018.

[ScoPhD] Claudio Scordino, Dynamic Voltage Scaling for Energy-Constrained Real-Time Systems,

PhD Thesis, University of Pisa, December 2007.

[Test-dl] Evidence Srl, test-sched-dl, https://github.com/evidence/test-sched-dl

[WikSD] Wikipedia, SCHED_DEADLINE, https://en.wikipedia.org/wiki/SCHED_DEADLINE