HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF...

29
© SYSGO AG 1 HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION CHALLENGES Sven Nordhoff, Director Certification SYSGO AG

Transcript of HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF...

Page 1: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 1

HOW HYPERVISOR OPERATING SYSTEMS

CAN COPE WITH MULTI-CORE

CERTIFICATION CHALLENGES

Sven Nordhoff,

Director Certification SYSGO AG

Page 2: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 2

Agenda

• Introduction Hypervisor / Segregation Kernel

• How to cope with Multi-Core?

• EASA/FAA Multi-Core issues

• Hypervisor OS solutions to support Multi-Core certification

• Current and future work

for more details have a look to the conference paper

Page 3: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 3

Introduction Hypervisor / Segregation Kernel

How to cope with Multi-Core?-

Page 4: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 4

Hypervisor OS – Architecture

• Small and fast micro kernel (Time Partitioning)

• Separation of critical and uncritical functionality (Resource Partitioning)

• Separation of HW and I/O resources (Resource Partitioning)

Low-criticality

Partition

(Network driver)

High-criticality

Partition

(Actuator control)

Medium-criticality

Partition

(File system)

Separation Kernel (Micro Kernel, Real-time)

Hardware

(CPUs, memory, and devices)

Page 5: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 5

Hypervisor OS – Resource Partitioning

Use MMU to Map

Memory to a

Partition

Execute in Kernel

Mode

Use IOMMU /

PAMU to Configure

Memory access for

Devices

Guaranteed Access

to Assigned

ResourcesNo access to other

resources / No Error

Propagation

Execute in User

Mode

Static

Configuration of

OS Resources

Page 6: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 6

Low-criticality

Partition

(Graphics)

High-criticality

Partition

(Actuator

control)

Medium-

criticality

Partition

(Communication)

High-criticality

Partition

(Database)

Time-Part A Time-Part B Time-Part C

Time-Part Dtime

8 ms 8 ms 6 ms

Hypervisor OS – Time Partitioning

4

Resource

Partitions

3 Time

Partitions

Page 7: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 7

Hypervisor OS – Partitioning of Multi-Core

possible?

Core 3Core 2Core 1

Low-criticality

Partition

(Network driver)

High-criticality

Partition

(Actuator control)

Medium-criticality

Partition

(File system)

Separation Kernel (Micro Kernel, Real-time)

Hardware

(CPUs, memory, and devices)

? ?

MCP interference

channels (memory,

caches, interconnect)

Hypervisor OS

techniques

adequate to cope

with this?

Time Partitioning ?

Resource Partitioning ?

Others ?

Separation of critical

and uncritical

functions on MCP

possible?

Page 8: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 8

Problems to certify Multi-Core Architectures

Page 9: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 9

Multi-Core Certification Concerns

• Design Assurance

• Leak of processor design documents may lead to undetected

interference channels

• In Service History

• There is not much in service history available for multi-core based

designs in avionics applications

• Hardware Interference Channels• Cores within a chip interact with each other and resources of the chip will be

shared (e.g. cache). This interaction and/or sharing leads to interferences

(e.g. timing) which need to be analyzed and possible mitigation / verification

activities to cope with this interference need to be defined.

• Measurement of WCET timing• Due to the sharing of resources in MCP and the usage of complex

infrastructure (like busses) the analysis and / or measurement of the worst

case timing is hard to achieve.

Page 10: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 10

MCP Interference Channels

• Shared caches

• Typically one L1 cache per CPU

• L2 cache shared on some CPU

• L3 cache typically shared

• Cache coherency protocol

• Problem grows with number of cores

• Configurable on some CPUs through core cross bar

• Global / Local cache flush and invalidate

• Shared buses (Core Connection, Processor, Memory, PCI)

• Shared Interrupts

• Shared devices (Memory, Timer, I/O)

Page 11: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 11

Worst case timing problems

Example: Access to shared data regions (Intel, AMD)

Presented on 29th Digital Avionics Systems Conference, Rudolf Fuchsen, 2010

Page 12: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 12

MCP CAST-32 (rev0, 05/2014)

MCP CRI (not published yet)

EASA/FAA

Multi-core Processors

certification guidance

Page 13: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 13

• CAST 32 paper (rev0) was published 05/2014

• includes 24 objective how to handle MCP in DAL A/B

• includes 16 objective how to handle MCP in DAL C

• limited to 2 active cores

• does not address IMAs, Incremental certification

• a lot of additional information and guidelines

• not the current EASA/FAA MCP position anymore

• EASA MCP CRI not published yet but under discussion in 2016

• includes 10 objective how to handle MCP in DAL A/B

• includes 7 objective how to handle MCP in DAL C

• not limited to two active cores anymore

• implements the current EASA/FAA MCP position

• IMA addressed

• … but this paper has no public availability

EASA/FAA MCP Status

(1) from presentation „FAA Status on Multi-Core Processors”, John Strasburger, 04/2014

(1)

Page 14: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 14

EASA/FAA CAST 32 / CRI Validity

• When does it apply ?

• Levels A, B, and C applications

• Smaller subset of objectives for DAL C applications.

• If only one core active – just two objectives

• Depending on the project – specific objectives may not apply

• Table shows which of the objectives apply by development

assurance level (A, B, C)

• Exempted configurations

• Hyperthreading CPUs are not covered by the CAST and CRI

• When two identical cores run in lock step

• Processors linked by conventional data buses, and not by shared

memory, shared cache, a coherency fabric / module / interconnect

• Systems on IDAL C which do not need robust partitioning do not

need to fulfill all objectives.

Page 15: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 15

CAST-32 – Topics (1)a) Configuration Settings

• Configuration of required, unused and dynamic features needs to be analyzed,

determined and documented (DO-254)

b) Processor Errata

• Process needs to be in place to assess MCP Errata sheets regularly

( same approach like COTS/microcontroller certification)

c) SW Hypervisors and MCP HW Hypervisor Features

• Needs to be identified in plans (PSAC, PHAC) and compliant with DO-178B/C

d) MCP Interference Channels

• Identification of interference channels and verify means of mitigation

e) Shared Memory and Cache (between processing cores)

• Describe shared resource approach in SW plans (PSAC)

• Prevent disruptions to deterministic software execution

• Analysis and tests to determine worse case effects of share memory and cache

WCET analysis

f) Planning and Verification of Resource Usage

• Describe approach in SW plans (PSAC)

• Allocate, manage and measure resource and interconnect usage.

• Verify resource and interconnect demands do not exceed the capacity.

Page 16: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 16

CAST-32 - Topics (2)

g) Software Planning and Development Processes

• Identify the MCP software architecture (in SW plans like PSAC)

• Describe the development and verification planned to demonstrate it

executes deterministically. ( standard DO-178B/C requirements)

h) Software Verification

• In best case verification on target MCP environment (justification if not)

• Software need to comply with DO-178B/C

• Data & control coupling between all software components hosted via shared

memory need to be verified.

i) Discovery of Additional Features or Problems

• Any other problem of MCP not described in CAST 32 need to be considered

j) Error Detection and Handling and Safety Nets

• Errors and failures (of the MCP) need to be addressed by Safety Net

approach

Page 17: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 17

EASA/FAA CRI objectives (1)§4.1 Software Planning

§4.2 The Planning and Setting of MCP Resources

• MCP_Planning_1:

• Identify MCP processor, # of active cores,

• MCP software architecture (e.g. AMP, SMP),

• SW components (e.g. OS, Hypervisor) with MCP related dynamic features

• Methods and tools for SW development/verification (EASA/FAA SW Guidance)

• MCP_Planning_2:

• Usage and configuration of MCP shared resources (avoid conflicts).

• Verify usage of shared resources (e.g. resource conflicts , overflow).

• Identify and usage of MCP dynamic features.

• MCP_Resource_Usage_1:

• Define the MCP configuration settings

• Satisfy the functional, performance and timing requirements of the system (!)

• MCP_Resource_Usage_2:

• Identify and verify adequacy of mitigation means for critical configuration settings

Page 18: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 18

EASA/FAA CRI objectives

§4.3 Interference Channels and Resource Usage

• MCP_Resource_Usage_3:

• Identification of interference channels

• Verification of means of mitigation of the interference.

• MCP_Resource_Usage_4:

• Identify available MCP resources and interconnect in final configuration

• Allocate MCP resource to SW applications

• Verify if this configuration do not exceed the available resources when all the hosted

software is executing on the target processor ( WCET).

§4.4 Software Verification

• MCP_Software_1:

• Verification that all SW components comply with EASA/FAA Software

Guidance (function correctly and have sufficient time to complete their

execution).

• MCP_Software_2:

• Verification of the data and control coupling between all the individual

software components ( same core, different cores)

Page 19: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 19

EASA/FAA CRI objectives §4.5 Error Detection and Handling, and Safety Nets

§4.6 Reporting of Compliance with the Objectives of the CRI

• MCP_Error_Handling_1:

• Identification of the effects of failures that may occur within the MCP

• Identification of means to handle the safety objectives accordingly

• Detect and handle MCP related failures within the equipment in a fail-

safe manner

• MCP_Accomplishment_Summary_1:

• Summary how the objectives of this CRI are met

• Document this summary in the

• SW Accomplishment Summary (SAS),

• HW Accomplishment Summary (HAS)

• or other deliverable documentation.

Page 20: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 20

EASA/FAA CRI objectives §5 Applicability of the MCP CRI Objectives according to their DAL.

Mapping between CRI and CAST 32

Page 21: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 21

Hypervisor / Segregation Kernel solutions to

support Multi-Core certification

Page 22: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 22

MCP Hypervisor Software Design

• Support static partitioning of CPU cores

• Load balancing algorithms may not be used during execution of

critical applications

• Control access to shared resources

• Avoid parallel execution of different criticality levels

• Eliminate “False Sharing” between partitions

• Avoid Multi-Core application design for safety critical applications

• Ensure proper data alignment inside shared software components

• Eliminate interference via OS internal synchronization

objects

• Minimize / Avoid OS global locks

Certification Aspects

Page 23: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 23

Scheduling scheme on a MCP platform with one

real time partition

• Avoid interference between applications with real time demands

• It is recommended to run the time critical application in its own time

partition without any activity in time partition 0 (service time partition)

• Resource Partitions need to

be set adequately to avoid

sharing conflicts between

cores

• Allocate one timing partition

to the safety-critical

applications

Safety-critical

Non-safety critical

Tp_1

Tp_2

Tp_3

1

2

3

4

Tp_1 Tp_2 Tp_3 Tp_1 Tp_2 Tp_3

1

1

4

2

2

2

3

1

1

4

2

2

2

3

A

B

C

D

A

B

C

D

Execution

Configuration

1

CPU Core

Resource

Partition

Time

Partition

Page 24: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 24

Define Cache / TLB Worst Case Jitter

• Caches and TLBs

• MCP Hypervisor OS shall provide means to invalidate instruction caches and

TBLs and to flush the data cache between time partition switches.

• This ensures that caches and TLBs are in a defined state when a partition starts

execution.

• The cache / TLB flush and invalidate operation takes place during the time partition

switch, so it will steal the CPU cycles from the partition to be activated.

• A small time partition window which is allocated to an unused time partition ID

should be inserted before the time critical application to eliminate the jitter of the

time critical application. This is shown in the figure below.

The platform specific worst case execution time analysis must provide the value for the

worst case jitter.

Page 25: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 25

MCP Compliance – Additional Analysis

Documentation

• SYSGO show compliance to the certification objectives (e.g.

EN50128, DO-178B/C) by providing a “Certification Kit”

including all relevant life cycle data and analysis means.

• NEW MCP Analysis will answer to most of the CRI-related

questions

• e.g. identification and mitigation of interference channels

• is typically CPU / MCP architecture specific

• will not address all HW related CRI aspects

Documentation Means of Compliance

PikeOS Partitioning Analysis (X86) SW Partitioning

PikeOS Stack Analysis (X86) Stack Usage

PikeOS WCET Timing Analysis WCET Analysis

PikeOS Safety Manual How to use OS in a „safe“ & „secure“ way

Safety Bulletin / Errata (for cert projects) Management of OPRs

PikeOS Multi-Core Analysis (for a given CPU architecture) Statement of EASA/FAA MCP CRI Compliance

Page 26: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 26

Current and future work

Page 27: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 27

Current Work

• PikeOS 3.4 is certified for Multi-Core projects against EN

50128 SIL 4 (Railway).

• Dual-Core approach

• SYSGO AG and Thales are currently working on the

preparation of next generation of

• PikeOS to be certifiable for DO-178C SW level C Multi-Core projects.

• PikeOS to be certifiable for DO-178C SW level A Multi-Core projects

(which includes also IMA considerations).

Page 28: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 28

SYSGO Engagement in Multi-Core Research • ARAMiS stands for Automotive, Railway and Avionics Multicore Systems

• ARAMiS is a three-year research project that has started on December 1, 2011.

It has received funding from the German Federal Ministry of Education and

Research.

MCFA

• EMC² – ‘Embedded Multi-Core systems for Mixed Criticality applications in

dynamic and changeable real-time environments’ is an ARTEMIS Joint

Undertaking project in the Innovation Pilot Program ‘Computing platforms for

embedded systems’ (AIPP5).

• SYSGO AG is supporting the working group Multi-Core For Avionics

(MCFA)

• PROXIMA pursues the development of probabilistically time analysable (PTA)

techniques and tools for multicore/many-core platforms. The project will

selectively introduce randomization in the timing behaviour of certain

hardware and software resources as a way to facilitate the use probabilities to

predict the overall timing behaviour of the software and its likelihood of timing

failure.

• Ashely: Extension of DME Concepts and solutions. Multi-Domain, secured

Data Distribution services to streamline aircraft data distribution.

Page 29: HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF file · 2017-06-12how hypervisor operating systems can cope with multi-core certification challenges

© SYSGO AG 29

Thank you for your attention!

Questions?

More information on www.sysgo.com