HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF...
Transcript of HOW HYPERVISOR OPERATING SYSTEMS CAN COPE WITH MULTI-CORE CERTIFICATION · PDF...
© SYSGO AG 1
HOW HYPERVISOR OPERATING SYSTEMS
CAN COPE WITH MULTI-CORE
CERTIFICATION CHALLENGES
Sven Nordhoff,
Director Certification SYSGO AG
© SYSGO AG 2
Agenda
• Introduction Hypervisor / Segregation Kernel
• How to cope with Multi-Core?
• EASA/FAA Multi-Core issues
• Hypervisor OS solutions to support Multi-Core certification
• Current and future work
for more details have a look to the conference paper
© SYSGO AG 3
Introduction Hypervisor / Segregation Kernel
How to cope with Multi-Core?-
© SYSGO AG 4
Hypervisor OS – Architecture
• Small and fast micro kernel (Time Partitioning)
• Separation of critical and uncritical functionality (Resource Partitioning)
• Separation of HW and I/O resources (Resource Partitioning)
Low-criticality
Partition
(Network driver)
High-criticality
Partition
(Actuator control)
Medium-criticality
Partition
(File system)
Separation Kernel (Micro Kernel, Real-time)
Hardware
(CPUs, memory, and devices)
© SYSGO AG 5
Hypervisor OS – Resource Partitioning
Use MMU to Map
Memory to a
Partition
Execute in Kernel
Mode
Use IOMMU /
PAMU to Configure
Memory access for
Devices
Guaranteed Access
to Assigned
ResourcesNo access to other
resources / No Error
Propagation
Execute in User
Mode
Static
Configuration of
OS Resources
© SYSGO AG 6
Low-criticality
Partition
(Graphics)
High-criticality
Partition
(Actuator
control)
Medium-
criticality
Partition
(Communication)
High-criticality
Partition
(Database)
Time-Part A Time-Part B Time-Part C
Time-Part Dtime
8 ms 8 ms 6 ms
Hypervisor OS – Time Partitioning
4
Resource
Partitions
3 Time
Partitions
© SYSGO AG 7
Hypervisor OS – Partitioning of Multi-Core
possible?
Core 3Core 2Core 1
Low-criticality
Partition
(Network driver)
High-criticality
Partition
(Actuator control)
Medium-criticality
Partition
(File system)
Separation Kernel (Micro Kernel, Real-time)
Hardware
(CPUs, memory, and devices)
? ?
MCP interference
channels (memory,
caches, interconnect)
Hypervisor OS
techniques
adequate to cope
with this?
Time Partitioning ?
Resource Partitioning ?
Others ?
Separation of critical
and uncritical
functions on MCP
possible?
© SYSGO AG 8
Problems to certify Multi-Core Architectures
© SYSGO AG 9
Multi-Core Certification Concerns
• Design Assurance
• Leak of processor design documents may lead to undetected
interference channels
• In Service History
• There is not much in service history available for multi-core based
designs in avionics applications
• Hardware Interference Channels• Cores within a chip interact with each other and resources of the chip will be
shared (e.g. cache). This interaction and/or sharing leads to interferences
(e.g. timing) which need to be analyzed and possible mitigation / verification
activities to cope with this interference need to be defined.
• Measurement of WCET timing• Due to the sharing of resources in MCP and the usage of complex
infrastructure (like busses) the analysis and / or measurement of the worst
case timing is hard to achieve.
© SYSGO AG 10
MCP Interference Channels
• Shared caches
• Typically one L1 cache per CPU
• L2 cache shared on some CPU
• L3 cache typically shared
• Cache coherency protocol
• Problem grows with number of cores
• Configurable on some CPUs through core cross bar
• Global / Local cache flush and invalidate
• Shared buses (Core Connection, Processor, Memory, PCI)
• Shared Interrupts
• Shared devices (Memory, Timer, I/O)
© SYSGO AG 11
Worst case timing problems
Example: Access to shared data regions (Intel, AMD)
Presented on 29th Digital Avionics Systems Conference, Rudolf Fuchsen, 2010
© SYSGO AG 12
MCP CAST-32 (rev0, 05/2014)
MCP CRI (not published yet)
EASA/FAA
Multi-core Processors
certification guidance
© SYSGO AG 13
• CAST 32 paper (rev0) was published 05/2014
• includes 24 objective how to handle MCP in DAL A/B
• includes 16 objective how to handle MCP in DAL C
• limited to 2 active cores
• does not address IMAs, Incremental certification
• a lot of additional information and guidelines
• not the current EASA/FAA MCP position anymore
• EASA MCP CRI not published yet but under discussion in 2016
• includes 10 objective how to handle MCP in DAL A/B
• includes 7 objective how to handle MCP in DAL C
• not limited to two active cores anymore
• implements the current EASA/FAA MCP position
• IMA addressed
• … but this paper has no public availability
EASA/FAA MCP Status
(1) from presentation „FAA Status on Multi-Core Processors”, John Strasburger, 04/2014
(1)
© SYSGO AG 14
EASA/FAA CAST 32 / CRI Validity
• When does it apply ?
• Levels A, B, and C applications
• Smaller subset of objectives for DAL C applications.
• If only one core active – just two objectives
• Depending on the project – specific objectives may not apply
• Table shows which of the objectives apply by development
assurance level (A, B, C)
• Exempted configurations
• Hyperthreading CPUs are not covered by the CAST and CRI
• When two identical cores run in lock step
• Processors linked by conventional data buses, and not by shared
memory, shared cache, a coherency fabric / module / interconnect
• Systems on IDAL C which do not need robust partitioning do not
need to fulfill all objectives.
© SYSGO AG 15
CAST-32 – Topics (1)a) Configuration Settings
• Configuration of required, unused and dynamic features needs to be analyzed,
determined and documented (DO-254)
b) Processor Errata
• Process needs to be in place to assess MCP Errata sheets regularly
( same approach like COTS/microcontroller certification)
c) SW Hypervisors and MCP HW Hypervisor Features
• Needs to be identified in plans (PSAC, PHAC) and compliant with DO-178B/C
d) MCP Interference Channels
• Identification of interference channels and verify means of mitigation
e) Shared Memory and Cache (between processing cores)
• Describe shared resource approach in SW plans (PSAC)
• Prevent disruptions to deterministic software execution
• Analysis and tests to determine worse case effects of share memory and cache
WCET analysis
f) Planning and Verification of Resource Usage
• Describe approach in SW plans (PSAC)
• Allocate, manage and measure resource and interconnect usage.
• Verify resource and interconnect demands do not exceed the capacity.
© SYSGO AG 16
CAST-32 - Topics (2)
g) Software Planning and Development Processes
• Identify the MCP software architecture (in SW plans like PSAC)
• Describe the development and verification planned to demonstrate it
executes deterministically. ( standard DO-178B/C requirements)
h) Software Verification
• In best case verification on target MCP environment (justification if not)
• Software need to comply with DO-178B/C
• Data & control coupling between all software components hosted via shared
memory need to be verified.
i) Discovery of Additional Features or Problems
• Any other problem of MCP not described in CAST 32 need to be considered
j) Error Detection and Handling and Safety Nets
• Errors and failures (of the MCP) need to be addressed by Safety Net
approach
© SYSGO AG 17
EASA/FAA CRI objectives (1)§4.1 Software Planning
§4.2 The Planning and Setting of MCP Resources
• MCP_Planning_1:
• Identify MCP processor, # of active cores,
• MCP software architecture (e.g. AMP, SMP),
• SW components (e.g. OS, Hypervisor) with MCP related dynamic features
• Methods and tools for SW development/verification (EASA/FAA SW Guidance)
• MCP_Planning_2:
• Usage and configuration of MCP shared resources (avoid conflicts).
• Verify usage of shared resources (e.g. resource conflicts , overflow).
• Identify and usage of MCP dynamic features.
• MCP_Resource_Usage_1:
• Define the MCP configuration settings
• Satisfy the functional, performance and timing requirements of the system (!)
• MCP_Resource_Usage_2:
• Identify and verify adequacy of mitigation means for critical configuration settings
© SYSGO AG 18
EASA/FAA CRI objectives
§4.3 Interference Channels and Resource Usage
• MCP_Resource_Usage_3:
• Identification of interference channels
• Verification of means of mitigation of the interference.
• MCP_Resource_Usage_4:
• Identify available MCP resources and interconnect in final configuration
• Allocate MCP resource to SW applications
• Verify if this configuration do not exceed the available resources when all the hosted
software is executing on the target processor ( WCET).
§4.4 Software Verification
• MCP_Software_1:
• Verification that all SW components comply with EASA/FAA Software
Guidance (function correctly and have sufficient time to complete their
execution).
• MCP_Software_2:
• Verification of the data and control coupling between all the individual
software components ( same core, different cores)
© SYSGO AG 19
EASA/FAA CRI objectives §4.5 Error Detection and Handling, and Safety Nets
§4.6 Reporting of Compliance with the Objectives of the CRI
• MCP_Error_Handling_1:
• Identification of the effects of failures that may occur within the MCP
• Identification of means to handle the safety objectives accordingly
• Detect and handle MCP related failures within the equipment in a fail-
safe manner
• MCP_Accomplishment_Summary_1:
• Summary how the objectives of this CRI are met
• Document this summary in the
• SW Accomplishment Summary (SAS),
• HW Accomplishment Summary (HAS)
• or other deliverable documentation.
© SYSGO AG 20
EASA/FAA CRI objectives §5 Applicability of the MCP CRI Objectives according to their DAL.
Mapping between CRI and CAST 32
© SYSGO AG 21
Hypervisor / Segregation Kernel solutions to
support Multi-Core certification
© SYSGO AG 22
MCP Hypervisor Software Design
• Support static partitioning of CPU cores
• Load balancing algorithms may not be used during execution of
critical applications
• Control access to shared resources
• Avoid parallel execution of different criticality levels
• Eliminate “False Sharing” between partitions
• Avoid Multi-Core application design for safety critical applications
• Ensure proper data alignment inside shared software components
• Eliminate interference via OS internal synchronization
objects
• Minimize / Avoid OS global locks
Certification Aspects
© SYSGO AG 23
Scheduling scheme on a MCP platform with one
real time partition
• Avoid interference between applications with real time demands
• It is recommended to run the time critical application in its own time
partition without any activity in time partition 0 (service time partition)
• Resource Partitions need to
be set adequately to avoid
sharing conflicts between
cores
• Allocate one timing partition
to the safety-critical
applications
Safety-critical
Non-safety critical
Tp_1
Tp_2
Tp_3
1
2
3
4
Tp_1 Tp_2 Tp_3 Tp_1 Tp_2 Tp_3
1
1
4
2
2
2
3
1
1
4
2
2
2
3
A
B
C
D
A
B
C
D
Execution
Configuration
1
CPU Core
Resource
Partition
Time
Partition
© SYSGO AG 24
Define Cache / TLB Worst Case Jitter
• Caches and TLBs
• MCP Hypervisor OS shall provide means to invalidate instruction caches and
TBLs and to flush the data cache between time partition switches.
• This ensures that caches and TLBs are in a defined state when a partition starts
execution.
• The cache / TLB flush and invalidate operation takes place during the time partition
switch, so it will steal the CPU cycles from the partition to be activated.
• A small time partition window which is allocated to an unused time partition ID
should be inserted before the time critical application to eliminate the jitter of the
time critical application. This is shown in the figure below.
The platform specific worst case execution time analysis must provide the value for the
worst case jitter.
© SYSGO AG 25
MCP Compliance – Additional Analysis
Documentation
• SYSGO show compliance to the certification objectives (e.g.
EN50128, DO-178B/C) by providing a “Certification Kit”
including all relevant life cycle data and analysis means.
• NEW MCP Analysis will answer to most of the CRI-related
questions
• e.g. identification and mitigation of interference channels
• is typically CPU / MCP architecture specific
• will not address all HW related CRI aspects
Documentation Means of Compliance
PikeOS Partitioning Analysis (X86) SW Partitioning
PikeOS Stack Analysis (X86) Stack Usage
PikeOS WCET Timing Analysis WCET Analysis
PikeOS Safety Manual How to use OS in a „safe“ & „secure“ way
Safety Bulletin / Errata (for cert projects) Management of OPRs
PikeOS Multi-Core Analysis (for a given CPU architecture) Statement of EASA/FAA MCP CRI Compliance
© SYSGO AG 26
Current and future work
© SYSGO AG 27
Current Work
• PikeOS 3.4 is certified for Multi-Core projects against EN
50128 SIL 4 (Railway).
• Dual-Core approach
• SYSGO AG and Thales are currently working on the
preparation of next generation of
• PikeOS to be certifiable for DO-178C SW level C Multi-Core projects.
• PikeOS to be certifiable for DO-178C SW level A Multi-Core projects
(which includes also IMA considerations).
© SYSGO AG 28
SYSGO Engagement in Multi-Core Research • ARAMiS stands for Automotive, Railway and Avionics Multicore Systems
• ARAMiS is a three-year research project that has started on December 1, 2011.
It has received funding from the German Federal Ministry of Education and
Research.
MCFA
• EMC² – ‘Embedded Multi-Core systems for Mixed Criticality applications in
dynamic and changeable real-time environments’ is an ARTEMIS Joint
Undertaking project in the Innovation Pilot Program ‘Computing platforms for
embedded systems’ (AIPP5).
• SYSGO AG is supporting the working group Multi-Core For Avionics
(MCFA)
• PROXIMA pursues the development of probabilistically time analysable (PTA)
techniques and tools for multicore/many-core platforms. The project will
selectively introduce randomization in the timing behaviour of certain
hardware and software resources as a way to facilitate the use probabilities to
predict the overall timing behaviour of the software and its likelihood of timing
failure.
• Ashely: Extension of DME Concepts and solutions. Multi-Domain, secured
Data Distribution services to streamline aircraft data distribution.
© SYSGO AG 29
Thank you for your attention!
Questions?
More information on www.sysgo.com