Model-Driven Performance Evaluation and Formal … Performance Evaluation and Formal Veriﬁcation...

Model-Driven Performance Evaluation and Formal Verification forMulti-level Embedded System Design

Daniela Genius1, Letitia W. Li2, Ludovic Apvrille2

1 Sorbonne Universites, UPMC Paris 06, LIP6, CNRS UMR 7606, Paris, France2 Telecom ParisTech, Universite Paris-Saclay, Biot, France

[email protected], {letitia.li,ludovic.apvrille}@telecom-paristech.fr

Keywords: Virtual prototyping, Embedded systems, System-level design, Formal verification

Abstract: The design methodology of an embedded system should start with a system-level partitioning dividing func-tions into hardware and software. However, since this partitioning decision is taken at a high level of ab-straction, we propose regularly validating the selected partitioning during software development. The paperintroduces a new model-based engineering process with a supporting toolkit, first performing system-levelpartitioning, and then assessing the partitioning choices thus obtained at different levels of abstraction dur-ing software design. This assessment shall in particular validate the assumptions made on system-level (e.g.on cache miss rates) that cannot be precisely determined without low-level hardware model. High-level parti-tioning simulations/verification rely on custom model-checkers and abstract models of software and hardware,while low-level prototyping simulations rely on automatically generated C-POSIX software code executing ona cycle-precise virtual prototyping platform. An automotive case study on an automatic braking applicationillustrates our complete approach.

1 Introduction

Embedded systems are composed of a tightlyintegrated ensemble of HW and SW components.The design of these systems usually starts with asystem-level partitioning phase, continues with sep-arate software and hardware design, and finishes witha HW/SW integration. In fact, this integration canalso be performed progressively during software de-sign using prototyping techniques.

At system-level partitioning, properties of em-bedded applications can be tested rapidly and theirdescription remains somewhat “human-readable”.Properties can also be proven formally. Exploration,however, remains at a rather abstract level e.g. manyhardware parameters are approximated. For example,the cache miss rate is modeled as a fixed value (e.g.,5%) obtained from the architect’s experience.

After partitioning, software is designed at lowerabstraction levels. Commonly, the hardware target isnot available, leading to the use of simulation tech-niques with precise hardware models. System pa-rameters, such as the cache miss ratio, can be closelyevaluated with simulations and formal proofs. Unfor-tunately, more details also means much slower sim-ulations, and infeasible formal proofs, even if com-

positional approaches could help handle entire hard-ware platforms. However, they are costly (Basu et al.,2011; Syed-Alwi et al., 2013) in terms of develop-ment time.

To improve both development stages (partitioning,prototyping), we propose to unify them in a commonSysML formalism. In fact, prototyping can rely onsoftware and hardware elements that were formallyevaluated at partitioning. Partitioning models can beenhanced using precise parameters that can be ob-tained during simulation at the prototyping level. Ourtoolkit, TTool (Apvrille, 2015), supports both stages,and makes it possible, at the push of a button, to eval-uate the design at a given development stage, and topropagate the results to enhance the system at an-other development state, thus easing development it-erations. We previously described (Li et al., 2016) ourapproach towards multi-level Design Space Explo-ration, but without the ability to generate detailed per-formance metrics during prototyping that we presentin this paper.

Section 2 presents related work. Section 3presents the overall design method. Section 4 detailsan automotive case study used to exemplify the high-level design space exploration (Section 5), as well assoftware component design and performance evalua-

tion (Section 6). A final discussion and perspectiveson future work are presented in Section 7.

2 System-level Design for EmbeddedSystems

A number of system-level design tools exist, of-fering a variety of verification and simulation capa-bilities at different levels of abstraction.

Ptolemy (Buck et al., 2002) proposes a modelingenvironment for the integration of diverse executionmodels, in particular hardware and software compo-nents. If design space exploration can be performedwith Ptolemy, its first intent is the simulation of themodeled systems.

Metropolis (Balarin et al., 2003) targets hetero-geneous systems, and architectural and applicationconstraints are closely interwoven. This approach ismore oriented towards application modeling, even ifhardware components are closely associated to themapping process. While our approach uses Model-Driven Engineering, Metropolis uses Platform-BasedDesign.

Sesame (Erbas et al., 2006) proposes modelingand simulation features at several abstraction lev-els for Multiprocessor System-on-Chip architectures.Pre-existing virtual components are combined to forma complex hardware architecture. Models’ semanticsvary according to the levels of abstraction, rangingfrom Kahn process networks (KPN (Kahn, 1974)) todata flow for model refinement, and to discrete eventsfor simulation. Currently, Sesame is limited to theallocation of processing resources to application pro-cesses. It models neither memory mapping nor thechoice of the communication architecture.

The ARTEMIS (Pimentel et al., 2001) project isstrongly based on the Y-chart approach. Applicationand architecture are clearly separated: the applicationproduces an event trace at simulation time, which isread by the architecture model. However, behaviordepending on timers and interrupts cannot be takeninto account.

MARTE (Vidal et al., 2009) shares many com-monalities with our approach, in terms of the ca-pacity to separately model communications from thepair application-architecture. However, it intrinsicallylacks a separation between control and message ex-change.

Other works based on UML/MARTE, such asGaspard2 (Gamatie et al., 2011), are dedicated to bothhardware and software synthesis, relying on a refine-ment process based on user interaction to progres-sively lower the level of abstraction of input models.

However, such a refinement does not completely sep-arate the application (software synthesis) or architec-ture (hardware synthesis) models from communica-tion.

Rhapsody can automatically generate software,but not hardware descriptions from SysML. MDGenfrom Sodius (Sodius Corporation, 2016) adds tim-ing and hardware specific artifacts such as clock/resetlines automatically to Rhapsody models, generatessynthesizable, cycle-accurate SystemC implementa-tions, and automates exploration of architectures.

The Architecture Analysis & Design LanguageAADL (Feiler et al., 2004) allows the use of formalmethods for safety-critical real-time systems. Simi-lar to our environment, a processor model can havedifferent underlying implementations and its charac-teristics can easily be changed at the modeling stage.Recently, (Yu et al., 2015) developed a model-basedformal integration framework which endows AADLwith a language for expressing timing relationships.

Capella (Polarsys, 2008) relies on Arcadia, a com-prehensive model-based engineering method. It is in-tended to check the feasibility of customer require-ments, called needs, for very large systems. Capellaprovides architecture diagrams allocating functions tocomponents, and advanced mechanisms to model bit-precise data structures.

3 Methodology

3.1 Modeling Phases

Our approach combines partitioning - the partitioningdecision relies on design space exploration techniques- and software design. The latter includes the proto-typing of the designed software. All stages are sup-ported within the same SysML-based free and open-source environment/toolkit (as shown in Figure 1):

1. The overall method starts with a partitioningphase containing three sub-phases: the modelingof the functions to be realized by the system (func-tional view), the modeling of the candidate archi-tecture as an assembly of highly abstracted hard-ware nodes, and the mapping phase. A functionmapped on a processor is a software function, afunction mapped on a hardware accelerator cor-responds to a custom ASIC (Application-specificIntegrated Circuit).

2. Once the system is fully partitioned, the secondphase starts with the design of the software andthe hardware. Our approach offers software mod-eling while taking into account hardware parame-

Final software code

Refinements

VHDL/Verilog

Software Design and Prototyping

(AVATAR)

Deployment view

......

Hardware design

Abstractions

Abstractions

Reconsiderationof partitioningdecisions

Simulation and Verification : safety, security and performance

Mapping view

Functional view Architecture view

Software Component Hardwaremodel

Partitioning with

Design Space Exploration techniques

(DIPLODOCUS)

Figure 1: Overall Approach

ters for prototyping purposes. Thus, a deploymentview displays how the software components areallocated to the hardware components. Code canthen be generated both for the software compo-nents of the application (in C/POSIX code) andfor the virtual hardware nodes (in SoCLib (So-CLib consortium, 2010) System C format).

Choice of parameters on the higher level is subject tovalidation or invalidation due to experimental resultson the generated prototype. Thus, simulations resultsat prototyping level could lead to reconsider the parti-tioning decisions.

3.2 Simulation, Verification andPrototyping

During the methodological phases, simulation andformal verification help in deciding whether safety,performance and security requirements are fulfilled.Our toolkit offers a press-button approach for per-forming these proofs. Model transformations trans-late the SysML models into an intermediate form thatis sent into the underlying simulation and formal ver-ification utilities. Backtracing to models is then per-formed to better inform the users about the verifica-tion results. Proofs of safety involve UPPAAL seman-tics (Bengtsson and Yi., 2004), and security proofsuse ProVerif (Blanchet, 2010). Before the next stage,simulation and formal verification ensure that our de-sign meets performance, behavioral, and schedulabil-

ity requirements. Simulation of partitioning specifi-cations involves executing tasks on the different hard-ware elements in a transactional high-level way. Eachtransaction executes for a variable time depending onexecution cycles and CPU parameters. The simula-tion shows performance results like bus usage, CPUusage, execution time, etc., so as to help users de-cide on an architecture and mapping. For example,singles execution sequences can be investigated withgtkwave. Also, our toolkit assists the user by automat-ically generating all possible architectures and map-pings, and summarizes performance results of eachpossible mapping. Users are provided with the “best”architecture under specified criteria, such as minimallatency or bus/CPU load.

During functional modeling, verification intendsto identify general safety properties (e.g., absence ofdeadlock situations). At the mapping stage, verifi-cation intends to ascertain if performance and secu-rity requirements are met. Hardware components arehighly abstracted. For example, a CPU can be de-fined with a set of parameters such as an averagecache-miss ratio, power-saving mode activation, con-text switch penalty, etc.

After mapping, software components can also beverified independently of any hardware architecturein terms of safety and security. For example, whendesigning a component implementing a security pro-tocol, the reachability of the states and absence of se-curity vulnerabilities can be verified. When the soft-

<<CPURR>>CPU_CU

Braking - FV::DSRCManagement

Braking - FV::CorrectnessCheckingBraking - FV::CorrectnessChecking

Braking - FV::NeighbourhoodTableManagementBraking - FV::NeighbourhoodTableManagement

<<HWA>>PTC_Devices

Braking - FV::doReduceDrivingPowerBraking - FV::doReduceDrivingPower

<<CPURR>>CPU_PTC

Braking - FV::DrivingPowerReductionstrategyBraking - FV::DrivingPowerReductionstrategy

<<MEMORY>>Flash_PTC

<<BUS-RR>>CAN_CU

<<MEMORY>>RAM_PTC

<<BUS-RR>>CAN_CSCU

<<BRIDGE>>CSCU_to_CAN

<<BRIDGE>>PTC_to_CAN

<<MEMORY>>Flash_CSCU

<<MEMORY>>Flash_BCU

<<HWA>>UMTS

<<HWA>>DSRC

Braking - FV::DSRCRxTx

<<MEMORY>>Flash_CU

<<BUS-RR>>CAN_CU

<<BRIDGE>>CU_to_CAN

<<BUS-RR>>CAN

<<CPURR>>CPU_CSCU

Braking - FV::ObjectListManagementBraking - FV::ObjectListManagement

Braking - FV::PlausibilityCheckBraking - FV::PlausibilityCheck

Braking - FV::VehicleDynamicsManagementBraking - FV::VehicleDynamicsManagement

<<MEMORY>>RAM_CSCU

<<HWA>>ChassisSensors

Braking - FV::GetVehicleDynamicsBraking - FV::GetVehicleDynamics

<<HWA>>EnvSensors

Braking - FV::GetEnvironmentInformationBraking - FV::GetEnvironmentInformation

<<BRIDGE>>BCU_to_CSCU <<BUS-RR>>

CAN_BCU

<<CPURR>>CPU_BCU

Braking - FV::DangerAvoidanceStrategyBraking - FV::DangerAvoidanceStrategy

Braking - FV::BrakeManagementBraking - FV::BrakeManagement

<<MEMORY>>RAM_BCU

<<HWA>>BrakingControlDeviceBraking - FV::DoBrakeBraking - FV::DoBrake

<<HWA>>GPS

Braking - FV::GPSReception

<<MEMORY>>RAM_CU

Figure 2: Automotive Case Study Architecture Diagram

ware components become more refined, it becomesimportant to evaluate their performance when exe-cuted on the target platform. Since the target sys-tem is commonly not yet available, our approach of-fers two facilities: a Deployment Diagram in whichsoftware components can be mapped over hardwarenodes (see Figure 4), and a press-button approach totransform this Deployment Diagram into a specifica-tion built upon virtual component models. For this,we use SoCLib, a public domain library of componentmodels written in SystemC. SoCLib targets shared-memory multiprocessor-on-chip system (MP-SoC) ar-chitectures based on the Virtual Component Intercon-nect (VCI) protocol (VSI Alliance, 2000) which sep-arates the components’ functionality from commu-nication. Hardware is described at several abstrac-tion levels: TLM (Transaction level), CABA (Cy-cle/Bit Accurate), and RTL (Register Transfer Level).SoCLib also contains a set of performance evalua-tion tools (Genius et al., 2011). Last but not least,the SoCLib prototyping platform comes with an oper-ating system well adapted to multiprocessor-on-chip(Becoulet, 2009).

If the performance results of the SystemC simula-tion differ too greatly from the ones obtained duringthe design space exploration stage – e.g., a cache missratio – then, design space exploration shall be per-formed again to assess if the selected architecture isstill the best according to the system requirements. Ifnot, software components may be (re)designed. Oncethe iterations over the high-level design space explo-ration and the low level virtual prototyping of soft-ware components are finished, software code can be

generated from the most refined software model.

4 Automotive Case Study

Our methodology is illustrated using an automo-tive embedded system designed in the scope of theEuropean EVITA project (EVITA, 2011). Recent on-board Intelligent Transport (IT) architectures com-prise a very heterogeneous landscape of communica-tion network technologies (e.g., LIN, CAN, MOST,and FlexRay) that interconnect in-car Electronic Con-trol Units (ECUs).

The increasing number of such equipment trig-gers the development of novel applications that arecommonly spread among several ECUs to fulfill theirgoals. Prototyping on multiprocessor architectures,even if they are more generic than the final hardware,is thus very useful.

An automatic braking application serves as a casestudy (Kelling et al., 2009). The system works essen-tially as follows: an obstacle is detected by anotherautomotive system which broadcasts that informationto neighboring cars. A car receiving such informa-tion has to decide if it is concerned with this obstacle.This decision includes a plausibility check functionthat takes into account various parameters, such as thedirection and speed of the car, and also informationpreviously received from neighboring cars. Once thedecision to brake has been taken, the braking orderis forwarded to relevant ECUs. Also, the presence ofthis obstacle is forwarded to other neighboring cars incase they have not yet received this information.

Figure 3: Active Braking Block Diagram

The stages of the methodology include Partition-ing by Design Space Exploration, Software Design,and Prototyping, with different models at each stage.Figure 2 shows the model for Partitioning: an Archi-tecture Diagram with the tasks divided onto differentCPUs and Hardware Accelerators. Figure 3 shows theBlock Diagram for Software Design. Figure 4 showsthe Deployment Diagram. We elaborate in detail onthe different stages in the following sections.

5 Hardware/Software Partitioning

5.1 Modeling

The HW/SW Partitioning phase of our methodologyintends to model the abstract, high-level functional-ity of a system (Knorreck et al., 2013). It followsthe Y-chart approach, first modeling the abstract func-tional tasks, candidate architectures, and then finally

mapping tasks to the hardware components (Kienhuiset al., 2002). The application is modeled as a set ofcommunicating tasks on the Component Design Dia-gram (an extension of the SysML Block Instance Di-agram). Task behavior is modeled using communi-cation operators, computation elements, and controlelements.

The architectural modeling (Figure 2) is displayedas a graph of execution nodes, communication nodes,and storage nodes. Execution nodes, such as CPUsand Hardware Accelerators, include parameters suchdata size, instruction execution time, and clock ratio(see Figure 5. CPUs also must be defined by taskswitching time, cache-miss percentage, etc. Commu-nication nodes include bridges and buses. Buses con-nect execution and storage nodes, and bridges connectbuses. Buses are defined by parameters such as arbi-tration policy, data size, clock ratio, etc, and bridgesare characterized by data size and clock ratio. Stor-age nodes are Memories, which are defined by datasize and clock ratio.

Figure 4: Deployment Diagram of the Active Braking Application: five CPUs and five RAMs

Mapping involves specifying the location of taskson the architectural model. A task mapped onto a pro-cessor will be implemented in software, and a taskmapped onto a hardware accelerator will be imple-mented in hardware. The exact physical path of adata/event write may also include mapping channelsto buses and bridges. Alternatively, if the data pathis complex (e.g., DMA transfer), channels can bemapped over communication patterns (Enrici et al.,2014).

Figure 5: Adapting architecture parameters during parti-tioning

5.2 High-Level Simulation

Using simulation techniques described in section 3.2,we can see that the mapping of tasks of our case study(see Figure 2) ensures that the maximum latency be-tween the decision (DangerAvoidanceStrategy) andthe resulting actions (doReduceDrivingPower andDoBrake) respect safety requirements. Similarly, wehave verified that the worst latency between the recep-tion of an emergency message by DRSCManagementand the consequent actions (e.g., DoBrake) is alwaysalso below the specified limit. These performanceverifications are performed according to the selectedfunctions, operating systems and hardware compo-nents. In particular, many parameters of the hardwarecomponents are simple values (we have for exampleselected a cache-miss ratio of 5%) that are meant tobe confirmed during the software design phase.

6 Software Design withAVATAR/SoCLib

Once the partitioning is complete, the AVATARmethodology (Pedroza et al., 2011) allows the userto design the software, perform functional simulationand formal verification, and finally test the softwarecomponents in a virtual prototyping environment.

6.1 Software Components

Figure 3 shows the software components of the activebraking use case modeled using an AVATAR blockdiagram. These modeling elements have been se-lected during the previous modeling stage (partition-ing). Software components are grouped according totheir destination ECU:

• Communication ECU manages communicationwith neighboring vehicles.

• Chassis Safety Controller ECU (CSCU) pro-cesses emergency messages and sends orders tobrake to ECUs.

• Braking Controller ECU (BCU) contains twoblocks: DangerAvoidanceStrategy determineshow to efficiently and safely reduce the vehiclespeed, or brake if necessary. BrakeManager oper-ates the brake for a given duration.

• Power Train Controller ECU (PTC) enforcesthe engine torque modification request.

The AVATAR model can be functionally simu-lated using the integrated simulator of our toolkit,which takes into account temporal operators but com-pletely ignores hardware, operating systems and mid-dleware. While being simulated, the model of thesoftware components is animated. This simulationaims at identifying logical modeling bugs. Figure 6shows the state machine of DangerAvoidanceStrat-egy, Figure 8 shows a visualization of the generatedsequence diagram.

Figure 6: High Level Simulation of the Active Braking Au-tomotive system: State Machine

We show traces for the CarPositionSimulatorblock and for three of the blocks which interact in anemergency braking situation: DrivingPowerReduc-tionStrategy, DangerAvoidanceStrategy and BrakeM-anagement.

6.2 Formal Verification

During formal verification of safety properties withUPPAAL, a model checker for networks of timed au-tomata, the behavioral model of a system to be ver-ified is first translated into a UPPAAL specificationto be checked for desired behavior. For example,UPPAAL may verify the lack of deadlock, such astwo threads both waiting for the other to send a mes-sage. Behavior may also be verified through “Reach-ability”, “Leads to”, and other general statements.The designer can indicate which states in the Ac-tivity Diagram or State Machine Diagram should bechecked if they can be reached in any execution trace.“Leads to” allows us to verify that one state mustalways be followed by another. Other user-definedUPPAAL queries can check if a condition is alwaystrue, is true for at least one execution trace, or if itwill be true eventually for all execution traces. Thesestatements may be entered directly on the UPPAALmodel checker, or permanently stored on the modelas pragma to be verified in UPPAAL.

For example, for our case study, we can verify thatstate ‘Plausibility Check’ is always executed after aneighboring car signals that it has detected an obsta-cle. We can also verify that an order to brake canbe received, or state ‘Braking Management’ in Task‘Danger Avoidance Strategy’ is reachable. Figure 7shows the UPPAAL verification window which al-lows the user to customize which queries to execute,and then returns the results as shown.

Figure 7: UPPAAL Formal Verification

Figure 8: High Level Simulation of the Active Braking Automotive system: generated Sequence diagram

6.3 Prototyping

To prototype the software components with the otherelements of the destination platform (hardware com-ponents, operating system), a user must first mapthem to a model of the target system. Mapping canbe performed using the new deployment features re-cently introduced in (Genius and Apvrille, 2016). AnAVATAR Deployment Diagram is used for that pur-pose. It features a set of hardware components, theirinterconnection, tasks, and channels.

The partitioning phase selected an architecturewith five clusters. Some tasks are destined to be soft-ware tasks (they are mapped onto CPUs), and theothers are expected to be realized as hardware ac-celerators. Yet, each specific hardware accelerator inSoCLib needs to be developed specifically which re-quires a significant effort. We do not consider thatcase in the paper since all AVATAR tasks are soft-ware tasks. The five clusters are represented by fiveCPUs and the channels between AVATAR tasks areimplemented as software channels mapped to on-chipRAM.

Some properties pertaining to mapping must beexplicitly captured in the Deployment Diagram, suchas CPUs, memories and their parameters, while oth-ers, such as simulation infrastructure and interruptmanagement, are added transparently to the top cellduring the transformation to SoCLib. Figure 4 showsthe Deployment Diagram of the software componentsof the active braking application mapped on five pro-cessors and five memory elements. From the Deploy-ment Diagram, a SoCLib prototype is then generated.This prototype consists of a SystemC top cell, the em-bedded software in the form of POSIX threads com-piled for the target processors, and the embedded op-erating system (Figure 9).

6.4 Capturing PerformanceInformation

We now present how performance information can beobtained from the use case simulated with SoCLib. Inthe experiments shown here, we use PowerPC cores.The cycle accurate bit accurate (CABA)-level simu-lation allows measurement of cache miss rates, la-

Figure 9: AVATAR/SoCLib Prototyping Environment in TTool

tency of any transaction on the interconnect, tak-ing/releasing of locks, etc. Since SoCLib hardwaremodels are much more precise than the ones usedat the design space exploration level, precise timingand hardware mechanisms can be evaluated. How-ever, these evaluations take considerable time com-pared to high-level simulation/evaluation. We restrictourselves to using only the hardware counters avail-able in the SoCLib cache module.

We start by an overview of performance problems.For this, we use an overall metric summing up all phe-nomena that slow down execution of instructions bythe processor, such as memory access latency, inter-connect contention, overhead due to context switch-ing etc.: Cycles per Instruction (CPI). For bottom linecomparison, the CPI is first measured on a mono pro-cessor platform (Figure 10). On this platform, the sin-gle processor is constantly overloaded (CPI > 16).

Our tool allows per-processor performance evalu-ation, which is particularly useful in detecting unbal-anced CPU loads. Even when prototyping onto fiveprocessors (Figure 11) to reflect the DIPLODOCUSpartitioning, the CPU loads are not very well bal-anced. This is due to the fact that currently, a cen-tral request manager is required to capture the se-mantics of AVATAR channels. Requests are storedin waiting queues for synchronous as well as asyn-

14

14.2

14.4

14.6

14.8

15

15.2

15.4

15.6

15.8

16

16.2

0 5 10 15 20 25 30

CPI

mio. simulation cycles

cpu

Figure 10: CPI per processor for a mono processor config-uration

chronous communication, and, in synchronous com-munications, cancelled when they became obsolete.CPU0 frequently needs to access memory areas stor-ing the boot sequence and the central request man-ager. Future work will address a better distribution ofthese functionalities, called the AVATAR runtime, overthe MPSoC architecture. Another interesting observa-tion is that in the five processor configuration, CPU4is more strongly challenged than the others. Look-ing at the AVATAR block diagram, it becomes clearthat the CSCU, mapped on CPU4, is connected byAVATAR channels to all the other ECUs.

We now investigate the cache miss rate. One

2

4

6

8

10

12

14

16

0 5 10 15 20 25 30

CPI


cpu0cpu1cpu2cpu3cpu4

Figure 11: CPI per processor for a 5 processor configuration

important parameter of the CPU used in theDIPLODOCUS partitioning is the overall cache missrate (see line Cache-miss in Figure 5). While the es-timated 5% of cache misses includes both data andinstruction cache misses, SoCLib measures them sep-arately. Instruction cache miss rates will be higher forthe cache of CPU0 because the central request man-ager runs on this CPU, as noted in the previous para-graph.

We vary size and associativity of both caches, ini-tially considering direct mapped caches (Figure 13),then setting associativity to four (Figure 14) for thesame size. This action can be performed with a fewmouse clicks (see Figure 12).

Figure 12: Varying cache associativity with a few mouseclicks

For the instruction cache, using the same param-eters (Figures 15 and 16), miss rates are closer to theestimated ones.

Even if we do not explore the cache parametersfully in the work presented here, we can already con-clude from this first exploration that data cache misseswere overestimated; they are below 10−7. As for in-struction cache misses, they are below 10% for thecache of CPU0, below 2% for the other four caches.

0

2x10-7

4x10-7

6x10-7

8x10-7

1x10-6

0 5 10 15 20 25 30

Cach

e m

iss

rate


cpu 0cpu1cpu2cpu3cpu4

Figure 13: Data cache misses per processor for a 5 proces-sor configuration with a direct mapped cache

0

2x10-7

4x10-7

6x10-7

8x10-7

1x10-6

0 5 10 15 20 25 30

Cach

e m

iss

rate



Figure 14: Data cache misses per processor for a 5 proces-sor configuration with 4 cache sets

We can thus lower the estimations, distinguishingbetween CPU0 and the others. Since our toolkit doesnot distinguish between data and instruction cachemisses during partitioning, we take the less favorablecase of instruction cache misses and raise the missrate for CPU0 to 10%, and lower it to 2% for the oth-ers. Figure 5 shows the window for customizing theCPU during partitioning, where we can now adapt thecache miss rate (and redo the partitioning).

We finally compare the influence of the intercon-nect latency (10 and 20 cycles, see Figures 17 and18). We observe a significant influence on the cost ofa cache miss; latency of data cache misses is generally

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0 5 10 15 20 25 30

Cach

e m

iss

rate



Figure 15: Instruction cache misses per processor for a 5processor configuration with a direct mapped cache

0

0.02

0.04

0.06

0.08

0.1

0.12

0 5 10 15 20 25 30

Cach

e m

iss

rate



Figure 16: Instruction cache misses per processor for a 5processor configuration with 4 cache sets

higher.We observe after these first exploration steps that

apart from correcting the estimated cache miss ratein DIPLODOCUS, adding another CPU in order totake some of the load from CPU4 would improve theperformance.

As we can see in the CPU attributes window ofFigure 5, our toolkit potentially allows a designer toimprove estimates of several more hardware parame-ters like branch misprediction rate and go idle time.Until now, we used only the hardware counters im-plemented in the SoCLib components. Taking intoaccount the OS, over which we have full control, wewill soon be able to address other issues such as taskswitching time.

30

35

40

45

50

55

60

65

0 5 10 15 20 25 30

cycl

es


latency 10latency 20

Figure 17: Cost of instruction cache miss in cycles for a 5processor configuration with 4 cache sets

7 Discussion and Future Work

Our model-driven approach with a SysML-basedmethodology and supporting toolkit enables designersto capture systems at multiple levels and facilitates thetransitions between embedded system design stages.Prototyping from AVATAR enables the user to takeinto account performance results in a few clicks in the

40

45

50

55

60

65

70

75

80

85

90

95

0 5 10 15 20 25 30

cycl

es


latency 10latency 20

Figure 18: Cost of data cache miss in cycles for a 5 proces-sor configuration with 4 cache sets

Deployment Diagram, though the process is not yetfully automated.

In order to deliver more realistic results, weare currently working on integrating clustered archi-tectures. These architectures are supported in So-Clib, but various details make top cell generationmuch more complex (two-level mapping table, ad-dress computation complexity, etc.).

To help backtrace low level results (prototyping)to a higher level (partitioning), we are currently work-ing on providing the performance graphs shown in thepaper directly and automatically in the toolkit. Also,most metrics we have exemplified are CABA-based.We could also propose two other abstraction levelsof SoCLib: TLM (Transaction Level) and TLM-T(Transaction Level with Time). Future work will fo-cus on adding these intermediate levels, considerablyspeeding up prototypes at the cost of loss of preci-sion to be evaluated. However, using this intermediatelevel of abstraction would smooth the developmentgap between system-level and low-level prototyping.

REFERENCES

Apvrille, L. (2015). Webpage of TTool. Inhttp://ttool.telecom-paristech.fr/.

Balarin, F., Watanabe, Y., Hsieh, H., Lavagno,L., Passerone, C., and Sangiovanni-Vincentelli,A. L. (2003). Metropolis: An integrated elec-tronic system design environment. IEEE Com-puter, 36(4):45–52.

Basu, A., Bensalem, S., Bozga, M., Combaz, J., Jaber,M., Nguyen, T.-H., and Sifakis, J. (2011). Rig-orous component-based system design using theBIP framework.

Becoulet, A. (2009). Mutekh operating system (web-page). http://www.mutekh.org.

Bengtsson, J. and Yi., W. (2004). Timed automata:

Semantics, algorithms and tools. In LectureNotes on Concurrency and Petri Nets, pages 87–124. W. Reisig and G. Rozenberg (eds.), LNCS3098, Springer-Verlag.

Blanchet, B. (2010). Proverif automatic crypto-graphic protocol verifier user manual. Techni-cal report, CNRS, Departement d’InformatiqueEcole Normale Superieure, Paris.

Buck, J., Ha, S., Lee, E. A., and Messerschmitt, D. G.(2002). Ptolemy: a framework for simulatingand prototyping heterogeneous systems. Read-ings in hardware/software co-design, pages 527–543.

Enrici, A., Apvrille, L., and Pacalet, R. (2014). Auml model-driven approach to efficiently allo-cate complex communication schemes. In MOD-ELS conference, Valencia, Spain.

Erbas, C., Cerav-Erbas, S., and Pimentel, A. D.(2006). Multiobjective optimization and evolu-tionary algorithms for the application mappingproblem in multiprocessor system-on-chip de-sign. IEEE Transactions on Evolutionary Com-putation, 10(3):358–374.

EVITA (2011). E-safety Vehicle InTrusion protectedApplications. http://www.evita-project.org/.

Feiler, P. H., Lewis, B. A., Vestal, S., and Colbert,E. (2004). An overview of the SAE architectureanalysis & design language (AADL) standard:A basis for model-based architecture-driven em-bedded systems engineering. In Dissaux, P.,Filali-Amine, M., Michel, P., and Vernadat, F.,editors, IFIP-WADL, volume 176 of IFIP, pages3–15. Springer.

Gamatie, A., Beux, S. L., Piel, E., Atitallah, R. B.,Etien, A., Marquet, P., and Dekeyser, J.-L.(2011). A model-driven design framework formassively parallel embedded systems. ACMTrans. Embedded Comput. Syst, 10(4):39.

Genius, D. and Apvrille, L. (2016). Virtual yet pre-cise prototyping : An automotive case study. InERTSS’2016, Toulouse.

Genius, D., Faure, E., and Pouillon, N. (2011). Map-ping a telecommunication application on a mul-tiprocessor system-on-chip. In Gogniat, G.,Milojevic, D., and Erdogan, A. M. A. A., ed-itors, Algorithm-Architecture Matching for Sig-nal and Image Processing, chapter 1, pages 53–77. Springer LNEE vol. 73.

Kahn, G. (1974). The semantics of a simple lan-guage for parallel programming. In Rosenfeld,J. L., editor, Information Processing ’74: Pro-ceedings of the IFIP Congress, pages 471–475.North-Holland, New York, NY.

Kelling, E., Friedewald, M., Leimbach, T., Menzel,M., Sieger, P., Seudie, H., and Weyl, B. (2009).Specification and evaluation of e-security rele-vant use cases. Technical Report DeliverableD2.1, EVITA Project.

Kienhuis, B., Deprettere, E., van der Wolf, P., andVissers, K. (2002). A Methodology to De-sign Programmable Embedded Systems: The Y-Chart Approach. In Embedded Processor DesignChallenges, pages 18–37. Springer.

Knorreck, D., Apvrille, L., and Pacalet, R. (2013).Formal System-level Design Space Exploration.Concurrency and Computation: Practice andExperience, 25(2):250–264.

Li, L., Apvrille, L., and Genius, D. (2016). Vir-tual prototyping of automotive systems: Towardsmulti-level design space exploration. In Confer-ence on Design and Architectures for Signal andImage Processing.

Pedroza, G., Knorreck, D., and Apvrille, L. (2011).AVATAR: A SysML environment for the formalverification of safety and security properties. InThe 11th IEEE Conference on Distributed Sys-tems and New Technologies (NOTERE’2011),Paris, France.

Pimentel, A. D., Hertzberger, L. O., Lieverse, P.,van der Wolf, P., and Deprettere, E. F. (2001).Exploring embedded-systems architectures withartemis. IEEE Computer, 34(11):57–63.

Polarsys (2008). ARCADIA/CAPELLA (webpage).SoCLib consortium (2010). SoCLib: an open

platform for virtual prototyping of multi-processors system on chip (webpage). Inhttp://www.soclib.fr.

Sodius Corporation (2016). MDGen for SystemC.http://sodius.com/products-overview/systemc.

Syed-Alwi, S.-H., Braunstein, C., and Encrenaz, E.(2013). Efficient Refinement Strategy Exploiting Component Properties in a CEGAR Process,volume 265 of Lecture Notes in Electrical Engi-neerin g, chapter 2, pages 17–36. Springer.

Vidal, J., de Lamotte, F., Gogniat, G., Soulard, P.,and Diguet, J.-P. (2009). A co-design approachfor embedded system modeling and code genera-tion with UML and MARTE. In DATE’09, pages226–231.

VSI Alliance (2000). Virtual Component InterfaceStandard (OCB 2 2.0). Technical report, VSI Al-liance.

Yu, H., Joshi, P., Talpin, J.-P., Shukla, S. K., and Shi-raishi, S. (2015). The challenge of interoperabil-ity: model-based integration for automotive con-trol software. In DAC, pages 58:1–58:6. ACM.

Model-Driven Performance Evaluation and Formal … Performance Evaluation and Formal Veriﬁcation...

Documents

Transcript of Model-Driven Performance Evaluation and Formal … Performance Evaluation and Formal Veriﬁcation...