Scalable and Precise Estimation and Debugging of the ... manuscript No. (will be inserted by the...

25
Noname manuscript No. (will be inserted by the editor) Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors A Comeback of Model Checking Martin Becker 1 · Ravindra Metta 2 · R Venkatesh 2 · Samarjit Chakraborty 1 This is a pre-print of an article published in the International Journal on Software Tools for Technology Transfer. The final authenticated version is available online at: https://doi.org/TODO Abstract Estimating the Worst-Case Execution Time (WCET) of an application is an essential task in the context of devel- oping real-time or safety-critical software, but it is also a complex and error-prone process. Conventional approaches require at least some manual inputs from the user, such as loop bounds and infeasible path information, which are hard to obtain and can lead to unsafe results if they are incorrect. This is aggravated by the lack of a comprehensive expla- nation of the WCET estimate, i.e., a specific trace showing how WCET was reached. It is therefore hard to spot incorrect inputs and hard to improve the worst-case timing of the appli- cation. Meanwhile, modern processors have reached a com- plexity that refutes analysis and puts more and more burden on the practitioner. In this article we show how all of these issues can be significantly mitigated or even solved, if we use processors that are amenable to WCET analysis. We define and identify such processors, and then we propose an auto- mated tool set which estimates a precise WCET without un- safe manual inputs, and also reconstructs a maximum-detail view of the WCET path that can be examined in a debug- ger environment. Our approach is based on Model Checking, which however is known to scale badly with growing appli- cation size. We address this issue by shifting the analysis to source code level, where source code transformations can be applied that retain the timing behavior, but reduce the com- plexity. Our experiments show that fast and precise estimates can be achieved with Model Checking, that its scalability can even exceed current approaches, and that new opportunities arise in the context of “timing debugging”. E-mail: [email protected], Tel.+49-89-289-23556 1 Chair of Real-Time Computer Systems, Technical University of Mu- nich, Munich, Germany · 2 Tata Research Development and Design Centre, Pune, India Keywords Worst-Case Execution Time · Debugging · Static Analysis · Deterministic Processor 1 Introduction Many real-time systems need to provide strict guarantees for response times. For instance, airbag deployment in a car, collision avoidance systems in aircraft, and control sys- tems in spacecraft have to meet deadlines for ensuring ve- hicle and passenger safety. The need to analyze this class of time-critical systems has been the main motivator [50] for research on calculating the Worst-Case Execution Time (WCET), which is the longest time a program takes to termi- nate, considering all possible inputs and control flows. This timing depends both on the structure of the pro- gram, as well as on the processor it is running on. In the most general case, the WCET problem is undecidable (e.g., because loop bounds have to be known), and hence only an upper bound of the WCET can be determined through au- tomatic analysis. Therefore, in a practical setting, the prob- lem reduces to finding the tightest safe upper bound, and in particular using a technique that scales well with program complexity. The techniques being applied today for analyzing the tim- ing behavior, such as Integer Linear Programming (ILP) and Abstract Interpretation (AI), or a combination thereof [59], work very well for analyzing complex programs, but they ex- hibit several weaknesses: (1) User annotations, such as loop bounds, have to be provided [59, 55], but are hard to obtain, influence the tightness and may even refute the soundness of the WCET estimate. Providing too large bounds leads to a large overestimation, and too small bounds may yield an unsafe estimate. As a result, providing safe and tight bounds has become a research field on its own with a wide range of different approaches, e.g., using Abstract Execution [25], arXiv:1802.09239v1 [cs.SE] 26 Feb 2018

Transcript of Scalable and Precise Estimation and Debugging of the ... manuscript No. (will be inserted by the...

Noname manuscript No.(will be inserted by the editor)

Scalable and Precise Estimation and Debugging of theWorst-Case Execution Time for Analysis-Friendly ProcessorsA Comeback of Model Checking

Martin Becker1 · Ravindra Metta2 · R Venkatesh2 · Samarjit Chakraborty1

This is a pre-print of an article published in the International Journal on Software Tools for Technology Transfer. The final authenticated version isavailable online at: https://doi.org/TODO

Abstract Estimating the Worst-Case Execution Time (WCET)of an application is an essential task in the context of devel-oping real-time or safety-critical software, but it is also acomplex and error-prone process. Conventional approachesrequire at least some manual inputs from the user, such asloop bounds and infeasible path information, which are hardto obtain and can lead to unsafe results if they are incorrect.This is aggravated by the lack of a comprehensive expla-nation of the WCET estimate, i.e., a specific trace showinghow WCET was reached. It is therefore hard to spot incorrectinputs and hard to improve the worst-case timing of the appli-cation. Meanwhile, modern processors have reached a com-plexity that refutes analysis and puts more and more burdenon the practitioner. In this article we show how all of theseissues can be significantly mitigated or even solved, if we useprocessors that are amenable to WCET analysis. We defineand identify such processors, and then we propose an auto-mated tool set which estimates a precise WCET without un-safe manual inputs, and also reconstructs a maximum-detailview of the WCET path that can be examined in a debug-ger environment. Our approach is based on Model Checking,which however is known to scale badly with growing appli-cation size. We address this issue by shifting the analysis tosource code level, where source code transformations can beapplied that retain the timing behavior, but reduce the com-plexity. Our experiments show that fast and precise estimatescan be achieved with Model Checking, that its scalability caneven exceed current approaches, and that new opportunitiesarise in the context of “timing debugging”.

E-mail: [email protected], Tel.+49-89-289-23556

1Chair of Real-Time Computer Systems, Technical University of Mu-nich, Munich, Germany · 2Tata Research Development and DesignCentre, Pune, India

Keywords Worst-Case Execution Time · Debugging ·Static Analysis · Deterministic Processor

1 Introduction

Many real-time systems need to provide strict guaranteesfor response times. For instance, airbag deployment in acar, collision avoidance systems in aircraft, and control sys-tems in spacecraft have to meet deadlines for ensuring ve-hicle and passenger safety. The need to analyze this classof time-critical systems has been the main motivator [50]for research on calculating the Worst-Case Execution Time(WCET), which is the longest time a program takes to termi-nate, considering all possible inputs and control flows.

This timing depends both on the structure of the pro-gram, as well as on the processor it is running on. In themost general case, the WCET problem is undecidable (e.g.,because loop bounds have to be known), and hence only anupper bound of the WCET can be determined through au-tomatic analysis. Therefore, in a practical setting, the prob-lem reduces to finding the tightest safe upper bound, and inparticular using a technique that scales well with programcomplexity.

The techniques being applied today for analyzing the tim-ing behavior, such as Integer Linear Programming (ILP) andAbstract Interpretation (AI), or a combination thereof [59],work very well for analyzing complex programs, but they ex-hibit several weaknesses: (1) User annotations, such as loopbounds, have to be provided [59,55], but are hard to obtain,influence the tightness and may even refute the soundnessof the WCET estimate. Providing too large bounds leads toa large overestimation, and too small bounds may yield anunsafe estimate. As a result, providing safe and tight boundshas become a research field on its own with a wide rangeof different approaches, e.g., using Abstract Execution [25],

arX

iv:1

802.

0923

9v1

[cs

.SE

] 2

6 Fe

b 20

18

2 Becker et al.

Fig. 1 Overview of our WCET tools and their artifacts

refinement invariants [23] and pattern matching [28]. (2) Ex-isting approaches work at the machine code level, where thehigh-level information from the original program is hard toextract. Variables are distributed over multiple registers, typeinformation is lost, loops and conditional statements can beimplemented in many different ways, and indirect addressingcan make it close to impossible to track data flows and func-tion calls. As a consequence, overapproximations have to beused, which usually result in pessimistic estimates. (3) Over-approximations are also used to bound the control flow. Asa result, the control flow that would correspond to WCETmight not even exist in many cases, contributing further toless tight estimates. (4) Finally, practitioners are facing an-other challenge with today’s analysis tools. Once the WCETof an application has been computed, the output offers littleto no explanation of how the WCET has developed, evenless of how it can be influenced through changes in the pro-gram. Clearly, there is room for improvement in the processof WCET estimation.

However, there exists an even more fundamental prob-lem: Analysis cannot keep up with modern processor archi-tecture. In recent years, the research community arrived atthe alarming conclusion that a sound analysis of modern pro-cessors is almost impossible, since their complex microar-chitecture requires considerably large models, which in turnlead to a state-space explosion [44,48,39,33, Chapter 5.20].

In this article, we see this discouraging situation as achance to anticipate what improvements could be possiblein WCET analysis if processors would become more ana-lyzable, as being proposed by various researchers [39,48].Assuming that we have a “deterministic processor”, we ques-tion the techniques being used today. Specifically, we showthat by virtue of more analyzable processors, WCET analy-sis can build on approaches that have already been declared

dead for that purpose. The weaknesses in the estimation pro-cess, such as the difficulty of providing flow annotations andimprecision of the estimate, can be significantly reduced, andthe usability of WCET tools increased. Moreover, we get achance to realize a true “time debugging”, through whichpractitioners get a handle on the timing behavior, in the sameway as functional behavior is being debugged today.The Idea: Assuming that we have a processor that is amenableto WCET analysis (we specify and identify such processorslater on), our approach builds on the following ideas:

– First, we propose WCET estimation by analysis of thesource code, instead of the machine code. This is enabledby choosing analysis-friendly processors, and makes thecontrol flows easier to analyze, and the additional infor-mation, such as variable types and their ranges, helpscutting down the number possibilities to be analyzed,thereby providing more precise estimates in shorter time.

– Second, we propose to use Model Checking instead of tra-ditional ILP or AI techniques for path analysis, becauseit can precisely determine the longest feasible path, asit explores all paths in the program explicitly. Therefore,we expect tighter estimates, and ideally we no longerneed flow constraints to be specified by the user, whichmakes WCET analysis safer and easier to apply.

– Last but not least, we propose to leverage the counterex-ample that is generated by the model checker, to recon-struct the precise path that is taken in the WCET case,and enable an interactive replay that shows not only thepath being taken and how time is accumulating, but alsothe contents of all program variables.

However, there are three main challenges in realizingour idea: First, microarchitecture–specific execution timesfrom the machine-level code need to be represented in thesource code. Towards that, a back-mapping from machine-level code to source code has to be developed. Second, ModelChecking had been considered for WCET estimation ear-lier, but it was found to scale poorly with program size [58,40], and thus never been seen as a competitive techniquefor WCET estimation. We show that this scalability prob-lem can be mitigated by the mentioned shift of the analysisfrom machine code to source code level, and that sourcetransformation techniques, such as program slicing and loopsummarization, can be used to reduce the complexity andallow Model Checking to scale better. The third challenge isto reconstruct the WCET path, such that it is accessible tothe user, and provides specific explanation why a path wastaken. That is not directly possible from the counterexample,as this carries incomplete information. We therefore need todevise an efficient path reconstruction technique.Summary of our approach: The overall approach consistsof two steps, as depicted in Fig. 1. First, we estimate theWCET using a model checker, then we reconstruct the WCETpath from the counterexample that is provided. The WCET

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 3

estimation – detailed in Fig. 3 – starts by establishing a map-ping between machine code and source code, and evaluatethe instruction timing of the program. Given this information,we annotate the source code with increments to a countervariable, which is updated at the end of each source block ac-cording to the instruction timing. The resulting code is thensliced with respect to time annotations, to remove all com-putations not affecting the control flow. In the next step, weaccelerate all loops that can be accelerated. Next, we over-approximate the remaining loops with a large or unknownnumber of iterations. Then, we perform an iterative searchprocedure with a model checker to determine the WCETvalue. We terminate the search procedure as soon as we finda WCET estimate within the precision specified by the user.Finally, the path reconstruction takes place in a debugger,while forcing decision variables to the values given in thecounterexample, and through that reconstruct the precisepath leading to WCET, whilst collecting a timing profilesimilar to gprof.

The contributions are as follows:

– Efficient application of Model Checking at source codelevel to find the worst-case execution time of a C program(Section 3.6).

– Application of source code transformation techniques toreduce the complexity of the system subject to ModelChecking (Section 3.4.1ff).

– A debugger-based technique to reconstruct and replaythe precise control flow, all variable contents, and a tim-ing profile of the path leading to the worst-case executiontime (Section 4).

– A prototype of a tool set called TIC which implementsour proposed approach.

– Experiments with the standard Malardalen WCET Bench-mark Suite, to assess the impact of the source code trans-formations on scalability and tightness of the WCET es-timates, in comparison to an ILP-based analyzer and acycle-accurate simulator (Section 5).

This article is an extension of our earlier work [42].

2 Technical Background

2.1 WCET Analysis

The goal of WCET analysis is to estimate the longest time a(sub)program P takes to terminate, while considering all pos-sible inputs and control flows that might occur, but excludingany waiting times caused by sleep states or interruption byother processes. In real-time systems, this estimate is subse-quently used as an input for schedulability analysis, whichthen models the influence of other processes and computesan upper bound of the reaction time of P. The reaction time,finally, should be shorter than any deadline imposed on P.

For example, the deadline for P could be given by the max-imum time that is permissible to detect a car crash and acti-vate the airbags. Consequently, the WCET estimate is a vitalmetric for real-time systems, and thus needs to be safe (i.e.,never smaller than what can be observed when executing P)and tight (i.e., as close as possible to the observed value).

The WCET of an application is influenced by the pro-cessor architecture, e.g., caches and pipelines, as well asprogram structure. Therefore, WCET analysis usually com-prises the following steps (not necessarily in that order):

1. Compilation: Cross-compile P for the processor it issupposed to run on. The source code of P is translated tomachine instructions I, applying various optimizations.

2. Flow Analysis: Analyze I to discover all possible controlflows. This includes finding all potential branches in Iand storing them in a control flow graph G, includingtheir branch conditions.

3. Value Analysis: Calculate possible ranges for operandvalues in I, to resolve indirect jumps and classify mem-ory accesses into different memory regions (e.g., slowDRAM vs. fast core-coupled memory).

4. Loop Analysis: Bound the control flow of G, that is,identify loops and compute their maximum executioncounts based on branch conditions, and annotate the nodesand edges in G with those execution counts.

5. Microarchitectural Analysis: Predict the timing effectsof caches, pipelines and other architecture-dependentconstructs, based on memory mapping and the paths in G.Annotate nodes and edges in G with instruction timingconsidering these features.

6. Path Analysis: Formulate a mathematical model basedon G and solve for WCET. The discovered control flowgraph G and the computed loop bounds are analyzedtogether to find those paths along G which could producethe longest execution time.

Steps 2 through 5 are often referred to as low-level analy-sis, and step 6 as high-level analysis. The employed methodstypically involve a combination of Abstract Interpretationand Linear Programming [58]: Flow analysis parses the ISA-specific binary and builds the control flow graph, value andloop analysis typically use Abstract Interpretation to deducevariable values and loop counts that may influence controlflow or timing, microarchitectural analysis typically buildson Abstract Interpretation to approximate cache and pipelineeffects, and finally path analysis is usually done by trans-lating the annotated control flow graph into a constrainedoptimization problem [38].

2.1.1 WCET-Amenable Processors

In this paper, we focus on “predictable” hardware, as recentlydefined by Axner et al. [3]. In particular, the ideal processor

4 Becker et al.

for WCET analysis has a constant instruction timing. By thiswe mean, that each instruction takes a constant and boundedamount of time, and this time should neither depend on pro-cessor states or operand values, nor be subject to additionalwaiting states (e.g., pipeline stalls due to pending bus trans-fers). We allow an exception for branch/jump instructions,where variable instruction timing (e.g., taking a branch maytake more time vs. not taking it) can be covered in our flowanalysis. With such a WCET-amenable processor, it is nei-ther necessary to perform a value analysis at register level,nor do we require complex microarchitectural models. Al-though these requirements seem unrealistic even for simpleprocessors, they do not automatically forbid the use of fea-tures such as pipelines and caches, as we shall explain in thefollowing.

Pipeline Requirements. Processors may have a pipeline, butthe timing of successive instructions must not depend oneach other. Again, we allow one exception for conditionaljumps (decisions upon the control flow), where we modelthe variable timing. Furthermore, the processor may makeuse of bypass/operand forwarding, but potential bubbles areassumed to be included in the instruction timing, or other-wise avoided by the compiler; otherwise, an architecturalway around this problem are interleaved pipelines [17]. Fur-thermore, there must be no out-of-order processing. Throughthat, structural and data hazards need not be modeled, butonly control hazards at the granularity of basic blocks.

Cache Requirements. Processors may have instruction anddata caches, but we do not allow for cache misses. That is,cache contents are selected at or prior to compile time andkept static during execution, or deterministically loaded bysoftware at locations known at compile time. For example,caches could be loaded every time a function is entered (asin [54]), or explicitly through statements in the source code(scratchpad memory). Cache locking [45] provides an alter-native mechanism to reach the same effect (which, inciden-tally, can improve the WCET [16]), and potentially qualifiesmany more processors for our approach. With these cacherequirements, timing effects due to caches can be annotatedin the source code as part of our analysis, and do not have tobe modeled on instruction granularity.

On-Chip Bus Transfers. In many system-on-chip processors,there are peripherals (e.g,. a UART peripheral) which are ac-cessed from the core via on-chip buses. Since usually therewill be no model available for the behavior of the periph-eral, we cannot support such accesses in a WCET analysis.A WCET-amenable processor therefore should provide timebounds for instructions performing such accesses. Conse-quently, no model is necessary for bus arbiters and peripheralstates.

Multicore Processors. With the presence of hardware threads,there could be interference caused by resource sharing. In

principle, there are techniques which provide some tempo-ral isolation for the hardware threads, as described in [3].We expect that a WCET-amenable processor has to imple-ment such techniques, since otherwise the WCET problembecomes even more intractable than it already is for today’smonoprocessors. Consequently, we are only considering mo-noprocessors here, assuming that an extension to multicoreprocessors can build on such isolation, and otherwise capturehigh-level interactions such that they are visible in the sourcecode.

Amenable Processors. Due to the complexity and varietyof modern processor architectures, only a detailed reviewon a case-by-case basis allows to identify existing proces-sors which fulfill our requirements. Therefore, we can onlygive a few select examples, and otherwise refer to our re-quirements for what we wish future processor architectureswould look like, to keep WCET analysis tractable. Slightoverapproximations, such as using only the maximum exe-cution time for time-variable instruction, might be appliedto use our approach even for processors that are not strictlycompliant. The processor family that we target in this paper,the 8-bit Atmel AVR family [36], is a good example for that.While there is a dependency of the instruction timing on theoperands in some cases – namely, slight variations in instruc-tion timing for flash memory access – this is negligible andcan be overapproximated. Our results shown later in this arti-cle justify this approach. Other processors that are a good fitare the SPARC V7 (specifically the ERC32 model [56]), theARM7TDMI family [22], and the Analog Devices ADSP-21020. Academic examples include the Java-Optimized Pro-cessor [54], and the Precision Timed Architecture [39] withminor modifications (namely, port-based I/O to avoid timevariances in load/store instructions, and absence of structuralhazards).

2.1.2 WCET Analysis at Source Code Level

Our main goal is to shift WCET analysis from the instructionlevel to the source code level, where control flows are easierto follow and type information is available. We expect twoprofound effects from this shift. First, a model checker canleverage the clearly visible control flow and type informationto perform an automatic path analysis, without requiring userinputs. That should lead to tighter and safer estimates thanother approaches, such as ILP. Second, the complexity ofthe analysis is reduced, since operations are represented in amore high-level view.

A necessary prerequisite for such a source-level analysisis to annotate the source code with the instruction timing,as faithfully as possible. We do so by introducing a countervariable [30] into the source code, which is a global variablerepresenting execution time that is incremented after each

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 5

statement according to the time taken thereof (see Fig. 4b foran example).

Towards that, timing information needs to be extractedfrom the executable code, whilst considering the microarchi-tecture of the target processor, and then mapped back to thesource code in the form of assignments to our counter vari-able at the correct locations. For a WCET-amenable proces-sor, considering the microarchitecture boils down to modelvariable instruction timing only at branch points. The timeannotations in the source code must therefore allow the en-coding of such variable timing. In principle this is only possi-ble precisely when the control flow of the instructions is alsomapped back to the source code. That is, conditional jumpsin the instructions must be lifted in the source code. Later, weshow that some overapproximations can be applied, in casea back-mapping is not possible because a source equivalentis lacking for some instructions.

In general, mapping the instruction timing back to thesource code means to establish a mapping from the instruc-tions to source-level constructs. This can be difficult, becausecompiler optimization may produce a control flow in the ex-ecutable that is very different from that of the source. Forexample, functions that are present at the source level maybe absent in the executable because of inlining. Vice versa,new functions could be introduced in the executable whichhave no direct match in the source code. For instance, itis common for compilers to introduce loops and functionsto compute 64-bit multiplications and shifts, or to imple-ment switch case statements as binary search or lookup ta-bles. These kinds of transformations make it hard to auto-matically map the basic blocks in the machine instructionsto the source code. An ideal compiler would keep track ofthe mapping during the translation process, however, we arenot aware of any compiler doing so. Therefore, some over-approximations must be applied to overcome these difficul-ties, and they can go a long way in automating the backmapping.

For the work in this paper, we have established a heuristicmapping for the AVR gcc compiler with only little difficulty,as explained later in Section 3.2. Other researchers have ac-complished the same goal for different WCET-amenable pro-cessors [32,52] even under some optimization, suggestingthat such a mapping usually can be established when onlyconsidering the timing.

In the next section, we elaborate how Model Checkingcan be used on this time-annotated source code to estimatethe WCET.

2.2 Model Checking

Model Checking [13] is a formal analysis technique used toverify a property on a model. Given a model and a property,a model checker – a tool implementing this technique – de-

termines if the model satisfies the property in every possibleexecution. If the property does not hold, the model checkerproduces evidence for the violation, called a counterexam-ple. Model checkers perform reachability analysis over finite-state transition systems, where the number of states is a prod-uct of program locations and variable valuations at theselocations. Therefore, though Model Checking is sound andcomplete for finite-state models, scalability is often an issuefor complex models.

It has been demonstrated that model checkers are usefulat computing the WCET of simple programs [32,37], but thescalability issue has not been addressed before. The idea isto take a program that is annotated with timing information,translate it into a model and use a model checker to verify theproperty “at program exit, execution time is always less thanX”, where X is a proposed WCET value. The model checkeracts as an oracle, telling if there exists any execution thattakes longer than this proposal. This process is repeated withchanging proposals, until we find the lowest upper boundwhere no counterexample is generated. We will follow thesame approach here, but address the scalability issue by mini-mizing the number of oracle queries, and also the complexityof the individual queries.

For our experiments we have used the model checkerCBMC [12], due to its robustness and performance. It is abounded model checker which accepts models and propertiesin the form of ANSI-C programs. Some important featuresof CBMC that we use in WCET computation are:

– Assertions: CBMC allows expressing properties with as-sertions. In our case, these assertions are of the formassert( time<X), where the constant X is the proposedWCET, and time denotes the counter variable reflectingthe time passing by in the program.

– Non-determinism: CBMC allows any of the programvariables to be assigned a non-deterministic value. This isdone using assignments of the form y=nondet(), wherenondet() is an undefined function having the same re-turn type as y. This results in y being assigned a non-deterministic value from the range specified by its type.

– Assumptions: CBMC allows the use of assume state-ments to block the analysis of undesirable paths in aprogram. A statement of the form assume(c) marks allthose paths infeasible for which c evaluates to false

at the execution of this statement. This feature can beused to constrain the value domain of non-deterministicassignments.

– Checking multiple assertions: CBMC allows multipleassertions in the input program, which can be checked atonce using the --all-properties option. This optionuses an optimal number of solver calls to verify programswith multiple assertions.

The technique we present here does not depend on CBMCspecifically. It could be replaced by any other model checker,

6 Becker et al.

possibly through an additional front-end that translates Ccode into a model to work with (e.g., the CPROVER tools).

2.3 Program Slicing

Program slicing was first introduced by Mark Weiser in1981 [57]. Given an imperative program and a slicing crite-rion, a program slicer uses data flow and control flow analy-sis to eliminate those parts of the program that do not impactthe slicing criterion. That is, it removes statements which donot influence the control flow related to the criterion, anddo not change its value. The resulting program is called a“program slice”, and behaves identically w.r.t. the slicing cri-terion, but has a reduced complexity.

Slicing works by constructing and evaluating a ProgramDependency Graph (PDG), which captures the control anddata flow dependencies of a program. For example, Figure 2shows the PDG corresponding to the code in Figure 4b. Thisgraph has two kinds of edges to denote dependencies: dashededges denote data dependence, and solid edges denote a con-trol flow dependency. For example, the loop increment “i++”is control-dependent on the loop condition “i < 35”, anddata-dependent on itself as well as on the loop initialization.

Given the PDG from Fig. 2 and a slicing criterion, westart at the node that corresponds to the location of the cri-terion, and traverse the PDG from this point until all theroot nodes of the graph are reached. Subsequently, we re-move from the program all statements and expressions thatcorrespond to nodes that have not been visited during thistraversal. For our example program in Fig. 4b, the criterionis the latest possible assignment to variable time in line 17,and the corresponding PDG node is “ time += 5” (in thelower right corner of the graph). When traversing the graphfrom there, the outlined/red nodes (e.g., “out = acc/scl”) arenot reachable. Therefore, these parts of the program do notimpact the value of variable time at our location, and canbe safely removed from the program. The resulting programslice is given in Fig. 4c.

2.4 Loop Acceleration

Loop acceleration describes the action of replacing a loopwith a precise closed-form formula capturing the effects ofthe loop upon its termination. This has been shown to beeffective for Model Checking of source code [14,6].

For loop acceleration to be applicable, the loop shouldhave the following characteristics:

– The loop should iterate a fixed number of times, say n,which could either be a constant or a symbolic expres-sion. For example, the loop may execute 10 times or ntimes, or x∗ y times and so on.

– The statements in the loop constitute only of assignmentsand linear arithmetic operators, that is, first order polyno-mial computations (and no higher order).

– The loop body consist of straight line code, that is, thereare no branching statements such as if-else statementsinside the loop body.

When a loop satisfies the above constraints, it is possibleto replace the loop with simple linear recurrence relationsthat precisely compute the summarized effect of all the iter-ations of loop on each of the assignments in the loop body.For example, the for-loop in Figure 4c is replaced with theblock spanning lines 7 through 13 in Figure 4d.

2.5 Abstraction

Abstraction is a term used to describe a modification to aninput program in such a way that the resulting output pro-gram allows more runs than the input program. That is, theset of all possible program states at the end of the outputprogram, is a superset of all possible states at the end ofthe input program. In other words, abstraction creates a safeoverapproximation of a program.

For instance, in Section 2.4, suppose a loop has onlythe first two characteristics, but violates the last one, viz.,the loop does have branching statements in the form of if-else statements. Then, it is still possible to replace the loopwith an over-approximate formula. The most simple way toachieve this is to replace it with a non-deterministic assign-ment that allows for the computation of the result in anyarbitrary value. For example, the effect of the loop in Fig-ure 5a on the variable ans is abstracted as shown in line 5of Figure 5b. There we allow ans to take on any non-deter-ministic value in its type range, which is a superset of all thefeasible values among all possible executions.

3 Finding the Worst-Case Execution Time

Our tool set TIC estimates the WCET at source level, withthe help of timing information that is extracted from the exe-cutable code. The overall workflow is illustrated in Fig. 3 andshall now be explained in detail. Given a C program, a corre-sponding executable that results from cross-compilation forthe target, and information about the target architecture, TICdoes the following:

1. Estimates the time taken for each basic block in the ma-chine code of the program.

2. Establishes a mapping from these basic blocks to a setof source lines and instruments the C program with thetiming obtained in the previous steps.

3. Detects and prepares the handling of persistent variables,i.e., the internal states of the program, since those mayhave an impact on the control flow and thus on the WCET.

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 7

Param fir

acc += fir;

fir <<= 1;

Param scl

out = acc / scl;

acc = 0;

_time += 44;

_time += 21;

i = 1;

i < 35

i ++;

for

{}

_time += 36;

break;

_time += 24; _time += 737;

return out;

_time += 5;

return value

Global _time

Fig. 2 Program Dependence Graph of the program from Fig. 4c. Sliced statements are outlined/red

Compileravr-gcc

TimingInstrumentor

custom

SlicerAccelerator

Abstractor

AssertionInsertor

OutputParser

Model Checkercbmc

C Source Basic Block ReaderBound-T

preciseenough?

STOP

alreadyabstracted?

C source

NewWCET range

timeout

WCET precise

WC

ET im

pre

cise

results

no

yes

Imprecise/no WCET

Basic blockTiming info

LABMC

Python script

original

time-annotated

Fig. 3 WCET analysis using Model Checking

4. Applies source code transformations to eliminate all ir-relevant data computations and to summarize loops.

5. Adds a driver function which represents the entry pointfor the model checker and encodes the search for WCETas a property.

6. Uses a model checker and an appropriate search strategyto estimate the WCET with desired precision.

7. If the model checker does not scale, then it abstracts theprogram and repeats the search.

The details of each of the above steps are given in the fol-lowing sections. As a running example, we use a simplifiedversion of the fir benchmark from the Malardalen WCETbenchmark suite.

3.1 Estimation of Basic Block Time ( )

We first analyze the executable file produced by the target-specific compiler to construct a control flow graph. Towardsthat, we identify basic blocks. These are maximal sequencesof instructions that are free of branches, i.e., where the onlyentry point is the first instruction and the only exit pointis the last instruction. Therefore, the basic blocks becomethe nodes in our control flow graph, and the edges repre-sent branches or function calls, labelled with their branchingcondition. Finally, we determine the execution time for eachbasic block by adding up the time taken by the containedinstructions, and annotate our nodes with this block timing.Since we allow for branch instructions to have a variabletiming (e.g., the instruction may take longer if a conditionalbranch is taken), we annotate the precise timing of the jumpinstructions to the edges of the graph. Through that, there isno overestimation due to such timing variations.

Our implementation re-uses parts of the Bound-T WCETanalyzer [31] to build the control flow graph and estimatebasic block times. We therefore have exactly the same inputsfor our approach and the ILP-based analysis in Bound-T.Alternatively, a more generic binary analysis library couldbe used, such as the Binary Analysis Platform [9].

8 Becker et al.

3.2 Back-Mapping and Annotation of Timing

The next task is to match the control flow of the machineinstructions with the control flow of the source code, andback-annotate the basic block timing in the source code, inthe form of increments to a counter variable. The result shallbe a source code that is instrumented with timing, in thefollowing called instrumented program.

Specifically, here we annotate the basic blocks of thesource with the instruction timing as close as possible tothe corresponding source construct. That is, each maximalsequence of statements that is free of branches (just like ba-sic blocks at machine code level) is immediately followed(or preceded, in the case of loop headers) by a timing an-notation that reflects the execution time of the instructionscorresponding to that block.

First of all, an approximate and incomplete mappingof machine instructions to source line numbers is given byGNU’s addr2line tool. However, it is incomplete becausesome instructions do not have a direct match in the sourcecode, and it is only approximate because it does not providecolumn numbers. Complex expressions falling into multi-ple basic blocks, such as loop headers or calls in compoundexpressions, cannot be resolved with this information. Wetherefore use this information as an initial mapping, and thenapply safe heuristics to complete the picture.

For the heuristics that manage the back-annotation ofthe basic block (BB) timing information according to sourcecode, there are two cases to handle:

1. One-to-Many mapping. One basic block in the executablecorresponds to one or more continuous expressions in thesource code. This case is trivial to handle, all that needsto be done is instrument the basic block before the lastexpression with the corresponding timing information.

2. Many-to-one mapping. In these cases, several basic blocksin the executable map to a single source expression. Typ-ically, this case arises when a source expression splitsinto different control paths in the machine instructions,for example in case of a division or a bit shift operationbeing converted into loops. In such cases, it is hard toinstrument the source code with this information, sinceit would require translating back the loop into the sourcefirst. To tackle this issue, we summarize the timing in-formation from such multi-path instruction blocks, andinstrument the source code with its worst-case timingvalue.

As an example, the timing of the source code in Figure 4awas mapped as follows:

The table contains timing information of each basic blockalong with the span of the block in the source code. This in-formation is used to instrument the source code as shownin Figure 4b: We introduce a global counter variable time,

Table 1 Mapping of basic blocks (BB) to souce code

Start line End line BB (#) Time Comment

1 3 1 44 including for init3 3 2 21 for conditional3 6 3 36 including for iter7 7 4 24 before div7 7 mult. 737 div block7 18 6 5 after div

which is incremented through macro TIC by the correspond-ing amount of time at the respective source locations.

Basic block 2 is an instance of a one-to-many mapping.This block maps to the conditional part of the for-statementon line 3 of Figure 4a, and therefore the annotation is in-serted just before the statement on line 8 in Figure 4b. Simi-larly, basic blocks 1 and 3 are one-to-many mappings. Basicblock 1 implements the start of the function line 1, as wellas the declarations and initialization on line 2 and also thefor-initialization block on line 3. All these source level linesfall into a single source basic block. Similarly, the for-loopincrement on line 3 along with statements on lines 4 and5 map to basic block 3. The instrumentation in this case isplaced in lines 6 and 12 respectively, in Fig. 4b.

The division assignment in line 7 is an instance of amany-to-one mapping, i.e., many basic blocks, and thereforealso some conditional jumps, map to this single statement.Here, these basic blocks include a compiler-inserted func-tion call (for the long division). In particular, the compiler-inserted function contains a loop, for which we do not com-pute the timing precisely, but instead we over-approximatethis loop with its worst-case bound, and subsequently we usea single timing annotation for the entire function. Thereafter,the statement at line 7 can be represented by three parts; onebefore the division (function call), one for division (func-tion WCET) and the final one after the call. The resultingthree timing annotations are shown on lines 14, 16 and 17Figure 4b.

This concludes the shift of WCET analysis from machinecode level to source level. At this point, we have an instru-mented program, i.e., a source code carrying the executiontiming, ready to be analyzed for WCET. From now on, wecontinue the analysis with the instrumented source code only.

3.3 Handling of Persistent Variables

A WCET estimation of a function f (including all its directand indirect callees) must consider all possible inputs to f ,and from those derive the longest feasible paths. Such inputscan be function parameters, but also referenced variables thatare persistent between successive calls of f (they can also beseen as the hidden program state). In the C language, suchpersistent variables are static and global variables that arereferenced by f or its callees.

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 9

1 int task(int fir , int scl) {2 int i, out , acc=0;3 for(i = 1 ; i < 35; i++) {4 acc += fir;5 fir <<=1;6 }7 out = acc/scl;89

101112131415161718 return out;19 }

(a) Original code

1 #define TIC(t) (_time += (t))2 unsigned long _time = 0;34 int task(int fir , int scl) {5 int i, out , acc=0;6 TIC (44); // BB17 for(i = 1 ;8 TIC (21), // BB29 i < 35; i++) {

10 acc += fir;11 fir <<=1;12 TIC (36); // BB313 }14 TIC (24); // BB415 out = acc/scl;16 TIC (737); // mult. divmodsi417 TIC (5); // BB618 return out;19 }

(b) Instrumented

1 #define TIC(t) (_time += (t))2 unsigned long _time = 0;34 int task(int fir, int scl) {5 int i, out, acc=0;6 TIC (44);7 for(i = 1 ;8 TIC (21),9 i < 35; i++) {

10 acc += fir;11 fir <<= 1;12 TIC (36);13 }14 TIC (24);15 out = acc/scl;16 TIC (737);17 TIC (5);18 return out;19 }

(c) Sliced

1 #define TIC(t) (_time += (t))2 unsigned long _time = 0;34 void task(void) {5 int i, _k0 , _k1;6 TIC (44);7 {8 _k0 = 1;9 i = 35;

10 _k1 = i - _k0;11 TIC (21* _k1 + 36*_k1);12 TIC (21);13 }14 TIC (24);1516 TIC (737);17 TIC (5);1819 }

(d) Accelerated

Fig. 4 Example for source code instrumentation and transformations to compute the WCET of a function task

In this work we over-approximate the content of such per-sistent variables by initializing them with a non-deterministicvalue, as explained in Section 2.2. This guarantees that allthe feasible and infeasible values of the persistent variables,as allowed by their data type size, are considered as inputsto f . Thus, the WCET estimate for f is always a safe over-approximation of the maximum observable execution timeof f . It is possible to remove (some of) the infeasible val-ues either with manual inputs by users or by analyzing thecallees of f . This may lead to a tighter WCET, but we didnot explore these approaches as they are orthogonal to ourmain work.

3.4 Source Transformation Techniques

The source code is now instrumented with its timing be-havior, and ready to be analyzed for WCET. However, adirect application of Model Checking to compute the WCETof large and complex programs would not scale due to thesize of the generated model. The analysis time could quicklyreach several hours for seemingly small programs, and mem-ory requirements may also quickly exceed the available re-sources. Our next step, therefore, are source code transfor-mations which retain the timing behavior, but reduce the pro-gram complexity. This can be done effectively thanks to theadditional information available in the source code, such asdata types and clearly visible control flows. The transforma-tions are executed sequentially in stages, which we explainin the following. All three states require only little computa-tional effort themselves, and therefore speed up the overallprocess of WCET estimation.

3.4.1 Stage 1: Slicing (G#)

Slicing reduces the size of the program by removing all state-ments that do not influence a certain criterion, as explainedin Section 2.3. For the specific case of WCET analysis us-ing a counter variable, our slicing criterion is the value of

this variable upon program termination, through which state-ments not impacting the counter variable are eliminated, andWCET estimation becomes less complex.

As an example, consider again the instrumented programin Figure 4b. Firstly, observe that line 15 is not needed tocompute timing, since the variable time has no data or con-trol dependency on the variable out. Similarly, lines 10 and11 do not impact timing computation and can be sliced away.The sliced source for this example is shown in Figure 4c.

We used a program slicer that builds an inter-proceduralprogram dependency graph [47], capturing both intra-pro-cedural and inter-procedural data and control dependencies.It then runs one pass over this graph to identify statementsthat impact the property to be verified and outputs the slicedcode. The slicer is a conservative one, which means that itdiscards statements only when sure that the statements do notaffect the timing analysis. In all other cases, the statementsare preserved.

3.4.2 Stage 2: Loop Acceleration (I)

A major scalability obstacle in Model Checking are loop-like constructs, since they have to be unrolled before anal-ysis. They can therefore increase the program complexitysignificantly. This problem can be solved by applying loopacceleration, as described in Section 2.4. The resulting pro-gram will have a reduced number of loops, and thereforeexhibit a reduced complexity.

As an example, consider the loop in Figure 4c: Here,time is incremented in line 12 within a loop body throughTIC. The effect of this repeated increment of time, can besummarized by the expression time = time + n ·36, wheren is the number of loop iterations. Line 11 in Figure 4d showsthe accelerated assignment. Note that two new variables k0

and k1 have been introduced, representing the initial valueof the loop counter ( k0) and the number of loop iterations( k1). After accelerating all variables, the loop in Figure 4ccan be removed and replaced by the statements given in Fig-ure 4d lines 7 through 13. Here, lines 8 to 11 represent the

10 Becker et al.

1 for (i=1 ;2 TIC (20), i<=len ;3 TIC (8), i++) {4 TIC (35);5 if(ans & 0x8000) {6 TIC (31);7 ans ^= 4129;8 } else {9 TIC (23);

10 ans <<= 1;11 }12 }

(a) Branch inside loop

1 int _k0 , _k1;2 _k0 = 1;3 i = len + 1;4 _k1 = i - _k0;5 ans = nondet ();6 TIC(_k1 *(20+8+35)+_k1 *31);7 TIC (20);89

101112

(b) Abstracted loop

Fig. 5 Loop abstraction using LABMC

effect of all the 34 iterations of the loop. Line 12 captures thetime taken for evaluating the loop condition after the finaliteration.

As the accelerated program of Fig. 4d is free of (most)loops, it is less complex than the instrumented program. Forthis example, the program size (as determined by CBMC)reduces from 325 steps for the instrumented program to only60 steps for the accelerated program. Last but not least, notethat a prior slicing is important for loop acceleration; if wehad not sliced the instrumented program w.r.t. time, thenthe above loop could not have been replaced, as the assign-ment to acc on line 4 of cannot be accelerated due to theassignment to fir on line 5.

TIC implements acceleration as proposed in [14], mainlyas they are shown to be effective on SVCOMP benchmark Cprograms and industrial programs [6].

3.4.3 Stage 3: Loop Abstraction (3)

Finally, further reduction of complexity at the cost of pre-cision can be accomplished by abstraction, as described inSection 2.5. However, we tailor the abstractions by includingsome specific time-approximations that preserve the WCETestimate and reduce complexity even further.

Abstraction is used to get rid of loops that could notbe accelerated, such as the loop shown in Figure 5a: Theif-condition on line 5 depends on the variable ans, whichis updated both in the then-branch (line 7) and else-branch(line 10). These assignments determine how many times thethen and else-branches would be executed. As these branchesdepend on values updated within the loop, we cannot deter-mine the number of times this branch would be executed,and hence we cannot accelerate the loop.

The abstracted version of the loop in Figure 5a is shownin Figure 5b. First, we introduce variables k0 and k1 tocapture the initial value of the loop counter (line 2) and thevalue of it upon loop termination (line 4). Then, the variableans is assigned a non-deterministic value (line 5). This as-signment, as explained in Section 2.2, allows ans to take anyvalue in the entire range of its data type, int. While in theoriginal program ans may take only a subset of int values,

we now allow it to take on any int value, and thus haveconstructed an abstraction.

After this abstraction, the loop can now be accelerated asexplained before. Line 6 contains the accelerated assignmentto time. In this, k1*(20+8+35) accounts for the time in-crements in Figure 5a corresponding to the loop conditionevaluation (line 2), loop counter increment (line 3) and thefirst part of the loop body on line 4. Finally, line 7 capturesthe time taken for evaluating the loop condition after the finaliteration.

Time Approximations. The trailing k1*31 in Fig. 5b online 6 summarizes the time taken by the if-statement formingthe remainder of the loop body, but it is somewhat special. Itcontains a WCET-specific modification to the abstraction asexplained in Section 2.5: Since now the values of ans havebeen abstracted, we can no longer reason about which branchof the if-statement is taken in the WCET case. Therefore,when abstracting such undecidable conditional statements,we over-approximate the effects on our time variable byonly considering the longest branch (here: the then-branch inline 6 with 31 clock cycles). Note that this does not changethe WCET estimate, because the model checker would havepicked the longer branch anyway, since the possible valuesof ans include the case allowing the then-branch to be taken.It is thus a safe modification to the abstraction in the contextof WCET analysis, which however reduces the complexityof the program further.

If a loop contains unstructured code, such as break andreturn statements, these are easily handled through a non-deterministic Boolean variable that allows these statementsto be executed or not executed.

To implement abstraction, we again used the LABMCtool [14]. However, the standard abstraction has been tailoredfor the counter variable time, as explained above, to pickthe maximum time increment from the branches.

3.4.4 Further transformations

A number of other source transformations could be appliedto further reduce the program complexity and yet retain theworst-case timing information. For example, after the back-annotation of timing, the counter variable could be summa-rized by merging increments belonging to the same sourceblock. Or, live variable analysis and loop folding could beused to reduce the unwinding depth of a program [4]. How-ever, all of this would make it harder to understand howsource-level statements contribute to timing, and thus hasnot been investigated in the context of this work.

Furthermore, CBMC itself (more specifically, goto-in-strument, a front-end of CBMC), features transformationsthat can be applied to the source program, such as k-induction

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 11

for loops, constant propagation and inlining. While our tool-chain supports using those CBMC features, none of themhave proven effective in reducing the complexity or analysistime in our WCET framework. In particular, the loop accel-eration included there takes longer than our entire WCETanalysis and does not improve the analysis time thereafter,and “full slicing” even produces unsound results. Therefore,our source transformations introduced above are justified,because they have a proven impact on WCET analysis.

3.5 Adding a Driver Function

To run a model checker on the instrumented (sliced, acceler-ated, abstracted) source code, we add a driver function to thesource that implements the following operations in sequence:

1. Initialize counter variable time to zero.2. Initialize all input variables of the program to nonde-

terministic values according to their data type. This in-cludes handling of persistent variables as described inSection 3.3.

3. Call function f for which WCET shall be estimated.4. Encode assertion properties to query for WCET (details

in Section 3.6).

The driver function is handed over as entry point to themodel checker.

3.6 Determining WCET

At this point, a model checker such as CBMC can verifywhether the counter variable time always carries a valueless than X after the program terminates. In this section weexplain how to choose candidates for X , such that we even-tually approach the WCET.

A WCET candidate X is encoded as assert( time <=

X), and subsequently passed to the model checker. Unlessthe model checker runs into a timeout or out of memory, onlytwo outcomes are possible:

1. Successfully verified, i.e., time can never exceed X .Therefore, X is a valid upper bound for the WCET.

2. Verification failed, i.e., time may exceed X in someexecutions. Therefore, X is a lower bound for WCET. Ifa counterexample was generated, then it may contain avalue Y > X , which then is a tighter lower bound for theWCET.

Our strategy is to use both outcomes to narrow downon the WCET value from both sides: Initially, we start withlower bound elower as zero, and upper bound eupper as a verylarge value (intmax). We now place a number of assertions1

1 We have empirically chosen Nassert=10; placing either more or lessassertions usually take longer, because either the computational effortis growing, or more iterations are required.

in the program, where each is querying one WCET candi-date X . In particular, the candidates are equidistantly spacedbetween elower and eupper; except for the first step, where weuse a logarithmic spacing to initially find the correct orderof magnitude. Subsequently, we invoke the model checkerto verify the assertions – in the case of CBMC, all at once.For each assertion we obtain a result. We set elower as thelargest X where the assertion was failing (or, when a coun-terexample with Y > X was generated, to Y ), and eupper asthe smallest X where the assertion was successfully verified.The search is now repeated with the new bounds, and stoppedwhen these upper and lower bounds are close enough to eachother, which can be interpreted as a precision goal for theWCET estimate. The full algorithm is given in Algorithm 1.

Algorithm 1: Iterative search for WCET boundInput: instrumented C source code C, required precision POutput: WCET estimate eupper, s.t. eupper− elower < P.begin

Nassert← 10 // number of assert per call

elower← 0eupper← intmaxp = eupper− elower

1 while p > P and not timeout doif eupper = intmax then

candidates← logspace(elower..eupper,Nassert)

elsecandidates← linspace(elower..eupper,Nassert)

2 C′← insert asserts for candidates into C3 results← model checker (C′)

for i = 1 to Nassert doif verified(results[i]) then

eupper←min(eupper,candidates[i])

else4 B← getCounterexample(results[i])

elower←max(elower,candidates[i],B)

5 p = eupper− elower

At any point in time the model checker could terminatedue to either a user-defined timeout or when it runs out ofmemory. In such cases the algorithm returns the WCET esti-mate as at-this-point tightest bound eupper that could be veri-fied. In combination with the precision goal P, this gives theuser a fine control over how much effort shall be spent oncomputing the WCET. For example, an imprecise and fast es-timate may be sufficient during early development, whereasa precise analysis may be only of interest when the WCETestimate approaches a certain time budget.

The maximum number of search iterations can be deter-mined in advance; in the worst case the number of searchiterations n is

n =

⌈logNassert

(eupper− elower

P)

⌉, (1)

12 Becker et al.

where eupper = intmax and elower = 0, if no a-priori knowl-edge about the bounds of WCET is available. Usually thenumber of iterations is lower, since the values found in thecounterexamples speed up the convergence (point 4 in Alg. 1).

Leveraging A-Priori Knowledge. In this article we assumethat no information about the timing behavior of the programis available. If, however, the user has some knowledge on theWCET bounds already, then these bounds can be tightenedby the algorithm, reducing the number of iterations. If an up-per bound is known, then eupper can be initialized with thatbound, and the algorithm tightens it up to the required preci-sion. Similarly, if a lower bound is known from measuringthe worst-case execution time on the target (e.g., from a highwatermark), then elower can be set to that measured value.

Implementation. The WCET search procedure was imple-mented as a Python script. It further implements monitoringof memory and CPU time as given in Table 4. As modelchecker we used CBMC with various solver backends. How-ever, other model checkers, such as cpachecker could be usedas alternatives with only little changes.

3.6.1 Target-Specific Analysis

Since the analysis takes place at source level, it is essentialto include target-specific information, such as word with,endianness, interrupts, I/O facilities etc. If neglected, the be-havior between the model in the analysis and the real targetmay differ, leading to unsafe results. Most importantly, weprovide target-specific preprocessor definitions and the wordwidths with the corresponding flags that CBMC offers. Formore details on how to include target-specific informationfor the model checker, we refer the reader to [4], where thespecific pitfalls for CBMC have been explained. We furtheremploy checks during the WCET path reconstruction, whichcan identify missing or incorrect target information.

3.7 Determining BCET

Our tool set can also be used to compute the Best-Case Exe-cution Time (BCET), with minor modifications. Such a lowerbound of execution time may be of interest for running aschedulability analysis of event-driven tasks, for example,interrupts. Schedulability analysis then requires a minimuminter-arrival time (MINT), i.e., the shortest possible time be-tween two consecutive releases, to bound the processing loadthat is generated by the task. If a software has no inherentmechanism to limit the MINT of an event-driven task, thenthe BCET can be computed and used in place of MINT.

4 Reconstructing the WCET Trace

From a software developer’s point of view, the WCET valuein itself (and its inputs) are of limited use. Without under-standing the exact path and the decision variables leadingto the WCET, counteractive measures are limited to a time-consuming trial-and-error approach. Instead, the followinginformation would be of high value for comprehension andproactive mitigation of the WCET case:

1. The exact inputs leading to the WCET path,2. a concrete walk-through of the worst-case path where all

variables can be inspected, to identify the timing drivers,and

3. a timing profile of the worst-case path (how much timewas spent where?).

Especially a detailed walk-through of the worst-case pathis very important to understand why a certain path incurshigh timing cost. From our experience, it is not sufficient toknow only the WCET path. Oftentimes the specific valuesof decision variables are important to understand why thispath was taken, but such information is not provided by anytool that we know of. The user is therefore left with themental puzzle of finding an explanation how the presentedpath came to be, and at times there can be very subtle andunintuitive reasons. For example, in our prime benchmark,we found that an integer overflow in a loop condition causedthe loop to terminate only after a multiple of its explicitbound. In turn, this overflow depends on the word widthof the processor being used (more details about this caseare given in Section 5.3). In such a case a developer wouldmost likely struggle to explain why the loop did not obey toits bounds, and thus not understand the WCET estimate. Insummary, the WCET trace that we want to reconstruct shallbe detailed enough to inspect not only the control flow andits timing, but also all data that is relevant for the specificpath being taken. Towards that, we want to leverage the finalcounterexample from Alg. 1 to provide a maximum-detailexplanation of how the WCET can be reached.

The challenge in reconstructing the WCET path from acounterexample is illustrated in Figure 6, for the same pro-gram from Fig. 4b introduced earlier. The goal is to annotatethe corresponding control flow graph in Fig. 6a with execu-tion counts at both its nodes (basic blocks in source code)and its edges (branches being taken). However, the coun-terexample produced by the model checker only containssparse information as shown in Fig. 6b. Typically, it onlyprovides a subset of the visited code locations, variable as-signments that are relevant for branches being taken, andassignments to time oftentimes occur only at the end of thecounterexample, depending on which solver backend waschosen. In fact, the counterexample is only guaranteed toprovide exactly one value for the variable time, which is atthe location where the WCET assertion is failing.

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 13

exit

BB#1

begin: line 14 col 3

end: line 18 col 13

?

BB#2

begin: line 9 col 15

end: line 9 col 18

BB#4

begin: line 8 col 7

end: line 9 col 13

?

?

BB#3

begin: line 10 col 7

end: line 12 col 15

?

?

BB#5

begin: line 5 col 7

end: line 7 col 12

?

entry

?

(a) Control flow graph

total time=2769

exit

BB#1

begin: line 14 col 3

end: line 18 col 13

count 1

1

BB#2

begin: line 9 col 15

end: line 9 col 18

BB#4

begin: line 8 col 7

end: line 9 col 13

count 35

?

1

BB#3

begin: line 10 col 7

end: line 12 col 15

?

?

BB#5

begin: line 5 col 7

end: line 7 col 12

count 1

1

entry

1

(b) Typical output of model checker

total time=2769

exit

BB#1

begin: line 14 col 3

end: line 18 col 13

time 766

count 1

total 766

1

BB#2

begin: line 9 col 15

end: line 9 col 18

count 34

BB#4

begin: line 8 col 7

end: line 9 col 13

time 21

count 35

total 735

34

1

BB#3

begin: line 10 col 7

end: line 12 col 15

time 36

count 34

total 1224

34

34

BB#5

begin: line 5 col 7

end: line 7 col 12

time 44

count 1

total 44

1

entry

1

(c) Fully reconstructed path

Fig. 6 Reconstruction problem of WCET path for example code from Figure 4a

It is clear that the counterexample provided by the modelchecker is insufficient as an explanation for the WCET path,much less for values of arbitrary variables. It could carryeven less information than what is provided by an ILP-basedapproach, where execution counts for all basic blocks on theWCET path are available. Without such data, neither can wecompute a timing profile in the presence of function calls,nor is it possible for the developer to understand or evenwalk through the WCET path.

Therefore, towards reconstruction of the WCET path, wehave to interpolate the control flow in between the locationsgiven in the counterexample, and deduce variable valuationsthat are not available (most importantly, variable time).There are two fundamental approaches for reconstructingthe path:

1. By Analysis: Use SMT, AI, or a similar technique to fillthe “location gaps” of the counterexample, and to con-clude about assignments of all (possibly sliced) variables.This is expected to be computationally complex and notprecise.

2. By Execution: Execute the code with the worst-case in-puts. An “injection” of critical values beyond inputs is re-quired, for example to all persistent variables. This couldbe compiled into the application through look-up tablesthat contain the known assignments from the counterex-ample, or done dynamically during execution.

It should be clear that an execution is preferable in termsof precision and speed, however, there are some challengesin such an approach:

1. Using an Interpreter: Whereas the most logical choice,only few good interpreters available for the C language.Most of them have limited “stepping” support (e.g., clinguses JIT; which means stepping would inline/flatteningfunctions), and in general they are slow. There would beno support for target-specific code, such as direct pointerdereferencing. Often only a subset of the language isimplemented.

2. Instrumented Execution: Compile and run the applica-tion, preferably on the target for maximum fidelity, whilecapturing an execution trace which subsequently couldbe replayed. Capturing a complete trace including vari-able valuations could produce a huge amount of data andincur memory and timing overhead. If the program underanalysis is supposed to run on a different target (cross-compilation), then it might not be feasible to capture thetrace for reasons of memory limitations or missing inter-faces.

We decided for an execution-based approach, since thisis computationally less complex than analysis, and thus ex-pected to be faster. The problem of insufficiently granularinterpretation and trace capturing has been addressed as fol-lows: Our path reconstruction is accomplished by means ofexecuting the application in a debugger, whilst injecting vari-able valuations from the counterexample when necessary.By choosing a debugger as replay framework, replaying theWCET has the potential to become intuitive to most devel-opers, and thus could be seamlessly integrated into the soft-ware development process. Furthermore, debuggers are read-

14 Becker et al.

loop

Compilerhost or target

C source

Automatic Replay

time-annotated

C stubsfor nondet

Executable

DebuggerGDB/Mi

Replay Engine

counter-example

break at nondet_()*

continue

Breakpoint hit

nondetlookup

set return value

continue

Program terminated

short nondet_short() {}

int nondet_int() {}

. . .

WCET reached

controlflow

Fig. 7 Automatic WCET replay using a debugger

ily available for most targets, and some processor simulatorseven can be debugged like a real target (in our case, simulavrallows this). And, finally, word widths, endianness etc, arebound to be correct, in contrast to any approach based onan interpreter. However, to use these advantages to full ca-pacity, the replay process must be of low overhead, and ableto be fully automated to not require any additional inputscompared to traditional debugging.

4.1 Preparing Replay

Our proposed replay process is illustrated in Figure 7. Westart by compiling the instrumented (or sliced, accelerated,abstracted) program with additional stubs (empty functions)for each “nondet” function (see section 2.2). Then, we loadthe program into a debugger. Depending on which compilerwas chosen, this could either be the host’s debugger, or theone for the target (connecting to either an actual target or to asimulator). Since in all cases the replay process is similar, forthe rest of this paper we do not distinguish anymore betweena host-based replay, a simulator or a real target.

Finally, it should be mentioned that the replay path isbased on the time instrumentation in the sources, whichis why the path always exists unless the model checker ishanded wrong target-specific settings (e.g., word widths), orunless the analysis is unsound.

4.2 Injection of Nondeterminism

To inject the critical variables as found by the model checker,we set breakpoints on the inserted nondet stubs and start theexecution. As soon as the debugger reaches one of our break-points (additional user breakpoints for stepping may existand are not interfering), we automatically inject the correctvalue as follows: First, we query the current call stack toidentify the caller, i.e., the source location of the nondet call.Knowing the location of the designated nondet assignment,the value that leads to WCET is extracted from the coun-terexample, and subsequently forced as a return value in thedebugger. As an effect, the call to the empty nondet stub re-turns the critical value suggested by the model checker. Afterthat, the execution is resumed automatically.

However, there exists a second source of non-determin-ism, besides the explicit “nondet” function calls. In the C lan-guage, every uninitialized variable that is not static nor at filescope, initially carries an unspecified value. Therefore, everysuch uninitialized local variable is considered by the modelchecker as non-deterministic input, as well. Since no explicit“nondet” function calls exist, the breakpoints inserted ear-lier do not allow us to inject values into those variables. Asa solution, we first identify all uninitialized local variablesfrom the parse tree of the C source (declarations withoutright-hand sides, mallocs, etc.), and then insert additionalbreakpoints for every such variable that is mentioned in thecounterexample. Through this, injecting values into unini-tialized variables is handled in the same way as the nondetfunction calls (not shown in the Fig. 7 for clarity).

With this technique, the injection of the critical variablesfrom the counterexample is accomplished without any userinteraction, and without embedding the assignments in theprogram itself (no memory overhead). Furthermore, this liveinjection allows for additional checks during the execution,such as matching the assignment with the actual call location,and ensuring that the execution path does not deviate fromthe analysis.

4.3 Identification of WCET-Irrelevant Variables

The valuations of some variables do not have an effect onthe control flow, and thus do not influence the timing2. Asexplained before, such variables are identified and slicedaway during the analysis phase. In particular, both our pre-processing (slicing, acceleration, abstraction), as well as themodel checker itself remove such variables. Consequently,the counterexample does not include valuations for variablesthat have been found irrelevant for the WCET.

As an effect, any location having a non-deterministic as-signment that is visited during replay and and does not have

2 Note that this only holds true for cache-less processors.

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 15

Table 2 Timing profile obtained from WCET path of adpcm decodebenchmark

%total cycles %self cycles calls self/call total/call name

100.0 69,673 35.3 24,593 1 24,593 69,673 decode13.7 9,522 13.7 9,522 2 4,761 4,761 upzero13.2 9,168 13.2 9,168 2 4,584 4,584 uppol212.9 8,984 12.9 8,984 2 4,492 4,492 filtez9.3 6,472 9.3 6,472 2 3,236 3,236 uppol15.5 3,854 5.5 3,854 2 1,927 1,927 filtep5.1 3,576 5.1 3,576 2 1,788 1,788 scalel2.5 1,758 2.5 1,758 1 1,758 1,758 logscl2.5 1,746 2.5 1,746 1 1,746 1,746 logsch

an equivalent assignment in the counterexample, indicatesthat the respective assignment is irrelevant for WCET. Wehighlight such irrelevant statements to the developer, to helpfocus on the drivers for the worst-case timing, and not getdistracted by surrounding code.

4.4 Collecting a Timing Profile

When larger programs are being analyzed, it may quickly be-come impractical to step through the whole path to find themain drivers of the worst-case execution time. To help the de-veloper identify interesting segments that need to be steppedthrough in the debugger, we also generate a timing profileof the WCET path, showing which location was visited howoften, and how much time was spent there.

Towards that, we capture the complete control flow onthe fly during the replay. Since the timing profile is espe-cially useful for larger programs, capturing the control flowduring the debugger execution must scale well with growingpath length and thus cannot be realized by a slow step-by-step execution in the debugger. Instead, we set a hardwarewatchpoint on our counter variable. That is, every time thisvariable is modified, the debugger pauses execution and no-tifies the replay engine of the new variable content. Sincehardware watchpoints are realized through exceptions, thedebugger can run without polling and interruptions betweenthe watchpoints, and therefore the control flow is capturedwith very little additional delay. Considering that the countervariable is embedded at least once per source block, the se-quence of all reached watchpoints (their valuation and loca-tion), represents the actual control flow in the source code.As a result, a timing profile similar to the outputs of the well-known tools gprof or callgrind can be reconstructed andintuitively used by the developer. Table 2 shows the resultingflat WCET timing profile for the adpcm decode benchmark.Note that additionally to the shown per-function metrics, theexecution times are also available at the even finer granu-larity of source blocks, which helps pinpointing the timingbottlenecks to specific source blocks within the functions.

5 Experiments

We applied TIC to the Malardalen WCET Benchmark Suite[24] to evaluate the performance and the tightness of WCETestimates computed with TIC. As a target, we used the At-mel ATmega 128 [36], for which WCET analyzers (Bound-T [31]) and simulators (simulavr) are freely available and canbe used as a baseline for evaluating TIC. This target satis-fies our requirements for WCET-amenable processors, sincethere is practically no timing dependence on the operands.

The complete set of experimental data (instrumented,sliced, accelerated and abstracted sources, as well as com-plete traces of the WCET search) is available at https://github.com/TRDDC-TUM/wcet-benchmarks.

5.1 Setup and Reference Data

Selected Benchmarks. We selected a representative subsetconsisting of 17 Malardalen WCET benchmarks, such thatall program properties, e.g., multi-path flows, nested loops,arrays and bit operations were covered, except for recursionand unstructured code (they cannot be handled by the basicblock extractor, yet), and floating point variables (cannot behandled by the timing instrumentor, yet). However, thesemissing properties in principle can be addressed; this is nota limitation of our approach.

Host Platform. We conducted our experiments on a 64bitmachine with a 2.7GHz Intel Xeon E5-2680 processor and16GB RAM, using CBMC 5.6 as model checker. As CBMC’sbackend we have used a portfolio of solvers, consisting ofminisat (built-in), mathsat (v5.3.10/GMP), cvc4 (v.1.5pre/-GLPK-cut), z3 (v.4.5.1) and yices (v.2.5.1/GMP). We stoppedthe analysis as soon as the first solver provided a WCET esti-mate. Solvers finishing within the same second were consid-ered equally fast. All programs have been analyzed sequen-tially, to minimize interference between the analyses andwith third-party background processes. The computationaleffort (CPU time, peak memory usage) are derived from theLinux system call wait3, and thus expected to be accurate.

Bounding the Control Flow. In most cases our approachcould bound the control flow automatically, and no manualinput was required. However, in two benchmarks, namely bsand insertsort, CBMC could not find the bounds automati-cally and we were not able to accelerate or abstract either,and thus bounds had to provided. This occasional need formanual bounds is discussed in detail in Section 6.3.

In contrast, we frequently had to provide loop boundsfor the ILP-based WCET analyzer. Since in an ILP approachthe bounds cannot be verified, they have to be specified cor-rectly and tightly in the first place. Towards this, wheneverthe ILP-based estimation required manual loop annotations,we have taken the deduced bounds from our approach and

16 Becker et al.

handed them to the ILP-based analyzer. Consequently, bothtechniques had similar preconditions for their WCET estima-tion.

Simulation Baseline. All benchmarks were also simulatedwith the cycle-accurate ISS simulavr, with the goal of hav-ing sanity checks for the WCET estimates, and also to get animpression on their tightness. Whenever possible, the simu-lation was performed with those inputs triggering the WCET.In other cases, we used random simulations in an attemptto trigger the WCET, but naturally we cannot quantify howclose to the actual WCET we have come (e.g., in nsichneu).Hence, the simulation results can only serve as a lower boundfor the actual WCET (i.e., no estimate must be less) and asan upper bound for the tightness (i.e., the overestimation ofthe actual WCET is at most the difference to the simulationvalue).

5.2 Results

Tables 3 and 4 summarize our experiments. We evaluated ourtechnique of WCET estimation for each source processingstage (Section 3.1) of the selected benchmarks, that is:

1. instrumented with execution times ( ),2. sliced w.r.t. timing (G#),3. loops accelerated (I) and4. abstracted (3).

In Table 3 we compare the tightness of our WCET esti-mate with that of Bound-T, an ILP-based WCET analyzer.We computed the tightest possible WCET, i.e., precision P =1, while allowing for a (relatively long) timeout of one hour,to show how our source transformations influence the tight-ness. Cases denoted with timeout are those where no solutionwas found within that time budget. Finally, the columns de-noted as ∆ represent the difference between the respectiveWCET estimates and simulation value, i.e., they give an up-per bound of the tightness for both techniques.

Table 4 summarizes the computational effort of the esti-mation process for a practical time budget of one minute anda precision of 10,000 clock cycles. The table also quantifiesthe speedup we have achieved with our source transforma-tion techniques. Again we denote timeout in cases where theWCET could either not be bounded within the time budget,or not up to the required precision. The column prog.sizeshows the number of program steps found by CBMC. Caseswhere the program size is not given (viz., in fir and prime)indicate a state-space explosion. That is, the model checkernever finished constructing the state space before the time-out. The column iter denotes the number of iterations ofthe search procedure (Algorithm 1) and the column time de-notes the total time taken by the search procedure in seconds.Cases with a valid program size and a timeout (e.g., bsort100

Table 3 Tightest WCET estimates per method and benchmark (timeout1 hour)

Simulation ILP-based Model Checking

benchmark observed WCET ∆% stage WCET ∆%

adpcm-decode 48,168 71,575 +48.6 69,673 +44.6G# 69,673 +44.6I 69,673 +44.63 69,673 +44.6

adpcm-encode 72,638 113,154 +55.8 110,901 +52.7G# 110,901 +52.7I 110,901 +52.73 110,901 +52.7

bs 401 496 +23.7 410 ≈0

bsort100 788,766 1,553,661 +97.0 timeout –3 797,598 +1.1

cnt 8,502 8,564 ≈ 0 8,564 ≈0G# 8,564 ≈03 8,564 ≈0

crc 129,470 143,137 +10.6 130,114 ≈0G# 130,114 ≈0I 130,114 ≈03 143,426 +10.8

fdct 17,500 17,504 ≈ 0 17,504 ≈0G# 17,504 ≈0I 17,504 ≈0

fibcall 1,777 1,781 ≈ 0 1,780 ≈0G# 1,780 ≈0I 1,780 ≈0

fir 5,204,167 5,690,524 +9.3 timeout –G# timeout –I 5,476,023 +5.2

insertsort 5,472 5,476 ≈ 0 5,476 ≈0

jfdctint 14,050 14,054 ≈ 0 14,054 ≈0G# 14,054 ≈0I 14,054 ≈0

matmult 1,010,390 1,010,394 ≈ 0 1,010,394 ≈0G# 1,010,394 ≈0I 1,010,394 ≈0

ndes 459,967 470,499 ≈ 0 timeout –G# timeout –3 465,459 ≈0

ns 56,409 56,450 ≈ 0 56,413 ≈03 56,450 ≈0

nsichneu 33,199 timeout – timeout –3 75,369 +127.0

prime 27,702,943 30,343,092 +9.5 timeout –G# timeout –3 30,146,785 +8.8

ud 35,753 93,487 +161.5 38,992 +9.1G# 38,992 +9.1I 38,992 +9.1

instrumented, G# sliced, I accelerated, 3 abstracted, ∆ upper bound for tightness

) indicate that the solver backend could not verify the givenproperties within 10 minutes.

Finally, some benchmarks in Tables 3 and 4 are lackinga sliced, accelerated or abstracted version. This occurs whenthe respective source transformation technique did not resultin any changes to the source code. For example, bsort100remained unmodified post slicing and acceleration and thusdoes not have a dedicated sliced or accelerated version.

5.3 WCET Path Reconstruction

We were able to reconstruct and replay the WCET path forall benchmarks. The time taken for the debugger-based re-play is in the range of a few seconds in all cases. For a better

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 17

Table 4 Complexity and computational effort of WCET search forprecision of 10,000 clock cycles and timeout of 1min

benchmark stage fastest solver(s) iter. time [s] mem [MB] prog.size

adpcm-decode C 3 7.4 356 3,537G# C 3 2.8 120 2,416I C 3 2.9 121 2,1343 C 3 2.5 123 1,676

adpcm-encode C – timeout 15,853 4,911G# C 3 10.4 168 4,131I C 3 8.5 173 3,9863 C 3 2.6 136 1,846

bs A, B, C, D, E 1 0.2 117 303

bsort100 timeout – timeout 13,259 1.57 ·105

3 A, C 5 2.3 114 5,833

cnt C 3 22.1 102 3,238G# C 3 23.8 100 2,2123 A, B, C, D, E 3 0.3 104 373

crc A 3 4.6 407 41,398G# C, D 3 4.7 367 40,648I A, C, D 3 4.1 268 39,8123 A, B, C, D, E 3 1.2 103 11,142

fdct A, B, C, D, E 3 0.6 101 990G# A, B, C, D, E 3 0.3 96 183I A, B, C, D, E 3 0.3 97 99

fibcall A, B, C, D, E 1 0.1 96 592G# A, B, C, D, E 1 0.1 96 496I A, B, C, D, E 1 0.1 96 79

fir timeout – timeout 6,718 −G# timeout – timeout 16,322 −I A 5 18.5 1,574 50,488

insertsort C 1 2.3 127 1,663

jfdctint A, B, C, D, E 3 0.5 101 787G# A, B, C, D, E 3 0.3 96 168I A, B, C, D, E 3 0.7 98 100

matmult B 5 22.7 773 70,769G# A 5 6.0 661 62,367I A, B, C, D, E 5 1.6 101 9,638

ndes timeout – timeout 40,867 75,932G# timeout – timeout 40,877 75,7973 A, C 2 0.9 167 5,727

ns C 3 28.3 827 22,3993 C 3 6.5 200 8,272

nsichneu timeout – timeout 25,491 23,2443 A, B, C, D, E 3 0.5 120 85

prime timeout – timeout 22,052 −G# timeout – timeout 23,532 −3 A, B, D, E 5 0.7 99 158

ud D 3 1.2 101 2,382G# A, B, C, D, E 3 0.4 98 1,881I A, B, C, D, E 3 0.4 99 1,380

Step: instrumented, G# sliced, I accelerated, 3 abstractedSolvers: A=minisat, B=mathsat, C=yices, D=z3, E=cvc4

usability, we enabled our replay engine to output the trace asa CBMC-like counterexample, but additionally augmentedwith all assignments to time. Recall that the complete in-formation about variable time implicitly carries the con-trol flow, because each block in the source code incrementsthis variable. As an effect, the graphical user interface canload this augmented counterexample, and map back the coun-terexample to the control flow graph, as well as compute atiming profile at a granularity of block- or function-level.

Identifying a Timing Hotspot. As an example, we show theresulting annotated control flow graph for the ud benchmarkin Figure 8. There, we have applied a heatmap over timing,where red marks the timing hotspots. At first glance, it isapparent that there exists one source block forming a timing

bottleneck, which consumes about one third of the overallexecution time.

The ud program performs a simple form of LU decom-position of a matrix A, and subsequently solves an equationsystem Ax = b. The timing bottleneck in this program oc-curs in the LU decomposition, but interestingly not at theinnermost or most frequently executed loop, but at a locationwhere a division occurs. With this, the reconstruction of theWCET path made it easy to spot that for the chosen target, along division incurs a high execution cost (see also the inlineannotations TIC), and that this is the single driver for theWCET of this program.

Discovering a Bug. The replay of the WCET path can alsoreveal defects in the program. Such a defect has been foundin the prime benchmark on a (simulated) 32-bit target, whereinitially the WCET estimate just seemed suspiciously high.After reconstructing the WCET path, an unexpected implau-sibility showed up. The path was indicating that the follow-ing loop had been executed 349,865 times:

for (uint i=3; i*i <= n; i+=2) {

if (divides (i, n)) return 0;

...

}

where n is a formal parameter of the surrounding functionthat shall be checked for being a prime number. At firstglance it seems like the loop can only execute 65,536 times:the maximum 32-bit unsigned integer n that the user can pro-vide is 4,294,967,295, therefore the loop must execute atmost

⌈√4,294,967,295

⌉= 65,536 times to evaluate whether

that number is prime. Thus, we replayed the WCET pathin our debugger, setting a watchpoint on i. It turned outthat the expression i*i will overflow for some specific val-ues of n> 4,294,836,225, since this would require at leasti≥ 65,536, which, when squared, lies beyond the maximumrepresentable unsigned integer and thus overflows. Specifi-cally, this happens for only those numbers which are also notdivisible by anything less than 65,536, hard to replicate ifonly the path and no values would be available. As an effect,the loop keeps running until finally the 32-bit modulo of i*iis larger than n before the loop terminates (which luckilyis always the case). Clearly, this is a defect in the contextof this program, resulting in a WCET which was orders ofmagnitude higher than it should have been3. Thanks to theability of interactively replaying the WCET path and inspect-ing variable values, such defects can be easily spotted.

Top Ten Longest Paths. In principle our approach could alsoprovide the “top 10” longest paths in the program. This couldbe done by repeating the overall WCET estimation, while ex-cluding already known paths from the solution, and decreas-ing the WCET proposal to the Model Checker if no coun-

3 Note that this code works correctly for 16-bit targets, as ∀n ∈uint16,∃i≤ 255, s.t. divides(i,n).

18 Becker et al.

Fig. 8 WCET path reconstruction for benchmark ud with identified timing hotspot

terexample can be found. However, then this would becomean explicit enumeration of paths, typically exponentially inthe size of the program [38], and would still leave us withthe question of how the program’s overall probability distri-bution looks like (unless we repeat our estimation until everypossible path length has been found). We did not investigatethis further, as this clearly would not scale well with programsize.

6 Discussion

Using our approach, the WCET could be estimated for allbenchmarks, within a few seconds, including a timing pro-file and path replay of the WCET case. Through that, notonly did we provide an estimate without minimal user in-puts, but also we generate useful insights into the WCETpath, enabling a true “time debugging”. As we elaborate onthe following pages, source transformation techniques hada noticeable impact on scalability of Model Checking, andthus paved the way to use Model Checking for WCET anal-ysis. As a result, the estimates are comparable to those of an

ILP-based technique, but with less effort and more feedbackfor the user.

6.1 Tightness

The tightness of TIC’s WCET estimate can be expected to bealmost exact when Model Checking is allowed to explore allpaths (i.e., no abstraction applied), and becoming less tightwhen abstractions are applied, comparable to an ILP-basedsolver where only loops are bounded with no further flowanalysis. In fact, Table 3 shows that our WCET estimatesoften are even tighter than an ILP-based approach.

For the adpcm benchmarks, there is still a lot of roomfor improvement. The computed WCET, even on the orig-inal version, is likely a large overestimation, as suggestedby the simulation. The reasons for this are discussed in thefollowing.

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 19

6.1.1 Intrinsic Sources of Overapproximation

During the back translation of timing information from ma-chine instructions to source code, overapproximations havebeen used as described earlier. We argue that these over-approximations are common even when using the existingILP-based techniques. During back translation, whereverthere is a difference between the source blocks and basicblocks in the machine instructions, we over-approximate thatpart machine instructions into one block. However, theseover-approximations are usually small, since these differ-ences are often formed by few and small basic blocks, amount-ing to only a few clock cycles per iteration.

Without any abstractions, TIC will exhaustively exploreall feasible paths in the (instrumented) source code, andtherefore not make additional overapproximations. Thus, whenidentifying the longest path, TIC will never be worse thanan ILP-based path search. In fact, the ILP solver could onlyperform better, if the control flow could be bounded exactlyand then encoded in the ILP formulation.

A typical case is that of crc. When no abstractions wereapplied, our estimate is around 10% tighter than that ofBound-T (130,114 vs. 143,137), which we tracked down toinfeasible paths being considered by Bound-T: In crc, thereexists a loop containing an if-else statement, whereas theif-part takes more time than the else-part. Bound-T, not an-alyzing the data flow and dependency between these twoparts, always assumes the longer if-branch is taken. How-ever, this is only feasible 70% of the time, whereas the other30% the shorter else-branch takes place. Without additionaluser annotations or a flow analysis, ILP cannot discover thisdependency and thus must overestimate the WCET.

Consequently, TIC performs better than an ILP-based ap-proach when abstractions are left aside, if the program com-plexity had been reduced enough by slicing and acceleration.For the remaining cases, where abstraction was necessaryto make Model Checking scale, we now must discuss theoverestimation caused thereby.

6.1.2 Overestimation due to Abstraction

Abstraction overapproximates the control and data flows,trading analysis time for tightness. When applying loop ab-straction, the abstracted code forces the model checker topick times along the branch with the longest times, cuttingout shorter branches. Thus, we compute the WCET assumingthat every branch inside the loop will always take the worstlocal choice, which may be infeasible. Naturally, this leadsto an overestimation of WCET. However, ILP-based analyz-ers usually over-approximate with similar strategies whenbounding the control flow. The result is a WCET estimatecomparable to that of an ILP-based analyzer.

Again, consider crc as a typical case. When no abstrac-tions were applied, our estimate was around 10% tighter than

that of Bound-T. When applying abstractions, the estimatebecame very close to the estimate of Bound-T, whereas thecomplexity was cut down to approximately 25%.

Surprisingly, in some benchmarks, namely adpcm-decodeand encode, cnt, fdct, jfdctint and ud, the loop abstraction didnot lead to a higher WCET estimate. This is because in allthe loops with branches in these programs (a) either thereis an input to the program that forces the branch with thehighest time to be taken in all the iterations of the loop, or(b) there is a break or return statement that gets taken in thelast iteration of the loop. These cases match the exact pes-simistic loop abstraction. In short, these loops do exhibit thepessimistic worst case timing behavior.

An extreme overestimation due to abstraction seems likely(reminder: the simulation is not guaranteed to contain theworst-case) for nsichneu, where the estimate is 127% higherthan the observed WCET. This benchmark has only one loopwith two iterations, but its simulated WCET is far away fromthe observed WCET. Upon inspection, we found that thisloop has 256 if-statements in a single sequence, many ofwhich do not get executed in every iteration. However, ourloop abstraction pessimistically assumes that all the 256 if-statements do get executed in each iteration, which explainsthe overestimation in this case. Note that this benchmarkcould not be solved with Bound-T, at all.

Consequently, TIC performs close to and often slightlybetter than an ILP-based approach when abstractions areused. However, this result depends on the way control flowsare bounded before the ILP solver is called, and thus it mightnot hold true when AI and ILP are combined.

6.2 Reduction of Computational Effort

Our claim was, that the scalability issues of Model Checkingcould be mitigated with appropriate source preprocessingtechniques, which we expected to be particularly effectiveat source code level. The experimental data clearly confirmsthat claim. In all cases, the analysis time – usually in therange of minutes up to unsolvable size for the original ver-sion – could be reduced to an acceptable time of at most afew seconds, which makes TIC an interesting alternative toexisting WCET estimation techniques.

However, taking the analysis time (computational effort)as measure of the computational complexity can be mislead-ing. The time to solve the WCET problem in our approachconsists of two fundamental parts: 1. building the SAT/SMTmodel and 2. solving the model. Whereas the time to buildthe model is often proportional to the size of the program(number of steps as found by CBMC), the time for solvingthe model cannot be predicted trivially by looking at pro-gram metrics. In particular, we have found no correlationbetween any of program size, cyclomatic complexity, num-ber of statements, number of loops and the analysis time.

20 Becker et al.

The reason for this is that the analysis time of a SAT or SMTproblem can vary significantly between two problems of thesame size or between two solvers, due to the nature of mod-ern solver algorithms [15]. For instance, compare adpcm-encode (program size 4,911 steps, timeout after 1 minute)with ndes3 (program size 5,727 steps, solved in less thanone second).

Therefore, we used a portfolio of solvers to reduce theeffect of solver-specific strengths and weaknesses, and weconsider the program size (last column in Table 4) as a primeindicator for the complexity of the program. With this, weshow in the following that the complexity of a program canbe significantly reduced with our techniques. Nevertheless,note that the solving time also points towards our interpre-tation. In all benchmarks we were able to greatly reducethe computational effort with each source processing stage.In particular, several benchmarks could not be solved in areasonable amount of time without our proposed process-ing, viz., adpcm-encode, bsort100, fir, ndes, nsichneu andprime. The memory usage suggests, that a state-space explo-sion prevents building the model. Their program size couldbe reduced to a tractable size with acceleration and abstrac-tion. Moreover, the benchmark nsichneu could not even beprocessed with Bound-T, since bounding of the control flowhad failed because of too many variables and flows (out ofmemory after running for several hours). After applying ourabstraction, the WCET of this benchmark could be success-fully estimated with Model Checking, within one second ofanalysis time.

The overall impact of source transformations is quanti-fied in Figure 9, where the program size after the respectiveprocessing stage is compared to the instrumented programand over all benchmarks. It can be seen, that on average eachadditional processing stage reduces the program complexity;in average we reach 78% of the original complexity afterslicing, 63% after acceleration, and 22% after abstraction.Furthermore, it can be seen that in the worst case slicing andacceleration have no effect, whereas abstraction has a muchmore consistent impact on the complexity.

Note that the numbers in Figure 9 exclude those bench-marks where a timeout had occurred for the instrumentedversion, i.e., the original program without our source pro-cessing. Here, the program size could not be determined.For each of those benchmarks, we have additionally spent24 hours of processing to determine the program size, butwithout success.

As explained in Section 3.6, further reduction of the anal-ysis time can be reached if a lower bound for the WCET isalready known. This is often the case for safety-critical sys-tems, where run-time measurements (like high watermarks)are common practice. The user would initialize the searchalgorithm with an observed WCET, thereby reducing theanalysis time. However, this does not change the structure of

sliced accelerated abstracted

0

0.2

0.4

0.6

0.8

1

G# I 3

Rel

ativ

epr

ogra

mco

mpl

exity

Fig. 9 Box plot showing reduction of program complexity of afterdifferent source transformation stages, relative to original program.Whiskers are denoting the mix/max values, diamonds the average value

the model under analysis, and therefore is not considered acomplexity reduction.

6.3 Safety and Usability

Most programs can be analyzed automatically, without anyuser input. However, two programs, namely bs and insertsort,could not be bounded automatically. The loop bounds couldnot be deduced by CBMC, which shows up as an infinite loopunwinding phase. In such cases, the user has to specify anunwinding depth, at which the loop unwinding stops. CBMCthen places an assertion at this point, which proves that thegiven depth is sufficient. In case of an insufficient bound,the model checker identifies this as a failed property. In caseof a too generous bound, the unwinding may take longer,but WCET is not overestimated. Therefore, unsound or tooimprecise flow constraints do not refute (or even worsen)the WCET estimate, which makes our approach safer thanothers.

Another contribution to the safety of our toolchain arethe run-time checks introduced during WCET replay. Asdiscussed earlier, a wide range of problems can be discoveredby this, including confounding of input data and failure tospecify the details of the target architecture.

Regarding the usability of our toolchain, we argue thatthe results of a source-level WCET analysis, especially thepossibility to replay the worst-case path, should increase thelevel of acceptance for WCET analysis in a practical setting.As opposed to machine code or assembly, developers canbe expected to be proficient in understanding a source code,for its higher level of abstraction and higher readability. Byoffering the possibility to replay the WCET path through anordinarily looking debugging session – which can be con-trolled in the usual ways of stepping, inspecting and resum-

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 21

ing – it becomes intuitive to walk through the path an identifythe drivers of WCET. This is also supported by generatinga profiler-like timing profile of the worst-case path, whichis another, well-known view at a program, and a quick entrypoint for WCET inspection. A developer can identify thoseparts of the WCET path that need an in-depth inspection, andsubsequently dive into debugging.

In summary, our approach is inherently safer than exist-ing approaches for its protection against unsound user input,and it ensures that developers need no additional tools ortraining to analyze and understand the WCET path of a pro-gram.

6.4 Correctness

The WCET estimate computed by TIC will always be anupper bound of the actual WCET. This is easy to see as ineach step in TIC, we either over-approximate or preserve thetiming information of machine instructions. In step 1, whilethe execution time computed for each basic block in TIC(Section 3.1) is precise, the back-annotation done in Section3.2 over-approximates the time in the case where multiplebasic blocks map to one source line. Therefore, at the endof this step, the WCET of the machine instructions is eitherpreserved or over-approximated in the source.

The slicing removes only those statements that do notimpact the values of the counter variable, thus preserving theWCET as per the instrumented program.

In the next stage, acceleration preserves the computa-tion of the counter variable and abstraction potentially over-approximates it. So, at the end of this step, too, WCET iseither preserved or over-approximated. Finally, the iterativesearch procedure of Algorithm 1 terminates only upon find-ing an upper bound of the WCET as per the accelerated orabstracted program. Thus, if TIC scales for the input pro-gram, it will provide a safe upper bound for the WCET.

6.5 Threats to Validity

Microarchitectural Features. We presented our results fora simple microcontroller. Modern processors, such as ARMSoCs, have architectural features like caches and pipelines,and memory-mapped I/O. As shown in [11], some of thesefeatures can be modeled in C in the form of appropriate con-straints. However, adding such constraints would increasethe complexity of the C code. Thus, while TIC can be ex-tended to handle for these features, the additional constraintsmay hinder its scalability. In such a situation, we can elimi-nate infeasible paths using Model Checking (as in [11]) andreduce the search space of the model checker (as in [29])while applying our technique. Furthermore, when the in-struction timing depends on the value of operands, a register

value analysis becomes necessary. For example, to determinethe timing of a load instruction, the specific address decideswhether this is cached, or it becomes a slow bus access, pos-sibly with waiting states, or a fast access to a core-coupledmemory. At the moment, we have not addressed how to carryover such a value analysis to the source code level.

Back-Annotation of Timing. The mapping of temporal infor-mation from machine instructions to source-level is a rathertechnical problem. A compiler could be build which enablescomplete traceability in at least one direction. Plans for sucha compiler have been made in [51], unfortunately, we arenot aware of any implementation, which leaves us with thetask of matching instructions to source code. Without anymapping information, this poses an optimal inexact graphmatching problem. Specifically, when compiler optimizationis off, then we expect this to boil down to a graph minor test(where the edit sequence represents our wanted mapping andthe source graph is fixed), known to have a polynomial com-plexity in the number of nodes [53]. With optimization on,however, it is unclear how the matching could be establishedin general. We therefore have to rely on some mapping infor-mation from the compilation process (as given by addr2linein our toolchain), and apply techniques to complete the map-ping.

Our mapping strategy assumes that the program is com-piled using gcc 4.8.1 for the 8bit AVR microprocessor fam-ily [36], with standard compilation options. If we changethe target-compiler pair or the compilation options, thenour backmapping strategy may not work. Since it was es-tablished on a case-by-case basis, our strategy might not becomplete, but it was extensive enough to cover all cases inthe Malardalen benchmarks. In general, compiler transfor-mations are a common problem for all WCET techniques.And, to the best of our knowledge, there has been no genericsolution to this problem, except to provide support for eacharchitecture, compiler and transformation individually, oftenin the form of pattern matching [59].

Compiler-Inserted Code. The compiler may insert low-levelprocedures, such as loops for arithmetic shifts or code forsoft-floating point operations. These are not covered in ourcurrent tool. We believe that this is only a matter of tooling.If the source code for such computations is not available,then TIC requires a pre-computed library of the WCETs oflow-level functions, to facilitate the backmapping for thelow-level loops.

7 Related Work

WCET Analysis. An excellent survey about techniques andtools for WCET analysis was published by Wilhelm et al. [59].We refer the reader to this article for a profound overview on

22 Becker et al.

the topic. A commonly used set of benchmark programs forWCET analysis are the Malardalen WCET benchmarks [24].

Model Checking for WCET Analysis. There have been stud-ies highlighting the inadequacy [58,40] of Model Checkingfor the task of WCET computation, concluding that is doesnot scale for WCET estimation. Experiments in [40] con-firmed that model checkers often face run-time- or memory-infeasibility for complex programs, whereas the ILP tech-nique can compute the WCET almost instantly. However,because they used the same benchmarks that we are usinghere (on a very similar processor) and we come to the oppo-site conclusion, this confirms that our shift of the analysis tothe source code indeed mitigates the scalability issue.

Further, there have been instances where a model checkerwas used as a subanalysis or optimization step in WCET es-timation. Chattopadhyay et al. [11] propose the use of AI forcache analysis and Model Checking for pruning infeasiblepaths considered by AI. Marref et al. [41] show automaticderivation of exact program flow constraints for WCET com-putation using Model Checking for hypothesis validation.Another proponent for Model Checking in WCET analysisis found in Metzner [43], who has shown that it can im-prove cache analysis, while not suffering from numericalinstabilities, unlike ILP. However, none of these approachesaddressed the scalability issue of Model Checking.

WCET Analysis at Higher Levels of Abstraction. One of thefirst works proposing WCET analysis at source level waspublished by Puschner [50]. He introduced the notion of tim-ing schemata, where each source-level construct is assigneda constant execution time. Many constraints have been im-posed on the programming language, as well as annotationconstructs to aid analysis. A similar approach was describedshortly after that in [46]. In fact, the processors they usedwere very comparable to the one used in our experiments.However, in their case, all loops must be statically bounded,and overestimation is a direct result of assigning a constantexecution time to source level constructs (since the compilermay translate the same construct very differently, dependingon the specific variable types and values).

Holsti [30] proposed modeling execution time as a globalvariable, and to use a dependency and value analyses (Pres-burger Arithmetic) to determine the value of the variableat the end of a function and meanwhile exclude infeasiblepaths. Similar to our approach, slicing and loop accelera-tion were suggested. However, he showed by example thatonly some infeasible paths can be excluded by his method,whereas we can precisely detect all infeasible paths. Further-more, his approach seems to have scalability issues, whichunfortunately are not detailed further due to a lacking setof experiments. Kim et al. [32] experimented with comput-ing WCET using Model Checking at source level on smalland simple programs, but without addressing the scalabilityissues, nor providing experimental data.

Puschner [49] later computed WCET on an AST-likerepresentation of a program, where flow constraints are ex-pected to be given by the user, and assuming that the com-piler performs the back-mapping of actual execution times tothe source code. He used ILP to compute the longest path. Itshould be noted that any ILP-based approach cannot handlevariable execution times of basic blocks without large over-approximation, whereas Model Checking can encode suchproperties with non-determinism and range constraints. Fur-thermore, complete path reconstruction is not possible eitherwith that approach.

WCET analysis has also been proposed at an interme-diate level in between source code and machine code, sim-ilarly because of easier analysis of data and control flow.Altenbernd [2] et al. developed an approximation of WCETwhich works on an ALF representation of the program, with-out analyzing the executable. They automatically identifieda timing model of the intermediate instruction set throughexecution and measurement of training programs. As an ef-fect, the analysis is very efficient, but the result is a possiblyunsafe WCET estimate.

Program Slicing, Acceleration and Abstraction. Hatcliff [27]was the first to suggest the use of program slicing to helpscale up Model Checking. In this work, we have build onthe acceleration and abstraction capabilities of LABMC [14].Different abstractions for improving the precision of WCETcomputation or determining loop bounds have been exploredby other researchers. Ermedahl et al. [19] show precisionimprovement in WCET computations by clustering basicblocks in a program. Knoop et al. [34] use recurrence rela-tions to determine loop bounds in programs. Blazy et al. [7]use program slicing and loop bound calculation techniquesto formally verify computed loop bounds. Cerny et al. [10]apply a segment abstraction technique to improve the pre-cision of WCET computations. While in these abstractionscould be used in some situations, in general they are eithertoo restrictive because they do not work for a large class ofprograms, or they fail to address the scalability issue aris-ing in the context of using a model checker to compute theWCET. Al-Bataineh et al. [1] use a precise acceleration oftimed-automata models (with cyclic behavior) for WCETcomputation, to scale up their IPET technique. However,these ideas are not readily applicable to loop accelerationin C programs in the absence of suitable abstractions.

Timing Debugging. Understanding how the worst-case tim-ing is produced is very important for practitioners, but thereis only a small body of work on this topic. Reconstructingthe inputs leading to WCET has been done before by Er-medahl [18], and is perhaps the closest work in respect tothe degree of detail that is provided on the WCET path. Ourapproach, however, uses entirely different methods. WhileErmedahl applies a mixture of static analysis and measure-ments to perform a systematic search over the value space

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 23

with WCET estimation, and only provides the inputs to theWCET path, our approach is leveraging the output of themodel checker that witnesses the WCET estimate, performsonly a single run of the application, and reconstructs theWCET inputs, as well as the precise path being taken to-gether with a timing profile. It is thus less expensive andbetter integrated with the actual WCET estimation.

Making the worst-case (and best-case) timing visible inthe source code is a rather old idea, first appeared around1995 [35]. The authors highlighted WCET/BCET paths inthe source code, allowing the user to visually see the WCETpath and how time passes on that path. This work was laterextended in [60] to apply genetic algorithms for minimizingWCET, and still later to introduce compiler transformationsto reduce the WCET (this is, however, beyond the scope ofthis article). A similar, but interactive tool was presented in[26]. This was also a tree-based approach for it was focusingon analysis speed instead of precision. All these approachesdid not provide tight results, as they were based on timingtrees which did not consider data dependencies; they canonly reconstruct the WCET path w.r.t. locations and timespent there, but lack a specific trace including variable values.Furthermore, trivial loop bounds had to be given manuallyby the user.

The commercial WCET tool RapiTime [5] estimates thetiming of software, and enables the user to identify timingbottlenecks at source-code level. For that, the tool provides asimple path highlighting, but also allows predicting whatwould happen if a specific function would be optimized.However, the tool is measurement-based and thus cannotgive any guarantees. The commercial tool AbsInt [20] pro-vides time debugging capabilities at the assembly level, e.g.,a call graph and a timing profile. It is also capable of mappingback this timing information to a high-level model, wherethe building blocks of the model are annotated with theircontribution to the WCET. Further, the tool allows to makechanges to the model, and compare the results with the pre-vious ones, essentially enabling a trial-and-error approachto improve the WCET. Finally, a similar toolset that realizesa generic formal interface between a modeling tool and atiming analysis tool has been presented in [21], but relieson external tools for the actual analysis. All of these toolsenable timing debugging to some extent, but they cannot pro-vide a specific trace with values, or even allow the user tointeractively step through the same. In contrast, our WCETpath has the maximum level of detail, and can be inspectedin a well-known debugger environment, thus offers deeperand more intuitive explanation of how the WCET path cameto be.

Possible Enhancements. One approach that could be com-bined with ours to further speed up Model Checking, is thatof Henry et al. [29]. They also employ an SMT solver tocompute the WCET (just like our back-end), but they pro-

pose additional constraints to have the solver ignore infeasi-ble paths. This helps to further increase the scalability, butunder the assumption that the programs are loop-free, or thatloops have been unrolled. This is therefore an enhancementthat fits well our approach. Brandner [8] computed time criti-cality information on all parts of a program, to help focus onoptimizing paths that are close to WCET, and not only thoseon it. However, this work only provides a relative ranking ofall code blocks (not absolute numbers on time consumption),and requires an external WCET analyzer that annotates codeblocks with individual WCETs. It can therefore be viewedas another possible extension for our work.

8 Concluding Remarks

We have shown that Model Checking can be a competitive ap-proach to Worst-Case Execution Time Analysis, in particularwhen an analysis-friendly processor is used. The estimatesare comparable to and sometimes even more precise thanestimates of the widely-used ILP technique, but with severalpractical advantages. Additional to a precise estimate, wealso reconstruct and replay the WCET path, while providingprofiling data and a well-known debugger interface, allow-ing the user to inspect arbitrary details on the WCET path atboth source code and machine code level. Although our ap-proach does not entirely remove the need for manual inputsfrom the user, WCET analysis is no longer prone to humanerror coming from there, because the model checker alsoverifies whether such inputs are sound. If too small boundsare given, an error is flagged. Too large bounds, on the otherhand, only influence the analysis time, but not the outcome.In summary, we therefore arrive at a safer WCET analysisand a more intuitive understanding of the outcome.

An essential part of our approach is the shift of the anal-ysis from machine instructions to the source code. Throughthis, data and control flows can be tracked more precisely,and source code transformation techniques can be applied tosummarize loops, to remove statements not related to timing,and to over-approximate larger programs. As a result, theanalysis time can be reduced significantly, making ModelChecking a viable approach to the Worst-Case ExecutionTime problem.

What we have shown in this article is merely the firststep of reintroducing Model Checking to Worst-Case Exe-cution Time Analysis. However, there is substantially morework to be done to catch up with the proven approaches ofcombining AI and ILP: Here, we assumed processors with al-most constant instruction timing. While certainly processorsfor real-time applications should be simplified to address theself-made and acknowledged problem of processors becom-ing more and more unpredictable, the approach presentedhere must be extended to meet current processors half way.Specifically, we plan to include a register-level value analysis

24 Becker et al.

to be able to handle variable timing due to memory-mappedI/O, and to propose a specific source-level model for scratch-pad memories. This will result in support for a much widerrange of processors. The challenge in these extensions willmainly be how to lift models for these architectural behav-iors to the source code, without impairing the scalability ofModel Checking too much. Nevertheless, pursuing this routeshould be worth the efforts that are on the way, by reasonof the practical benefits over existing approaches, such ashigher automation and usability.

References

1. Al-Bataineh, O., Reynolds, M., French, T.: Accelerating worst caseexecution time analysis of timed automata models with cyclic be-haviour. Formal Aspects of Computing 27(5), 917–949 (2015)

2. Altenbernd, P., Gustafsson, J., Lisper, B., Stappert, F.: Early ex-ecution time-estimation through automatically generated timingmodels. Real-Time Systems 52(6), 731–760 (2016)

3. Axer, P., Ernst, R., Falk, H., Girault, A., Grund, D., Guan, N.,Jonsson, B., Marwedel, P., Reineke, J., Rochange, C., Sebastian,M., von Hanxleden, R., Wilhelm, R., Yi, W.: Building timing pre-dictable embedded systems. ACM Trans. Embedded Comput. Syst.13(4), 82:1–82:37 (2014)

4. Becker, M., Neumair, M., Sohn, A., Chakraborty, S.: Approachesfor Software Verification of an Emergency Recovery System forMicro Air Vehicles. In: F. Koornneef, C. van Gulijk (eds.) Proc.Computer Safety, Reliability, and Security - 34th International Con-ference (SAFECOMP), Lecture Notes in Computer Science, vol.9337. Springer (2015)

5. Bernat, G., Davis, R., Merriam, N., Tuffen, J., Gardner, A., Ben-nett, M., Armstrong, D.: Identifying opportunities for worst-caseexecution time reduction in an avionics system. Ada User Journal28(3), 189–195 (2007)

6. Beyer, D.: Status report on software verification - (competitionsummary SV-COMP 2014). In: E. Abraham, K. Havelund (eds.)Proc. 20th International Conference on Tools and Algorithms forthe Construction and Analysis of Systems (TACAS), Lecture Notesin Computer Science, vol. 8413, pp. 373–388. Springer (2014)

7. Blazy, S., Maroneze, A.O., Pichardie, D.: Formal verification ofloop bound estimation for WCET analysis. In: E. Cohen, A. Ry-balchenko (eds.) Proc. 5th International Conference on VerifiedSoftware: Theories, Tools, Experiments (VSTTE), Lecture Notesin Computer Science, vol. 8164, pp. 281–303. Springer (2014)

8. Brandner, F., Hepp, S., Jordan, A.: Static profiling of the worst-case in real-time programs. In: L. Cucu-Grosjean, N. Navet,C. Rochange, J.H. Anderson (eds.) Proc. 20th International Con-ference on Real-Time and Network Systems (RTNS), pp. 101–110.ACM (2012)

9. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: A binaryanalysis platform. In: G. Gopalakrishnan, S. Qadeer (eds.) Proc.23rd International Conference on Computer Aided Verification(CAV), Lecture Notes in Computer Science, vol. 6806, pp. 463–469. Springer (2011)

10. Cerny, P., Henzinger, T.A., Kovacs, L., Radhakrishna, A., Zwirch-mayr, J.: Segment abstraction for worst-case execution time analy-sis. In: J. Vitek (ed.) Proc. 24th European Symposium on Program-ming Languages and Systems (ESOP), Lecture Notes in ComputerScience, vol. 9032, pp. 105–131. Springer (2015)

11. Chattopadhyay, S., Roychoudhury, A.: Scalable and precise re-finement of cache timing analysis via path-sensitive verification.Real-Time Systems 49(4), 517–562 (2013)

12. Clarke, E.M., Kroening, D., Lerda, F.: A tool for checking ANSI-Cprograms. In: K. Jensen, A. Podelski (eds.) Proc. 10th Interna-tional Conference on Tools and Algorithms for the Constructionand Analysis of Systems (TACAS), Lecture Notes in ComputerScience, vol. 2988, pp. 168–176. Springer (2004)

13. Clarke Jr., E.M., Grumberg, O., Peled, D.A.: Model Checking.MIT Press, Cambridge, MA, USA (1999)

14. Darke, P., Chimdyalwar, B., Venkatesh, R., Shrotri, U., Metta,R.: Over-approximating loops to prove properties using boundedmodel checking. In: W. Nebel, D. Atienza (eds.) Proc. Design,Automation & Test in Europe Conference & Exhibition (DATE),pp. 1407–1412. ACM (2015)

15. Demyanova, Y., Pani, T., Veith, H., Zuleger, F.: Empirical soft-ware metrics for benchmarking of verification tools. In: D. Kroen-ing, C.S. Pasareanu (eds.) Proc. 27th International Conference onComputer Aided Verification (CAV), Lecture Notes in ComputerScience, vol. 9206, pp. 561–579. Springer (2015)

16. Ding, H., Liang, Y., Mitra, T.: Wcet-centric partial instructioncache locking. In: P. Groeneveld, D. Sciuto, S. Hassoun (eds.) Proc.49th Annual Design Automation Conference (DAC), pp. 412–420.ACM (2012)

17. Edwards, S.A., Kim, S., Lee, E.A., Liu, I., Patel, H.D., Schoeberl,M.: A disruptive computer design idea: Architectures with repeat-able timing. In: Proc. 27th International Conference on ComputerDesign (ICCD), pp. 54–59. IEEE Computer Society (2009)

18. Ermedahl, A., Fredriksson, J., Gustafsson, J., Altenbernd, P.: De-riving the worst-case execution time input values. In: Proc. 21stEuromicro Conference on Real-Time Systems (ECRTS), pp. 45–54. IEEE Computer Society (2009)

19. Ermedahl, A., Stappert, F., Engblom, J.: Clustered worst-caseexecution-time calculation. IEEE Trans. Computers 54(9), 1104–1122 (2005)

20. Ferdinand, C., Heckmann, R., Le Sergent, T., Lopes, D., Martin,B., Fornari, X., Martin, F.: Combining a high-level design tool forsafety-critical systems with a tool for wcet analysis of executables.In: Proc. 4th European Congress on Embedded Real Time Software(ERTS). SIA/AAAF/SEE (2008)

21. Fuhrmann, I., Broman, D., von Hanxleden, R., Schulz-Rosengarten, A.: Time for reactive system modeling: Interac-tive timing analysis with hotspot highlighting. In: A. Plantec,F. Singhoff, S. Faucou, L.M. Pinho (eds.) Proc. 24th InternationalConference on Real-Time Networks and Systems (RTNS), pp. 289–298. ACM (2016)

22. Furber, S.: ARM System-on-Chip Architecture. Addison Wesley(2000)

23. Gulwani, S., Jain, S., Koskinen, E.: Control-flow refinement andprogress invariants for bound analysis. In: M. Hind, A. Diwan(eds.) Proc. Conference on Programming Language Design andImplementation (PLDI), pp. 375–385. ACM (2009)

24. Gustafsson, J., Betts, A., Ermedahl, A., Lisper, B.: The malardalenWCET benchmarks: Past, present and future. In: B. Lisper(ed.) Proc. 10th International Workshop on Worst-Case ExecutionTime Analysis (WCET), OASICS, vol. 15, pp. 136–146. SchlossDagstuhl - Leibniz-Zentrum fuer Informatik, Germany (2010)

25. Gustafsson, J., Ermedahl, A., Sandberg, C., Lisper, B.: Automaticderivation of loop bounds and infeasible paths for wcet analysisusing abstract execution. In: Proc. 27th International Real-TimeSystems Symposium (RTSS), pp. 57–66 (2006)

26. Harmon, T., Klefstad, R.: Interactive back-annotation of worst-caseexecution time analysis for java microprocessors. In: Proc. 13thInternational Conference on Embedded and Real-Time ComputingSystems and Applications (RTCSA), pp. 209–216. IEEE ComputerSociety (2007)

27. Hatcliff, J., Dwyer, M.B., Zheng, H.: Slicing software for modelconstruction. Higher-Order and Symbolic Computation 13(4), 315–353 (2000)

Scalable and Precise Estimation and Debugging of the Worst-Case Execution Time for Analysis-Friendly Processors 25

28. Healy, C.A., Sjodin, M., Rustagi, V., Whalley, D.B., van Enge-len, R.: Supporting timing analysis by automatic bounding of loopiterations. Real-Time Systems 18(2/3), 129–156 (2000)

29. Henry, J., Asavoae, M., Monniaux, D., Maiza, C.: How to computeworst-case execution time by optimization modulo theory and aclever encoding of program semantics. In: Y. Zhang, P. Kulkarni(eds.) Proc. 15th Conference on Languages, Compilers and Toolsfor Embedded Systems (LCTES), pp. 43–52. ACM (2014)

30. Holsti, N.: Computing time as a program variable: a way aroundinfeasible paths. In: R. Kirner (ed.) Proc. 8th Intl. Workshop onWorst-Case Execution Time (WCET) Analysis, OASICS, vol. 8.Internationales Begegnungs- und Forschungszentrum fuer Infor-matik (IBFI), Schloss Dagstuhl, Germany (2008)

31. Holsti, N., Saarinen, S.: Status of the Bound-T WCET tool. SpaceSystems Finland Ltd (2002)

32. Kim, S., Patel, H.D., Edwards, S.A.: Using a Model Checker toDetermine Worst-Case Execution Time. Tech. rep., Columbia Uni-versity (2009). CUCS-038-09

33. Kirner, R., Puschner, P.P.: Obstacles in worst-case execution timeanalysis. In: Proc. 11th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC), pp. 333–339. IEEE Computer Society (2008)

34. Knoop, J., Kovacs, L., Zwirchmayr, J.: Symbolic loop bound com-putation for WCET analysis. In: E.M. Clarke, I. Virbitskaite,A. Voronkov (eds.) Proc. 8th International Conference Perspec-tives of Systems Informatics (PSI), Revised Selected Papers, Lec-ture Notes in Computer Science, vol. 7162, pp. 227–242. Springer(2012)

35. Ko, L., Healy, C.A., Ratliff, E., Arnold, R.D., Whalley, D.B., Har-mon, M.G.: Supporting the specification and analysis of timingconstraints. In: Proc. 2nd Real-Time Technology and ApplicationsSymposium (RTAS), pp. 170–178. IEEE Computer Society (1996)

36. Kuhnel, C.: AVR RISC Microcontroller Handbook, first edn.Newnes (1998)

37. Kuo, M.M.Y., Yoong, L.H., Andalam, S., Roop, P.S.: Determiningthe worst-case reaction time of IEC 61499 function blocks. In:Proc. 8th IEEE International Conference on Industrial Informatics,pp. 1104–1109. IEEE (2010)

38. Li, Y.T., Malik, S.: Performance analysis of embedded softwareusing implicit path enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16(12) (1997)

39. Lickly, B., Liu, I., Kim, S., Patel, H.D., Edwards, S.A., Lee, E.A.:Predictable programming on a precision timed architecture. In:E.R. Altman (ed.) Proc. International Conference on Compilers,Architecture, and Synthesis for Embedded Systems, (CASES), pp.137–146. ACM (2008)

40. Lv, M., Gu, Z., Guan, N., Deng, Q., Yu, G.: Performance compar-ison of techniques on static path analysis of WCET. In: C. Xu,M. Guo (eds.) Proc. International Conference on Embedded andUbiquitous Computing (EUC), pp. 104–111. IEEE Computer So-ciety (2008)

41. Marref, A.: Fully-automatic derivation of exact program-flowconstraints for a tighter worst-case execution-time analysis. In:L. Carro, A.D. Pimentel (eds.) Proc. International Conference onEmbedded Computer Systems: Architectures, Modeling, and Sim-ulation (SAMOS), pp. 200–208. IEEE (2011)

42. Metta, R., Becker, M., Bokil, P., Chakraborty, S., Venkatesh, R.:TIC: a scalable model checking based approach to WCET estima-tion. In: T. Kuo, D.B. Whalley (eds.) Proc. 17th Conference onLanguages, Compilers, Tools, and Theory for Embedded Systems(LCTES), pp. 72–81. ACM (2016)

43. Metzner, A.: Why model checking can improve WCET analysis.In: R. Alur, D.A. Peled (eds.) Proc. 16th International Conferenceon Computer Aided Verification (CAV), Lecture Notes in Com-puter Science, vol. 3114, pp. 334–347. Springer (2004)

44. Mitra, T., Teich, J., Thiele, L.: Adaptive Isolation for Predictabilityand Security (Dagstuhl Seminar 16441). Dagstuhl Reports 6(10),120–153 (2017)

45. Mittal, S.: A survey of techniques for cache locking. ACM Trans.Design Autom. Electr. Syst. 21(3), 49:1–49:24 (2016)

46. Park, C.Y., Shaw, A.C.: Experiments with a program timing toolbased on source-level timing schema. IEEE Computer 24(5), 48–57 (1991)

47. Pingali, K., Bilardi, G.: APT: A data structure for optimal controldependence computation. In: D.W. Wall (ed.) Proc. Conference onProgramming Language Design and Implementation (PLDI), pp.32–46. ACM (1995)

48. Puschner, P.: Is WCET Analysis a Non-Problem? – Towards NewSoftware and Hardware Architectures. In: G. Bernat (ed.) Proc.2nd International Workshop on Worst-Case Execution Time Anal-ysis (WCET), pp. 89–92. Technical University of Vienna, Austria(2002)

49. Puschner, P.P.: A tool for high-level language analysis of worst-case execution times. In: Proc. 10th Euromicro Conference onReal-Time Systems (ECRTS), pp. 130–137. IEEE Computer Soci-ety (1998)

50. Puschner, P.P., Koza, C.: Calculating the maximum execution timeof real-time programs. Real-Time Systems 1(2), 159–176 (1989)

51. Puschner, P.P., Prokesch, D., Huber, B., Knoop, J., Hepp, S.,Gebhard, G.: The T-CREST approach of compiler and wcet-analysis integration. In: Proc. 16th International Symposium onObject/Component/Service-Oriented Real-Time Distributed Com-puting, (ISORC), pp. 1–8. IEEE Computer Society (2013)

52. Raymond, P., Maiza, C., Parent-Vigouroux, C., Carrier, F.: Timinganalysis enhancement for synchronous program. In: M. Auguin,R. de Simone, R.I. Davis, E. Grolleau (eds.) Proc. 21st Interna-tional Conference on Real-Time Networks and Systems (RTNS),pp. 141–150. ACM (2013)

53. Robertson, N., Seymour, P.: Graph minors .xiii. the disjoint pathsproblem. Journal of Combinatorial Theory, Series B 63(1), 65 –110 (1995)

54. Schoeberl, M.: JOP: A Java Optimized Processor. In: R. Meers-man, Z. Tari (eds.) Proc. International Workshop On The Moveto Meaningful Internet Systems (OTM), pp. 346–359. SpringerBerlin Heidelberg (2003)

55. Souyris, J., Pavec, E.L., Himbert, G., Jegu, V., Borios, G., Heck-mann, R.: Computing the Worst Case Execution Time of an Avion-ics Program By Abstract Interpretation. In: Proc. 5th Intl Workshopon Worst-Case Execution Time (WCET) Analysis (2005)

56. Sun Microsystems Inc.: The SPARC Architecture Manual, Version7. Sun Microsystems Inc. (1987)

57. Weiser, M.: Program slicing. In: S. Jeffrey, L.G. Stucki (eds.) Proc.5th International Conference on Software Engineering (ICSE), pp.439–449. IEEE Computer Society (1981)

58. Wilhelm, R.: Why AI + ILP is good for WCET, but MC is not, norILP alone. In: B. Steffen, G. Levi (eds.) Proc. 5th InternationalConference on Verification, Model Checking, and Abstract Inter-pretation (VMCAI), Lecture Notes in Computer Science, vol. 2937,pp. 309–322. Springer (2004)

59. Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S.,Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T.,Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenstrom, P.:The Worst-Case Execution Time Problem – Overview of Methodsand Survey of Tools. ACM Trans. Embed. Comput. Syst. 7(3),36:1–36:53 (2008)

60. Zhao, W., Kulkarni, P.A., Whalley, D.B., Healy, C.A., Mueller, F.,Uh, G.: Tuning the WCET of embedded applications. In: Proc.10th Real-Time and Embedded Technology and Applications Sym-posium (RTAS), pp. 472–481. IEEE Computer Society (2004)