Assessing the Reliability of Successive Approximate...

Journal of Electronic Testing (2019) 35:367–381https://doi.org/10.1007/s10836-019-05806-y

Assessing the Reliability of Successive Approximate ComputingAlgorithms under Fault Injection

Gennaro S. Rodrigues1 · Adria Barros de Oliveira1 · Fernanda Lima Kastensmidt1 · Vincent Pouget2 ·Alberto Bosio3

Received: 26 November 2018 / Accepted: 13 May 2019 / Published online: 30 May 2019© Springer Science+Business Media, LLC, part of Springer Nature 2019

AbstractThis work presents two fault injection and dependability test methodologies exploring the fault tolerance of successiveapproximation algorithms. This type of approximate computing algorithm can present an inherent fault tolerance, convergingto a final correct output even under faults affecting processed data. A set of algorithms was implemented as embeddedsoftware in the ARM Cortex A9 processor of Xilinx Zynq-7000 series board. Experiments consist of exposing thedecapsulated processor to laser beams targeting the data cache memory and emulation fault injections at the register file.Results show that successive approximation is effective in protecting the output from faults injected at the data cachememory, but not from the ones injected at the register file. The experiments also show that most of the silent data corruptionerrors provoked by data cache fault injections are not significant and can be accepted as correct by merely tolerating a resultvariation of as little as 1%.

Keywords Reliability · Fault tolerance · Approximate computing · Laser · Fault injection

1 Introduction

Safety-critical systems need to achieve both excellentperformance and high dependability, given that they oftendeal with human lives and high-cost equipment. Thosesystems are constantly prone to faults, given the harshenvironment they are subjected (e.g., radiation for aerospacesystems). When a fault affects the system in a way that itis perceived by the user or other parts of the system, wesay that an error occurs [3]. A soft error occurs when itdoes not permanently damage a system. A soft error is alsocalled a single event upset (SEU). In some cases, such aswhen exposed to extreme radiation environments, electronicsystems are affected by multiple bits upset (MBU), but thosecases are rare.

Responsible Editor: L. M. Bolzani Pohls

� Gennaro S. Rodrigues

1 Porto Alegre, Brazil

2 Montpellier, France

3 Lyon, France

Developing fault-tolerant hardware is expensive, andachieving good trade-offs between performance, energyconsumption and reliability is an arduous task. This scenariohighlighted commercial off-the-shelf (COTS) componentsas a good hardware alternative for the safety-criticalsystems industry. COTS are capable of achieving excellentperformance and power consumption at a low cost andadd flexibility to the system design. However, they areusually not fault-tolerant and shall, therefore, be protectedat the software level. Consequentially, it is imperative toknow the error susceptibility of a safety-critical softwareexecuting on top of COTS. The development of softwarefault tolerance techniques and computing paradigms thatminimize the error occurrence is also desired. An exampleof COTS employed in a multitude of aerospace and othersafety-critical applications is the Zynq-7000 SoC.

Past works proved that software with approximate naturepresents higher intrinsic fault tolerance than conventionalalgorithms [13]. Approximate computing techniques [16]execute inexact computations results with acceptable accu-racy for many applications [18]. They have been used inmany scenarios, from big data to scientific applications [8].Approximate computing has been proposed as an approachto developing energy-efficient systems [5], saving computa-tional resources and presenting better execution times.

http://crossmark.crossref.org/dialog/?doi=10.1007/s10836-019-05806-y&domain=pdf

http://orcid.org/0000-0003-0461-4563

368 J Electron Test (2019) 35:367–381

Successive approximation algorithms achieve approxi-mate computation through iterative execution. They makeuse of mathematical properties and numerical analysis the-ory to approximate their results to the mathematicallyexpected one. Those algorithms are iteration-based and getcloser to an acceptable result on every execution of a loop.Because the value is approximated on each iteration, it isexpected that if an error occurs, causing an iteration execu-tion that is out of the expected calculation path, it will becorrected in the subsequent iterations. Past works presenteda preliminary study on the inherent fault-tolerant nature ofsuccessive approximation algorithms [13], comparing themto conventional algorithms.

Different types of applications imply on differencedependabilities, even when they are implemented andexecuted on the very same hardware. That is because theresources used are unique for each application, and differentparts of the hardware have different fault tolerances. Thatimplies on the need for safety-critical systems to bethoroughly evaluated for their reliability on a multitude ofmethods.

This work presents successive approximation as ameans of achieving the benefits of approximate computingwhile providing fault tolerance, exploring their possiblevariations. A past work [14] presented preliminary studieson the dependability of successive approximations. Inthis paper the works from [14] are expanded, providingresults from two fault injections methodologies. Each ofthose methodologies targets different resources of thesystem, injecting faults and analyzing their impact on thebenchmarks. Differently from [14], in this work faults areinjected both on the cache memory and at the register fileof the studied microprocessor. As will be further discussed,each of those injections has different implications. Thecontributions of this work are:

– The presentation of a method to assess the faulttolerance of algorithms under laser fault injection on theprocessor cache memory.

– A method to assess the fault tolerance of algorithmsunder emulation fault injection on the processor registerfile.

– The assessment of the fault tolerance of successiveapproximation algorithms related to the variation oftheir approximation intensity.

– Discussions on the differences between the behaviorof faults affecting the data cache memory and theprocessor register file.

The work is organized as follows. Section 2 presents anoverview of the proposed approximate computing methodand the impact of different tests methodologies on the experi-mental results. The experiments methodologies and thebenchmark applications used in this work are presented at

Section 3. The experiments results are discussed at Section 4.Finally, the conclusions are presented at Section 5.

2 Successive Approximation and FaultTolerance

The increasing computing power enables the possibilityto process more complex tasks. Among them, physicalproblems are often described by Partial Differential Equa-tions (PDEs). Since analytic solutions for those equationscannot be obtained with generalist approaches, numericalalgorithms are used. This can be achieved by softwareimplementing successive approximation algorithms, amongothers [1]. Successive approximation algorithms consistmainly of numerical calculations. Those are intended forwhen an exact solution is not computationally achievable.Some of the most significant mathematical problems (e.g.,derivatives and integrals calculations) base their only solu-tions on successive approximations. Those solutions areof great importance for a considerable number of applica-tions. Among them are safety-critical systems, which dealwith human lives and require high reliability. It is thereforeimperative to know approximate algorithms’ susceptibilityto errors. Past works showed that successive approximationalgorithms are capable of achieving some fault tolerancewithout any overhead in comparison with conventionalalgorithms [13].

The final output value from those algorithms is calculatedand approximated on each iteration of a loop. Therefore, themore loops are executed, the more accurate the final resultis. Because of that nature, it is expected that even if an erroroccurs and causes the execution to diverge from the correctpath, it can be corrected in the following iterations (forcingthe execution back to the correct path). This, however, isonly valid when analyzing SEUs, not permanent hardwarefaults. It is also worth mentioning that an error affectingone of the most critical iterations of the algorithm maynever be corrected. This will depend on the algorithm itself.Some have iterations of different importance (i.e., higherimpact on the final result) and others do not. A large numberof iterations would imply a smaller probability of a faultaffecting one of the important ones, leading to higher faulttolerance. A lower number of iterations would imply inlower fault tolerance, because each iteration would have ahigher impact on the output value.

While increasing the number of loop iterations ofa successive approximation algorithm may lower theprobability of an imminent fault to provoke an error, it mayalso increase the probability of a fault to affect the system.The longer the execution time of an application, the higherthe probability of a fault to reach it. For example, if a givensafety-critical system is exposed to an environment that

J Electron Test (2019) 35:367–381 369

provokes one fault per second, an application running fasterthan 1s will probably not even be touched by the fault, whileone with an execution time of more than 1s will certainlypresent at least one fault. This fault may or may not causean error. It is clear that a trade-off between reliability, loopsize, execution time and accuracy is present and needs to beknown during the development of a safety-critical system.

Safety-critical systems, such as aerospace and militaryapplications, often have strict guidelines and requirements.Avionics software, for example, requires the Radio Tech-nical Commission for Aeronautics (RTCA) certificationbefore being used on real systems. The DO-178B/C cer-tificate by RCTA imposes imposing a large number oflimitations and safety measures to avoid catastrophic errors.Those safety measures are often in conflict with the idea ofapproximate computing. They demand the double-checkingdata and strict execution deadlines. While there is no explicitban on the usage of successive approximation algorithms,a designer shall be well aware of the natural reliabilitybehavior of those algorithms and that they might imply ondeviations from the expected execution time. For example,a fault might be mitigated on the following iterations butcause the algorithm to need a higher number of iterationsbefore achieving a sufficient accuracy, increasing the execu-tion time and losing a deadline. Another example that mightcause problems is when applying successive approximationto applications on which the inputs are not well defined.Some successive approximation algorithms stop executingafter a specific condition is met (e.g., a small differencebetween two consecutive iterations, meaning the result isnear). For those applications, defining a strict deadlinemight be a challenge. All those issues show that succes-sive approximation provides reliability, but at costs that aresometimes unknown. This justifies an intense study on theirsafety-critical systems usability.

3 Experiment Methodologies

The proposed approximation method shall be evaluated onperformance, fault tolerance, and possible error correctioncapacity. Different experiments can be performed to assesseach one of those characteristics. The most realisticexperiment type is radiation exposure, where target systemis exposed to particles similar to those found in harshenvironments. On that case, the whole system is affected bythe radiation effects. Therefore, the inherent fault toleranceof the benchmarks can be compromised since there aresensible parts that are not protected by the techniquepresented in this paper.

Testing the system under radiation would not evaluate thesuccessive approximation efficiency, but system reliabilityas a whole. The best manner to acquire interesting data

is to study soft-errors that affect only the processor’sregisters and memory. That way, we can evaluate howthe software reacts to faults affecting the resources ituses. For that purpose, fault injection experiments areperformed by emulation in the processor register file andby laser pulses targeting the data cache. The proposedmethodologies inject faults only at the defined sensibleregions. If radiation experiments were to simulate the actualapplication conditions, it would take very long time togather sufficient data for any probabilistic study to bepossible. Because of that, in this work, the system will beconsidered under fault emulation.

This work presents two different fault injection method-ologies, intended for different purposes. The first one is alaser fault injection, detailed at Section 3.2. In this case,laser pulses target the cache memory of the an embed-ded processor, causing bit-flips. The second one consists ofinjecting faults on the processor register file, presented atSection 3.3. The methodologies differ in relation to the wayfaults are injected but the benchmarks applied to both casesare the same (to be presented at Section 3.1), and so is theclassification of the errors caused by the faults (Section 3.4).

The device under test (DUT) of this work is a Zynq-7000 APSoC, designed by Xilinx. The Zynq board hasembedded a high-performance ARM Cortex-A9 processorwith two cache levels on the processing system (PS),alongside a programmable logic (PL) layer. The PL presentsan FPGA based on the Xilinx 7-Series with approximately27.7Mb configuration logic bits and 4.5Mb Block RAM(BRAM). The dual-core 32-bit ARM Cortex-A9 processorruns a maximum of 666MHz and is designed with 28nmtechnology. It counts with two L1 caches (data andinstruction) per core with 32KB each, and one L2 cachewith 512KB shared between both cores. A 256KB on-chipSRAM memory (OCM) is shared between the PS and PLparts, and so is the DDR (external memory interface). In thiswork, only the PS part of the board is used.

3.1 Study Cases and Benchmarks

As study case and examples of successive approximation,three algorithms are presented. Those are numericalmethods used to compute calculus approximations and canbe applied to a multitude of safety-critical applications. Oneexample is the case of nuclear power plants concerningthe numerical integration of the neutron diffusion equation[12, 17]. The numerical methods implemented by usingsuccessive approximation evaluated in this work are:

1. Newton-Raphson: The Newton-Raphson method is analgorithm used to find the roots of a function. Itcalculates the intersection of the tangent line of thefunction in an initial guess point x0 with the x-axis. It is

370 J Electron Test (2019) 35:367–381

calculated iteratively, as stated in Eq. 1 until it reachesa sufficient approximation. The algorithm considers a“good enough” approximation to be achieved when thetwo last consecutive iterations calculated points veryclose to each other. It means that the algorithm alreadyconversed to the best result possible, or is very close toit.

xn+1 = xn − f (xn)

f ′(xn)(1)

2. Trapezoid Rule: The trapezoid rule algorithm is usedto calculate the integral of a function. It approximatesthe area under a curve to some trapezoids and thencalculates their areas. Considering N equally spacedtrapezoids defined between points a and b of thefunction, we have each trapezoid k with a base of length�xk = �x = b−a

N. That being said, Eq. 2 defines the

integral approximation with the trapezoid rule.

∫ b

a

f (x) dx ≈ �x

2

N∑k=1

(f (xk−1) + f (xk)) (2)

3. Simpson’s Rule: Another way of numerically approxi-mating a function integral is with the Simpson’s rule.The difference between the Simpson’s rule and theTrapezoid is that it calculates the area of parabolasinstead of trapezoids. This way, it usually approximatesthe result with more exactitude in fewer iterations thanthe Trapezoid rule. Equation 2 presents the Simpsonapproximation for an integral with a step size of h =(b − a)/2.∫ b

a

f (x) dx ≈ h

3

[f (a) + 4f

(a + b

2

)+ f (b)

](3)

Those algorithms are considered on this work asbenchmarks. For each one, three variants are proposed. Eachvariant has a different number of iterations (and thereforeaccuracy), but the algorithm remains the same. Thosedifferent number of iterations are achieved by manipulating

the inputs of the algorithms. On the Newton-Raphsonalgorithm, the initial point and the stop condition arechanged so that the algorithm will require more iterations toachieve the expected result. For the Trapezoid and Simpsonrules, the number of trapezoids and parabolas used tocalculate the total area are increased for each subsequentvariant, thus forcing a higher number of iterations. Table 1provides some details on each of those variants and how thenumber of iterations affects the application execution. Somealgorithms converge faster to a final acceptable result thanother (notice the difference between Trapezoid and Newton-Raphson at Table 1), and therefore naturally present alower number of iterations. The different iterations numberof the benchmarks allows an in-depth assessment of howit impacts the algorithm execution behavior concerningreliability. Those three different algorithms, with variousiterations number variations, were chosen to broaden thestudy space and represent the multiple different convergencepossibilities.

The data from Table 1 is gathered from the implemen-tation at the Zynq-7000 APSoC. The data concerning theused registers will be very important during the register faultinjections experiments. This data was gathered concerningonly the general-purpose registers (from r0 to r12 and thestack pointer, link register, and program counter) and willbe further detailed at Section 3.3. The table shows that thosebenchmarks tend to use few registers. It also shows thatthe number of accesses to the L1 Data Cache is heavilyimpacted by the number of iterations of the loop, with theexception of the Trapezoid application. On that case, it isprobably because all the Trapezoid variants already have ahigh data cache access rate. The fact that Trapezoid is thebenchmark with the highest execution time supports thatidea

Faults affecting the register file are expected to havea higher probability to vanish than the ones affecting thecache memory. As Table 1 shows, the proposed benchmarksare far from using all registers available. Low use of

Table 1 Benchmarks details

App. Var. Num. of Iters. Used Registers L1 Data Cache Accesses [per ms] Exec. Time [ms]

Newton- Raphson 1 14 r2, r3, r11, pc, sp, lr 97.2k 0.44

2 37 202.4k 1.19

3 71 682.8k 3.19

Simpson 1 242 r2, r3, r11, pc, sp, lr 178.2k 0.94

2 423 1648.1k 9.31

3 3081 2350.4k 18.62

Trapezoid 1 128 r0, r2, r3, r11, pc, sp, lr 6053.2k 202.34

2 1274 6792.4k 605.29

3 12746 6763.1k 33540.09

J Electron Test (2019) 35:367–381 371

registers means that the sensitive area of the register file issmall. Therefore faults affecting it may touch registers thatare not even in use. Injecting faults in the cache memory,however, may lead to unexpected behaviors. Some of thebenchmarks have a high number of cache memory accesses.However, it can both mean that the fault would be read intothe applications and cause an error or that a faulty memoryspace would be overwritten, causing the fault to vanish.

3.2 Laser Fault Injection

Laser testing is commonly used as an in-lab tool forinjecting transient localized perturbations into a deviceunder test by photoelectric stimulation, especially forSingle-Event Effects investigations [4], security evaluation[15], and more generally to evaluate the fault tolerance ofan application [10].

The experiments setup is presented in Fig. 1. It consists ofthe DUT, a host computer, and the laser equipment. The hostcomputer is responsible for controlling the laser beam andlistening to messages from the DUT. The DUT periodicallysends messages to the host computer, to report an error orjust to confirm it is alive. Error messages are reported whenthere is a difference between an execution output and thegolden output. The golden output is the result from a fault-free execution at the beginning of the experiment, calledgolden execution. The alive message is essential becausesome faults will cause the DUT to be unresponsive or hang(errors definitions will be further detailed in Section 3.4),needing a reset. A reset consists of re-programming andconfiguring the DUT and is performed when a timeoutoccurs while the host computer waits for an alive messagefrom the DUT. This timeout is set to about three minutes butmay vary for different experiments with different responsetimes. During the reset, the DUT is turned off and backon again, reseting the ARM processor and the DUT warnsthe host computer so that the laser beam is deactivated.

Fig. 1 Laser experiments setup

It prevents any errors during the system initialization andgolden execution. The laser beam is then re-activated afterthe host computer receives an alive message from the DUT,meaning the ARM processor is running again as usual.

The communication between the host computer and theDUT is rather complex and is highly susceptible to errorsbecause it happens during the fault injection. To avoiderrors that are not interesting to our experiment and wouldmake it less efficient, we developed a strategy to reducethis communication to a minimum necessary. During abenchmark execution, the algorithm runs N times, fillingin an output vector, which will be then compared with theresult from the fault-free execution (golden value). This waythe DUT only has to send messages to the host computeronce every N runs, or when an error is detected. The valueof N may vary for different benchmarks, according to theirexecution times.

The laser fault injection experiments were performedon the two-photon absorption (TPA) microscope of IESlaser facilities, University of Montpellier. The TPA methodwas preferred to the more classical single-photon approachbecause it provided a better reproducibility of the faultoccurrences in this 28nm DUT with a thick substrate(700μm). The laser wavelength was 1.55μm with a pulseduration of 400fs. The laser beam was focused through thebackside of the DUT by a 100× lens. The DUTwas scannedunder the static beam using motorized stages.

The laser pulse energy was set at 250pJ. This value wasfound in previous work [11] to induce between 0 and 3bit flips per pulse in the region of interest of the DUT,depending on the laser position. Being slightly above theenergy threshold for a single bit flip, this value providessome tolerance on the position of the laser spot alongthe Z-axis (optical axis) during the scan to accommodatewith small focus variations due to thermal dilatation of thesubstrate induced by the chip activity modulation.

Laser pulses were triggered at a constant frequency of10Hz, without any synchronization with the DUT clock.This asynchronous approach was preferred in this workbecause it naturally introduces randomness in the arrivaltime of the laser pulse in the application execution cycle,which allows for statistically covering any vulnerabilitytime window of the application in a reasonable amount ofexperiment time.

The region of interest (ROI) of approximately 200 ×400μm2 was defined to cover the L1 data cache of thecore under test (Fig. 2). This ROI was scanned once foreach application variant with a step of 1μm along the Y -axis. Considering the constant pulse triggering rate, themaximum scanning speed along the X-axis was adjustedto have at least one pulse every μm along X. Due to theacceleration and deceleration phases at the extremities ofeach scan lines, this approach leads to smaller steps along

372 J Electron Test (2019) 35:367–381

Fig. 2 InfraredMicrophotograph of the DUTCore under Test, showing thescanned area (L1 Data Cache)

the X-axis between consecutive pulses near the edges ofthe ROI. However, this approach was preferred to a strict1μm step in order to maintain a constant laser pulse rate,and thus an accurate control of the number of pulses perapplication execution cycle. Indeed, in this work, we aremore interested in the faults time-related statistics than inthe accurate spatial localization of the faults occurrence.

The laser beam was automatically turned off and thescanning motion paused when the DUT needed to be reset.Depending on the number of resets required during eachrun, a scan of the ROI typically took between 40 and 60minutes to complete.

Table 2 presents the details of the laser fault injectionexperiments, applied to each studied application defined at

Table 2 Laser fault injection details on benchmarks

App. Var. Num. of Runs (N) Total Workload [Bytes] Avg. Shots per Exec. Exec. Time [ms]

Newton- Raphson 1 100 800 0.417 41.72

2 100 800 1.302 130.20

3 100 800 3.379 337.91

Simpson 1 100 400 0.9669 96.69

2 100 400 9.4887 948.87

3 100 400 18.9963 1899.63

Trapezoid 1 150 1200 302.8096 30280.96

2 70 560 444.4170 44441.70

3 1 8 546.1586 54615.86

J Electron Test (2019) 35:367–381 373

Section 3.1. The number of runs is the size of the outputvector, i.e., the value of N times an algorithm runs perexecution. The “Total Workload per Run” represents thesize in bytes of the output vector. With that data it ispossible to infer the workload per run (the size of theoutputs) by simply dividing the number of runs N by thetotal workload per run. The “Execution Time” is the timeof a complete execution (N runs). Finally, the last columnpresents the average number of laser shots per executionwhich is calculated dividing the execution time and thetime between laser shots (i.e., the inverse value of the laserfrequency).

During the program execution, data can be stored inone or more memory components of the system, such asRAM and L2 and L1 caches. While at the software levelthis memory management is invisible (unless explicit bythe designer), from the hardware point of view only onereplica is used by the program and influences its execution.Therefore, a fault affecting one of the non-active variablereplicas is masked and does not conduct a program failure.Thus if we consider a system having more than one cachelevel, we can expect to have an increase in reliability,because the probability that a fault affects an active variablewill be lower than in a system with only one cache. Finally,we can state that we considered the worst case in ourexperiments. More details about the impact of faults into amemory hierarchy can be found in [6].

3.3 Emulation Fault Injection on Register File

The register file area of the ARM processor is physicallyvery small in comparison with the cache memory area.Because of that, a laser fault injection on the register fileis impracticable. To perform a register file fault injection,the FPGA layer of the Zynq-7000 APSoC board is usedto implement a fault injector. The fault injection emulationsystem consists of the following modules:

– Injector Module: Intellectual property (IP) designedin hardware description language (VHDL) and imple-mented in the FPGA layer of the Zynq board. It isresponsible for performing the fault injection proce-dure, to be detailed further.

– Power Control: Electrical device in charge of power upthe board in each injection cycle.

– System Controller: Software application running on ahost computer responsible for Power Control manage-ment. It also saves the fault injection logs, which arereceived by serial communication.

Figure 3 presents the experiment setup environment. TheZynq board (DUT) and the Power Control are connectedto a host computer. The host computer is responsible forcontrolling the system and registering experiments logs.

Fig. 3 Onboard register file fault injection setup view

An USB-TTL Converter is responsible for transmittingserial data containing informations about the error and isconnected to the DUT and the host computer. The adoptedmethodology follows the same scheme presented in [9].

The injector module injects bit-flips on the processor’sregister file. The affected ARM registers are the general-purpose ones, from R0 to R12, and the specific ones, whichare the Stack Pointer (SP), Link Register (LR), and ProgramCounter (PC). The faults are injected using an interruptmechanism, that locks the processor and applies a XORmask to the target register, provoking a bit-flip. The targetregister and bit to be flipped are randomly defined. Theinjection time is also randomly determined, being a randompoint between the start and the finish of the execution.It is intended to simulate real scenarios, where the faultcan affect the system any moment during the applicationexecution. Figure 4 presents a procedure flow performedby the injector module. First, the injector is configuredwith all the random injection data defined by the ARMCPU0, as generating random numbers in FPGA logic hasa high complexity. Once configured, the injector countsclock cycles until it reaches the injection time. Next, aninterruption is launched to inject the bit-flip in the processorregister defined in the configuration. At the end of theapplication, the module compares the application resultswith a golden execution output (i.e., with no fault injected)to check for errors.

Contrary to the laser fault injection, the emulation faultinjection is programmed to inject one fault per executionof the algorithm. This would be impossible on laser faultinjections due the frequency of the laser pulse and thedelays of the experimental system, and because some ofthe benchmarks are very fast. The emulation fault injectionarchitecture makes it unnecessary to keep executing thebenchmark on loop to force faults on its execution. As aconsequence, this type of fault injection does not need toproduce a vector of outputs, like is done at the laser fault

374 J Electron Test (2019) 35:367–381

Fig. 4 Register file faultinjection procedure flow [9]

injections. This will have direct impacts on the types oferrors that are studied from each injection methodology, aswill be presented at Section 3.4, and at the characteristics ofthe benchmarks. The details of the emulation fault injectiondiffers from laser ones (presented at Table 2) not onlydue to its different characteristics but also because of thedifferent focus of this methodology. All the interesting dataon this type of injection was presented at Table 1. The mostimportant information about it is the number of registersin use. The execution time is also important because itmay define the probability of an error to be corrected (dueto a higher amount of iterations), but the emulation faultinjection assures that there will be one fault injected perexecution, no matter the execution time.

3.4 Errors

After each algorithm execution, the outputs are comparedwith the golden value to check for its correctness. When theoutput value and the golden value are different, the DUTsends to the host computer a message containing the detailsof the error. This step of the experiments is common to bothfault injections methodologies presented in this work. Thehost computer receives the error messages and saves theminto a log to be further analyzed. The errors are classifiedinto three different types: Hang, Silent Data Corruption(SDC), and Multi-SDC. Those are defined at Table 3. TheMulti-SDC error type is unique for the laser injection. Thatis because, contrary to the laser injection, the onboard faultinjection executes the algorithm only once (so there is nooutput vector, only a single output). At the register file fault

injection, another definition is made for the outcome of aninjected fault. This definition, however, does not concern anerror, but rather the absence of it. When a fault is injectedbut causes no error, it is considered an “Unace”. This namecomes from the fact that the fault probably hit data at theregister file that was not critical at that point of the softwareexecution.

4 Results

The fault tolerance of the benchmarks is evaluated in twoaspects. First, we assess how the number of iterationsimpacts the error susceptibility, i.e., how each variantpresented at Table 1 behaves under fault injection. Variatingthe number of iterations for each benchmark has asignificant impact on fault tolerance. This evaluation ismade with results from both laser and emulation faultinjection. Figures 5, 6 and 8 present the error relativeprobability (per laser pulse) for each benchmark and theirvariants, as presented at Table 2. This probability iscalculated by normalizing the error occurrence values withtheir maximum for each benchmark. The normalizationis needed because the error occurrence depends onthe execution time and the shots per execution ofthe benchmark, and those are very different for eachapplication. Figures 9, 10 and 11 present the erroroccurrence (in percentage) for each type of errors presentedat Section 3.4 after 5000 emulation fault injections (perbenchmark) at the register file. The number of faultinjections performed on each of the experiments was

Table 3 Error type classifications

Type Error

Hang Causes the application to be either stuck in a certain point or to crash.

SDC Output difference between the golden execution and the exposed one.

Multi-SDC Multiple SDC occurrences in the same run, e.g.: multiple positions of the output vector corrupted. Unique to laser fault injection.

J Electron Test (2019) 35:367–381 375

Fig. 5 Simpson error relativeprobability (per laser pulse)

computed using the approach presented in [7] in order toobtain statistically significant results with an error marginof 1% and a confidence level of 95%.

Secondly, we evaluate how tolerating small variances onthe output value can reduce the number of considered SDC-type errors. For that assessment, we compare the outputvalues with the golden value and check how different theyare. So for example, if an application can tolerate an outputvariation of 2%, an output value will only be considerederroneous if it is less than 98% equal to the golden value.This evaluation was made from the results from the laserfault injection. Each benchmark is evaluated and comparedwith others separately in the following subsections.

4.1 Error Susceptibility

4.1.1 Data Cache Fault Injection

The application error relative probability per laser pulse ofthe Simpson benchmark is presented in Fig. 5. As expected,the variants with larger number of iterations are more fault-tolerant. However, more iterations means more latency. AsTable 2 shows, the variant 3 of the Simpson benchmark isalmost 20 times slower than the variant 1, but Fig. 5 showsthe error occurrence does not decrease in the same pace.It means that, for this algorithm, increasing the number ofiterations improves reliability, but the price is high.

Fig. 6 Trapezoid error relativeprobability (per laser pulse)

376 J Electron Test (2019) 35:367–381

Fig. 7 Application error relativeprobability (per laser pulse)calculated for Trapezoid andSimpson together

The Trapezoid rule also shows a significant improvementin reliability for higher numbers of iterations, but it tendsto stabilize, as Fig. 6 shows. This is because the Trapezoidrule converges slower than the Simpson method. For thatsame reason, the number of iterations for each version ofthis benchmark is higher than the other ones (see Table 1).The variants 2 and 3 of the Trapezoid had a very similarresult. It indicates that the benchmark might have an optimalpoint of fault tolerance on around 1200 iterations (accordingto Table 1). Using more iterations than that would add moreexecution time to the algorithm, but have no impact on faulttolerance. Nevertheless, as Table 2 shows, the executiontime difference from the three Trapezoid variants are not asbig as in other benchmarks.

According to Table 2, the execution time of Trapezoidis much longer than Simpson. Given the fact thatboth these benchmarks are applied to solve the sameproblem (calculating an integral), we can draw interestingconclusions on that. Figure 7 presents the values ofthe application error relative probability per laser pulsecalculated and normalized for Simpson and Trapezoidtogether. For that figure, instead of normalizing eachvariant of the algorithm in relation to the first one (toevaluate the error relative probability drop), all of thevariants were normalized concerning the first variant of theSimpson algorithm. That is the variant with the highesterror probability. In that manner, Fig. 7 draws a comparisonof error probability between the two algorithms. It isnoticeable that Trapezoid is much more fault tolerant thanSimpson. This result indicates that having a higher numberof iterations is beneficial for fault tolerance, but someapplications might pay a big price for that. It is also clearthat for this kind of approximate computing algorithm

the drawback for increasing reliability is execution time.When using an approximate computing technique to solvea computation problem, different approaches will providevery different fault tolerances, even if they are similar.

Newton-Raphson presents a behavior similar to Simp-son. Figure 8 indicates that the variant 2 of this benchmarkalready achieves a considerable fault tolerance improve-ment, having a relative probability two orders of magnitudesmaller than the first variant. It is interesting to notice thatthis benchmark is the one with the lower number of iter-ations, as reported in Table 1. What it indicates is thatthe number of iterations alone is not enough to provide afault tolerance estimation. Different from the other two, thisbenchmark is not used to calculate an integral, but the rootsof a function. It also has a very fast convergence, so a highnumber of iterations is not necessary.

Fig. 8 Newton-Raphson error relative probability (per laser pulse)

J Electron Test (2019) 35:367–381 377

Fig. 9 Simpson erroroccurrence for emulationregister file fault injection

4.1.2 Register File Fault Injection

Figure 9 presents the percentage of each error typeoccurrence from the emulation fault injection at the registerfile. The y-axis is presented in log scale to facilitate theview of the data, given that there are significant differencesbetween the occurrences. In that case, contrary to the cachememory evaluation, increasing the number of iterations ofthe algorithm has little to no effects on the distribution oferrors. The high number of unaces shows the faults have thetendency to vanish. As discussed before, faults are expectedto vanish due to the nature of successive approximation.However, it is interesting to see that they did not vanish the

same way the ones injected at the cache memory did. In thatcase, all the variants had the same fault tolerance.

Figure 10 presents the percentage of each error typeoccurrence from the emulation fault injection for theTrapezoid algorithm. For that algorithm, the occurrence ofSDCs dropped while increasing the number of iterations.However, the SDC occurrence for the variant 1 was alreadyvery low. Comparing the results from Figs. 10 and 9 it isclear that the Trapezoid algorithm is much less prone toSDCs than Simpson. That behavior that was already noticedin Fig. 7.

The emulation fault injection results for the Newton-Raphson benchmarks are presented in Fig. 11. Again, the

Fig. 10 Trapezoid erroroccurrence for emulationregister file fault injection

378 J Electron Test (2019) 35:367–381

Fig. 11 Newton-Raphson error occurrence for emulation register filefault injection

variation in the number of iterations did not affect the typeof errors distribution. Hangs are also more frequent thanSDC, which is expected given that those are iteration-basedalgorithms. A significant part of the execution concerns theloop management. Therefore it is a definite critical pointof failure. Still, most of the faults (around 82%) caused noerrors (unace).

It is interesting to see that the fault effects on registerfile and cache memory are very distinct. While the faultsinjected at the cache tend to vanish for a higher numberof iterations, the ones injected at the register file havealmost the same effect no matter the loop size (theexception of the Trapezoid benchmark is noticeable, but thedifference between the variants SDC error occurrence isstill meager). Two facts can explain it: the register usage ofthe benchmarks is low, and the data cache memory usageis crucial. As Table 1 shows, the benchmarks do not useall the registers. However, the fault injection on registerfile is considering all of them when selecting one for therandom injection. Thus the probability of a fault to affecta register being used is not very high. The way registers

Fig. 13 Simpson error occurrence drop in relation to output variationtolerance

are used also affect their criticality: they are continuallybeing overwritten, and so are the faults injected into them.The data cache memory has a higher data latency (i.e., datausually stays untouched longer than at registers), and is alsowhere most of the results are stored (while registers are usednot to store data, but mainly to process it). Therefore, faultsinjected in the data cache have a more significant probabilityto spread to the final output of the application.

4.2 Error Tolerance

Let us resort to the following example from [2] to illustratethe idea presented in this section’s results analysis. Theexample is based on a 32-bit RISC-V core availablein the PULPino open-source microcontroller system. The

Fig. 12 Fault impact measured in PSNR

J Electron Test (2019) 35:367–381 379

running application corresponds to software implementingthe Discrete Cosine Transform (DCT) for a JPEG encoder.The goal of this experiment was to determine the impact ofthe faults on the running application. Figure 12 reports someDCT degradations. Figure 12a shows the golden outputwhile Fig. 12b, c and d show the output affected by a PSNRof 40dB, 30dB and 5dB, respectively. As can be seen, aPSNR of 5dB translates in an unacceptable degradation ofthe output, while for larger values of PSNR, the outputs areviable. Based on this analysis, there is no need to take intoaccount differences in the output caused by faults leading tolarge PSNRs (as they don’t significantly affect the output).Only the “critical faults” (i.e., resulting in crashes and PSNRlower than 5dB) have to be classified as SDC.

Figure 13 presents the results for the error reductionwhen accepting output variations. It is clear that having anoutput variation tolerance of about 2.5% is enough to havea big reduction of error occurrence. Each variant presenteddifferent results on that evaluation, but a general trend isclear. Most of the errors on this application’s outputs aresmall, i.e., the final value does not differ much from theexpected. An approximate computing system, which is ableto tolerate those small errors, may benefit from this outputrelaxation to provide reliability.

Figure 14 presents the output variation tolerance effect onthe number of perceived errors for the Trapezoid algorithm.It has a very different behavior from the other benchmarks.On the worst case, for the variant 2, the occurrence of

Fig. 14 Trapezoid error occurrence drop in relation to output variationtolerance

SDC errors drops more than 25%, but it remains even whenaccepting more significant variations. The other variantspresented a drop in total SDC errors of about 70% whenaccepting up to 2.5% output variation from the golden value.This unexpected Trapezoid behavior can be explained byits already low error occurrence. Because Trapezoid alreadypresented much fewer errors than Simpson, it has fewererrors to tolerate. Thus a more considerable amount of thoseare significant.

The error drop when variating the output value errortolerance for the Newton-Raphson algorithm is shown atFig. 15. It also presents the same trend from Simpsonbenchmark. The variant 3 of Newton-Raphson is the onethat had the lower drop on error count while increasing thetolerance. It means that the errors had a higher differencein relation to the expected output, in other words they were“more erroneous”. The error drop stagnates after around 4%of output variation tolerance.

All the results from the output tolerance variationshow significant error drops. Tolerating small devianceson the expected output of an algorithm is the verydefinition of approximate computing. Those results indicatethat approximate computing not only can be applied tosafety-critical systems, but might even improve their faulttolerance. However, it is important to notice that somesystems may not tolerate even minimal output deviations.Those are not good candidates for any approximatecomputing technique.

Fig. 15 Newton-Raphson error occurrence drop in relation to outputvariation tolerance

380 J Electron Test (2019) 35:367–381

5 Conclusions

After analyzing successive approximation algorithms undertwo different fault injection methodologies, it becomes clearthat it is a promising approach to approximate computing.As expected, faults injected on different parts of the systemhad very distinct effects. Nevertheless, the results indicatethose algorithms present an inherent fault tolerance.

Results show that faults affecting the data cache memorycan be mitigated by increasing the algorithm loop size.However, as the comparison between the Trapezoid andSimpson benchmarks shows, the number of iterations aloneis not enough to assure a method will achieve goodresilience. The algorithm itself has a significant impact onfault tolerance. Faults affecting the register file, however,are normally not mitigated by increasing the number ofiterations of a loop-based algorithms. For all the cases, morethan 82% of the register file faults caused no errors.

All the benchmarks showed a trend of having asignificant drop in the number of SDC errors for smalloutput variation tolerances. It shows that most of the SDCtype errors affecting approximate computing by successiveapproximation algorithms have values not very differentfrom the expected one. In other words, the errors arenot significant. Many applications that use this kind ofalgorithm may tolerate small variations on the outputwithout a problem. For those applications, successiveapproximation arises as the perfect method for approximatecomputing.

The usage of this kind of algorithm on safety-criticalsystems without the addition of external fault tolerancemethods remains questionable. The results presented in thiswork point towards the idea that they are indeed capableof correcting faults, especially if the system can toleratesmall variations on the output. However, not all applicationsare capable of that. Also, the inherent fault tolerance ofsuccessive approximation algorithms may not be enough toprovide sufficient reliability for a safety-critical application.In those cases, approximate computing could be used notonly as a means of providing reliability but also improvingfault tolerance methods, making them more reliable and lesscostly.

References

1. Chapter two: The method of successive approximations. In: Bell-man R, Cooke KL, Lockett JA (eds) Algorithms Graphs and Com-puters. Mathematics in Science and Engineering, vol 62, pp 49–100. Elsevier (1970). https://doi.org/10.1016/S0076-5392(08)61895-0.http://www.sciencedirect.com/science/article/pii/S0076539208618950

2. Anghel L, Benabdenbi M, Bosio A, Traiola M, Vatajelu EI(2018) Test and reliability in approximate computing. Journal ofElectronic Testing. https://doi.org/10.1007/s10836-018-5734-9

3. Baumann RC (2005) Radiation-induced soft errors in advancedsemiconductor technologies. IEEE Trans Device Mater Reliab5(3):305–316. https://doi.org/10.1109/TDMR.2005.853449

4. Buchner SP, Miller F, Pouget V, McMorrow DP (2013) Pulsed-laser testing for single-event effects investigations. IEEE TransNucl Sci 60(3):1852–1875. https://doi.org/10.1109/TNS.2013.2255312

5. Han J, Orshansky M (2013) Approximate computing: an emergingparadigm for energy-efficient design. In: Proc 18th IEEE Euro-pean Test Symposium (ETS), pp 1–6. https://doi.org/10.1109/ETS.2013.6569370

6. Kooli M, Di Natale G, Bosio A (2019) Memory-aware designspace exploration for reliability evaluation in computing systems.Journal of Electronic Testing. https://doi.org/10.1007/s10836-019-05785-0

7. Leveugle R, Calvez A, Maistri P, Vanhauwaert P (2009) Statisticalfault injection: Quantified error and confidence. In: Proc design,Automation Test in Europe Conference Exhibition, pp 502–506.https://doi.org/10.1109/DATE.2009.5090716

8. Nair R (2014) Big data needs approximate computing: technicalperspective. Commun ACM 58(1):104–104. https://doi.org/10.1145/2688072

9. de Oliveira AB, Tambara LA, Kastensmidt FL (2017) ExploringPerformance Overhead Versus Soft Error Detection in LockstepDual-Core ARM Cortex-A9 Processor Embedded into XilinxZynq APSoc, pp 189–201. Springer International Publishing,Cham

10. Pouget V, Douin A, Foucard G, Peronnard P, Lewis D,Fouillat P, Velazco R (2008) Dynamic testing of an sram-based fpga by time-resolved laser fault injection. In: Proc 14thIEEE International On-line Testing Symposium, pp 295–301.https://doi.org/10.1109/IOLTS.2008.39

11. Pouget V, Jonathas S, Job R, Vaille J, Wrobel F, Saigne F(2017) Structural pattern extraction from asynchronous two-photon laser fault injection using spectral analysis, vol 76-77. https://doi.org/10.1016/j.microrel.2017.07.028. http://www.sciencedirect.com/science/article/pii/S0026271417303128

12. Rajesh G, Vinayagasundaram B, Moorthy GS (2014) Data fusionin wireless sensor network using simpson’s 3/8 rule. In: Proc Inter-national Conference on Recent Trends in Information Technology,pp 1–5. https://doi.org/10.1109/ICRTIT.2014.6996201

13. Rodrigues GS, Kastensmidt FL (2017) Evaluating the behavior ofsuccessive approximation algorithms under soft errors. In: Proc18th IEEE Latin American Test Symposium (LATS), pp 1–6.https://doi.org/10.1109/LATW.2017.7906764

14. Rodrigues GS, Kastensmidt FL, Pouget V, Bosio A (2018)Exploring the inherent fault tolerance of successive approximationalgorithms under laser fault injection. In: Proc 19th IEEE Latin-american Test Symposium (LATS), pp 1–6. https://doi.org/10.1109/LATW.2018.8349675

15. Trichina E, Korkikyan R (2010) Multi fault laser attacks onprotected crt-rsa. In: Proc. Workshop on Fault Diagnosis and Tol-erance in Cryptography, pp 75–86. https://doi.org/10.1109/FDTC.2010.14

16. Venkataramani S, Chakradhar ST, Roy K, Raghunathan A (2015)Approximate computing and the quest for computing efficiency.In: Proc 52nd Annual Design Automation Conference, DAC’15. ACM, New York, pp 120:1–120:6. https://doi.org/10.1145/2744769.2751163

17. Vollmer M (2004) Numerical integration of pdes for safetycritical applications implemented by i&c systems. In: Heisel M,Liggesmeyer P, Wittmann S (eds) Computer Safety, Reliability,and Security. Springer, Berlin, pp 269–282

18. Xu Q, Mytkowicz T, Kim NS (2016) Approximate computing:a survey. IEEE Design Test 33(1):8–22. https://doi.org/10.1109/MDAT.2015.2505723

https://doi.org/10.1016/S0076-5392(08)61895-0

http://www.sciencedirect.com/science/article/pii/S0076539208618950

https://doi.org/10.1007/s10836-018-5734-9

https://doi.org/10.1109/TDMR.2005.853449

https://doi.org/10.1109/TNS.2013.2255312

https://doi.org/10.1109/TNS.2013.2255312

https://doi.org/10.1109/ETS.2013.6569370

https://doi.org/10.1109/ETS.2013.6569370

https://doi.org/10.1007/s10836-019-05785-0

https://doi.org/10.1007/s10836-019-05785-0

https://doi.org/10.1109/DATE.2009.5090716

https://doi.org/10.1145/2688072

https://doi.org/10.1145/2688072

https://doi.org/10.1109/IOLTS.2008.39

https://doi.org/10.1016/j.microrel.2017.07.028



https://doi.org/10.1109/ICRTIT.2014.6996201

https://doi.org/10.1109/LATW.2017.7906764



https://doi.org/10.1109/FDTC.2010.14

https://doi.org/10.1109/FDTC.2010.14

https://doi.org/10.1145/2744769.2751163

https://doi.org/10.1145/2744769.2751163

https://doi.org/10.1109/MDAT.2015.2505723

https://doi.org/10.1109/MDAT.2015.2505723

J Electron Test (2019) 35:367–381 381

Publisher’s Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Gennaro S. Rodrigues holds a computer engineering degree fromthe Federal University of Rio Grande do Sul (UFRGS), Brazil(2016). He has also studied “Electronique, Robotique et InformatiqueIndustrielle” at Polytech’Montepellier, France, under the auspices ofthe exchange studies program BRAFITEC, and is currently a Ph.D.student in microelectronics at UFRGS, with a co-direction project withLaboratoire d’Informatique, de Robotique et de Microelectroniquede Montpellier (LIRMM), France. His research efforts focus onfault-tolerance methods, safety-critical systems, and approximatecomputing. Lately, he has been studying the use of approximatecomputing to provide fault tolerance at low costs.

Adria Barros de Oliveira holds a Bachelor degree in TeleinformaticsEngineering from the Federal University of Ceara (2014) withemphasis in Computer Engineering. She did an undergraduatesandwich program in Informatics Engineering at University ofAlicante, in Spain (2013). Currently, she is a PhD student inmicroelectronics at the Federal University of Rio Grande do Sul,where also did her master degree. The research focuses on faulttolerance solutions to protect embedded processors against the soft-errors provoked by the radiation effects. Her main interests includeprocessors reliability, reconfigurable architectures, and fault tolerance.

Fernanda Lima Kastensmidt holds a degree in Electrical Engineeringfrom the Federal University of Rio Grande do Sul (1997), a Master’sDegree in Computer Science from the Federal University of RioGrande do Sul (1999) and a PhD in Computer Science from theFederal University of Rio Grande do Sul (2003). She is currentlyAssociate Professor at the Federal University of Rio Grande doSul and Coordinator of the Graduation Program in Microelectronics(PGMICRO). She has experience in the area of Microelectronics andComputer Engineering, with emphasis on Hardware, working mainlyon the following topics: radiation fault protection techniques, faulttolerant systems design, programmable architecture, FPGA, systemsqualification and integrated circuits under failures and fault modeling.She is author of the book Fault Tolerance Techniques for SRAM-basedFPGAs published in 2006 by the Springer publisher and co-author of3 more scientific books. She participated in the payload project of theNanoSat-BR1 satellite that was launched in June 2014 and is currentlyin the NanoSat-BR2 project where part of the payload is responsiblefor analyzing the effects of SAA on Integrated Circuits manufacturedin nanometric technology.

Vincent Pouget received his PhD in Applied Physics in 2000 fromthe University of Bordeaux, France. He’s currently a research scientistwith the CNRS, the French National Center for Scientific Research,with the Nanoelectronics group at the IMS (Integration from Materialto Systems) Laboratory, where he’s in charge of the ATLAS laserfacility. He’s also the founder of the PULSCAN company. His mainresearch interests include laser applications to the semiconductorindustry and reliability issues in advanced technologies.

Alberto Bosio got his PhD in Computer Engineering in the area ofdigital systems dependability at the Politecnico di Torino (Italy) in2006. He was an Associate Professor at the Universite de Montpellierfrom 2007. From 2018 he is a Full Professor at INL - Ecole Centralede Lyon. He is the co-author of 1 book, 35 international journal papers,3 patents, 7 invited papers, 2 embedded tutorials and more than 100papers in international conferences. He supervised 13 Ph.D. studentsand actively participated to 19 european- and national-funded projectsand research contracts with industrial partners. He served as committeeand organizing member in several international conferences as well asreviewers for many international journals. He is a member of the IEEEand the Chair of the Europeen Test Technical Technology Council(eTTTC).

Assessing the Reliability of Successive Approximate...

Documents

Transcript of Assessing the Reliability of Successive Approximate...