Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L....

30
Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA {zima,mjames}@jpl.nasa.gov High Performance Embedded Computing (HPEC) Workshop MIT Lincoln Laboratory, September 23 rd , 2009

Transcript of Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L....

Page 1: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Runtime Verification and Validation

for

Multi-Core Based On-Board Computing

Hans P. Zima and Mark L. James

Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA{zima,mjames}@jpl.nasa.gov

High Performance Embedded Computing (HPEC)Workshop

MIT Lincoln Laboratory, September 23rd, 2009

Page 2: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

Contents

The work described in this presentation has been funded by the JPL Research and Technology (R&TD) project 01STCR R.08.023.015 “Introspection Framework for Fault Tolerance in Support of Autonomous Space Systems”

Page 3: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Neptune Triton Explorer

Europa Astrobiology Laboratory

Titan Explorer Europa

Mars Sample Return

Explorer

NASA/JPL: Potential Future MissionsArtist ConceptArtist Concept

Page 4: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

New Requirements

Future missions and the limited downlink toFuture missions and the limited downlink to

Earth lead to two major new requirements:Earth lead to two major new requirements:

1. Autonomy

2. High-Capability On-Board Computing

1. Autonomy

2. High-Capability On-Board Computing Missions will require on-board computational power ranging from tens of Gigaflops to Teraflops.

Emerging multi-core technology is expected to provide this capability

Missions will require on-board computational power ranging from tens of Gigaflops to Teraflops.

Emerging multi-core technology is expected to provide this capability

Page 5: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Future Multi-Core Architectures:From 10s to 100s of Processors on a Chip

Tile64 (Tilera Corporation, 2007)Tile64 (Tilera Corporation, 2007) 64 identical cores, arranged in an 8X8 grid

iMesh on-chip network, 27 Tb/sec bandwidth

170-300mW per core; 600 MHz – 1 GHz

192 GOPS (32 bit)—about 10 GOPs/Watt

Maestro (2010/11)Maestro (2010/11) RHBD 7x7 grid SW-compatible version of Tile64 with FP

270mW per core; 480 MHz; 70 GOPs; max 28W

Kilocore 1025Kilocore 1025 (Rapport Inc. and IBM, 2008) (Rapport Inc. and IBM, 2008) Power PC and 1024 8-bit processing elements (125MHz)

32X32 “stripes” dedicated to different tasks

80-core research chip from Intel (2011)80-core research chip from Intel (2011) 2D on-chip mesh network for message passing

1.01 TF (3.16 GHz); 62W power—16 GOPS/Watt

Tile64 (Tilera Corporation, 2007)Tile64 (Tilera Corporation, 2007) 64 identical cores, arranged in an 8X8 grid

iMesh on-chip network, 27 Tb/sec bandwidth

170-300mW per core; 600 MHz – 1 GHz

192 GOPS (32 bit)—about 10 GOPs/Watt

Maestro (2010/11)Maestro (2010/11) RHBD 7x7 grid SW-compatible version of Tile64 with FP

270mW per core; 480 MHz; 70 GOPs; max 28W

Kilocore 1025Kilocore 1025 (Rapport Inc. and IBM, 2008) (Rapport Inc. and IBM, 2008) Power PC and 1024 8-bit processing elements (125MHz)

32X32 “stripes” dedicated to different tasks

80-core research chip from Intel (2011)80-core research chip from Intel (2011) 2D on-chip mesh network for message passing

1.01 TF (3.16 GHz); 62W power—16 GOPS/Watt

The issue of dependability, and in particular fault tolerance, has to be addressed in this new contextThe issue of dependability, and in particular fault tolerance, has to be addressed in this new context

Page 6: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Dependability*

*A. Avizienis, J.-C.Laprie, B.Randell: Fundamental Concepts of Dependability. UCLA CSD Report 010028, 2000

Means

Dependability Attributes

Threats

Faults

Errors

Failures

Availability

Reliability

Safety

Integrity

Maintainability

Fault Prevention

Fault Removal

Fault Tolerance

The ability of a computing system to deliver service that can be justifiably trusted

(Fault Protection)

Page 7: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Threats: the Fault-Error-Failure Chain

Fault

defect in a system

Error Error

invalid system state

Failure

violation of system specification

activation

propagation propagation to service boundary

external fault(caused by external failure)

System Boundary (Service Interface)

Page 8: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Means (Fault Protection)

Fault Prevention: Fault Prevention: via via quality control quality control during design and manufacturing during design and manufacturing of hardware and softwareof hardware and software structured programming, modularization, information hiding; firewalls

shielding and radiation hardening

Fault Removal: Fault Removal: Verification and Validation (V&V),model checking, etc.Verification and Validation (V&V),model checking, etc.

Fault Tolerance: the ability to preserve the delivery of Fault Tolerance: the ability to preserve the delivery of correct service (system specification) in the presence of correct service (system specification) in the presence of active faultsactive faults error detection

recovery: error handling and fault handling

fault masking: redundancy-based recovery without explicit error detection (e.g., TMR)

Fault Prevention: Fault Prevention: via via quality control quality control during design and manufacturing during design and manufacturing of hardware and softwareof hardware and software structured programming, modularization, information hiding; firewalls

shielding and radiation hardening

Fault Removal: Fault Removal: Verification and Validation (V&V),model checking, etc.Verification and Validation (V&V),model checking, etc.

Fault Tolerance: the ability to preserve the delivery of Fault Tolerance: the ability to preserve the delivery of correct service (system specification) in the presence of correct service (system specification) in the presence of active faultsactive faults error detection

recovery: error handling and fault handling

fault masking: redundancy-based recovery without explicit error detection (e.g., TMR)

Page 9: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Fault Tolerance

Fault Error Failureactivation

propagation to service boundary

fault tolerance

well-definedsystem state

Xelimination of detected errors

prevention of fault activations

Page 10: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

Contents

Page 11: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Introspection…Introspection… provides provides dynamicdynamic monitoring, analysis, and feedback, monitoring, analysis, and feedback,

enabling system to become self-aware and context-aware: enabling system to become self-aware and context-aware: monitoring execution behavior

reasoning about its internal state

changing the system or system state when necessary

exploits adaptively the threads available in a multi-core exploits adaptively the threads available in a multi-core systemsystem

can be applied to a range of different scenarios, including:can be applied to a range of different scenarios, including: fault tolerance

performance tuning

power management

behavior analysis

Introspection…Introspection… provides provides dynamicdynamic monitoring, analysis, and feedback, monitoring, analysis, and feedback,

enabling system to become self-aware and context-aware: enabling system to become self-aware and context-aware: monitoring execution behavior

reasoning about its internal state

changing the system or system state when necessary

exploits adaptively the threads available in a multi-core exploits adaptively the threads available in a multi-core systemsystem

can be applied to a range of different scenarios, including:can be applied to a range of different scenarios, including: fault tolerance

performance tuning

power management

behavior analysis

A Framework for Introspection

Page 12: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

An Introspection Module

Application

Introspection System sensors

actuators

.

.

.

.

.

.

Inference Engine(SHINE)

Monitoring

Analysis

Recovery

Prognostics

KnowledgeBase

System Knowledge

Application Knowledge

Domain Knowledge

Page 13: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Example: SHINE Diagnostics and Prognostics Architecture for DSN Health Management

BEAM Frame Generator

Time Variant Slot Filler

FDIR Front-End, DSN Processor

Stability State Estimate

Stability Limit Estimate

Inter-Channel Relationships

Model-based State Diagnosis

Fault Identification

Prognostics

State Summarizer

KB KB

KB

Stability Limit Checker

GUI GUI

Eng

lish-

Bas

ed

Dia

gnos

is

Blo

ck

Dia

gram

s

Planner GUIF

ailu

re

Pre

dict

ions

KB

GUI

Cha

nnel

Dia

gnos

is

GUI

BEAM Specific Displays

System-Wide Health Assessment(SHINE)

Diagnostics(SHINE)

Planning

Reports &

LogsFiles

Complex, Station and Subsystem Level Health Summaries

Station Monitor Data

FDIR Data Handler

Channel Level Diagnostics

DSSC Health Assessor (Heartbeats)

Failing Channel Contexts

Prognostics

GUI

Page 14: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

Contents

Page 15: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Traditional V&V versus Introspection Key Differences

Verification and Validation (V&V) Verification and Validation (V&V) is applied before actual program execution

to static source program in the context of test executions for particular input set(s)

focuses on design errors

IntrospectionIntrospection performs performs execution time execution time monitoring, analysis, recoverymonitoring, analysis, recovery

current approach focuses on transient errorscurrent approach focuses on transient errors

can, in principle, also address design errors can, in principle, also address design errors

Verification and Validation (V&V) Verification and Validation (V&V) is applied before actual program execution

to static source program in the context of test executions for particular input set(s)

focuses on design errors

IntrospectionIntrospection performs performs execution time execution time monitoring, analysis, recoverymonitoring, analysis, recovery

current approach focuses on transient errorscurrent approach focuses on transient errors

can, in principle, also address design errors can, in principle, also address design errors

Page 16: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Limitations of V&V

Verification: theoretical limitsVerification: theoretical limits undecidability : many problems are inherently unsolvable

halting problem constant propagation

NP-completeness: many theoretically solvable problems are intractable due to exponential complexity

SAT problem many graph problems

Model checking: subject to scalability challenge Model checking: subject to scalability challenge exponential growth of state space

TestTest tests can prove presence or absence of faults for specific input sets

but they cannot prove their absence for all inputs (Edsger Dijkstra)

V&V is V&V is inherently inherently unableunable to deal with transient errors or to deal with transient errors or execution anomaliesexecution anomalies

Verification: theoretical limitsVerification: theoretical limits undecidability : many problems are inherently unsolvable

halting problem constant propagation

NP-completeness: many theoretically solvable problems are intractable due to exponential complexity

SAT problem many graph problems

Model checking: subject to scalability challenge Model checking: subject to scalability challenge exponential growth of state space

TestTest tests can prove presence or absence of faults for specific input sets

but they cannot prove their absence for all inputs (Edsger Dijkstra)

V&V is V&V is inherently inherently unableunable to deal with transient errors or to deal with transient errors or execution anomaliesexecution anomalies

Page 17: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Introspection Can Complement V&V

Introspection performs Introspection performs execution time execution time monitoring, monitoring, analysis, recoveryanalysis, recovery

Introspection can deal with transient errors, execution Introspection can deal with transient errors, execution anomalies, performance problemsanomalies, performance problems this capability is inherently beyond the scope of V&V technologythis capability is inherently beyond the scope of V&V technology

and it can be extended to deal with design errorsand it can be extended to deal with design errors

Future Goal: integration of introspection with current V&V Future Goal: integration of introspection with current V&V technology into a comprehensive V&V schemetechnology into a comprehensive V&V scheme

Introspection performs Introspection performs execution time execution time monitoring, monitoring, analysis, recoveryanalysis, recovery

Introspection can deal with transient errors, execution Introspection can deal with transient errors, execution anomalies, performance problemsanomalies, performance problems this capability is inherently beyond the scope of V&V technologythis capability is inherently beyond the scope of V&V technology

and it can be extended to deal with design errorsand it can be extended to deal with design errors

Future Goal: integration of introspection with current V&V Future Goal: integration of introspection with current V&V technology into a comprehensive V&V schemetechnology into a comprehensive V&V scheme

Page 18: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

1.1. IntroductionIntroduction

2.2. An Introspection Framework for Fault ToleranceAn Introspection Framework for Fault Tolerance

3.3. V&V versus IntrospectionV&V versus Introspection

4.4. Automatic Generation of Fault-Tolerant CodeAutomatic Generation of Fault-Tolerant Code

5.5. Concluding RemarksConcluding Remarks

Contents

Page 19: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Effects of Single Event Upsets (SEUs)

SEUs and MBUs are radiation-induced transient hardware SEUs and MBUs are radiation-induced transient hardware errors, which may corrupt software in multiple ways:errors, which may corrupt software in multiple ways: instruction codes and addresses

user data structures

synchronization objects

protected OS data structures

synchronization and communication

Potential effects include:Potential effects include: wrong or illegal instruction codes and addresses

wrong user data in registers, cache, or DRAM

control flow errors

unwarranted exceptions

hangs and crashes

synchronization and communication faults

SEUs and MBUs are radiation-induced transient hardware SEUs and MBUs are radiation-induced transient hardware errors, which may corrupt software in multiple ways:errors, which may corrupt software in multiple ways: instruction codes and addresses

user data structures

synchronization objects

protected OS data structures

synchronization and communication

Potential effects include:Potential effects include: wrong or illegal instruction codes and addresses

wrong user data in registers, cache, or DRAM

control flow errors

unwarranted exceptions

hangs and crashes

synchronization and communication faults

Page 20: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Design and Interaction Faults …

…can result in errors similar to those that can be caused by transient faults

…can result in errors similar to those that can be caused by transient faults

Page 21: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Fault Examples: Sequential threads

S1: a= exp …

S2: x=f(a)

fault fault

S1: a= exp …

S2: x=f(a)

X undefined assignment to x

for i in 1:n { S(i)) }

corruption ofassignment to variable a

fault fault

corruption ofvariable n

destruction of loop rangefor i in ?:? { S(i)) }

if cond then S1 else S2end if

fault fault

corruption ofcondition cond

if ??? then ? else ?end if

branch to wrong orundefined target

X

X

Page 22: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Example: Parallel Algorithms Safety Violation

repeat forever …wait(mutex)

signal(mutex) …end repeat

repeat forever …wait(mutex)

signal(mutex) …end repeat

Resource R

Critical Section 1: Exclusive access of

P1 to R

Critical Section 2: Exclusive access of

P2 to R

P1 P2mutex mutex

Corruption of mutex destroys safety property of program

mutex

fau

ltfa

ult

XX

P1: P2:

Page 23: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Example: Parallel Algorithms Data Race

forall i in D, independent , A(f(i)) = exp(i) end

A(1)…

A(j)…A(n)

data race

A(f(i1))A(f(i1))

A(f(i2))A(f(i2))

f(i1)=f(i2)=jf(i1)=f(i2)=j

Fault resulting in theexistence of indicesi1, i2, such that:

Fault resulting in theexistence of indicesi1, i2, such that:

Page 24: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Example: Parallel Algorithms Deadlock

P0P0

P1P1 Pn-1Pn-1

R1R1

R1R1

RnRn

……

Cyclic resource allocation graph results in deadlock(as a result of programming or runtime error)

Static approach/analysis: deadlock preventionRuntime analysis: deadlock avoidance and detection

Page 25: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

High-Performance Computing has produced a wealth of High-Performance Computing has produced a wealth of methods for parallel program analysis since the 1990smethods for parallel program analysis since the 1990s source program static analysis and restructuring dynamic performance and behavior analysis of program executions

Many of these methods can be adapted to or directly Many of these methods can be adapted to or directly applied to fault toleranceapplied to fault tolerance static program analysis (variable use, dependences, call chains) dynamic analysis of program and data flow

This provides a basis for the generation of error-This provides a basis for the generation of error-checking assertions and error correction and recovery checking assertions and error correction and recovery

High-Performance Computing has produced a wealth of High-Performance Computing has produced a wealth of methods for parallel program analysis since the 1990smethods for parallel program analysis since the 1990s source program static analysis and restructuring dynamic performance and behavior analysis of program executions

Many of these methods can be adapted to or directly Many of these methods can be adapted to or directly applied to fault toleranceapplied to fault tolerance static program analysis (variable use, dependences, call chains) dynamic analysis of program and data flow

This provides a basis for the generation of error-This provides a basis for the generation of error-checking assertions and error correction and recovery checking assertions and error correction and recovery

Leveraging Results from HPC ProgramAnalysis Technology for Fault Tolerance

Page 26: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Analysis Static analysis and profiling determine properties of Static analysis and profiling determine properties of

dynamic program behavior dynamic program behavior beforebefore actual execution actual execution

Analysis of Sequential ThreadsAnalysis of Sequential Threads control flow graph: represents the control structure in a program unit

data flow analysis: solves data flow problems over a program graph

dependence analysis: determines relationships between assignments of values to (possibly subscripted, or pointer) variables and their uses

program slice: the set of all statements that can affect a variable’s value

call graph: representing method/function/procedure calling relationships

Analysis of Parallel ConstructsAnalysis of Parallel Constructs data parallel loops: analysis of “independence” property

locality and communication analysis

race condition analysis

safety and liveness analysis

deadlock analysis

Static analysis and profiling determine properties of Static analysis and profiling determine properties of dynamic program behavior dynamic program behavior beforebefore actual execution actual execution

Analysis of Sequential ThreadsAnalysis of Sequential Threads control flow graph: represents the control structure in a program unit

data flow analysis: solves data flow problems over a program graph

dependence analysis: determines relationships between assignments of values to (possibly subscripted, or pointer) variables and their uses

program slice: the set of all statements that can affect a variable’s value

call graph: representing method/function/procedure calling relationships

Analysis of Parallel ConstructsAnalysis of Parallel Constructs data parallel loops: analysis of “independence” property

locality and communication analysis

race condition analysis

safety and liveness analysis

deadlock analysis

Page 27: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Control Flow and Data Flow Analysis

Control Flow Graph Control Flow Graph G=(N, E, nG=(N, E, n00)) models the control structure in a

program unit (control flow analysis)

N : set of basic blocks

E: control transfers between basic blocks

n0 : initial node

Monotone Data Flow Systems Monotone Data Flow Systems M=(L, F, G, g)M=(L, F, G, g) L is a bounded semilattice representing

the objects of interest, e.g., “definitions” reaching a basic block, variable-value associations, available expressions

F: monotone function space over L

G=(N,E, n0) flow graph; g: N F

under appropriate conditions, an optimal “meet over all paths (MOP)” solution can be determined by a general algorithm

Control Flow Graph Control Flow Graph G=(N, E, nG=(N, E, n00)) models the control structure in a

program unit (control flow analysis)

N : set of basic blocks

E: control transfers between basic blocks

n0 : initial node

Monotone Data Flow Systems Monotone Data Flow Systems M=(L, F, G, g)M=(L, F, G, g) L is a bounded semilattice representing

the objects of interest, e.g., “definitions” reaching a basic block, variable-value associations, available expressions

F: monotone function space over L

G=(N,E, n0) flow graph; g: N F

under appropriate conditions, an optimal “meet over all paths (MOP)” solution can be determined by a general algorithm

1

2

3 4

5

6

n0

N={1,2,3,4,5,6}E={(1,2),(2,3), (2,4),(3,5),(4,5),(5,6),(6,2)}L: P(DEFS) powerset of all “definitions”g(n) transformation function for n N

g(1)

g(2)

g(4)g(3)

g(5)

g(6)

For example, the transformation along path 12356 can be computed asg(6) o g(5) o g(3) o g(2) o g(1)(L0)

Page 28: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Dependence Dependence is a relation in a set of statement executions is a relation in a set of statement executions that characterizes their access (R/W) to common variablesthat characterizes their access (R/W) to common variables

In general, dependence analysis may In general, dependence analysis may not not be statically be statically feasible—for example, in a 2D Euler solver performing a feasible—for example, in a 2D Euler solver performing a sweep over an irregular gridsweep over an irregular grid

“ “Optimistic parallelization” (or the correctness of an Optimistic parallelization” (or the correctness of an independentindependent assertion) may have to be dynamically verified assertion) may have to be dynamically verified

Dependence Dependence is a relation in a set of statement executions is a relation in a set of statement executions that characterizes their access (R/W) to common variablesthat characterizes their access (R/W) to common variables

In general, dependence analysis may In general, dependence analysis may not not be statically be statically feasible—for example, in a 2D Euler solver performing a feasible—for example, in a 2D Euler solver performing a sweep over an irregular gridsweep over an irregular grid

“ “Optimistic parallelization” (or the correctness of an Optimistic parallelization” (or the correctness of an independentindependent assertion) may have to be dynamically verified assertion) may have to be dynamically verified

Dependence Analysis

for e in all_edges do [ independent ] … delta=f(GRID(vx1(e)).V1, GRID(vx2(e)).V1) … GRID(vx1(e)).V2 -= delta GRID(vx2(e)).V2 += delta …end

…… e

GRID(vx1(e))

GRID(vx2(e))

Page 29: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Generation of Assertion-Based Fault Tolerance

P

source program

instrumented source programwith error detectors

P*

instrumentationerror detector generation

program analysis profiling

Assertions

User

Knowledge Base

assert (A(i)>B(i)) and (D<Epsilon) at L; error (FT1,i,A(i),B(i),D,Epsilon)

assert (x < y+1000) invariant in (Loop1); error (FT2,…)

assert independence in (ParLoop); error (FT3,…)

Examples:

Page 30: Runtime Verification and Validation for Multi-Core Based On-Board Computing Hans P. Zima and Mark L. James Jet Propulsion Laboratory, California Institute.

Future deep-space missions will require on-board high-Future deep-space missions will require on-board high-capability computing for support of autonomy and sciencecapability computing for support of autonomy and science

IntrospectionIntrospection provides a generic framework for dynamic monitoring and analysis of provides a generic framework for dynamic monitoring and analysis of

program executionprogram execution

can exploit multi-core technologycan exploit multi-core technology

a prototype framework for introspection supporting fault tolerance has a prototype framework for introspection supporting fault tolerance has been implemented for the Tile64 architecturebeen implemented for the Tile64 architecture

Future work will address automatic analysis and generation of Future work will address automatic analysis and generation of application-adaptive fault-tolerant code application-adaptive fault-tolerant code

Integration of introspection with conventional V&V technology adds a new dimension to fault tolerance

Future deep-space missions will require on-board high-Future deep-space missions will require on-board high-capability computing for support of autonomy and sciencecapability computing for support of autonomy and science

IntrospectionIntrospection provides a generic framework for dynamic monitoring and analysis of provides a generic framework for dynamic monitoring and analysis of

program executionprogram execution

can exploit multi-core technologycan exploit multi-core technology

a prototype framework for introspection supporting fault tolerance has a prototype framework for introspection supporting fault tolerance has been implemented for the Tile64 architecturebeen implemented for the Tile64 architecture

Future work will address automatic analysis and generation of Future work will address automatic analysis and generation of application-adaptive fault-tolerant code application-adaptive fault-tolerant code

Integration of introspection with conventional V&V technology adds a new dimension to fault tolerance

Concluding Remarks

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise,does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, CaliforniaInstitute of Technology

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise,does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, CaliforniaInstitute of Technology