If you fix everything you lose fixes for everything else

If you fix everything youlose fixes for everything else

Tim Menzies (WVU)Jairus Hihn (JPL)

Oussama Elrawas (WVU)Dan Baker (WVU)Karen Lum (JPL)

International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research

Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or

imply its endorsement by the United States Government.

[email protected]@mix.wvu.edu

2

What does this mean?

Q: for what models does (a few peeks) = (many hard stares)?

A supposedly np-hard task

abduction over first-order theories

nogood/2

3

A: models with “collars”

Grow– Monte Carlo a model

Picking input settings at random

– For each run Score each output Add score to each input

settings Harvest

– Rule generation experiments, favoring settings with better

scores If “collars”, then

– … small rules … – … learned quickly …– … will suffice

“Collar” variables set the other variables

– Narrows Amarel in the 60s

– Minimal environments DeKleer ’85

– Master variables Crawford & Baker ‘94

– Feature subset selection Kohavi & John ‘97

– Back doors Williams et al ‘03

– Etc Implications for uncertainty?

Feather & Menzies RE’02

4

STAR: collars + simulated annealing on Boehm’s USC’s software process models

USC software process models for effort, defects, threats– y[i] = impact[i] * project[i] + b[i] for i {1,2,3,…} ≤ project[i] ≤ : uncertainty in project description ≤ impact[i] ≤ : uncertainty in model calibration

Random solution – pick project[i] and impact[i] from any .. , .. .. set via domain knowledge;

e.g. process maturity in 3 to 5– range of .. known from history;

Score solution by effort (Ef),

defects (De) and Threat (Th)

For example

uncontrollable

controllable

5

Two studies y[i] = impact[i] * project[i] + b[i]

Certain methods– Using much historical data– Learn the magnitude of the

impact[i] relationship– With fixed impact[I]

Monte Carlo at andom across the project[i] settings

E.g.– Regression-based tools that

learn impact[I] from historical records

– 93 records of JPL systems– SCAT:

JPL’s current methods– 2CEE:

WVU’s improvement over SCAT (currently under test)

Methods with more uncertainty– Using no historical data– Monte Carlo at random across

the project[i] settings and impact[i] settings

E.g. – STAR– Monte Carlo a model– Score each output– Sort settings by their “C”,

“C”= cumulative score– Rule generation experiments,

favoring settings with better “C”.

Tame uncontroll-ables via historical records

one two

6

for setting Sx { value[setting] += E }

Sort all settings by their value– Ignore uncontrollables impact[I]– Assume the top

(1 ≤ i ≤ max) project[I] settings– Randomly select the rest

“Policy point” :– smallest I with lowest E

Median = 50% percentile– Spread = (75-50)% percentile

Bad

Good

22 good ideas

38 not-so- good ideas

Inside STAR

1. sampling - simulated annealing2. summarizing - post-processor

7

SCAT vs 2CEE vs STAR project[i]

8


Control impact[I] via historical data

9


Stagger around superset of possible impact[I]


10

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%




11

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%




12

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%


Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread



OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread




13

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread


Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%


Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread



OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread




14

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread


Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%


Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread



OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread




Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

15

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread


Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%


Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread



OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread




Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

16

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread


Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%


Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread



OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread




If you fix everything, you lose fixes for everything elseIf you fix everything, you lose fixes for everything elseIgnoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

Luke, trust the force, I mean, collars

IEEE Computer, Jan 2007“The strangest thing about software”

Extra Material

19

Related work

Feather, DDP, treatment learning– Optimization of

requirement models

XEROC PARC, 1980s, qualitative representations (QR)

– not overly-specific, – Quickly collected in a new

domain. – Used for model diagnosis

and repair – Can found creative solutions in

larger space of possible qualitative behaviors,

than in the tighter space of precise quantitative behaviors

Abduction : – World W = minimal set of

assumptions (w.r.t. size) such that T A => G Not(T U A => error)

– Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,…

– Theoretically slow (NP-hard) but this should be practical:

Abduction + stochastic sampling Find collars Learn constraints on collars

20

Possible optimizations (not used here)

STAR, an example of a general process:

– Stochastic sampling– Sort settings by “value”– Rule generation experiments

favoring highly “value”-ed settings See also, elite sampling in the

cross-entropy method

If SA convergence too slow– Try moving back select into the SA; – Constrain solution mutation to

prefer highly “value”-ed settings

BORE (best or rest)– n runs– Best= top 10% scores– Rest = remaining 90%– {a,b} = frequency of

discretized range in {best, rest – Sort settings by

-1 * (a/n)2 / (a/n + b/n)

Other valuable tricks: – Incremental discretization:

Gama&Pinto’s PID + Fayyad&Irani

– Limited discrepancy search: Harvey&Ginsberg

– Treatment learning: Menzies&Yu

Askme why,off-line

“Uncertainty helps

planning”

(questions? comments?)

22

At the “policy point”,STAR’s random solutionsare surprisingly accurateLC : learn impact[i] via regression (JPL data)STAR: no tuning, randomly pick impact[i]

Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual

{ “” “”} same at {95, 99}% confidence (MWU)

Why so little Diff (median= 75%)?– Most influential inputs tightly constrained

diff same

diff diff

same same

diff diff

same same

∑ mre(lc) / ∑ mre(star) strategic tactical

ground 66% 63%

all 91% 75%

OSP2 99% 125%

OSP 112% 111%

flight 101% 121%

23

(Model uncertainty = collars) << inputs

In many models, a few “collar” variables set the other variables – Narrows (Amarel in the 60s)– Minimal environments (DeKleer ’85)– Master variables (Crawford & Baker ‘94)– Feature subset selection (Kohavi & John ‘97)– Back doors (Williams et al ‘03)– See “The Strangest Thing About Software (IEEE Computer, Jan’07)”

Collars appear in all execution traces (by definition)– You don’t have to find the collars, they’ll find you

So, to handle uncertainty– Write a simulator– Stagger over uncertainties– From stagger, find collars– Constrain collars

This talk: a very simple example of this process

24

Comparisons

Standard software process modeling– Models written more than run (PROSIM community)

Limited sensitivity analysis Limited trade space

– Or, expensive, error-prone, incomplete data collection programs

Point solutions Here:

– No data collection– Found stable conclusions

within a space of possibilities– Search : very simple– Solution, not brittle

With trade-off space

22 good ideas, sorted

25

Summary Living with uncertainty

– Sometimes, simpler than you may think

– more useful than you might think

Simple:– Here, the smallest change

to simulating annealing

Useful:– Sometimes uncertainty can

teach you more than certainty– If you fix everything, you lose

fixes to everything else

Collars control certainty– Uncertainty plus constrained

collars more certainty– Also, can drive model to

better performance

An example youcan explain to

any business user

Bad

Good

22 good ideas, sorted

An example youcan explain to

any business user

If you fix everything you lose fixes for everything else

Documents

Transcript of If you fix everything you lose fixes for everything else