If you fix everything you lose fixes for everything else

25
If you fix everything you lose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Dan Baker (WVU) Karen Lum (JPL) International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007 QuickTime™ and a TIFF (Uncompressed) decompress are needed to see this pictu This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government. [email protected] [email protected]

description

- PowerPoint PPT Presentation

Transcript of If you fix everything you lose fixes for everything else

Page 1: If you fix everything you lose fixes for everything else

If you fix everything youlose fixes for everything else

Tim Menzies (WVU)Jairus Hihn (JPL)

Oussama Elrawas (WVU)Dan Baker (WVU)Karen Lum (JPL)

International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research

Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or

imply its endorsement by the United States Government.

[email protected]@mix.wvu.edu

Page 2: If you fix everything you lose fixes for everything else

2

What does this mean?

Q: for what models does (a few peeks) = (many hard stares)?

A supposedly np-hard task

abduction over first-order theories

nogood/2

Page 3: If you fix everything you lose fixes for everything else

3

A: models with “collars”

Grow– Monte Carlo a model

Picking input settings at random

– For each run Score each output Add score to each input

settings Harvest

– Rule generation experiments, favoring settings with better

scores If “collars”, then

– … small rules … – … learned quickly …– … will suffice

“Collar” variables set the other variables

– Narrows Amarel in the 60s

– Minimal environments DeKleer ’85

– Master variables Crawford & Baker ‘94

– Feature subset selection Kohavi & John ‘97

– Back doors Williams et al ‘03

– Etc Implications for uncertainty?

Feather & Menzies RE’02

Page 4: If you fix everything you lose fixes for everything else

4

STAR: collars + simulated annealing on Boehm’s USC’s software process models

USC software process models for effort, defects, threats– y[i] = impact[i] * project[i] + b[i] for i {1,2,3,…} ≤ project[i] ≤ : uncertainty in project description ≤ impact[i] ≤ : uncertainty in model calibration

Random solution – pick project[i] and impact[i] from any .. , .. .. set via domain knowledge;

e.g. process maturity in 3 to 5– range of .. known from history;

Score solution by effort (Ef),

defects (De) and Threat (Th)

For example

uncontrollable

controllable

Page 5: If you fix everything you lose fixes for everything else

5

Two studies y[i] = impact[i] * project[i] + b[i]

Certain methods– Using much historical data– Learn the magnitude of the

impact[i] relationship– With fixed impact[I]

Monte Carlo at andom across the project[i] settings

E.g.– Regression-based tools that

learn impact[I] from historical records

– 93 records of JPL systems– SCAT:

JPL’s current methods– 2CEE:

WVU’s improvement over SCAT (currently under test)

Methods with more uncertainty– Using no historical data– Monte Carlo at random across

the project[i] settings and impact[i] settings

E.g. – STAR– Monte Carlo a model– Score each output– Sort settings by their “C”,

“C”= cumulative score– Rule generation experiments,

favoring settings with better “C”.

Tame uncontroll-ables via historical records

one two

Page 6: If you fix everything you lose fixes for everything else

6

for setting Sx { value[setting] += E }

Sort all settings by their value– Ignore uncontrollables impact[I]– Assume the top

(1 ≤ i ≤ max) project[I] settings– Randomly select the rest

“Policy point” :– smallest I with lowest E

Median = 50% percentile– Spread = (75-50)% percentile

Bad

Good

22 good ideas

38 not-so- good ideas

Inside STAR

1. sampling - simulated annealing2. summarizing - post-processor

Page 7: If you fix everything you lose fixes for everything else

7

SCAT vs 2CEE vs STAR project[i]

Page 8: If you fix everything you lose fixes for everything else

8

SCAT vs 2CEE vs STAR project[i]

Control impact[I] via historical data

Page 9: If you fix everything you lose fixes for everything else

9

SCAT vs 2CEE vs STAR project[i]

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Page 10: If you fix everything you lose fixes for everything else

10

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

SCAT vs 2CEE vs STAR project[i]

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Page 11: If you fix everything you lose fixes for everything else

11

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

SCAT vs 2CEE vs STAR project[i]

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Page 12: If you fix everything you lose fixes for everything else

12

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%

SCAT vs 2CEE vs STAR project[i]

OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Page 13: If you fix everything you lose fixes for everything else

13

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%

SCAT vs 2CEE vs STAR project[i]

OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Page 14: If you fix everything you lose fixes for everything else

14

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%

SCAT vs 2CEE vs STAR project[i]

OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

Page 15: If you fix everything you lose fixes for everything else

15

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%

SCAT vs 2CEE vs STAR project[i]

OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%

Stagger around superset of possible impact[I]

Control impact[I] via historical data

Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

Page 16: If you fix everything you lose fixes for everything else

16

OSP (effort)

0

500

1000

1500

2000

2500

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%

Flight (effort)

0200400600800

1000120014001600

SCAT median

spread

2CEE median

spread

STAR median

spread

Median: 50% point

Spread : (75 - 50)%

Median: 50% point

Spread : (75 - 50)%

STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%

Ground (effort)

0100200300400500600700800

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%

SCAT vs 2CEE vs STAR project[i]

OSP2 (effort)

050

100150200250300350400450

SCAT median

spread

2CEE median

spread

STAR median

spread

STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%

Stagger around superset of possible impact[I]

Control impact[I] via historical data

If you fix everything, you lose fixes for everything elseIf you fix everything, you lose fixes for everything elseIgnoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)

Page 17: If you fix everything you lose fixes for everything else

Luke, trust the force, I mean, collars

IEEE Computer, Jan 2007“The strangest thing about software”

Page 18: If you fix everything you lose fixes for everything else

Extra Material

Page 19: If you fix everything you lose fixes for everything else

19

Related work

Feather, DDP, treatment learning– Optimization of

requirement models

XEROC PARC, 1980s, qualitative representations (QR)

– not overly-specific, – Quickly collected in a new

domain. – Used for model diagnosis

and repair – Can found creative solutions in

larger space of possible qualitative behaviors,

than in the tighter space of precise quantitative behaviors

Abduction : – World W = minimal set of

assumptions (w.r.t. size) such that T A => G Not(T U A => error)

– Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,…

– Theoretically slow (NP-hard) but this should be practical:

Abduction + stochastic sampling Find collars Learn constraints on collars

Page 20: If you fix everything you lose fixes for everything else

20

Possible optimizations (not used here)

STAR, an example of a general process:

– Stochastic sampling– Sort settings by “value”– Rule generation experiments

favoring highly “value”-ed settings See also, elite sampling in the

cross-entropy method

If SA convergence too slow– Try moving back select into the SA; – Constrain solution mutation to

prefer highly “value”-ed settings

BORE (best or rest)– n runs– Best= top 10% scores– Rest = remaining 90%– {a,b} = frequency of

discretized range in {best, rest – Sort settings by

-1 * (a/n)2 / (a/n + b/n)

Other valuable tricks: – Incremental discretization:

Gama&Pinto’s PID + Fayyad&Irani

– Limited discrepancy search: Harvey&Ginsberg

– Treatment learning: Menzies&Yu

Askme why,off-line

Page 21: If you fix everything you lose fixes for everything else

“Uncertainty helps

planning”

(questions? comments?)

Page 22: If you fix everything you lose fixes for everything else

22

At the “policy point”,STAR’s random solutionsare surprisingly accurateLC : learn impact[i] via regression (JPL data)STAR: no tuning, randomly pick impact[i]

Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual

{ “” “”} same at {95, 99}% confidence (MWU)

Why so little Diff (median= 75%)?– Most influential inputs tightly constrained

diff same

diff diff

same same

diff diff

same same

∑ mre(lc) / ∑ mre(star) strategic tactical

ground 66% 63%

all 91% 75%

OSP2 99% 125%

OSP 112% 111%

flight 101% 121%

Page 23: If you fix everything you lose fixes for everything else

23

(Model uncertainty = collars) << inputs

In many models, a few “collar” variables set the other variables – Narrows (Amarel in the 60s)– Minimal environments (DeKleer ’85)– Master variables (Crawford & Baker ‘94)– Feature subset selection (Kohavi & John ‘97)– Back doors (Williams et al ‘03)– See “The Strangest Thing About Software (IEEE Computer, Jan’07)”

Collars appear in all execution traces (by definition)– You don’t have to find the collars, they’ll find you

So, to handle uncertainty– Write a simulator– Stagger over uncertainties– From stagger, find collars– Constrain collars

This talk: a very simple example of this process

Page 24: If you fix everything you lose fixes for everything else

24

Comparisons

Standard software process modeling– Models written more than run (PROSIM community)

Limited sensitivity analysis Limited trade space

– Or, expensive, error-prone, incomplete data collection programs

Point solutions Here:

– No data collection– Found stable conclusions

within a space of possibilities– Search : very simple– Solution, not brittle

With trade-off space

22 good ideas, sorted

Page 25: If you fix everything you lose fixes for everything else

25

Summary Living with uncertainty

– Sometimes, simpler than you may think

– more useful than you might think

Simple:– Here, the smallest change

to simulating annealing

Useful:– Sometimes uncertainty can

teach you more than certainty– If you fix everything, you lose

fixes to everything else

Collars control certainty– Uncertainty plus constrained

collars more certainty– Also, can drive model to

better performance

An example youcan explain to

any business user

Bad

Good

22 good ideas, sorted

An example youcan explain to

any business user