If you fix everything you lose fixes for everything else
description
Transcript of If you fix everything you lose fixes for everything else
If you fix everything youlose fixes for everything else
Tim Menzies (WVU)Jairus Hihn (JPL)
Oussama Elrawas (WVU)Dan Baker (WVU)Karen Lum (JPL)
International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research
Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or
imply its endorsement by the United States Government.
[email protected]@mix.wvu.edu
2
What does this mean?
Q: for what models does (a few peeks) = (many hard stares)?
A supposedly np-hard task
abduction over first-order theories
nogood/2
3
A: models with “collars”
Grow– Monte Carlo a model
Picking input settings at random
– For each run Score each output Add score to each input
settings Harvest
– Rule generation experiments, favoring settings with better
scores If “collars”, then
– … small rules … – … learned quickly …– … will suffice
“Collar” variables set the other variables
– Narrows Amarel in the 60s
– Minimal environments DeKleer ’85
– Master variables Crawford & Baker ‘94
– Feature subset selection Kohavi & John ‘97
– Back doors Williams et al ‘03
– Etc Implications for uncertainty?
Feather & Menzies RE’02
4
STAR: collars + simulated annealing on Boehm’s USC’s software process models
USC software process models for effort, defects, threats– y[i] = impact[i] * project[i] + b[i] for i {1,2,3,…} ≤ project[i] ≤ : uncertainty in project description ≤ impact[i] ≤ : uncertainty in model calibration
Random solution – pick project[i] and impact[i] from any .. , .. .. set via domain knowledge;
e.g. process maturity in 3 to 5– range of .. known from history;
Score solution by effort (Ef),
defects (De) and Threat (Th)
For example
uncontrollable
controllable
5
Two studies y[i] = impact[i] * project[i] + b[i]
Certain methods– Using much historical data– Learn the magnitude of the
impact[i] relationship– With fixed impact[I]
Monte Carlo at andom across the project[i] settings
E.g.– Regression-based tools that
learn impact[I] from historical records
– 93 records of JPL systems– SCAT:
JPL’s current methods– 2CEE:
WVU’s improvement over SCAT (currently under test)
Methods with more uncertainty– Using no historical data– Monte Carlo at random across
the project[i] settings and impact[i] settings
E.g. – STAR– Monte Carlo a model– Score each output– Sort settings by their “C”,
“C”= cumulative score– Rule generation experiments,
favoring settings with better “C”.
Tame uncontroll-ables via historical records
one two
6
for setting Sx { value[setting] += E }
Sort all settings by their value– Ignore uncontrollables impact[I]– Assume the top
(1 ≤ i ≤ max) project[I] settings– Randomly select the rest
“Policy point” :– smallest I with lowest E
Median = 50% percentile– Spread = (75-50)% percentile
Bad
Good
22 good ideas
38 not-so- good ideas
Inside STAR
1. sampling - simulated annealing2. summarizing - post-processor
7
SCAT vs 2CEE vs STAR project[i]
8
SCAT vs 2CEE vs STAR project[i]
Control impact[I] via historical data
9
SCAT vs 2CEE vs STAR project[i]
Stagger around superset of possible impact[I]
Control impact[I] via historical data
10
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
SCAT vs 2CEE vs STAR project[i]
Stagger around superset of possible impact[I]
Control impact[I] via historical data
11
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
SCAT vs 2CEE vs STAR project[i]
Stagger around superset of possible impact[I]
Control impact[I] via historical data
12
OSP (effort)
0
500
1000
1500
2000
2500
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
Ground (effort)
0100200300400500600700800
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vs STAR project[i]
OSP2 (effort)
050
100150200250300350400450
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger around superset of possible impact[I]
Control impact[I] via historical data
13
OSP (effort)
0
500
1000
1500
2000
2500
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
Ground (effort)
0100200300400500600700800
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vs STAR project[i]
OSP2 (effort)
050
100150200250300350400450
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger around superset of possible impact[I]
Control impact[I] via historical data
14
OSP (effort)
0
500
1000
1500
2000
2500
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
Ground (effort)
0100200300400500600700800
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vs STAR project[i]
OSP2 (effort)
050
100150200250300350400450
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger around superset of possible impact[I]
Control impact[I] via historical data
Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)
15
OSP (effort)
0
500
1000
1500
2000
2500
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
Ground (effort)
0100200300400500600700800
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vs STAR project[i]
OSP2 (effort)
050
100150200250300350400450
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger around superset of possible impact[I]
Control impact[I] via historical data
Ignoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)
16
OSP (effort)
0
500
1000
1500
2000
2500
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Flight (effort)
0200400600800
1000120014001600
SCAT median
spread
2CEE median
spread
STAR median
spread
Median: 50% point
Spread : (75 - 50)%
Median: 50% point
Spread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
Ground (effort)
0100200300400500600700800
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vs STAR project[i]
OSP2 (effort)
050
100150200250300350400450
SCAT median
spread
2CEE median
spread
STAR median
spread
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger around superset of possible impact[I]
Control impact[I] via historical data
If you fix everything, you lose fixes for everything elseIf you fix everything, you lose fixes for everything elseIgnoring historical data is useful (!!!?)Ignoring historical data is useful (!!!?)
Luke, trust the force, I mean, collars
IEEE Computer, Jan 2007“The strangest thing about software”
Extra Material
19
Related work
Feather, DDP, treatment learning– Optimization of
requirement models
XEROC PARC, 1980s, qualitative representations (QR)
– not overly-specific, – Quickly collected in a new
domain. – Used for model diagnosis
and repair – Can found creative solutions in
larger space of possible qualitative behaviors,
than in the tighter space of precise quantitative behaviors
Abduction : – World W = minimal set of
assumptions (w.r.t. size) such that T A => G Not(T U A => error)
– Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,…
– Theoretically slow (NP-hard) but this should be practical:
Abduction + stochastic sampling Find collars Learn constraints on collars
20
Possible optimizations (not used here)
STAR, an example of a general process:
– Stochastic sampling– Sort settings by “value”– Rule generation experiments
favoring highly “value”-ed settings See also, elite sampling in the
cross-entropy method
If SA convergence too slow– Try moving back select into the SA; – Constrain solution mutation to
prefer highly “value”-ed settings
BORE (best or rest)– n runs– Best= top 10% scores– Rest = remaining 90%– {a,b} = frequency of
discretized range in {best, rest – Sort settings by
-1 * (a/n)2 / (a/n + b/n)
Other valuable tricks: – Incremental discretization:
Gama&Pinto’s PID + Fayyad&Irani
– Limited discrepancy search: Harvey&Ginsberg
– Treatment learning: Menzies&Yu
Askme why,off-line
“Uncertainty helps
planning”
(questions? comments?)
22
At the “policy point”,STAR’s random solutionsare surprisingly accurateLC : learn impact[i] via regression (JPL data)STAR: no tuning, randomly pick impact[i]
Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual
{ “” “”} same at {95, 99}% confidence (MWU)
Why so little Diff (median= 75%)?– Most influential inputs tightly constrained
diff same
diff diff
same same
diff diff
same same
∑ mre(lc) / ∑ mre(star) strategic tactical
ground 66% 63%
all 91% 75%
OSP2 99% 125%
OSP 112% 111%
flight 101% 121%
23
(Model uncertainty = collars) << inputs
In many models, a few “collar” variables set the other variables – Narrows (Amarel in the 60s)– Minimal environments (DeKleer ’85)– Master variables (Crawford & Baker ‘94)– Feature subset selection (Kohavi & John ‘97)– Back doors (Williams et al ‘03)– See “The Strangest Thing About Software (IEEE Computer, Jan’07)”
Collars appear in all execution traces (by definition)– You don’t have to find the collars, they’ll find you
So, to handle uncertainty– Write a simulator– Stagger over uncertainties– From stagger, find collars– Constrain collars
This talk: a very simple example of this process
24
Comparisons
Standard software process modeling– Models written more than run (PROSIM community)
Limited sensitivity analysis Limited trade space
– Or, expensive, error-prone, incomplete data collection programs
Point solutions Here:
– No data collection– Found stable conclusions
within a space of possibilities– Search : very simple– Solution, not brittle
With trade-off space
22 good ideas, sorted
25
Summary Living with uncertainty
– Sometimes, simpler than you may think
– more useful than you might think
Simple:– Here, the smallest change
to simulating annealing
Useful:– Sometimes uncertainty can
teach you more than certainty– If you fix everything, you lose
fixes to everything else
Collars control certainty– Uncertainty plus constrained
collars more certainty– Also, can drive model to
better performance
An example youcan explain to
any business user
Bad
Good
22 good ideas, sorted
An example youcan explain to
any business user