Hubbard Decision Research The Applied Information Economics Company Bootstrap Hints.

HubbardDecision Research

The Applied Information Economics Company

Bootstrap HintsBootstrap Hints



Overview of Bootstrapping HintsOverview of Bootstrapping Hints

The objective of a good bootstrap model is to be a realistic model of intuitive judgments which are even more accurate than the judges

The measure of effectiveness in this area is the R squared

Roughly, R squared means the % of variance explained by the model

These hints should help improve R squared



Strategies for Improving R SquaredStrategies for Improving R Squared

Hints for choosing the right variables Hints for improving data gathering Hints for improving quantification Hints for finding higher-order variables



Hints for Choosing VariablesHints for Choosing Variables For some commonly bootstrapped variables – such as

Confidence Index and Cancellation Probability – these variables may be considered: Project cost and/or duration Is it a compliance project and/or is the project a documented

strategic requirement? What is the scope of the business covered? (eg. Number of

departments involved, number of users, etc.) Sponsor characteristics such as level, whether the sponsor is

business or IT, or the sponsors past success record in past projects Whether the investment is new software development, package

modification, upgrades to previous systems, hardware only, etc. Technology risk such as proven track records, IT familiarity with

the technology, the maturity of the technology Watch how many variables are added - much more than 8

variables starts to become unproductive and may degrade the accuracy of the model – stick to the important ones



Data Gathering HintsData Gathering Hints

You will probably always get a higher R square when averaging larger groups

Be sure to allow time for calibration Use a trial bootstrap list that they discuss as a

group They can check results with “pair-wise

comparisons” – they pick pairs of investments at random, determine which they would prefer, then they confirm that their evaluators scores reflect this



Hints for Quantifying VariablesHints for Quantifying Variables

Regression assumes that all variables are basically linear

Reviewing each variable for non- linearity and finding a way to make them linear will improve R squared

Variables that can be captured as 0 or 1 (binary) need no review

Continuous variables need to be graphed to check for non-linearity

Discrete variables that are not binary require pivot table analysis (see pivot table procedure for details)



Continuous VariablesContinuous Variables

One way to improve R square is to convert your non-linear variables into linear variables

To check which variables are non-linear make an XY graph of the continuous variable on the X axis and the bootstrapped variable (from the evaluators) on the Y axis

If you find an obviously non-linear relationship, you can change the variable so that it becomes linear

Depending on how the graph looks, you can take the appropriate steps



LinearLinear

This is an obvious linear relationship, leave it just like it is

0,00%

20,00%

40,00%

60,00%

80,00%

100,00%

0% 20% 40% 60% 80% 100%



Scattered DistributionScattered Distribution

If the XY plot is not obviously non-linear, then just leave it like it is

If the Excel regression output indicates that this variable has little or no effect, consider removing it

0,00%

1,00%

2,00%

3,00%

4,00%

5,00%

6,00%

0,00% 1,00% 2,00% 3,00% 4,00% 5,00% 6,00% 7,00%



Clustered distributionClustered distribution

Here, a “threshold” would be the best quantification of this variable Instead of being linear, this variable appears to make a difference only

when it is above or below a certain value (in this case, about 6% on the horizontal scale

Try converting the continuous variable to a binary. In this case you would use “=if(x<.06, 1,0)”

0%

2%

4%

6%

8%

10%

12%

14%

16%

0% 2% 4% 6% 8% 10% 12% 14%



Upward SlopingUpward Sloping

If the graph slopes upward, then you might try putting the scale of the X axis on “logarithmic”

If this makes it look linear then use the formula “=log(X)” If that doesn’t work try “=X^.5” or some other power of X less than 1

0%10%20%30%40%50%60%70%80%90%

0 10 20



Leveling OffLeveling Off

Try setting the scale of the Y axis to “logarithmic” If this makes it look linear then use “=exp(X)” If it doesn’t work, try “=X^2” or some other power of X

0%10%20%30%40%

50%60%70%80%90%

0% 50% 100% 150% 200% 250% 300%



Downward SlopingDownward Sloping

Try setting the scale of the Y axis to “logarithmic” If this makes it look linear then use “=exp(x)” If it doesn’t work, try “=1/X”

0%10%20%30%40%

50%60%70%80%90%

0% 50% 100% 150% 200% 250% 300%



Hints for Higher-Order TermsHints for Higher-Order Terms

After your first attempt at a regression, you may improve your R squared by adding some “higher-order” variables

A higher-order variable includes variables that are the products of other variables, conditional statements involving other variables, etc.

To find potential candidates for higher-order terms, ask yourself if the importance of some variables depend on the values of other variables

Try several new terms and plot each one. If there looks like an obvious linear relationship, then add it

If you make a higher-order variable, run a new regression, and the R square is higher, it was probably a good choice



Continuous Higher-Order TermsContinuous Higher-Order Terms

If the importance of one variable depends on the value of another, and they are both continuous, try the following – we’ll call these two variables X and Y

If the bootstrapped variable should increase when both X and Y are high (or when both are low) then try “=X*Y”

If the bootstrapped variable should increase when one variable is high and the other is low then try “=X/Y”

If X is especially important when Y is over/under a certain value N then try “=if(Y>N, X, 0)



Discrete Higher-Order TermsDiscrete Higher-Order Terms

You might try a pivot table that compares the average bootstrapped output variable in combinations of the two variables – put one variable in the columns of a pivot and the other in the rows

You can then try a nested IF statement that allows you to put a separate discrete value on each combination of the two variables

For example, suppose you found a compounding relationship between “strategic” (Y) and “multiple departments” (X)

You might try “=if(X=1,if(Y=1,.41,.11),.5)”

10

.41 .51

.11 .49

1 0Strategic

Multiple Departments

These 2 are not significantly different so

you can average them and use the same value

These 2 are not significantly different so

you can average them and use the same value

Average



Improvements Due to BootstrapImprovements Due to Bootstrap

This chart shows the percentage reduction in error of intuitive estimates compared to bootstrapped estimates

Results vary depending on how objective and systematic the model was – like ours

0% 5% 10% 15% 20% 25% 30% 35% 40%

Cancer patient life-expectancy

Life-insurance salesrep performance

Graduate students grades

Changes in stock prices

Mental illness using personality tests

Student ratings of teaching effectiveness

IQ scores using Rorschach tests

Psychology course grades

Business failures using financial ratios

Mean across many studies



Actual Classification PlotsActual Classification Plots

An Illinois insurance company created a classification chart to help prioritize the current list of proposed investments

They wanted to determine which investments could be accepted without more analysis and which need more analysis

18 investments were plotted on the classification chart

The results had a profound effect on investment priorities

Some investments that were assumed to be beneficial now required analysis and some that required analysis could now be approved immediately



Classification of Example ProjectsClassification of Example Projects

3

4

5

6

7

89

1011

12

13

14

15

16

17

18

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 100 1,000 10,000

12

Expected Investment Size ($000)

Con

fide

nce

Inde

x

No

Cla

ssif

icat

ion

Nee

ded

Do Abbreviated Risk-Return Analysis: 6. DLSW Router Network Redesign9. Extended Hours18. Doc. Access Strategy

Do Abbreviated Risk-Return Analysis: 6. DLSW Router Network Redesign9. Extended Hours18. Doc. Access Strategy

Do Full Risk-Return Analysis: 8. Pearl Indicator and Pearl I/O interface11. Richardson Data Center Consolidation15. MVS DB2 Tools

Do Full Risk-Return Analysis: 8. Pearl Indicator and Pearl I/O interface11. Richardson Data Center Consolidation15. MVS DB2 Tools

Reject; Consider Other Options: 1. Data Strategy 2. Enterprise Security Strategy3. Remote Server Redundancy12. MQ Series: Base13. Development Environment 2000 (mf)14. “Source Control” Source Code Mgmt16. Enterprise InterNet

Reject; Consider Other Options: 1. Data Strategy 2. Enterprise Security Strategy3. Remote Server Redundancy12. MQ Series: Base13. Development Environment 2000 (mf)14. “Source Control” Source Code Mgmt16. Enterprise InterNet

Success Factor Adjustments: 4. Network OS migration to Novell 5.x10. Optimize Single Code Base

Success Factor Adjustments: 4. Network OS migration to Novell 5.x10. Optimize Single Code Base

Accept without Further Analysis: 5. Lucent switch upgrade7. Image Server Relocation17. Enterprise IntraNet to all sites

Accept without Further Analysis: 5. Lucent switch upgrade7. Image Server Relocation17. Enterprise IntraNet to all sites



Bootstrapping DeliverablesBootstrapping Deliverables

Final presentation including An XY chart showing correlation of original estimates to

the bootstrap model Any “solution space” that was developed such as

classification charts

A worksheet for input of various values which uses the bootstrap model to estimate some output variable(s)

Any customization to RAVI documentation for that client for proper use of the worksheets and solution spaces

Any recommendations based on the bootstrap

Hubbard Decision Research The Applied Information Economics Company Bootstrap Hints.

Documents

Transcript of Hubbard Decision Research The Applied Information Economics Company Bootstrap Hints.