Decision Analysis Using Microsoft...

DECISION ANALYSIS USING MICROSOFT

EXCEL

SPRING 2006

Michael R. Middleton School of Business and Management University of San Francisco

This page is intentionally mostly blank.

Copyright © 2006 by Michael R. Middleton

Detailed Contents 1

PART 1 MODELS AND SENSITIVITY ANALYSIS ...................... 11

Chapter 1 Introduction to Decision Modeling ............................................................. 13 1.1 Models to Aid Decision Making ............................................................................ 13

Components of a Decision Model ............................................................................ 14 1.2 Basic What-If Model.............................................................................................. 16

Influence Diagram Representation........................................................................... 16 Decision Tree Representation .................................................................................. 18 Consequence Table Representation.......................................................................... 18

Chapter 2 Sensitivity Analysis Using SensIt ................................................................ 19 2.1 How to Install SensIt .............................................................................................. 19 2.2 How to Uninstall or Delete SensIt.......................................................................... 20 2.3 SensIt Overview ..................................................................................................... 20 2.4 Example Problem ................................................................................................... 20 2.5 One Input, One Output ........................................................................................... 21

Cells for Input Variable............................................................................................ 22 Cells for Output Variable ......................................................................................... 22 Input Values ............................................................................................................. 22

2.6 Many Inputs, Many Outputs Tornado .................................................................... 23 Ranges for Input Variables....................................................................................... 24 Cells for Output Variable ......................................................................................... 25 Ranges for Input Values........................................................................................... 25

2.7 Tornado Sorted by Downside Risk ........................................................................ 26 2.8 Tornado Sorted by Upside Potential ...................................................................... 26 2.9 Tornado Showing Major Uncertainties .................................................................. 27 2.10 Spider ................................................................................................................... 28 2.11 Tips for Many Inputs, One Output ....................................................................... 29 2.12 Eagle Airlines Problem ........................................................................................ 31

Chapter 3 Multiattribute Utility ................................................................................... 33 3.1 Applications of Multi-Attribute Utility .................................................................. 33

4 Detailed Contents

3.2 MultiAttribute Utility Swing Weights.................................................................... 34 Attribute Scores........................................................................................................ 35 Swing Weights ......................................................................................................... 36 Overall Scores .......................................................................................................... 37

3.3 Sensitivity Analysis Methods................................................................................. 38 Dominance ............................................................................................................... 39 Monetary Equivalents Assessment........................................................................... 39 Additive Utility Function ......................................................................................... 40 Weight Ratio Assessment......................................................................................... 41 Weight Ratio Sensitivity Analysis ........................................................................... 43 Swing Weight Assessment ....................................................................................... 44 Swing Weight Sensitivity Analysis .......................................................................... 46 Direct Weight Assessment and Sensitivity Analysis................................................ 49 Summary .................................................................................................................. 51 Sensitivity Analysis Examples References .............................................................. 51 Screenshots from Excel to Word.............................................................................. 52

PART 2 MONTE CARLO SIMULATION....................................... 53

Chapter 4 Introduction to Monte Carlo Simulation ................................................... 55 4.1 Introduction ............................................................................................................ 55

Chapter 5 Uncertain Quantities.................................................................................... 57 5.1 Discrete Uncertain Quantities ................................................................................ 57 5.2 Continuous Uncertain Quantities ........................................................................... 57

Case A: Uniform Density ......................................................................................... 57 Case B: Ramp Density ............................................................................................. 60 Case C: Triangular Density ...................................................................................... 62

Chapter 6 Simulation Without Add-Ins....................................................................... 65 6.1 Simulation Using Excel Functions ......................................................................... 65

Chapter 7 Monte Carlo Simulation Using RiskSim .................................................... 67 7.1 Using RiskSim Functions....................................................................................... 67 7.2 Using RiskSim Functions....................................................................................... 68 7.3 Updating Links To RiskSim Functions .................................................................. 68 7.4 Monte Carlo Simulation ......................................................................................... 70 7.5 Random Number Seed ........................................................................................... 71 7.6 One-Output Example.............................................................................................. 72 7.7 RiskSim Output for One-Output Example ............................................................. 73 7.8 Customizing RiskSim Charts ................................................................................. 75 7.9 Random Number Generator Functions................................................................... 77

RandBinomial .......................................................................................................... 77

Detailed Contents 5

RandBiVarNormal ................................................................................................... 78 RandCumulative....................................................................................................... 79 RandDiscrete ............................................................................................................ 80 RandExponential ...................................................................................................... 82 RandInteger .............................................................................................................. 83 RandNormal ............................................................................................................. 84 RandSample ............................................................................................................. 85 RandPoisson............................................................................................................. 85 RandTriangular ........................................................................................................ 86 RandUniform............................................................................................................ 87

7.10 RiskSim Technical Details ................................................................................... 88 7.11 Modeling Uncertain Relationships ....................................................................... 90

Base Model, Four Inputs .......................................................................................... 90 Three Inputs ............................................................................................................. 91 Two Inputs ............................................................................................................... 92 Four Inputs with Three Uncertainties....................................................................... 93 Intermediate Details ................................................................................................. 95

Chapter 8 Multiperiod What-If Modeling ................................................................... 97 8.1 Apartment Building Purchase Problem.................................................................. 97

Apartment Building Analysis Notes....................................................................... 100 8.2 Product Launch Financial Model ......................................................................... 101 8.3 Machine Simulation Model .................................................................................. 105

AJS Process 1......................................................................................................... 105 AJS Process 2......................................................................................................... 106

Chapter 9 Modeling Inventory Decisions................................................................... 113 9.1 Newsvendor Problem ........................................................................................... 113

Stationery Wholesaler Example ............................................................................. 113

Chapter 10 Modeling Waiting Lines .......................................................................... 115 10.1 Queue Simulation............................................................................................... 115

PART 3 DECISION TREES ........................................................ 121

Chapter 11 Introduction to Decision Trees................................................................ 123 11.1 Decision Tree Structure...................................................................................... 123

DriveTek Problem, Part A...................................................................................... 123 Nodes and Branches ............................................................................................... 124

11.2 Decision Tree Terminal Values.......................................................................... 126 DriveTek Problem, Part B...................................................................................... 126

11.3 Decision Tree Probabilities ................................................................................ 128 DriveTek Problem, Part C...................................................................................... 128

6 Detailed Contents

Chapter 12 Decision Trees Using TreePlan ............................................................... 129 12.1 TreePlan Installation .......................................................................................... 129

Occasional Use....................................................................................................... 129 Selective Use.......................................................................................................... 129 Steady Use.............................................................................................................. 130

12.2 Building a Decision Tree in TreePlan ................................................................ 130 12.3 Anatomy of a TreePlan Decision Tree ............................................................... 132 12.4 Step-by-Step TreePlan Tutorial.......................................................................... 134

DriveTek Problem.................................................................................................. 134 Nodes and Branches ............................................................................................... 135 Terminal Values ..................................................................................................... 136 Building the Tree Diagram..................................................................................... 137 Interpreting the Results .......................................................................................... 145 Formatting the Tree Diagram................................................................................. 146 Displaying Model Inputs........................................................................................ 148 Printing the Tree Diagram...................................................................................... 150 Alternative Model .................................................................................................. 151

12.5 Decision Tree Solution....................................................................................... 151 Strategy .................................................................................................................. 151 Payoff Distribution................................................................................................. 152 DriveTek Strategies................................................................................................ 152 Strategy Choice ...................................................................................................... 156 Certainty Equivalent............................................................................................... 157 Rollback Method.................................................................................................... 159 Optimal Strategy .................................................................................................... 160

12.6 Newox Decision Tree Problem .......................................................................... 162 12.7 Brandon Decision Tree Problem ........................................................................ 163

Decision Tree Strategies......................................................................................... 163

Chapter 13 Sensitivity Analysis for Decision Trees................................................... 171 13.1 One-Variable Sensitivity Analysis ..................................................................... 171 13.2 Two-Variable Sensitivity Analysis..................................................................... 173

Setup for Data Table .............................................................................................. 174 Obtaining Results Using Data Table Command..................................................... 174 Embellishments ...................................................................................................... 175

13.3 Multiple-Outcome Sensitivity Analysis ............................................................. 176 13.4 Robin Pinelli's Sensitivity Analysis ................................................................... 177

Chapter 14 Value of Information in Decision Trees ................................................. 181 14.1 Value of Information.......................................................................................... 181 14.2 Expected Value of Perfect Information.............................................................. 181

Expected Value of Perfect Information, Reordered Tree ....................................... 182 Expected Value of Perfect Information, Payoff Table ........................................... 185 Expected Value of Perfect Information, Expected Improvement........................... 186

Detailed Contents 7

Expected Value of Perfect Information, Single-Season Product............................ 187 14.3 DriveTek Post-Contract-Award Problem ........................................................... 190 14.4 Sensitivity Analysis vs EVPI ............................................................................. 194

Chapter 15 Value of Imperfect Information.............................................................. 195 15.1 Technometrics Problem...................................................................................... 195

Prior Problem ......................................................................................................... 195 Imperfect Information ............................................................................................ 196 Probabilities From Relative Frequencies................................................................ 196 Revision of Probability........................................................................................... 200

Chapter 16 Modeling Attitude Toward Risk ............................................................. 201 16.1 Risk Utility Function.......................................................................................... 201 16.2 Exponential Risk Utility..................................................................................... 204 16.3 Approximate Risk Tolerance.............................................................................. 207 16.4 Exact Risk Tolerance Using Excel ..................................................................... 207 16.5 Exact Risk Tolerance Using RiskTol.xla ........................................................... 211 16.6 Exponential Utility and TreePlan ....................................................................... 212 16.7 Exponential Utility and RiskSim........................................................................ 212 16.8 Risk Sensitivity for Machine Problem ............................................................... 214 16.9 Risk Utility Summary......................................................................................... 215

Concepts................................................................................................................. 215 Fundamental Property of Utility Function ............................................................. 216 Using a Utility Function To Find the CE of a Lottery............................................ 216 Exponential Utility Function.................................................................................. 216 TreePlan's Simple Form of Exponential Utility ..................................................... 216 Approximate Assessment of RiskTolerance .......................................................... 216 Exact Assessment of RiskTolerance ...................................................................... 217 Using Exponential Utility for TreePlan Rollback Values ...................................... 217 Using Exponential Utility for a Payoff Distribution .............................................. 218

PART 4 DATA ANALYSIS ......................................................... 219

Chapter 17 Introduction to Data Analysis ................................................................. 221 17.1 Levels of Measurement ...................................................................................... 221

Categorical Measure............................................................................................... 221 Numerical Measure ................................................................................................ 221

17.2 Describing Categorical Data .............................................................................. 222 17.3 Describing Numerical Data ................................................................................ 222

Frequency Distribution and Histogram .................................................................. 222 Numerical Summary Measures .............................................................................. 222 Distribution Shapes ................................................................................................ 223

8 Detailed Contents

Chapter 18 Univariate Numerical Data ..................................................................... 225 18.1 Analysis Tool: Descriptive Statistics.................................................................. 225

Formatting the Output Table .................................................................................. 228 Interpreting Descriptive Statistics .......................................................................... 229 Another Measure of Skewness ............................................................................... 231

18.2 Analysis Tool: Histogram .................................................................................. 233 Histogram Embellishments .................................................................................... 235

18.3 Better Histograms Using Excel .......................................................................... 237 Exercises .................................................................................................................... 238

Chapter 19 Bivariate Numerical Data........................................................................ 239 19.1 XY (Scatter) Charts............................................................................................ 240 19.2 Analysis Tool: Correlation ................................................................................. 242 19.3 Analysis Tool: Covariance ................................................................................. 244 19.4 Correlations for Several Variables ..................................................................... 245 Exercises .................................................................................................................... 247

Chapter 20 One-Sample Inference for the Mean ...................................................... 249 20.1 Normal versus t Distribution .............................................................................. 249 20.2 Hypothesis Tests ................................................................................................ 249

Left-Tail, Right-Tail, or Two-Tail ......................................................................... 250 Decision Approach or Reporting Approach ........................................................... 250

Chapter 21 Simple Linear Regression........................................................................ 253 21.1 Inserting a Linear Trendline ............................................................................... 254

Trendline Interpretation.......................................................................................... 256 Trendline Embellishments...................................................................................... 257

21.2 Regression Analysis Tool................................................................................... 257 Regression Interpretation ....................................................................................... 261 Regression Charts................................................................................................... 262

21.3 Regression Functions ......................................................................................... 264 Exercises .................................................................................................................... 267

Chapter 22 Simple Nonlinear Regression .................................................................. 269 22.1 Polynomial ......................................................................................................... 271 22.2 Logarithmic ........................................................................................................ 273 22.3 Power ................................................................................................................. 275 22.4 Exponential ........................................................................................................ 277 Exercises .................................................................................................................... 282

Chapter 23 Multiple Regression ................................................................................. 283 23.1 Interpretation of Regression Output ................................................................... 285

Significance of Coefficients ................................................................................... 285 Interpretation of the Regression Statistics .............................................................. 286

Detailed Contents 9

Interpretation of the Analysis of Variance ............................................................. 286 23.2 Analysis of Residuals ......................................................................................... 286 23.3 Using TREND to Make Predictions ................................................................... 288

Interpretation of the Predictions ............................................................................. 289 Exercises .................................................................................................................... 290

Chapter 24 Regression Using Categorical Variables ................................................ 293 24.1 Categories as Explanatory Variables.................................................................. 293 24.2 Interpretation of Regression Using Indicators.................................................... 296 24.3 Interpretation of Multiple Regression ................................................................ 297 24.4 Categories as the Dependent Variable................................................................ 298

Interpretation of the Classifications ....................................................................... 301 Exercises .................................................................................................................... 302

Chapter 25 Regression Models for Cross-Sectional Data......................................... 305 25.1 Cross-Sectional Regression Checklist ................................................................ 305

Plot Y versus each X .............................................................................................. 305 Examine the correlation matrix .............................................................................. 305 Calculate the regression model with diagnostics.................................................... 305 Use the model......................................................................................................... 306

Chapter 26 Time Series Data and Forecasts.............................................................. 307 26.1 Time Series Patterns........................................................................................... 307

Chapter 27 Autocorrelation and Autoregression ...................................................... 311 27.1 Linear Time Trend ............................................................................................. 312 27.2 Durbin-Watson Statistic ..................................................................................... 313 27.3 Autocorrelation .................................................................................................. 314 27.4 Autoregression ................................................................................................... 316 27.5 Autocorrelation Coefficients Function ............................................................... 320 27.6 AR(2) Model ...................................................................................................... 322 Exercises .................................................................................................................... 324

Chapter 28 Time Series Smoothing ............................................................................ 325 28.1 Moving Average Using Add Trendline.............................................................. 327 28.2 Moving Average Data Analysis Tool................................................................. 329 28.3 Exponential Smoothing Tool.............................................................................. 330 Exercises .................................................................................................................... 333

Chapter 29 Time Series Seasonality ........................................................................... 335 29.1 Regression Using Indicator Variables ................................................................ 336 29.2 AR(4) Model ...................................................................................................... 342 29.3 Classical Time Series Decomposition ................................................................ 347 Exercises .................................................................................................................... 354

10 Detailed Contents

Chapter 30 Regression Models for Time Series Data ............................................... 357 30.1 Time Series Regression Checklist ...................................................................... 357

Plot Y versus time .................................................................................................. 357 Plot Y versus each X .............................................................................................. 357 Examine the correlation matrix .............................................................................. 357 Calculate the regression model with diagnostics.................................................... 358 Use the model......................................................................................................... 358

30.2 Autocorrelation of Residuals.............................................................................. 359

PART 5 CONSTRAINED OPTIMIZATION.................................. 361

Chapter 31 Product Mix Optimization ...................................................................... 363 31.1 Linear Programming Concepts........................................................................... 363

Formulation ............................................................................................................ 363 Graphical Solution.................................................................................................. 363 Sensitivity Analysis................................................................................................ 363

31.2 Basic Product Mix Problem ............................................................................... 365 31.3 Outdoors Problem .............................................................................................. 370

Spreadsheet Model ................................................................................................. 372 Solver Reports........................................................................................................ 373

Chapter 32 Modeling Marketing Decisions ............................................................... 375 32.1 Allocating Advertising Expenditures ................................................................. 375

Chapter 33 Nonlinear Product Mix Optimization .................................................... 381 33.1 Diminishing Profit Margin ................................................................................. 381

Chapter 34 Integer-Valued Optimization Models..................................................... 383 34.1 Transportation Problem...................................................................................... 383 34.2 Modified Transportation Problem ...................................................................... 384 34.3 Scheduling Problem ........................................................................................... 386

Chapter 35 Optimization Models for Finance Decisions .......................................... 389 35.1 Working Capital Management Problem............................................................. 389 35.2 Work Cap Alternate Formulations ..................................................................... 391 35.3 Stock Portfolio Problem..................................................................................... 393 35.4 MoneyCo Problem ............................................................................................. 395

Appendix Excel for the Macintosh.............................................................................. 397 The Shortcut Menu................................................................................................. 397 Relative and Absolute References.......................................................................... 397

References ..................................................................................................................... 399

Part 1 Models and Sensitivity Analysis

Chapter 1 introduces the terminology for decision models that is used throughout the book. Several ways to describe a decision problem are discussed, including spreadsheet models, influence charts, decision trees, and consequence tables.

Chapter 2 contains the documentation and examples for the SensIt sensitivity analysis add-in for Excel.

Chapter 3 discusses multi-attribute utility which is a useful model for decision problems with conflicting objectives. The discussion includes extensive sensitivity analysis for multi-attribute utility using standard Excel features.

12


Introduction to Decision Modeling 1

1.1 MODELS TO AID DECISION MAKING Decision: irrevocable allocation of resources

Model: abstract representation of reality

What makes decision difficult?

Complexity

many factors to consider; relationships among factors

Uncertainty

Conflicting Objectives

How does modeling help?

Complexity Model; consider each factor separately;

consider relationships explicitly;

avoid being overwhelmed

Uncertainty Sensitivity Analysis and Probability

Conflicting Objectives consider each objective;

consider tradeoffs explicitly

Goals of modeling: recommended solution, insight, clarity of action

14 Chapter 1 Introduction to Decision Modeling

Figure 1.1 Overall Model-Building Flowchart

Components of a Decision Model Controllable input variables

"What you can do," decision variables, alternatives

Uncontrollable input variables

"What you know and don't know," uncertainties, constraints

Relationships

how inputs are related to output, usually with intermediate variables, structure

Intermediate variables

useful for linking inputs to output

Output variable

"What you want," performance measure, overall satisfaction

Math Model

Operations on Model

Model Results

Real World Model

Difficult Problem

Implementation

Abstraction

1.1 Models to Aid Decision Making 15

Influence chart

Rectangle for controllable inputs

Rounded rectangle or oval for other variables

Figure 1.2 Generic Influence Chart

Controllable Factor (Input)

Uncontrollable Factor (Input)

Intermediate Variables

Performance Measure (Output)

...

... ...

...


1.2 BASIC WHAT-IF MODEL

Influence Diagram Representation

Figure 1.3 Typical Influence Diagram

Figure 1.4 Typical Spreadsheet Model

Unit Price

Units Sold

Unit Variable Cost

Fixed Costs

Sales Revenue

Total Variable Cost

Total Costs

Net Cash Flow

Output

Intermediate Variables

Inputs

1.2 Basic What-If Model 17

Figure 1.5 Formulas for Typical Spreadsheet Model

Figure 1.6 Defined Names for Typical Spreadsheet Model


Decision Tree Representation

Figure 1.7 Decision Fan and Event Fan

Figure 1.8 Conceptual Decision Tree

Consequence Table Representation

Figure 1.9 Professor's Summer Decision

Conflicting Objectives

Alternatives Cash Flow Hassle-Free Happy Deans Professional Fame

Develop Software $2700 Yes Maybe Maybe

Teach MBAs $4300 No Yes No

Vacation $0 Yes No No

Event Fan

Decision with many possible

alternatives

Decision Fan

Event with manypossible

outcomes

=

=. . . ...

Net Cash Flow

$

Unit Variable

Cost

Unit Price

Fixed Costs

Units Sold

Sensitivity Analysis Using SensIt 2

SensIt is a sensitivity analysis add-in for Microsoft Excel (Excel 97 and later versions) for Windows and Macintosh. The original version was written by Mike Middleton of the University of San Francisco and Jim Smith of Duke University, and the current version was rewritten in VBA by Mike Middleton.

2.1 HOW TO INSTALL SENSIT There are several ways to install SensIt:

(1) Start Excel, and use Excel’s File | Open command to open the SensIt xla file from floppy or hard drive.

(2) Copy the SensIt xla file to the Program Files | Microsoft Office | Office | Library folder of your hard drive, in which case SensIt will automatically appear in Excel's Add-In Manager. Start Excel, and use Excel’s Tools | Add-Ins command to load and unload SensIt as needed by checking or unchecking the SensIt Sensitivity Analysis checkbox.

(3) Copy the SensIt xla file to your choice of a folder on the hard drive. Start Excel, choose Tools | Add-Ins | Browse, navigate to the location of the SensIt xla file, select it, and click OK. Subsequently, use Excel’s Tools | Add-Ins command to load and unload SensIt as needed by checking or unchecking the SensIt Sensitivity Analysis checkbox.

(4) Copy the SensIt xla file to the Program Files | Microsoft Office | Office | XLStart folder of your hard drive, in which case the file will be opened every time you start Excel.

All of SensIt’s functionality, including its built-in help, is a part of the SensIt xla file. There is no separate setup file or help file. When you use SensIt, it does not create any Windows Registry entries (although Excel may use such entries to keep track of its add-ins). SensIt does create a temporary worksheet for intermediate calculations, but after the calculations are successfully completed, SensIt deletes the temporary worksheet.

20 Chapter 2 Sensitivity Analysis Using SensIt

2.2 HOW TO UNINSTALL OR DELETE SENSIT (A) First, use your file manager to locate the SensIt xla file, and delete the file from your hard drive.

(B1) If SensIt is listed under Excel's add-in manager and the box is checked, when you start Excel you will see "Cannot find ..." Click OK. Choose Tools | Add-Ins, uncheck the box for SensIt; you will see "Cannot find ... Delete from list?" Click Yes.

(B2) If SensIt is listed under Excel's add-in manager and the box is not checked, start Excel and choose Tools | Add-Ins. Check the box for SensIt; you will see "Cannot find ... Delete from list?" Click Yes.

2.3 SENSIT OVERVIEW To run SensIt, start Excel and open the SensIt xla file. Alternatively, install SensIt using one of the methods described above. SensIt adds a Sensitivity Analysis command to the Tools menu. The Sensitivity Analysis command has three subcommands: One Input, One Output; Many Inputs, One Output; and Help.

Before using the SensIt options, you must have a spreadsheet model with one or more inputs and an output. SensIt's features make it easy for you to see how sensitive the output is to changes in the inputs.

Use SensIt’s One Input, One Output option to see how your model’s output depends on changes in a single input variable. This feature creates an XY (Scatter) chart type.

Use SensIt’s Many Inputs, One Output option to see how your model’s output depends on ranges you specify for each of the model’s input variables. This feature creates a tornado chart (a horizontal Bar chart type) and a spider chart (an XY (Scatter) chart type).

2.4 EXAMPLE PROBLEM Eagle Airlines is deciding whether to purchase a five-seat aircraft where some proportion of the hours flown would be charter flights and some hours would be regularly scheduled ticketed flights with an uncertain number of seats sold (capacity). A spreadsheet model that does not include financing costs is shown below.

2.5 One Input, One Output 21

Figure 2.1 Model Display

123456789

1011121314151617181920

A B CSpreadsheet Model For Eagle Airlines

Input Variables Input CellsCharter Price/Hour $325Ticket Price/Hour $100Hours Flown 800Capacity of Scheduled Flights 50%Proportion of Chartered Flights 0.5Operating Cost/Hour $245Insurance $20,000

Intermediate CalculationsTotal Revenue $230,000Total Cost $216,000

Performance MeasureAnnual Profit $14,000

Adapted from Bob Clemen's textbook,Making Hard Decisions, 2nd ed., Duxbury (1996).

Figure 2.2 Model Formulas

1112131415161718

A B

Intermediate CalculationsTotal Revenue =(B8*B6*B4)+((1-B8)*B6*B5*B7*5)Total Cost =(B6*B9)+B10

Performance MeasureAnnual Profit =B13-B14

2.5 ONE INPUT, ONE OUTPUT Use SensIt’s One Input, One Output option to see how your model’s output depends on changes in a single input variable.


Figure 2.3 SensIt One Input, One Output Dialog Box

Cells for Input Variable In the Label reference edit box, type a cell reference, or point to the cell containing a text label and click. In the Value reference edit box, type a cell reference, or point to the cell containing a numeric value that is an input cell of your model.

Cells for Output Variable In the Label reference edit box, type a cell reference, or point to the cell containing a text label and click. In the Value reference edit box, type a cell reference, or point to the cell containing a formula that is the output of your model.

Input Values Type numbers in the Start, Step, and Stop edit boxes to specify values to be used in the input variable’s cell. Cell references are not allowed.

Click OK: SensIt uses the Start, Step, and Stop values to prepare a table of values. Each value is copied to the input variable Value cell, the worksheet is recalculated, and the value of the output variable Value cell is copied to the table. (You could do this manually in Excl using the Edit | Fill | Series and Data | Table commands.) SensIt uses the paired input and output values to prepare an XY (Scatter) chart. The text in the label cells you identified are used as the chart’s axis labels. (You could do this manually using the ChartWizard.)

2.6 Many Inputs, Many Outputs Tornado 23

Figure 2.4 SensIt Numerical and Chart Output SensIt 1.20 ProfessionalOne Input, One Output

Date (current date)Time (current time)Workbook senssamp.xlsInput Cell Model!$B$6Output Cell Model!$B$17

Hours Flown Annual Profit400 -$3,000450 -$875500 $1,250550 $3,375600 $5,500650 $7,625700 $9,750750 $11,875800 $14,000850 $16,125900 $18,250950 $20,375

1000 $22,500

SensIt 1.20 Professional

-$5,000

$0

$5,000

$10,000

$15,000

$20,000

$25,000

400 500 600 700 800 900 1000

Hours Flow n

Annu

al P

rofit

From the table and chart, we observe that Eagle must fly approximately 480 hours to achieve a positive profit, assuming all other assumptions stay the same. The exact threshold value for Hours Flown could be obtained using Excel's Goal Seek feature.

2.6 MANY INPUTS, MANY OUTPUTS TORNADO Use SensIt’s Tornado option to see how your model’s output depends on ranges you specify for each of the model’s input variables. Before using Tornado, arrange your model input cells in adjacent cells in a single column, arrange corresponding labels in adjacent cells in a single column, and arrange Low, Base, and High input values for each input variable in three separate columns. Alternatively, the three columns containing input values can be worst case, likely case, and best case. An appropriate arrangement is shown below.


Figure 2.5 Model Display with Lower and Upper Bounds

123456789

1011121314151617181920

A B C D E FSpreadsheet Model For Eagle Airlines

Input Variables Input Cells Lower Bound Base Value Upper BoundCharter Price/Hour $325 $300 $325 $350Ticket Price/Hour $100 $95 $100 $108Hours Flown 800 500 800 1000Capacity of Scheduled Flights 50% 40% 50% 60%Proportion of Chartered Flights 0.5 0.45 0.5 0.7Operating Cost/Hour $245 $230 $245 $260Insurance $20,000 $18,000 $20,000 $25,000

Intermediate CalculationsTotal Revenue $230,000Total Cost $216,000

Performance MeasureAnnual Profit $14,000

Adapted from Bob Clemen's textbook,Making Hard Decisions, 2nd ed., Duxbury (1996).

Figure 2.6 SensIt Many Inputs, One Output Dialog Box

Ranges for Input Variables Type a range reference, or point to the range (click and drag) containing text labels and the range containing numeric values that are inputs to your model. If the range is not contiguous, select the first portion and then hold down the Control key while making the remaining selections. Alternatively, type a comma between each portion.

2.6 Many Inputs, Many Outputs Tornado 25

Cells for Output Variable Type a cell reference, or point to the cell containing a text label and the cell containing a formula that’s the output of your model.

Ranges for Input Values Type a range reference, or point to the range (click and drag) containing numeric values for each of your model’s inputs. You can make non-contiguous selections similar to the ranges for input variables. Be sure that all five range selections have the appropriate cells in the same order.

After you click OK, for each input variable, SensIt sets all other input values at their Base case values, copies the One Extreme input value to the input variable cell, recalculates the worksheet, and copies the value of the output variable cell to the table; the steps are repeated using each Other Extreme input value. For each input variable, SensIt computes the range of the output variable values (the swing), sorts the table from largest swing down to smallest smallest, and prepares a bar chart.

Figure 2.7 SensIt Tornado Numerical and Chart Output 123456789

10111213141516171819202122232425262728293031323334353637

A B C D E F G H I JSensIt 1.20 ProfessionalMany Inputs, One OutputSingle-Factor Sensitivity Analysis

Date (current date) Workbook senssamp.xlsTime (current time) Output Cell Cases!$B$17

Input Value Output Value (Annual Profit)Input Variable Low Output Base Case High Output Low Base Case High SwingCapacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000


$25,000

$95

0.45

$300

500

$260

40%

$18,000

$108

0.7

$350

1000

$230

60%

-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000

Capacity of Scheduled Flights

Operating Cost/Hour

Hours Flown

Charter Price/Hour

Proportion of Chartered Flights

Ticket Price/Hour

Insurance

Annual Profit


The uncertainty about Capacity of Scheduled Flights is associated with the widest swing in Annual Profit.

2.7 TORNADO SORTED BY DOWNSIDE RISK The tornado chart is originally sorted by Swing. To sort by downside risk, i.e., by the low output values, select the data in cells A10:J16, choose Data | Sort, check that "No header row" is selected, select "Sort by" column F Ascending, and click OK. The results are shown below.

Figure 2.8 SensIt Tornado Sorted by Downside Risk

89

10111213141516171819202122232425262728293031323334353637

A B C D E F G H I JInput Value Output Value (Annual Profit)

Input Variable Low Output Base Case High Output Low Base Case High SwingCapacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000


0.45

$25,000

$95

$300

$260

500

40%

0.7

$18,000

$108

$350

$230

1000

60%

-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000


Hours Flown

Operating Cost/Hour

Charter Price/Hour

Ticket Price/Hour

Insurance


Annual Profit

2.8 TORNADO SORTED BY UPSIDE POTENTIAL To sort by upside potential, i.e., by the high output values, select the data in cells A10:J16, choose Data | Sort, check that "No header row" is selected, select "Sort by" column H Dscending, and click OK. The results are shown below.

2.9 Tornado Showing Major Uncertainties 27

Figure 2.9 SensIt Tornado Sorted by Upside Potential

89

10111213141516171819202122232425262728293031323334353637

A B C D E F G H I JInput Value Output Value (Annual Profit)

Input Variable Low Output Base Case High Output Low Base Case High SwingCapacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000


$25,000

$95

500

$300

0.45

$260

40%

$18,000

$108

1000

$350

0.7

$230

60%

-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000


Operating Cost/Hour


Charter Price/Hour

Hours Flown

Ticket Price/Hour

Insurance

Annual Profit

2.9 TORNADO SHOWING MAJOR UNCERTAINTIES In some situations you may have twenty or more input variables and you wish to show the variation of only the top five or ten. To illustrate this modification, consider showing only the top four input variables in the example. Click one of the bars on the left side of the vertical base case line to select Series 1 (shown at the right end of the formula bar), and then click and drag the fill handle from A16 up to A13 and the fill handle from F16 up to F13. Click one of the bars on the right side of the vertical base case line to select Series 2, and then click and drag the fill handle from H16 up to H13. To resize the chart, click just inside its outer border and drag the bottom center fill handle upward. The resulting chart is shown below.


Figure 2.10 SensIt Tornado Showing Only Major Uncertainties


$300

$260

500

40%

$350

$230

1000

60%

-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000


Hours Flow n

Operating Cost/Hour

Charter Price/Hour

Annual Profit

2.10 SPIDER Use SensIt’s Spider option to see how your model’s output depends on the same percentage changes for each of the model’s input variables.

Click OK: SensIt Spider uses the Start (%), Step (%), and Stop (%) values and the original (base case) numeric value in each input variable cell to prepare a table of percentage change input values. For each input variable, all other input values are set at their base case values, each percentage change input value is copied to the input variable cell, the worksheet is recalculated, and the value of the output variable cell is copied to the table. SensIt prepares two XY (Scatter) charts; the horizontal axis is percentage change of input variables; the vertical axis is model output value on one chart and percentage change of model output value on the other; the input variables’ labels are used for chart legends.

2.11 Tips for Many Inputs, One Output 29

Figure 2.11 SensIt Spider Numerical and Chart Output 123456789

101112131415161718192021222324252627282930313233343536373839404142434445464748

A B C D E F G H I J KSensIt 1.20 ProfessionalMany Inputs, One OutputSingle-Factor Sensitivity Analysis

Date (current date) Workbook senssamp.xlsTime (current time) Output Cell Cases!$B$17

Input Value Input Value as % of Base Output Value (Annual Profit)Input Variable Low Output Base Case High Output Low Output Base Case High Output Low Base Case High SwingCapacity of Scheduled Flights 40% 50% 60% 80.0% 100.0% 120.0% -$6,000 $14,000 $34,000 $40,000Operating Cost/Hour $260 $245 $230 106.1% 100.0% 93.9% $2,000 $14,000 $26,000 $24,000Hours Flown 500 800 1000 62.5% 100.0% 125.0% $1,250 $14,000 $22,500 $21,250Charter Price/Hour $300 $325 $350 92.3% 100.0% 107.7% $4,000 $14,000 $24,000 $20,000Proportion of Chartered Flights 0.45 0.5 0.7 90.0% 100.0% 140.0% $11,000 $14,000 $26,000 $15,000Ticket Price/Hour $95 $100 $108 95.0% 100.0% 108.0% $9,000 $14,000 $22,000 $13,000Insurance $25,000 $20,000 $18,000 125.0% 100.0% 90.0% $9,000 $14,000 $16,000 $7,000


-$15,000

-$10,000

-$5,000

$0

$5,000

$10,000

$15,000

$20,000

$25,000

$30,000

$35,000

$40,000

50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 110.0% 120.0% 130.0% 140.0% 150.0%

Input Value as % of Base Case

Ann

ual P

rofit

Capacity of Scheduled FlightsOperating Cost/HourHours FlownCharter Price/HourProportion of Chartered FlightsTicket Price/HourInsurance

2.11 TIPS FOR MANY INPUTS, ONE OUTPUT When defining the high and low cases for each variable, it is important to be consistent so that the "high" cases are all equally high and the "low" cases are equally low. This will ensure that the output results can be meaningfully compared.

For example, if you are thinking about the uncertainty using probability and very extreme values are possible but with low probability of occurrence, you might take all of the base case values to be estimates of the mean of the input variable, take low cases to be values such there is a 1-in-10 chance of the variable being below this amount, and take the high cases to be values such that there is a 1-in-10 chance of the variable being above this amount. Or, you might use the 5th and 95th percentiles for each of the input variables.


Alternatively, in some situations the values for each input variable may have lower and upper bounds, so you may specify low and high values that are the absolute lowest and highest possible values.

When you click OK, SensIt sets all of the input variables to their base-case values and records the output value. Then SensIt goes through each of the input variables one at a time, plugs the low-case value into the input cell, and records the value in the output cell. It then repeats the process for the high case. For each substitution, all input values are kept at their base-case values except for the single input value that is setn at it low or high value. SensIt then produces a spreadsheet that lists the numerical results as shown in columns F, G, and H of the worksheet with the tornado chart.

In the worksheet, the variables are sorted by their "swing" -- the absolute value of the difference between the output values in the low and high cases. "Swing" serves as a rough measure of the impact of each input variable. The rows of numerical output are sorted from highest swing at the top down to lowest swing at the bottom. Then SensIt creates a bar chart of the sorted data.

In general, you should focus your modeling efforts on those variables with the greatest impact on the value measure.

If your model has input variables that are discrete or categorical, you should create multiple tornado charts using different base case values of that input variable. For example, if your model has an input variable "Government Regulation" that has possible values 0 (zero) or 1, the low and high values will be 0 and 1, but you should run one tornado chart with base case = 0 and another tornado chart with base case = 1.

2.12 Eagle Airlines Problem 31

2.12 EAGLE AIRLINES PROBLEM

Figure 2.12 Ten-Variable Eagle Model Display

1234567891011121314151617181920

A B C D E FSpreadsheet Model For Eagle Airlines

Variable Input Cells Lower Bound Base Value Upper BoundHours Flown 800 500 800 1000

Charter Price/Hour $325 $300 $325 $350Ticket Price/Hour $100 $95 $100 $108

Capacity of Scheduled Flights 50% 40% 50% 60%Proportion Of Chartered Flights 0.5 0.45 0.5 0.7

Operating Cost/Hour $245 $230 $245 $260Insurance $20,000 $18,000 $20,000 $25,000

Proportion Financed 0.4 0.3 0.4 0.5Interest Rate 11.5% 10.5% 11.5% 13.0%

Purchase Price $87,500 $85,000 $87,500 $90,000

Total Revenue $230,000Total Cost $220,025

Annual Profit $9,975

Adapted from Bob Clemen's textbook, Making Hard Decisions

Figure 2.13 Ten-Variable Eagle Model Formulas

141516171819

A B C D E F

Total Revenue =(B8*B4*B5)+((1-B8)*B4*B6*B7*5)Total Cost =(B4*B9)+B10+(B13*B11*B12)

Annual Profit =B15-B16


Figure 2.14 Ten-Variable Worst Case and Best Case Inputs Determined by Solver

Variable Worst Case Base Case Best CaseHours Flown 1000 800 1000

Charter Price/Hour $300 $325 $350Ticket Price/Hour $95 $100 $108

Capacity of Scheduled Flights 40% 50% 60%Proportion Of Chartered Flights 0.45 0.5 0.7

Operating Cost/Hour $260 $245 $230Insurance $25,000 $20,000 $18,000

Proportion Financed 0.5 0.4 0.3Interest Rate 13.0% 11.5% 10.5%

Purchase Price $90,000 $87,500 $85,000

Total Revenue $239,500 $230,000 $342,200Total Cost $290,850 $220,025 $250,678

Annual Profit -$51,350 $9,975 $91,523

Multiattribute Utility 3

3.1 APPLICATIONS OF MULTI-ATTRIBUTE UTILITY Strategy for Dealing with Microcomputer Networking

Impact on microcomputer users

Productivity enhancement

User satisfaction

Impact on mainframe capacity

Costs

Upward compatibility of the network

Impacts on organizational structure

Risks

Purchase of manufacturing machinery

Price

Technical features

Service

Choosing a manager candidate

Education

Management skills

Technical skills

Personal skills

Choosing a beverage container (soft drink industry)

34 Chapter 3 Multiattribute Utility

Energy to produce

Cost

Environmental waste

Customer service

Selecting a best job

Monetary compensation

Geographical location

Travel requirements

Nature of work

3.2 MULTIATTRIBUTE UTILITY SWING WEIGHTS Excel Workbook Clemen15.xls

Conflicting Objectives: Fundamental Objectives versus Means Objectives

Clemen, Making Hard Decisions, Ch. 15

Multiattribute Utility

Set of Objectives should be

1) complete

2) as small as possible

3) not redundant

4) decomposable ("independent" or unrelated)

Additive Utility Function

Overall Score of Alternative = Sum [ Weight times Attribute Score of Alternative ]

Figure 3.1 Data for Example

Attribute Red Portalo Blue Norushi Yellow Standard

Life span, in years 12 9 6

Price $17,000 $10,000 $8,000

Color Red Blue Yellow

3.2 MultiAttribute Utility Swing Weights 35

Attribute Scores

Figure 3.2 Individual Utility for Life Span

Life Span

Years Score6 09 0.5

12 1

Scores for Life Span

0.0

0.2

0.4

0.6

0.8

1.0

5 6 7 8 9 10 11 12 13

Life Span, in years

Life

Spa

n Sc

ore

Figure 3.3 Individual Utility for Price

Price

Price Score$17,000 0$10,000 0.78

$8,000 1

Scores for Price

0.0

0.2

0.4

0.6

0.8

1.0

$5,000 $10,000 $15,000 $20,000

Price

Pric

e Sc

ore


Figure 3.4 Individual Utility for Color

Color

Color ScoreRed 0Blue 0.667

Yellow 1

Scores for Color

0.0

0.2

0.4

0.6

0.8

1.0

Red Blue Yellow

Color

Col

or S

core

Swing Weights

Figure 3.5 Swing Weight Assessment Display

12345678910

A B C D E F GSwing Weights

Consequence to CompareAttribute Swung from Worst to Best Life span Price Color Rank Rate Weight(Benchmark) 6 years $17,000 red 4 0 0.000Life span 12 years $17,000 red 2 75 0.405Price 6 years $8,000 red 1 100 0.541Color 6 years $17,000 yellow 3 10 0.054

185

1) Hypothetical alternatives (number of attributes plus one)

Benchmark alternative is worst for all attributes

Each other hypothetical alternative has one attribute at best, all others at worst

2) Rank the hypothetical alternatives

3) Benchmark has rating zero, first ranked alternative has rating 100

3.2 MultiAttribute Utility Swing Weights 37

Assign level-of-satisfaction ratings to the intermediate alternatives

4) Weight equals rating divided by sum of ratings

Figure 3.6 Swing Weight Assessment Formulas

12345678910

A B C D E F GSwing Weights

Consequence to CompareAttribute Swung from Worst to Best Life span Price Color Rank Rate Weight(Benchmark) 6 years $17,000 red 4 0 =F6/$F$10Life span 12 years $17,000 red 2 75 =F7/$F$10Price 6 years $8,000 red 1 100 =F8/$F$10Color 6 years $17,000 yellow 3 10 =F9/$F$10

=SUM(F6:F9)

Overall Scores

Figure 3.7 Swing Weight Overall Scores Display

123456789

101112

I J K L M N O P QOverall Scores

Red Portalo Blue Norushi Yellow StandardAttribute Attribute Attribute Attribute Attribute Attribute

Attribute Value Score Value Score Value ScoreLife span 12 1.000 9 0.500 6 0.000Price $17,000 0.000 $10,000 0.780 $8,000 1.000Color Red 0.000 Blue 0.667 Yellow 1.000

Overall Score 0.40541 0.66038 0.59459

Best Blue Norushi

Figure 3.8 Swing Weight Overall Scores Formulas

123456789101112

I J K L M N O P Q ROverall Scores

Red Portalo Blue Norushi Yellow StandardAttribute Attribute Attribute Attribute Attribute Attribute

Attribute Value Score Value Score Value ScoreLife span 12 1.000 9 0.500 6 0.000Price $17,000 0.000 $10,000 0.780 $8,000 1.000Color Red 0.000 Blue 0.667 Yellow 1.000

Overall Score =SUMPRODUCT($G$7:$G$9,K6:K8) =SUMPRODUCT($G$7:$G$9,N6:N8) =SUMPRODUCT($G$7:$G$9,Q6:Q8)

Best =IF(K10=MAX(K10,N10,Q10),"Red Portalo",IF(N10=MAX(K10,N10,Q10),"Blue Norushi","Yellow Standard"))


Figure 3.9 Sensitivity Analysis

123456789

10111213141516171819202122232425262728

U V W X Y Z AASensitivity Analysis Data Tables

Life Span Rate (10 to 100) Color Rate (0 to 75)

W9 Output Formula: =J12 Z9 Output Formula: =J12Column Input Cell: F7 Column Input Cell: F9

Life Span Rate Best Color Rate Best

10 Yellow Standard 0 Blue Norushi15 Yellow Standard 5 Blue Norushi20 Yellow Standard 10 Blue Norushi Base Case25 Yellow Standard 15 Blue Norushi30 Yellow Standard 20 Blue Norushi35 Yellow Standard 25 Blue Norushi40 Yellow Standard 30 Blue Norushi45 Yellow Standard 35 Blue Norushi50 Yellow Standard 40 Blue Norushi55 Blue Norushi 45 Blue Norushi60 Blue Norushi 50 Yellow Standard65 Blue Norushi 55 Yellow Standard70 Blue Norushi 60 Yellow Standard

Base Case 75 Blue Norushi 65 Yellow Standard80 Blue Norushi 70 Yellow Standard85 Blue Norushi 75 Yellow Standard90 Blue Norushi95 Blue Norushi

100 Blue Norushi

3.3 SENSITIVITY ANALYSIS METHODS SENSITIVITY ANALYSIS FOR MULTI-ATTRIBUTE UTILITY USING EXCEL

This paper describes several standard methods for analyzing decisions where the outcomes have multiple attributes. The example problem concerns a large company that is planning to purchase several hundred cars for use by the sales force. The company wants a car that is inexpensive, safe, and lasts a long time. Figure 1 shows data for seven cars that are being considered.

3.3 Sensitivity Analysis Methods 39

Figure 1 Attribute Data for Seven Alternatives

123456789

A B C D E F G HAlternatives

Attribute Alta Bulldog Cruiser Delta Egret Fleet GarnettCost $20 $18 $16 $14 $12 $10 $15Lifetime 10 10 8 8 6 6 8Safety High Medium High Medium Medium Low Low

Cost thousands of dollarsLifetime expected yearsSafety third-party rating

Other attributes might be important, e.g., comfort and prestige. The cost attribute should include operating costs, insurance, and salvage value, in addition to purchase price. It might be appropriate to combine the cost and lifetime attributes into a single attribute, e.g., cost per year. Clemen [1] suggests that a set of attributes should be complete (so that all important objectives are included), as small as possible (to facilitate analysis), not redundant (to avoid double-counting a common underlying characteristic), and decomposable (so that the decision maker can think about each attribute separately).

Dominance An alternative can be eliminated if another alternative is better on some objectives and no worse on the others. The Garnett is more expensive than the Delta, has the same lifetime, and has a lower safety rating. So the Garnett can be eliminated from further consideration.

Monetary Equivalents Assessment One method for comparing multi-attribute alternatives is to subjectively assign monetary values to the non-monetary attributes. For example, the decision maker may determine that each additional year of expected lifetime is worth $500, medium safety is $4,000 better than low safety, and high safety is $6,000 better than low safety. Arbitrarily using Fleet as the base case with total equivalent cost of $10,000, Figure 2 shows costs and equivalent costs, in thousands of dollars, in rows 9:11. The negative entries for Lifetime and Safety correspond to positive benefits relative to the Fleet car's base case values.

Based on this method, the Egret is chosen. Sensitivity analysis, not shown here, would involve seeing how the choice depends on subjective equivalents different from the $500 per year lifetime and the $4,000 and $6,000 safety assessments.

Hammond et al. [3] describe another method involving even swaps that could be used to select the best alternative.


Figure 2 Monetary Equivalents for Non-Dominated Alternatives

123456789

10111213

A B C D E F GNon-Dominated Alternatives

Attribute Alta Bulldog Cruiser Delta Egret FleetCost $20 $18 $16 $14 $12 $10Lifetime, years 10 10 8 8 6 6Safety rating High Medium High Medium Medium Low

Non-Dominated AlternativesAttribute Alta Bulldog Cruiser Delta Egret FleetCost $20 $18 $16 $14 $12 $10Lifetime, $ -$2 -$2 -$1 -$1 $0 $0Safety, $ -$6 -$4 -$6 -$4 -$4 $0

Equiv. Cost $12 $12 $9 $9 $8 $10

Additive Utility Function The additive multi-attribute utility function U includes individual utility functions Ui for each attribute xi, usually scaled from 0 to 1, and weights wi that reflect the decision maker's tradeoffs among the attributes.

U(x1,x2,x3) = w1.U1(x1) + w2

.U2(x2) + w3.U3(x3), where w1 + w2 + w3 = 1 (1)

Weights may be specified directly, as ratios, or using a swing weight procedure. Individual utility functions are assessed using the range of attribute values for the alternatives being considered.

The individual utility values for Cost and Lifetime shown in Figure 3 are based on proportional scores, corresponding to linear utility functions. For example, each thousand dollar difference in cost is associated with a 0.1 difference in utility. The utility values for Safety are subjective judgments. For example, the decision maker thinks that a change in Safety from Low to Medium achieves only two-thirds of the satisfaction associated with a change from Low to High.


Figure 3 Individual Utilities

123456789

10111213141516


Attribute Alta Bulldog Cruiser Delta Egret FleetCost $20 $18 $16 $14 $12 $10Lifetime 10 10 8 8 6 6Safety High Medium High Medium Medium Low

Assess individual utility for each attribute.Cost U($20,000)=0, U($10,000)=1, linearLifetime U(6 years)=0, U(10 years)=1, linearSafety U(Low)=0, U(Medium)=2/3, U(High)=1

Non-Dominated AlternativesAttribute Alta Bulldog Cruiser Delta Egret FleetCost 0.000 0.200 0.400 0.600 0.800 1.000Lifetime 1.000 1.000 0.500 0.500 0.000 0.000Safety 1.000 0.667 1.000 0.667 0.667 0.000

Compared to the assessments for individual utility, the assessments for tradeoffs are usually much more difficult to make. The following sections focus on assessments of tradeoff weights and sensitivity analysis.

Weight Ratio Assessment One method for measuring trade-offs among the conflicting objectives is to assess weight ratios. For example, the decision maker may judge that cost is five times as important as lifetime, which may be interpreted to mean that the change in overall satisfaction corresponding to a change in cost from $20,000 to $10,000 is five times the change in overall satisfaction corresponding to a change in lifetime from 6 years to 10 years. Similarly, the decision maker may judge that a $10,000 decrease in cost is one and a half times as satisfying as a change from a low to a high safety rating. The assessments are shown in cells J4:J5 in Figure 4.


Figure 4 Weight Ratio Assessment and Choice

12345678910111213

A B C D E F G H I JNon-Dominated Alternatives Assess weight ratios.

Attribute Alta Bulldog Cruiser Delta Egret FleetCost 0.000 0.200 0.400 0.600 0.800 1.000 Weight Ratio InputLifetime 1.000 1.000 0.500 0.500 0.000 0.000 Cost/Lifetime 5.0Safety 1.000 0.667 1.000 0.667 0.667 0.000 Cost/Safety 1.5

Overall 0.464 0.452 0.625 0.613 0.667 0.536 WeightsCost 0.536

Max Value 0.667 Lifetime 0.107Location 5 Safety 0.357Choice Egret

Choice Egret

With three attributes, the two assessed weight ratios determine two equations and the requirement that the weights sum to one determines a third equation. Using algebra, a solution for the three unknown weights is shown in cells J8:J10 in Figure 5.

The formula for overall utility in cell B7, with a relative reference to the attribute utilities in B3:B5 and an absolute reference to the weights in J8:J10, is copied to cells C7:G7.

The MAX worksheet function determines the maximum overall utility in B7:G7, the MATCH function determines the location of that maximum in B7:G7, and the INDEX function returns the alternative name located in B2:G2. The zero argument in the MATCH function is needed to specify that an exact match is required; the zero argument in the INDEX function is used as a placeholder and could be omitted in this application without affecting the results. Cell B13 combines these functions into a single formula.

Figure 5 Formulas for Weight Ratio Assessment and Choice

123456789

10111213

A B H I JNon-Dominated Alternatives Assess weight ratios.

Attribute AltaCost 0 Weight Ratio InputLifetime 1 Cost/Lifetime 5Safety 1 Cost/Safety 1.5

Overall =SUMPRODUCT(B3:B5,$J$8:$J$10) WeightsCost =1/(1/J4+1/J5+1)

Max Value =MAX(B7:G7) Lifetime =J8/J4Location =MATCH(B9,B7:G7,0) Safety =J8/J5Choice =INDEX(B2:G2,0,B10)

Choice =INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0))

After deleting cells A9:B12, the single formula is in cell B9. The arrangement shown in Figure 6 is used for the remaining analyses.


Figure 6 Weight Ratio Choice for Sensitivity Analysis

123456789


Attribute Alta Bulldog Cruiser Delta Egret FleetCost 0.000 0.200 0.400 0.600 0.800 1.000Lifetime 1.000 1.000 0.500 0.500 0.000 0.000Safety 1.000 0.667 1.000 0.667 0.667 0.000

Overall 0.464 0.452 0.625 0.613 0.667 0.536

Choice Egret

Weight Ratio Sensitivity Analysis The decision maker specified tradeoffs using weight ratios, so it is appropriate to see whether the choice is sensitive to changes in those assessed values. To construct a two-way data table for sensitivity analysis of the weight ratios as shown in Figures 7 and 8, enter a set of values in a row, N4:R4, and another set of values in a column, M5:M13. In the top left cell of the data table, M4, enter a formula for determining the data table's output values, =B9. (To improve the appearance of the table, cell M4 is formatted with a custom three-semicolon format so that the formula result is not displayed.) Select M4:R13. Choose Data | Table. In the Data Table dialog box, specify J4 as the Row Input Cell and J5 as the Column Input Cell. Click OK.

Figure 7 Coarse Two-Factor Sensitivity Analysis of Weight Ratios

12345678910111213

L M N O P Q RTwo-Factor Sensitivity Analysis

Cost/Lifetime Weight Ratio3.0 4.0 5.0 6.0 7.0

Cost/Safety 1.00 Cruiser Cruiser Cruiser Cruiser CruiserWeight 1.25 Cruiser Egret Egret Egret EgretRatio 1.50 Egret Egret Egret Egret Egret

1.75 Egret Egret Egret Egret Egret2.00 Egret Egret Egret Egret Egret2.25 Egret Egret Egret Egret Egret2.50 Egret Egret Egret Egret Egret2.75 Egret Egret Egret Egret Egret3.00 Egret Egret Egret Egret Egret

Cell P7, corresponding to the original assessments, has a border. The data table is dynamic, so the macro view may be refined near the base-case assessments by specifying different input values.


Figure 8 Fine Two-Factor Sensitivity Analysis of Weight Ratios

12345678910111213

L M N O P Q RTwo-Factor Sensitivity Analysis

Cost/Lifetime Weight Ratio4.0 4.5 5.0 5.5 6.0

Cost/Safety 1.00 Cruiser Cruiser Cruiser Cruiser CruiserWeight 1.10 Cruiser Cruiser Cruiser Egret EgretRatio 1.20 Cruiser Egret Egret Egret Egret

1.30 Egret Egret Egret Egret Egret1.40 Egret Egret Egret Egret Egret1.50 Egret Egret Egret Egret Egret1.60 Egret Egret Egret Egret Egret1.70 Egret Egret Egret Egret Egret1.80 Egret Egret Egret Egret Egret

Figure 8 shows that the Cost/Safety weight ratio must be less than 1.2 to affect the choice. If the decision maker regards 1.2 as "far away" from 1.5, then the Egret choice is appropriate. Otherwise, the decision maker should think more carefully about the original assessments before making a choice based on this analysis. The assessment of the Cost/Lifetime weight ratio is not as critical, because any value between 4 and 6 yields the same choice.

Swing Weight Assessment Compared to weight ratio assessment, the swing weight method requires assessments that are similar to directly assigning an overall utility to an alternative. However, the hypothetical alternatives requiring assessment in this method are constructed so that it should be easier for the decision maker to assign overall utilities to them instead of to the actual alternatives.

The swing weight method involves four steps as shown in Figure 9.

1) Develop the hypothetical alternatives. The number of hypothetical alternatives equals the number of attributes plus one. The benchmark alternative in column J is worst for all attributes. Each other hypothetical alternative, shown in columns K, L, and M, has one attribute at best and all others at worst.

2) Rank the hypothetical alternatives, as shown in row 7. This is an intermediate step that facilitates assigning overall utilities.

3) Assign overall utility scores reflecting overall satisfaction for the hypothetical alternatives. The benchmark worst case has score zero, and the first-ranked alternative has score 100. Then assign level-of-satisfaction scores to the intermediate alternatives, as shown in cells L9 and M9.


4) Sum the scores, as shown in cell N9. In the additive utility function, the weight for each attribute equals the score divided by sum of the scores. (The algebra solution, not shown here, is based on the special zero and one individual utility values of the hypothetical alternatives.) Formulas are shown in Figure 10.

Figure 9 Hypothetical Alternatives and Weights for Swing Weight Assessment

12345678910111213

I J K L M NHypothetical Alternatives

Attribute Worst Best Cost Best Lifetime Best SafetyCost $20 $10 $20 $20Lifetime 6 6 10 6Safety Low Low Low High

Rank 4 1 3 2Total

Overall Score 0 100 20 70 190

Weight 0.000 0.526 0.105 0.368

Decision Maker's Inputs Underlined

Figure 10 Formulas for Swing Weight Assessment

12345678910111213

I J K L M N

Attribute Worst Best Cost Best Lifetime Best SafetyCost 20 10 20 20Lifetime 6 6 10 6Safety Low Low Low High

Rank 4 1 3 2Total

Overall Score 0 100 20 70 =SUM(J9:M9)

Weight =J9/$N$9 =K9/$N$9 =L9/$N$9 =M9/$N$9

Hypothetical Alternatives

Decision Maker's Inputs Underlined

The individual utility values are in a column, and the weights are in a row. The SUMPRODUCT function requires that the two arrays for its arguments have the same orientation, so the TRANSPOSE function converts the weights into a column format, as shown in Figure 11. The function in B7 must be array-entered; after typing the function, hold down Control and Shift while you press Enter.


Figure 11 Formulas for Swing Weight Choice

123456789

A BNon-Dominated Alternatives

Attribute AltaCost 0Lifetime 1Safety 1

Overall =SUMPRODUCT(B3:B5,TRANSPOSE($K$11:$M$11))

Choice =INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0))

Figure 12 Swing Weight Choice

123456789


Attribute Alta Bulldog Cruiser Delta Egret FleetCost 0.000 0.200 0.400 0.600 0.800 1.000Lifetime 1.000 1.000 0.500 0.500 0.000 0.000Safety 1.000 0.667 1.000 0.667 0.667 0.000

Overall 0.474 0.456 0.632 0.614 0.667 0.526

Choice Egret

Swing Weight Sensitivity Analysis The decision maker specified tradeoffs using overall scores for the hypothetical alternatives, so it is appropriate to see whether the choice is sensitive to changes in those assessed values. Figure 13 shows the sensitivity for the Best-Lifetime score that was specified as 20 relative to the worst-case benchmark and the highest-ranked Best-Cost hypothetical alternative. The Best-Lifetime alternative is still ranked 3 as long as its score is between 0 and 70.

To improve the appearance of the sensitivity analysis tables in Figure 13, the output formula cells, R13 and T13, have a three-semicolon custom format.


Figure 13 Sensitivity Analysis of Swing Weight Best-Lifetime Score

12345678910111213141516171819202122232425262728

P Q R S T USingle-Factor Sensitivity Analysis

Best Lifetime Overall ScoreBase case Score is 20Rank 3 as long as Score is between 0 and 70

Output Formula in cell R13: =B9Data Table Column Input Cell: M9

DetailBest Lifetime Best LifetimeOverall Score Choice Overall Score Choice

0 Egret 30 Egret5 Egret 31 Egret

10 Egret 32 Egret15 Egret 33 Egret

Base Case 20 Egret 34 Cruiser25 Egret 35 Cruiser30 Egret35 Cruiser40 Cruiser45 Cruiser50 Cruiser55 Cruiser60 Cruiser65 Cruiser70 Cruiser

The results in the left table Figure 13, cells Q13:R28, indicate that the Best-Lifetime score must be greater than 30 to affect the choice. A refined data table in cells T13:U19 shows that the score must be greater than 33 before the choice changes from Egret to Cruiser. If the decision maker regards 33 as "far away" from 20, then the Egret choice is appropriate.

Figure 14 shows a similar sensitivity analysis for the Best-Safety score. The assessed score of 70 must be greater than 89 to affect the choice.


Figure 14 Sensitivity Analysis of Swing Weight Best-Safety Score

123456789101112131415161718192021222324252627282930

W X Y Z AA ABSingle-Factor Sensitivity Analysis

Best Safety Overall ScoreBase case Score is 70Rank 2 as long as Score is between 20 and 100

Output Formula in cell Y13 and cell AB13: =B9Data Table Column Input Cell: N9

DetailBest Safety Best Safety

Overall Score Choice Overall Score Choice

20 Fleet 85 Egret25 Fleet 86 Egret30 Fleet 87 Egret35 Egret 88 Egret40 Egret 89 Egret45 Egret 90 Cruiser50 Egret55 Egret60 Egret65 Egret

Base Case 70 Egret75 Egret80 Egret85 Egret90 Cruiser95 Cruiser

100 Cruiser

To construct a two-way data table for sensitivity analysis of the swing weight assessments as shown in Figure 15, enter a set of values in a row, R4:V4, and another set of values in a column, Q5:Q13. In the top left cell of the data table, Q4, enter a formula for determining the data table's output values, =B9. (To improve the appearance of the table, cell Q4 is formatted with a custom three-semicolon format so that the formula result is not displayed.) Select Q4:V13. Choose Data | Table. In the Data Table dialog box, specify L9 as the Row Input Cell and M9 as the Column Input Cell. Click OK.


Figure 15 Sensitivity Analysis of Both Swing Weight Scores

12345678910111213

P Q R S T U VTwo-Way Sensitivity Analysis

Best Lifetime Overall Score10 15 20 25 30

Best 50 Egret Egret Egret Egret EgretSafety 55 Egret Egret Egret Egret EgretOverall 60 Egret Egret Egret Egret EgretScore 65 Egret Egret Egret Egret Egret

70 Egret Egret Egret Egret Egret75 Egret Egret Egret Egret Cruiser80 Egret Egret Egret Egret Cruiser85 Egret Egret Egret Cruiser Cruiser90 Egret Egret Cruiser Cruiser Cruiser

The table shows that the choice changes from Egret to Cruiser if the combination of assessments is changed from 20 & 70 to 30 & 75. This table could be refined to examine the exact threshold values.

Direct Weight Assessment and Sensitivity Analysis In some situations the decision maker may be able to assign tradeoff weights directly. Figure 16 shows results using the formulas shown in Figure 17.

Figure 16 Direct Weight Assessment

123456789

A B C D E F G H I JNon-Dominated Alternatives Weights

Attribute Alta Bulldog Cruiser Delta Egret Fleet Cost 0.500Cost 0.000 0.200 0.400 0.600 0.800 1.000 Lifetime 0.100Lifetime 1.000 1.000 0.500 0.500 0.000 0.000 Safety 0.400Safety 1.000 0.667 1.000 0.667 0.667 0.000

Overall 0.500 0.467 0.650 0.617 0.667 0.500

Choice Egret

The formula in cell B9 includes an IF function to verify that each weight is between 0 and 1, inclusive, and that the sum of the weights equals one. If not, the formula returns empty text. This formula must be array-entered; after typing the function, hold down Control and Shift while you press Enter.


Figure 17 Formulas for Direct Weight Assessment

123456789

A B H I JNon-Dominated Alternatives Weights

Attribute Alta Cost 0.5Cost 0 Lifetime 0.1Lifetime 1 Safety =1-J3-J2Safety 1

Overall =SUMPRODUCT(B3:B5,$J$2:$J$4)

Choice =IF(AND(SUM(J2:J4)<=1,J2:J4>=0),INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0)),"")

Figure 18 shows a two-way table for sensitivity analysis of the weights. Cell R5 corresponds to the approximate base case assessments in the weight ratio and swing weight methods.

Figure 18 Sensitivity Analysis of Direct Weight Assessment

123456789

10111213

L M N O P Q R S T U VTwo-Factor Sensitivity Analysis

Cost Weight0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Lifetime 0.1 Alta Cruiser Cruiser Cruiser Egret Egret Fleet Fleet FleetWeight 0.2 Alta Alta Cruiser Cruiser Cruiser Egret Fleet Fleet

0.3 Alta Alta Alta Cruiser Delta Fleet Fleet0.4 Alta Alta Alta Bulldog Bulldog Fleet0.5 Alta Alta Alta Bulldog Bulldog0.6 Alta Alta Bulldog Bulldog0.7 Alta Bulldog Bulldog0.8 Alta Bulldog0.9 Bulldog

Figure 19 is a more detailed view. The choice formula in cell B9 is modified by placing the INDEX function inside the LEFT function so that only the first letter of the alternative's name is returned.


Figure 19 Detailed Sensitivity Analysis of Direct Weight Assessment

123456789

10111213141516171819202122232425

L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AHTwo-Factor Sensitivity Analysis

Cost Weight0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Lifetime 0.00 A C C C C C C C C C E E E E E E F F F F FWeight 0.05 A A C C C C C C C C E E E E E F F F F F

0.10 A A A C C C C C C C E E E E F F F F F0.15 A A A A C C C C C C E E E E F F F F0.20 A A A A A A C C C C C E E F F F F0.25 A A A A A A A C C C D D F F F F0.30 A A A A A A A A C D D D F F F0.35 A A A A A A A A A D D D F F0.40 A A A A A A A A B B B D F0.45 A A A A A A A B B B B B0.50 A A A A A A A B B B B0.55 A A A A A A B B B B0.60 A A A A A A B B B0.65 A A A A A B B B0.70 A A A A B B B0.75 A A A A B B0.80 A A A B B0.85 A A B B0.90 A A B0.95 A B1.00 A

The results in Figure 19 show that all alternatives in this data set are candidates depending on the tradeoffs specified by the decision maker. In general, moving left to right, if more weight is given to cost, a less expensive alternative is chosen.

Summary This paper considered three methods for assessing tradeoffs in the additive utility function. For each method sensitivity analysis is useful for gaining insight into which tradeoff assumptions are critical. Kirkwood [2] includes Excel VBA methods for sensitivity analysis of individual utility functions in addition to weights.

Sensitivity Analysis Examples References [1] Clemen, R.T. Making Hard Decisions: An Introduction to Decision Analysis,

2nd Edition. Duxbury Press, 1996.

[2] Kirkwood, C.W. Strategic Decision Making: Multiobjective Decision Analysis with Spreadsheets. Duxbury Press, 1997.

[3] Hammond, J.S., Keeney, R.L., and Raiffa, H. Smart Choices: A Practical Guide to Making Better Decisions. Harvard Business School Press, 1999.

Part 2 Monte Carlo Simulation

Part 2 discusses Monte Carlo simulation which is useful for incorporating uncertainty into spreadsheet what-if models.

Separate chapters describe simulation using standard Excel features and simulation using the RiskSim simulation add-in for Excel.

Additional topics in this part include multi-period evaluation models, inventory decisions, and queuing models.

54


Introduction to Monte Carlo Simulation 4

4.1 INTRODUCTION

Figure 4.1 Conceptual Simulation as a Sample of Tree Endpoints

Net Cash Flow

~600,000 values

$

Unit Variable

Cost ~500 values

Unit Price

Fixed Costs

3 values

Units Sold

~400 values

56 Chapter 4 Introduction to Monte Carlo Simulation

Figure 4.2 Probability Distributions for Sampling Tree Endpoints

Figure 4.3 Conceptual Simulation as Influence Chart with Repeated What-Ifs

Constant Discrete Normal Uniform

Net Cash Flow

Fixed Costs

Units Sold

Unit Variable

Cost

Unit Price

$29

Net Cash Flow

$

Unit Variable

Cost

Unit Price

Fixed Costs

Units Sold

Discrete Normal Uniform

Uncertain Quantities 5

5.1 DISCRETE UNCERTAIN QUANTITIES Discrete UQ: a few, distinct values

Assign probability mass to each value (probability mass function).

Contrast discrete UQs with continuous UQs. Continuous UQs have an infinite number of values or so many distinct values that it is difficult to assign probability to each value. Instead, for a continuous UQ we assign probability only to ranges of values.

5.2 CONTINUOUS UNCERTAIN QUANTITIES Probability Density Functions and Cumulative Probability for Continuous Uncertain Quantities

The total area under a probability density function equals one.

A portion of the area under a density function is a probability.

The height of a density function is not a probability.

The simplest probability density function is the uniform density function.

Case A: Uniform Density The number of units of a new product that will be sold is an uncertain quantity.

What is the minimum quantity? “1000 units”

What is the maximum quantity? “5000 units”

Are any values in the range between 1000 and 5000 more likely than others? “No”

Represent the uncertainty using a uniform density function.

58 Chapter 5 Uncertain Quantities

Technical point: For a continuous UQ, P(X=x) = 0.

For a continuous UQ, probability is non-zero only for a range of values.

For convenience in computation and assessment, we may use a continuous UQ to approximate a discrete UQ, and vice versa.

In Figure 1, the range of values is 5000 – 1000 = 4000, which is the width of the total area under the uniform (rectangular) density function. The area of a rectangle is Width * Height = Area, and the area under the uniform density function in Figure 1 must equal 1. So, Height = Area / Base. Here the Base is 5000 – 1000 = 4000 units. Therefore, Height = 1/4000 = 0.00025.

Figure 5.1 Uniform Density Function

Uniform Density Function

0

0.00025

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Prob

abili

ty D

ensi

ty, f

(x)

5.2 Continuous Uncertain Quantities 59

Figure 5.2 Figure 2

Cumulative Probability for Uniform Density

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Cum

ulat

ive

Prob

abili

ty,

P(X<

=x)

Both probability mass functions (for discrete UQs) and probability density functions (for continuous UQs) have corresponding cumulative probability functions.

It is important to understand the relationship between a density function and its cumulative probability function.

Cumulative probability can be expressed in four ways:

P(X<=x) probability that UQ X is less than or equal to x

inclusive left -tail

P(X<x) probability that UQ X is strictly less than x

exclusive left -tail

P(X>=x) probability that UQ X is greater than or equal to x

inclusive right -tail

P(X>x) probability that UQ X is strictly greater than x

exclusive right -tail

For continuous UQs the cumulative probability is the same for inclusive and exclusive.

P(X<=x) is the most common type.


Figure 2 is the cumulative probability function corresponding to the uniform density function shown in Figure 1.

What is the probability that sales will be between 3,500 and 4,000 units?

P(3500<=X<=4000) = 0.125

P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.750 – 0.625 = 0.125

Mathematical observation: The uniform density function is a constant; the corresponding cumulative function (the integral of the constant function) is linear.

Case B: Ramp Density The number of units of a new product that will be sold is an uncertain quantity.



Are any values in the range between 1000 and 5000 more likely than others?

“Yes, values close to 5000 are much more likely than values close to 1000.”

Represent the uncertainty using a ramp density function.

The area of a triangle is Base * Height / 2, and the area under the ramp density function in Figure 3 must equal 1. So, Height = 2 / Base. Here, the Base is 5000 – 1000 = 4000 units. Therefore, Height = 2 / 4000 = 0.0005.


Figure 5.3 Figure 3

Ramp Density Function

0

0.0005

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Prob

abili

ty D

ensi

ty, f

(x)

Figure 5.4 Figure 4

Cumulative Probability for Ramp Density

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Cum

ulat

ive

Prob

abili

ty, P

(X<=

x)


An important observation is that flatter portions of a cumulative probability function correspond to ranges with low probability. Steeper portions of a cumulative probability function correspond to ranges with high probability.


P(3500<=X<=4000) = 0.171875

P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.562500 – 0.390625 = 0.171875

The ramp density may not be appropriate for describing uncertainty in many situations, but it is an important building block for the extremely useful triangular density function.

Mathematical observation: The ramp density function is linear; the corresponding cumulative function (the integral of the linear function) is quadratic.

Case C: Triangular Density The number of units of a new product that will be sold is an uncertain quantity.



Are any values in the range between 1000 and 5000 more likely than others?

“Yes, values close to 4000 are more likely.”

Represent the uncertainty using a triangular density function.

The area of a triangle is Base * Height / 2, and the area under the triangular density function in Figure 5 must equal 1. So, Height = 2 / Base. Here, the Base is 5000 – 1000 = 4000 units. Thus, Height = 2 / 4000 = 0.0005.


Figure 5.5 Figure 5

Triangular Density Function

0

0.0005

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Prob

abili

ty D

ensi

ty, f

(x)

Figure 5.6 Figure 6

Cumulative Probability for Triangular Density

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000 6000

Unit Sales, x

Cum

ulat

ive

Prob

abili

ty,

P(X<

=x)


Again, an important observation is that flatter portions of a cumulative probability function correspond to ranges with low probability (the range close to 1000 and the range close to 5000 in Figure 6). Steeper portions of a cumulative probability function correspond to ranges with high probability (the range close to 4000).


P(3500<=X<=4000) = 0.229167

P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.750000 – 0.520833 = 0.229167

The triangular density function is extremely useful for describing uncertainty in many situations. It requires only three inputs: minimum, mode (most likely value), and maximum.

Mathematical observation: The triangular density function has two linear segments, i.e., piecewise linear; the corresponding cumulative function (the integral of each linear function) is two quadratic segments, i.e., piecewise quadratic.

Simulation Without Add-Ins 6

6.1 SIMULATION USING EXCEL FUNCTIONS

Figure 6.1 Display

123456789

A B C D E F GSoftware Decision Analysis

RAND()Unit Price $29Units Sold 661 0.3502 Normal Mean = 700, StDev = 100Unit Variable Cost $10.92 0.9832 Uniform Min = $6, Max = $11Fixed Costs $12,000 0.7364 Discrete Value Probability Cumulative

$10,000 0.25 0.25Net Cash Flow -$47 $12,000 0.50 0.75

$15,000 0.25 1.00

Figure 6.2 Formulas

123456789

A B C D E F GSoftware Decision Analysis

RAND()Unit Price 29Units Sold =INT(NORMINV(C4,700,100)) =RAND() Normal Mean = 700, StDev = 100Unit Variable Cost =6+5*C5 =RAND() Uniform Min = $6, Max = $11Fixed Costs =IF(C6<0.25,10000,IF(C6<0.75,12000,15000)) =RAND() Discrete Value Probability Cumulative

10000 0.25 0.25Net Cash Flow =B4*(B3-B5)-B6 12000 0.5 0.75

15000 0.25 1

66


Monte Carlo Simulation Using RiskSim 7

7.1 USING RISKSIM FUNCTIONS RiskSim is a Monte Carlo Simulation add-in for Microsoft Excel (Excel 97 and later versions) for Windows and Macintosh.

RiskSim provides random number generator functions as inputs for your model, automates Monte Carlo simulation, and creates charts. Your spreadsheet model may include various uncontrollable uncertainties as input assumptions (e.g., demand for a new product, uncertain variable cost of production, competitor reaction), and you can use simulation to determine the uncertainty associated with the model's output (e.g., annual profit). RiskSim automates the simulation by trying hundreds of what-ifs consistent with your assessment of the uncertainties.

To use RiskSim, you

(1) create a spreadsheet model

(2) optionally use SensIt to identify critical inputs

(3) enter one of RiskSim's eleven random number generator functions in each input cell of your model

(4) choose Tools | Risk Simulation from Excel's menu

(5) specify the model output cell and the number of what-if trials

(6) interpret RiskSim's histogram and cumulative distribution charts.

RiskSim facilitates Monte Carlo simulation by providing: Eleven random number generator functions Ability to set the seed for random number generation Automatic repeated sampling for simulation Frequency distribution of simulation results Histogram and cumulative distribution charts

68 Chapter 7 Monte Carlo Simulation Using RiskSim

7.2 USING RISKSIM FUNCTIONS RiskSim adds nine random number generator functions to Excel. You can use these functions as inputs to your model by typing in a worksheet cell or by using the Function Wizard. From the Insert menu choose Function, or click the Function Wizard button. RiskSim's functions are listed in a User Defined category. The nine functions are:

RANDBINOMIAL(trials,probability_s)

RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12)

RANDCUMULATIVE(value_cumulative_table)

RANDDISCRETE(value_discrete_table)

RANDEXPONENTIAL(lambda)

RANDINTEGER(bottom,top)

RANDNORMAL(mean,standard_dev)

RANDPOISSON(mean)

RANDSAMPLE(population)

RANDTRIANGULAR(minimum,most_likely,maximum)

RANDUNIFORM(minimum,maximum)

RiskSim's RAND... functions include extensive error checking of arguments. After verifying that the functions are working properly, you may want to substitute RiskSim's FAST... functions which have minimal error checking and therefore run faster. From the Edit menu choose Replace; in the Replace dialog box, type =RAND in the "Find What" edit box, type =FAST in the "Replace with" edit box, and click the Replace All button.

7.3 UPDATING LINKS TO RISKSIM FUNCTIONS When you insert a RiskSim random number generator function in a worksheet cell, the function is linked to the disk location of the RiskSim xla file you are currently using. During the current Excel session, the formula bar shows only the name of the RiskSim function. But when you save and close the workbook, Excel saves the complete path to the disk location of RiskSim function. For example, after closing and reopening the workbook, the formula bar might show C:\MyAddIns\risk231p.xla\RandNormal(100, 10). This is standard behavior for Excel user defined functions like the ones contained in the RiskSim xla file.

When you open the workbook, Excel looks for the RiskSim xla file using the saved path. If Excel cannot find the RiskSim xla file at the saved path location (e.g., if you deleted

7.3 Updating Links To RiskSim Functions 69

the RiskSim xla file from the C:\MyAddIns folder or if you opened the workbook on another computer where the RiskSim xla file is not located at the same path), Excel displays a dialog box like the one shown below.

Figure 7.3 Excel 2003 Warning To Update Links

If you see this dialog box or a similar warning when you open an Excel file, choose the "Don't Update" option. The workbook will be opened, but any cell containing a reference to a RiskSim function will display the #NAME? or similar error code.

To update the links after the workbook is open, be sure that a RiskSim xla file is open. Then choose Edit | Links to see the dialog box shown below. (In this example the workbook originally used functions from the RiskSim xla file located at C:\middleton\risksim\risksim.xla.)

Figure 7.4 Excel 2003 Edit Links Dialog Box


To update the links, click the Change Source button. A file browser window will open, where you can navigate to the RiskSim xla file that is open. After you select the file using the file browser, click OK. Back in the Edit Links dialog box, click the Close button.

In Excel 2003 the Edit Links dialog box has a Startup Prompt button. To avoid possible problems when Excel tries to automatically update links while a file is being opened, we recommend the default "Let users choose to display the alert or not."

Figure 7.5 Excel 2003 Startup Prompt Dialog Box

7.4 MONTE CARLO SIMULATION After specifying random number generator functions as inputs to your model, from the Tools choose Risk Simulation | One Output.

Figure 7.6 RiskSim Dialog Box

7.5 Random Number Seed 71

Optionally, select the "Output Label Cell" edit box, and point or type a reference to a cell containing the name of the model output (for example, a cell whose contents is the text label "Net Profit").

Select the "Output Formula Cell" edit box, and point to a single cell on your worksheet or type a cell reference. The output cell of your model must contain a formula that depends, usually indirectly, on the model inputs determined by the random number generator functions.

Select the "Random Number Seed" edit box, and type a number between zero and one. (If you want to change the seed without performing a simulation, enter zero in the "Number of iterations" edit box.)

Select the "Number Of Trials" edit box, and type an integer value (for example, 100 or 500). This value, sometimes called the sample size or number of iterations, specifies the number of times the worksheet will be recalculated to determine output values of your model.

7.5 RANDOM NUMBER SEED The "Random Number Seed" edit box on the RiskSim dialog box allows you to set the seed for RiskSim's random number generator functions. The seed must be an integer in the range 1 through 2,147,483,647. RiskSim's random number generator functions depend on RiskSim's own uniform random number function that is completely independent of Excel's built-in RAND().

Random numbers generated by the computer are actually pseudo-random. The numbers appear to be random, and they pass various statistical tests for randomness. But they are actually calculated by an algorithm where each random number depends on the previous random number. Such an algorithm generates a repeatable sequence. The seed specifies where the algorithm starts in the sequence.

A Monte Carlo simulation model usually has uncontrollable inputs (uncertain quantities using random number generator functions), controllable inputs (decision variables that have fixed values for a particular set of simulation iterations), and an output variable (a performance measure or operating characteristic of the system).

For example, a simple queuing system model may have an uncertain arrival pattern, a controllable number of servers, and total cost (waiting time plus server cost) as output. To evaluate a different number of servers, you would specify the same seed before generating the uncertain arrivals. Then the variation in total cost should depend on the different number of servers, not on the particular sequence of random numbers that generates the arrivals.


7.6 ONE-OUTPUT EXAMPLE In this example the decision maker has described his subjective uncertainty using normal, triangular, and discrete probability distributions.

Figure 7.7 One-Output Example Model Display

123456789

A B C D E F G HSoftware Decision Analysis

Unit Price $29 Price is controllable and constant.Units Sold 739 Normal Mean = 700, StDev = 100Unit Variable Cost $8.05 Triangular Min = $6, Mode = $8, Max = $11Fixed Costs $12,000 Discrete Value Probability

$10,000 0.25Net Cash Flow $3,485 $12,000 0.50

$15,000 0.25

Figure 7.8 One-Output Example Model Formulas

12345678

A BSoftware Decision Analysis

Unit Price $29Units Sold =INT(RANDNORMAL(700,100))Unit Variable Cost =RANDTRIANGULAR(6,8,11)Fixed Costs =RANDDISCRETE(E7:F9)

Net Cash Flow =B4*(B3-B5)-B6

Figure 7.9 RiskSim Dialog Box for One-Output Example

7.7 RiskSim Output for One-Output Example 73

7.7 RISKSIM OUTPUT FOR ONE-OUTPUT EXAMPLE When you click the Simulate button, RiskSim creates a new worksheet in your Excel workbook named "RiskSim Summary 1." A summary of your inputs and the output is shown in cells L1:R9 with the accompanying histogram and cumulative distribution charts.


Figure 7.10 RiskSim Summary Output for One-Output Example

123456789

101112131415161718192021222324252627282930313233343536373839404142434445464748

L M N O P Q RRiskSim 2.31 Pro Mean $2,335Date (current date) St. Dev. $2,800Time (current time) Mean St. Error $89Workbook risksamp.xls Minimum -$6,288Worksheet Simulation First Quartile $523Output Cell $B$8 Median $2,470Output Label Net Cash Flow Third Quartile $4,157Seed 1 Maximum $12,838Trials 1000 Skewness -0.1133

RiskSim 2.31 Pro - Histogram

0

20

40

60

80

100

120

140

160

180

-$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000

Net Cash Flow

Freq

uenc

y

RiskSim 2.31 Pro - Cumulative Chart

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000

Net Cash Flow

Cum

ulat

ive

Prob

abili

ty

The histogram is based on the frequency distribution in columns I:J. The cumulative distribution is based on the sorted output values in column C and the cumulative probabilities in column D.

7.8 Customizing RiskSim Charts 75

Figure 7.11 RiskSim Numerical Output for One-Output Example

123456789

10111213141516171819202122232425262728

A B C D E F G H I JTrial Net Cash Flow Sorted Cumulative Percent Percentile Upper Limit Frequency

1 $1,594 -$6,288 0.0005 0% -$6,288 -$8,000 02 $1,593 -$6,239 0.0015 5% -$2,324 -$7,000 03 $1,533 -$5,635 0.0025 10% -$1,465 -$6,000 24 $7,480 -$5,213 0.0035 15% -$699 -$5,000 25 $5,968 -$4,831 0.0045 20% $62 -$4,000 116 $1,862 -$4,601 0.0055 25% $523 -$3,000 187 -$1,677 -$4,588 0.0065 30% $1,009 -$2,000 348 $2,727 -$4,487 0.0075 35% $1,336 -$1,000 549 $6,167 -$4,420 0.0085 40% $1,625 $0 77

10 $4,740 -$4,336 0.0095 45% $2,035 $1,000 10111 $1,783 -$4,298 0.0105 50% $2,470 $2,000 14612 $904 -$4,285 0.0115 55% $2,897 $3,000 12613 $1,518 -$4,243 0.0125 60% $3,216 $4,000 15514 $1,596 -$4,116 0.0135 65% $3,544 $5,000 11015 $1,536 -$4,113 0.0145 70% $3,805 $6,000 7316 -$701 -$3,954 0.0155 75% $4,157 $7,000 5217 -$414 -$3,951 0.0165 80% $4,615 $8,000 2118 $783 -$3,906 0.0175 85% $5,168 $9,000 819 $5,087 -$3,849 0.0185 90% $5,777 $10,000 920 $2,804 -$3,793 0.0195 95% $6,680 $11,000 021 $1,869 -$3,757 0.0205 100% $12,838 $12,000 022 $1,402 -$3,719 0.0215 $13,000 123 $2,120 -$3,608 0.0225 $14,000 024 $7,783 -$3,591 0.0235 025 $704 -$3,548 0.024526 $5,471 -$3,485 0.025527 $4,743 -$3,403 0.0265

The cumulative probabilities start at 1/(2*N), where N is the number of trials, and increase by 1/N. The rationale is that the lowest ranked output value of the sampled values is an estimate of the population's values in the range from 0 to 1/N, and the lowest ranked value is associated with the median of that range.

Column B contains the original sampled output values.

Columns F:G show percentiles based on Excel's PERCENTILE worksheet function. Refer to Excel's online help for the interpolation method used by the PERCENTILE function.

The summary measures in columns Q:R are also based on Excel worksheet functions: AVERAGE, STDEV, QUARTILE, and SKEW.

7.8 CUSTOMIZING RISKSIM CHARTS If the labels on the horizontal axis are numbers with many digits, some of the labels may wrap around so that some of the digits display below the others. One way to remedy this anomaly is to widen the chart (click just inside the outer border of the chart so that eight chart handles are shown and then drag the middle chart handle on the left or right to


widen the chart). Another way is to select the horizontal axis (click between the labels on the horizontal axis so that "Value (X) axis" appears in the name box in the upper left of Excel) and change to a smaller font size using the Font Size drop-down edit box on the the Formatting tool bar.

The histogram chart is a combination chart using a column chart type for the vertical bars and an XY (Scatter) chart type for the horizontal axis. The two chart types align properly as long as the horizontal axis retains the same minimum and maximum values.

For example, if you want more spacing between the dollar labels on the horizontal axis, select the horizontal axis (so that "Value (X) axis" appears in the name box in the upper left of Excel), choose Format | Selected Axis | Scale, and change the "Major unit" from 2000 to 4000. Do not change the Minimum = –8000 or the Maximum = 14000. The histogram appears as shown below.

Figure 7.12 Original Histogram With Modified Horizontal Axis Major Unit

RiskSim 2.31 Pro - Histogram

0

20

40

60

80

100

120

140

160

-$8,000 -$4,000 $0 $4,000 $8,000 $12,000

Net Cash Flow

Freq

uenc

y

The cumulative chart is a standard XY (Scatter) chart type, so you can change the major unit as described above, but you can also change the minimum and maximum without affecting the integrity of the chart.

Another way to obtain more spacing on the horizontal axis of the histogram or cumulative chart is to use a custom format. For example, if you want to show values in thousands instead of the original units, select the horizontal axis (click between the labels on the horizontal axis so that "Value (X) axis" appears in the name box in the upper left of Excel), choose Format | Selected Axis | Number | Custom, and enter a comma at the end of the current format shown in the "Type:" edit box. After changing the original

7.9 Random Number Generator Functions 77

format "$#,##0" to "$#,##0," and modifying the horizontal axis title, the cumulative chart appears as shown below.

Figure 7.13 Original Cumulative Chart With Horizontal Axis Custom Format

RiskSim 2.31 Pro - Cumulative Chart

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-$8 -$6 -$4 -$2 $0 $2 $4 $6 $8 $10 $12 $14

Net Cash Flow, in thousands of dollars

Cum

ulat

ive

Pro

babi

lity

7.9 RANDOM NUMBER GENERATOR FUNCTIONS

RandBinomial Returns a random value from a binomial distribution. The binomial distribution can model a process with a fixed number of trials where the outcome of each trial is a success or failure, the trials are independent, and the probability of success is constant. RANDBINOMIAL counts the total number of successes for the specified number of trials. If n is the number of trials, the possible values for RANDBINOMIAL are the non-negative integers 0,1,...,n.

RANDBINOMIAL Syntax: RANDBINOMIAL(trials,probability_s)

Trials (often denoted n) is the number of independent trials.

Probability_s (often denoted p) is the probability of success on each trial.

RANDBINOMIAL Remarks

Returns #N/A if there are too few or too many arguments.

Returns #NAME! if an argument is text and the name is undefined.


Returns #NUM! if trials is non-integer or less than one, or probability_s is less than zero or more than one.

Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or contains text.

RANDBINOMIAL Example

A salesperson makes ten unsolicited calls per day, where the probability of making a sale on each call is 30 percent. The uncertain total number of sales in one day is =RANDBINOMIAL(10,0.3)

RANDBINOMIAL Related Function

FASTBINOMIAL: Same as RANDBINOMIAL without any error checking of the arguments.

CRITBINOM(trials,probability_s,RAND()): Excel's inverse of the cumulative binomial, or CRITBINOM(trials,probability_s,RANDUNIFORM(0,1)) to use the RiskSim Seed feature.

RandBiVarNormal Returns two random values from a bivariate normal distribution with a specified correlation.

To use this random number generator function, select two adjacent cells on the worksheet. Type =RANDBIVARNORMAL followed by numerical values for the five arguments or references to cells containing the values, separated by commas, enclosed in starting and ending parentheses. After typing the ending parentheses, do not press Enter. Instead, hold down the Control and Shift keys while you press Enter, thus "array entering" the function.

Syntax:

RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12)

Returns #REF! if the array function is not entered into two adjacent cells.

Returns #NUM! if a standard deviation is negative or the correlation is outside the range between -1 and +1.

Returns #VALUE! if an argument is not numeric.

Example: Select two adjacent cells, type

=RANDBIVARNORMAL(100,10,50,5,0.5)

Hold down Control and Shift while you press Enter.


RandCumulative Returns a random value from a piecewise-linear cumulative distribution. This function can model a continuous-valued uncertain quantity, X, by specifying points on its cumulative distribution. Each point is specified by a possible value, x, and a corresponding left-tail cumulative probability, P(X<=x). Random values are based on linear interpolation between the specified points.

RANDCUMULATIVE Syntax: RANDCUMULATIVE(value_cumulative_table)

Value_cumulative_table must be a reference, or the defined name of a reference, for a two-column range, with values in the left column and corresponding cumulative probabilities in the right column.

RANDCUMULATIVE Remarks


Returns #NAME! if the argument is text and the name is undefined.

Returns #NUM! if the first (top) cumulative probability is not zero, if the last (bottom) cumulative probability is not one, or if the values or cumulative probabilities are not in ascending order.

Returns #REF! if the number of columns in the table reference is not two.

Returns #VALUE! if the argument is not a reference, if the argument is a defined name but not for a reference, or if any cell of the table contains text or is blank.

RANDCUMULATIVE Example A corporate planner thinks that minimum possible market demand is 1000 units, median is 5000, and maximum possible is 9000. Also, there is a ten percent chance that demand will be less than 4000 and a ten percent chance it will exceed 7000. The values, x, and cumulative probabilities, P(X<=x), are entered into spreadsheet cells A1:B5.

Figure 7.14 RandDiscrete Example Spreadsheet Data

The function is entered into another cell: =RANDCUMULATIVE(A1:B5)

RANDCUMULATIVE Related Function


FASTCUMULATIVE: Same as RANDCUMULATIVE without any error checking of the arguments.

Figure 7.15 RandCumulative Example Probability Density Function

Market Demand, x, in units

Pro

babi

lity

Den

sity

, f(x

)

0

0.0001

0.0002

0.0003

0.0004

0.0005

0 2000 4000 6000 8000 10000

Figure 7.16 RandCumulative Example Cumulative Probability Function


Cum

ulat

ive

Pro

babi

lity,

P(X

<=x

0

0.2

0.4

0.6

0.8

1

0 2000 4000 6000 8000 10000

RandDiscrete Returns a random value from a discrete probability distribution. This function can model a discrete-valued uncertain quantity, X, by specifying its probability mass function. The


function is specified by each possible discrete value, x, and its corresponding probability, P(X=x).

RANDDISCRETE Syntax: RANDDISCRETE(value_discrete_table)

Value_discrete_table must be a reference, or the defined name of a reference, for a two-column range, with values in the left column and corresponding probability mass in the right column.

RANDDISCRETE Remarks



Returns #NUM! if a probability is negative or if the probabilities do not sum to one.

Returns #REF! if the number of columns in the table reference is not two.

Returns #VALUE! if the argument is not a reference, if the argument is a defined name but not for a reference, or if any cell of the table contains text or is blank.

RANDDISCRETE Example

A corporate planner thinks that uncertain market demand, X, can be approximated by three possible values and their associated probabilities: P(X=3000) = 0.3, P(X=4000) = 0.6, and P(X=5000) = 0.1. The values and probabilities are entered into spreadsheet cells A1:B3.

Figure 7.17 RandDiscrete Example Spreadsheet Data

The function is entered into another cell: =RANDDISCRETE(A1:B3)

RANDDISCRETE Related Function

FASTDISCRETE: Same as RANDDISCRETE without any error checking of the arguments. RandDiscrete Example Probability Mass Function


Figure 7.18 RandDiscrete Example Probability Mass Function


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1000 2000 3000 4000 5000 6000 7000

Figure 7.19 RandDiscrete Example Cumulative Probability Function


Cum

ulat

ive

Pro

babi

lity,

P(X

<=x

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000 5000 6000 7000

RandExponential Returns a random value from an exponential distribution. This function can model the uncertain time interval between successive arrivals at a queuing system or the uncertain time required to serve a customer.

RANDEXPONENTIAL Syntax: RANDEXPONENTIAL(lambda)

Lambda is the mean number of occurrences per unit of time.


RANDEXPONENTIAL Remarks



Returns #NUM! if lambda is negative or zero.

Returns #VALUE! if the argument is a defined name of a cell and the cell is blank or contains text.

RANDEXPONENTIAL Examples

Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain time between successive arrivals, measured in minutes, is =RANDEXPONENTIAL(3). The average value returned by repeated recalculation of RANDEXPONENTIAL(3) is 0.333.

A bank teller requires an average of two minutes to serve a customer. The uncertain customer service time, measured in minutes, is =RANDEXPONENTIAL(0.5). The average value returned by repeated recalculation of RANDEXPONENTIAL(0.5) is 2.

RANDEXPONENTIAL Related Functions

FASTEXPONENTIAL: Same as RANDEXPONENTIAL without any error checking of the arguments.

−LN(RAND())/lambda: Excel's inverse of the exponential, or −LN(RANDUNIFORM(0,1))/lambda to use the RiskSim Seed feature.

RANDPOISSON: Counts number of occurrences for a Poisson process.

RandInteger Returns a uniformly distributed random integer between two integers you specify.

RANDINTEGER Syntax: RANDINTEGER(bottom,top)

Bottom is the smallest integer RANDINTEGER will return.

Top is the largest integer RANDINTEGER will return.

RANDINTEGER Remarks



Returns #NUM! if top is less than or equal to bottom.

Returns #VALUE! if bottom or top is not an integer or if an argument is a defined name of a cell and the cell is blank or contains text.


RANDINTEGER Example

The number of orders a particular customer will place next year is between 7 and 11, with no number more likely than the others. The uncertain number of orders is =RANDINTEGER(7,11).

RANDINTEGER Related Function

FASTINTEGER: Same as RANDINTEGER without any error checking of the arguments.

RANDBETWEEN(bottom,top): Excel’s function for uniformly distributed integers, without RiskSim’s capability of setting the seed.

RandNormal Returns a random value from a normal distribution. This function can model a variety of phenomena where the values follow the familiar bell-shaped curve, and it has wide application in statistical quality control and statistical sampling.

RANDNORMAL Syntax: RANDNORMAL(mean,standard_dev)

Mean is the arithmetic mean of the normal distribution.

Standard_dev is the standard deviation of the normal distribution.

RANDNORMAL Remarks



Returns #NUM! if standard_dev is negative.


RANDNORMAL Example

The total market for a product is approximately normally distributed with mean 60,000 units and standard deviation 5,000 units. The uncertain total market is =RANDNORMAL(60000,5000).

RANDNORMAL Related Function

FASTNORMAL: Same as RANDNORMAL without any error checking of the arguments.

NORMINV(RAND(),mean,standard_dev): Excel's inverse of the normal, or NORMINV(RANDUNIFORM(0,1),mean,standard_dev) to use the RiskSim Seed feature.


RandSample Returns a random sample without replacement from a population.

To use this random number generator function, select a number of cells equal to the sample size, either in a single column or in a single row. Type =RANDSAMPLE followed by a reference to the cells containing the population values, enclosed in parentheses. After typing the ending parentheses, do not press Enter. Instead, hold down the Control and Shift keys while you press Enter, thus "array entering" the function.

Syntax: RANDSAMPLE(population)

The population argument is a reference to a range of values in a single column.

Returns #N/A if the population range is not part of a single column.

Returns #REF! if the function is not entered into two adjacent cells.

Example: Type population values into cells A1:A5. For a sample of size 3, select cells B1:B3, and type =RANDSAMPLE(A1:A5) but don't press Enter. Hold down Control and Shift while you press Enter.

RandPoisson Returns a random value from a Poisson distribution. This function can model the uncertain number of occurrences during a specified time interval, for example, the number of arrivals at a service facility during an hour. The possible values of RANDPOISSON are the non-negative integers, 0, 1, 2, ... .

RANDPOISSON Syntax: RANDPOISSON(mean)

Mean is the mean number of occurrences per unit of time.

RANDPOISSON Remarks



Returns #NUM! if mean is negative or zero.

Returns #VALUE! if mean is a defined name of a cell and the cell is blank or contains text.

RANDPOISSON Examples

Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain number of arrivals in a minute is =RANDPOISSON(3). The average value returned by repeated recalculation of RANDPOISSON(3) is 3.


A bank teller requires an average of two minutes to serve a customer. The uncertain number of customers served in a minute is =RANDPOISSON(0.5). The average value returned by repeated recalculation of RANDPOISSON(0.5) is 0.5.

RANDPOISSON Related Functions

FASTPOISSON: Same as RANDPOISSON without any error checking of the arguments.

RANDEXPONENTIAL: Describes time between occurrences for a Poisson process.

RandTriangular Returns a random value from a triangular probability density function. This function can model an uncertain quantity where the most likely value (mode) has the largest probability of occurrence, the minimum and maximum possible values have essentially zero probability of occurrence, and the probability density function is linear between the minimum and the mode and between the mode and the maximum. This function can also model a ramp density function where the minimum equals the mode or the mode equals the maximum.

RANDTRIANGULAR Syntax: RANDTRIANGULAR(minimum,most_likely,maximum)

Minimum is the smallest value RANDTRIANGULAR will return.

Most_likely is the most likely value RANDTRIANGULAR will return.

Maximum is the largest value RANDTRIANGULAR will return.

RANDTRIANGULAR Remarks



Returns #NUM! if minimum is greater than or equal to maximum, if most_likely is less than minimum, or if most_likely is greater than maximum.


RANDTRIANGULAR Example

The minimum time required to complete a particular task that is part of a large project is 4 hours, the most likely time required is 6 hours, and the maximum time required is 10 hours.

The function returning the uncertain time required for the task is entered into a cell: =RANDTRIANGULAR(4,6,10).


RANDTRIANGULAR Related Function

FASTTRIANGULAR: Same as RANDTRIANGULAR without any error checking of arguments.

Figure 7.20 RandTriangular Example Probability Density Function

Task Time, x, in hours

Pro

babi

lity

Den

sity

, f(x

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

Figure 7.21 RandTriangular Example Cumulative Probability Function

Task Time, x, in hours

Cum

ulat

ive

Pro

babi

lity,

P(X

<=x

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

RandUniform


Returns a uniformly distributed random value between two values you specify. As a special case, RANDUNIFORM(0,1) is the same as Excel's built-in RAND() function.

RANDUNIFORM Syntax: RANDUNIFORM(minimum,maximum)

Minimum is the smallest value RANDUNIFORM will return.

Maximum is the largest value RANDUNIFORM will return.

RANDUNIFORM Remarks



Returns #NUM! if minimum is greater than or equal to maximum.


RANDUNIFORM Example

A corporate planner thinks that the company's product will garner between 10% and 15% of the total market, with all possible percentages equally likely in the specified range. The uncertain market proportion is =RANDUNIFORM(0.10,0.15).

RANDUNIFORM Related Function

FASTUNIFORM: Same as RANDUNIFORM without any error checking of the arguments.

7.10 RISKSIM TECHNICAL DETAILS RiskSim's random number generator functions are based on a uniformly distributed random number function called RandSeed which is not directly accessible by the user. RandSeed returns a random value x in the range 0<x<=1. Internally, decimal values for RandSeed are calculated by dividing a uniformly distributed random integer by 2,147,483,647, which is RandSeed's period. Random integers in the range 1 through 2,147,483,647 are generated using the well-documented Park-Miller algorithm, where each random integer depends on the previous random integer.

When RiskSim starts, the initial integer seed depends on the system clock. Unlike Excel's RAND() function, you can use RiskSim at any time to specify an integer seed in the range 1 through 2,147,483,647, which is used as the previous random integer for the sequence of random numbers generated by the RiskSim functions.

7.10 RiskSim Technical Details 89

In the Risk Simulation dialog box, the "Random number seed" edit box changes the seed only for the RiskSim functions; it does not have any effect on Excel's built-in RAND() function.

Each of RiskSim's random number generator functions use RandSeed as a building block.

RANDBINOMIAL(trials,probability_s) uses RandSeed as the cumulative probability in Excel's built-in CRITBINOM function.

RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12) uses two values of RandNormal to obtain correlated normal values.

RANDCUMULATIVE(value_cumulative_table) uses the value of RandSeed, R, searches to find the adjacent cumulative probabilities that bracket R, and interpolates on the linear segment of the cumulative distribution to find the corresponding value.

RANDDISCRETE(value_discrete_table) compares RandSeed with summed probabilities of the input table until the sum exceeds the RandSeed value, and then returns the previous value from the input table.

RANDEXPONENTIAL(lambda) uses the value of RandSeed, R, as follows. If the exponential density function is f(t) = lambda*EXP(-lambda*t), the cumulative is P(T<=t) = 1 - EXP(-lambda*t). Associating R with P(T<=t), the inverse cumulative is t = -LN(1-R)/lambda. Since R and 1-R are both uniformly distributed between 0 and 1, RiskSim uses -LN(R)/lambda for the returned value.

RANDINTEGER(bottom,top) returns bottom + INT(RandSeed*(top-bottom+1)).

RANDNORMAL(mean,standard_dev) uses two RandSeed values in the well-documented Box-Muller method.

RANDPOISSON(mean) compares RandSeed with cumulative probabilities of Excel's built-in POISSON function until the probability exceeds the RandSeed value, and then returns the previous value.

RANDSAMPLE(population) uses RandSeed for each of the cells that were selected when the function was array-entered, avoiding population values that have already been selected, thus providing sampling without replacement.

RANDTRIANGULAR(minimum,most_likely,maximum) uses RandSeed once. The triangular density function has two linear segments, so the cumulative distribution has two quadratic segments. The returned value is determined by interpolation on the appropriate quadratic segment.

RANDUNIFORM(minimum,maximum) returns minimum + RandSeed*(maximum-minimum). RANDUNIFORM(0,1) is equivalent to Excel's built-in RAND() function.


RiskSim includes a FAST... version of each of the nine functions, e.g., FASTBINOMIAL, FASTCUMULATIVE, etc. The FAST... functions are identical to the RAND... functions except there is no error checking of arguments.

7.11 MODELING UNCERTAIN RELATIONSHIPS

Base Model, Four Inputs Price is fixed. The three uncontrollable inputs are independent.

Figure 7.22 Four Inputs Influence Chart

Fixed Costs Units SoldUnit

Variable Cost

Net Cash Flow

Price

Figure 7.23 Four Inputs Display

12345678

A BControllable InputPrice $29Uncontrollable InputsFixed Costs $12,000Units Sold 700Unit Variable Cost $8Output VariableNet Cash Flow $2,700

7.11 Modeling Uncertain Relationships 91

Figure 7.24 Four Inputs Formulas

12345678

A BControllable InputPrice 29Uncontrollable InputsFixed Costs 12000Units Sold 700Unit Variable Cost 8Output VariableNet Cash Flow =(B2-B6)*B5-B4

Three Inputs Price is variable. Units sold depends on price. The two cost inputs are independent.

Figure 7.25 Three Inputs Influence Chart

Fixed Costs

Units Sold

UnitVariable Cost

Net Cash Flow

Price

Figure 7.26 Three Inputs Display

123456789

A B C D EControllable Input Price Units SoldPrice $29 $29 700Uncontrollable Inputs $39 550Fixed Costs $12,000 $49 400Unit Variable Cost $8 $59 250Intermediate VariableUnits Sold 700 Slope -15Output Variable Intercept 1135Net Cash Flow $2,700


Figure 7.27 Three Inputs Formulas

123456789

A B C D EControllable Input Price Units SoldPrice 29 29 700Uncontrollable Inputs 39 550Fixed Costs 12000 49 400Unit Variable Cost 8 59 250Intermediate VariableUnits Sold =E8+E7*B2 Slope =SLOPE(E2:E5,D2:D5)Output Variable Intercept =INTERCEPT(E2:E5,D2:D5)Net Cash Flow =(B2-B5)*B7-B4

Two Inputs Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.

Figure 7.28 Two Inputs Influence Chart

Fixed Costs

Units SoldUnit

Variable Cost

Net Cash Flow

Price


Figure 7.29 Two Inputs Display

123456789101112131415161718

A B C D EControllable Input Price Units SoldPrice $29 $29 700Uncontrollable Inputs $39 550Fixed Costs $12,000 $49 400Intermediate Variable $59 250Unit Variable Cost $8.00Units Sold 700 Slope -15Output Variable Intercept 1135Net Cash Flow $2,700

Fixed Costs Unit Variable Cost$10,000 $11$12,000 $8$15,000 $6

a 0.000000166667b -0.005166666667c 46

Figure 7.30 Two Inputs Formulas

123456789101112131415161718

A B C D EControllable Input Price Units SoldPrice 29 29 700Uncontrollable Inputs 39 550Fixed Costs 12000 49 400Intermediate Variable 59 250Unit Variable Cost =E16*B4^2+E17*B4+E18Units Sold =E8+E7*B2 Slope =SLOPE(E2:E5,D2:D5)Output Variable Intercept =INTERCEPT(E2:E5,D2:D5)Net Cash Flow =(B2-B6)*B7-B4

Fixed Costs Unit Variable Cost10000 1112000 815000 6

a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

Four Inputs with Three Uncertainties Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.

Fixed costs, units sold, and unit variable cost are uncertain.


Figure 7.31 Three Uncertainties Influence Chart

Fixed Costs

Units Sold Median

Unit Variable Cost Median

Net Cash Flow

Price

Units Sold

Units Sold Uncertainty

UnitVariable Cost

Unit VariableCost Uncertainty

Figure 7.32 Three Uncertainties Display

123456789

101112131415161718

A B C D EControllable Input Price Units SoldPrice $29 $29 700Uncontrollable Inputs $39 550Fixed Costs $12,000 $49 400Units Sold Uncertainty 10 $59 250Unit Variable Cost Uncertainty $0.10Intermediate Variable Slope -15Units Sold Median 700 Intercept 1135Units Sold 710Unit Variable Cost Median $8.00Unit Variable Cost $8.10 Fixed Costs Unit Variable CostOutput Variable $10,000 $11Net Cash Flow $2,839 $12,000 $8

$15,000 $6

a 0.000000166667b -0.005166666667c 46


Figure 7.33 Three Uncertainties Formulas

123456789101112131415161718

A B C D EControllable Input Price Units SoldPrice 29 29 700Uncontrollable Inputs 39 550Fixed Costs 12000 49 400Units Sold Uncertainty 10 59 250Unit Variable Cost Uncertainty 0.1Intermediate Variable Slope =SLOPE(E2:E5,D2:D5)Units Sold Median =E8+E7*B2 Intercept =INTERCEPT(E2:E5,D2:D5)Units Sold =B8+B5Unit Variable Cost Median =E16*B4^2+E17*B4+E18Unit Variable Cost =B10+B6 Fixed Costs Unit Variable CostOutput Variable 10000 11Net Cash Flow =(B2-B11)*B9-B4 12000 8

15000 6

a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

Intermediate Details Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.

Fixed costs, units sold, and unit variable cost are uncertain.

Include revenue, total variable cost, and total costs as intermediate variables.

Figure 7.34 Intermediate Details Influence Chart

Fixed Costs

Units Sold Median

Unit Variable Cost Median

Net Cash Flow

Price

Units Sold

Units Sold Uncertainty

UnitVariable Cost

Unit VariableCost Uncertainty

Revenue Total Costs

TotalVariable Cost


Figure 7.35 Intermediate Details Display

123456789

101112131415161718

A B C D EControllable Input Price Units SoldPrice $29 $29 700Uncontrollable Inputs $39 550Fixed Costs $12,000 $49 400Units Sold Uncertainty 10 $59 250Unit Variable Cost Uncertainty $0.10Intermediate Variable Slope -15Units Sold Median 700 Intercept 1135Units Sold 710Revenue $20,590Unit Variable Cost Median $8.00 Fixed Costs Unit Variable CostUnit Variable Cost $8.10 $10,000 $11Total Variable Cost $5,751 $12,000 $8Total Costs $17,751 $15,000 $6Output VariableNet Cash Flow $2,839 a 0.000000166667

b -0.005166666667c 46

Figure 7.36 Intermediate Details Formulas

123456789

101112131415161718

A B C D EControllable Input Price Units SoldPrice 29 29 700Uncontrollable Inputs 39 550Fixed Costs 12000 49 400Units Sold Uncertainty 10 59 250Unit Variable Cost Uncertainty 0.1Intermediate Variable Slope =SLOPE(E2:E5,D2:D5)Units Sold Median =E8+E7*B2 Intercept =INTERCEPT(E2:E5,D2:D5)Units Sold =B8+B5Revenue =B9*B2Unit Variable Cost Median =E16*B4^2+E17*B4+E18 Fixed Costs Unit Variable CostUnit Variable Cost =B11+B6 10000 11Total Variable Cost =B12*B9 12000 8Total Costs =B4+B13 15000 6Output VariableNet Cash Flow =B10-B14 a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

Multiperiod What-If Modeling 8

8.1 APARTMENT BUILDING PURCHASE PROBLEM You are considering the purchase of an apartment building in northern California. The building contains 25 units and is listed for $2,000,000. You plan to keep the building for three years and then sell it.

You know that the annual taxes on the property are currently $20,000 and will increase to $25,000 after closing. You estimate that these taxes will grow at a rate of 2 percent per year. You estimate that it will cost about $1,000 per unit per year to maintain the apartments, and these maintenance costs are expected to grow at a 15 percent per year rate.

You have not decided on the rent to charge. Currently, the rent is $875 per unit per month, but there is substantial turnover, and the occupancy is only 75 percent. That is, on average, 75 percent of the units are rented at any time. You estimate that if you lowered the rent to $675 per unit per month, you would have 100 percent occupancy. You think that intermediate rental charges would produce intermediate occupancy percentages; for example, a $775 rental charge would have 87.5 percent occupancy.

You will decide on the monthly rental charge for the first year, and you think the rental market is such that you will be able to increase it 7 percent per year for the second and third years. Furthermore, whatever occupancy percentage occurs in the first year will hold for the second and third years. For example, if you decide on the $675 monthly rental charge for the first year, the occupancy will be 100 percent all three years.

At the end of three years, you will sell the apartment building. The realtors in your area usually estimate the selling price of a rental property as a multiple of its annual rental income (before expenses). You estimate that this multiple will be 9. That is, if the rental income in the third year is $200,000, then the sale price will be $1,800,000.

Your objective is to achieve the highest total accumulated cash at the end of the three year period. If rental income exceeds expenses in the first or second years, you will invest the excess in one-year certificates of deposits (CDs) yielding 5 percent. Thus, total

98 Chapter 8 Multiperiod What-If Modeling

accumulated cash will include net cash flow (income minus expense) in each of the three years, interest from CDs received at the end of the second and third years, and cash from the sale of the property at the end of the third year.

In your initial analysis you have decided to ignore depreciation and other issues related to income taxes.

Instead of purchasing the apartment building, you could invest the entire $2,000,000 in certificates of deposits yielding 5 percent per year.

8.1 Apartment Building Purchase Problem 99

Figure 8.1 Base Case Model Display

1234567891011121314151617181920212223242526272829303132333435363738

A B C D E FApartment Building Purchase Monthly Rent Occupancy

$675 100Controllable Factors $775 87.5Unit monthly rent $775 $875 75Uncertain FactorsAnnual unit maintenance $1,000 slope -0.125Annual maint. increase 15% intercept 184.375Annual tax increase 2.0%Gross rent multiplier 9.00Other AssumptionsFirst year property taxes $25,000Annual rent increase 7%CD annual yield 5%Intermediate variableOccupancy percentage 87.50%Performance measureFinal cash value $2,610,848

One Two ThreeUnit monthly rent $775 $829 $887Annual rental income $203,438 $217,678 $232,916

Annual maintenance cost $25,000 $28,750 $33,063Annual property tax $25,000 $25,500 $26,010Total annual expenses $50,000 $54,250 $59,073

Operating cash flow $153,438 $163,428 $173,843

CD investment $153,438 $324,538Year-end CD interest $7,672 $16,227

Sale receipt $2,096,240

Final Cash Value $2,610,848

CD investment $2,000,000 $2,100,000 $2,205,000Year-end CD interest $100,000 $105,000 $110,250Final Cash Value $2,315,250


Figure 8.2 Base Case Model Formulas 123456789

1011121314151617181920212223242526272829303132333435363738

A B C D E FApartment Building Purchase Monthly Rent Occupancy

675 100Controllable Factors 775 87.5Unit monthly rent 775 875 75Uncertain FactorsAnnual unit maintenance 1000 slope =SLOPE(F2:F4,E2:E4)Annual maint. increase 0.15 intercept =INTERCEPT(F2:F4,E2:E4)Annual tax increase 0.02Gross rent multiplier 9Other AssumptionsFirst year property taxes 25000Annual rent increase 0.07CD annual yield 0.05Intermediate variableOccupancy percentage =(F7+F6*B4)/100Performance measureFinal cash value =D34

One Two ThreeUnit monthly rent =B4 =B20*(1+$B$12) =C20*(1+$B$12)Annual rental income =B20*25*$B$15*12 =C20*25*$B$15*12 =D20*25*$B$15*12

Annual maintenance cost =B6*25 =(1+$B$7)*B23 =(1+$B$7)*C23Annual property tax =B11 =(1+$B$8)*B24 =(1+$B$8)*C24Total annual expenses =SUM(B23:B24) =SUM(C23:C24) =SUM(D23:D24)

Operating cash flow =B21-B25 =C21-C25 =D21-D25

CD investment =B27 =C27+C29+C30Year-end CD interest =B13*C29 =B13*D29

Sale receipt =D21*B9

Final Cash Value =SUM(D27:D32)

CD investment 2000000 =B36+B37 =C36+C37Year-end CD interest =B13*B36 =B13*C36 =B13*D36Final Cash Value =D36+D37

Figure 8.3 Ranges based on decision maker’s or expert’s judgment

Uncertain Factors Low Base HighAnnual unit maintenance $700 $1,000 $2,000Annual maint. increase 10% 15% 30%Annual tax increase 2.0% 2.0% 3.0%Gross rent multiplier 7.00 9.00 10.00

Apartment Building Analysis Notes Influence Diagram (for single period)

Modeling effect of rent on occupancy rate

Linear fit: algebra (slope and intercept)

XY Scatter chart; Insert Trendline

Quadratic fit: if $775 yields 82.5% occupancy instead of 87.5%

8.2 Product Launch Financial Model 101

Base Case model

Use Solver to find optimum rent to maximize final cash value

Use Sensit.xla Plot of final cash value depending on rent; relatively insensitive

Use Sensit.xla Spider

Sensitivity Cases

Ranges based on decision maker’s or expert’s judgment

Sensit.xla Tornado chart: identify critical variables

Monte Carlo simulation

RiskSim.xla

Triangular distributions for critical variables

What is probability that final cash will be less than $2,315,250?

8.2 PRODUCT LAUNCH FINANCIAL MODEL

Figure 8.4 Original Model Display

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

A B C D E F G H I J K L FINANCE The @RISK Demonstration Model :Product Launch Risk Analysis 2001-2010

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========

Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30 Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94 Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935 Volume With Entry

3300 4158 3564 3399 3300 3300 3432 3696

Competitor Entry: 1 Design Costs $50,000.00 Capital Investment $100,000.00Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15

Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94 Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696 Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70 Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896 Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604 Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205 Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400 Depreciation $20,000 $20,000 $20,000 $20,000 $20,000Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104 Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104 Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296

======== ======== ======== ======== ======== ======== ======== ======== ======== ========Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296 NPV 10% $164,877


Figure 8.5 Input Assumptions

123456789

101112131415161718192021222324252627282930313233

A B C D E F G H I J K L

FINANCE The @RISK Demonstration Model :Product Launch Risk Analysis 2001-2010

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010======== ======== ======== ======== ======== ======== ======== ======== ======== ========

Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935Volume With Entry 3300 4158 3564 3399 3300 3300 3432 3696Competitor Entry: 1

Design Costs $50,000.00Capital Investment $100,000.00Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15

Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400Depreciation $20,000 $20,000 $20,000 $20,000 $20,000Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296

======== ======== ======== ======== ======== ======== ======== ======== ======== ========Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296NPV 10% $164,877

Figure 8.6 Modifications for SensIt Display

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

A B C D E F G H I J K LInputs

Price w/ o Entry $70.00Price w/ Entry $53.00Volume No Entry 3,500Volume w/ Entry 3,300Competitor Entry 1Design Costs $50,000Capital Investment $100,000Operating Expense Factor 15.0%Unit Production Costs 23.33Overhead $3,300

FINANCE The @RISK Demonstration Model :Product Launch Risk Analysis 2001-2010

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010======== ======== ======== ======== ======== ======== ======== ======== ======== ========

Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935Volume With Entry 3300 4158 3564 3399 3300 3300 3432 3696Competitor Entry: 1

Design Costs $50,000.00Capital Investment $100,000.00Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15

Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400Depreciation $20,000 $20,000 $20,000 $20,000 $20,000Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296

======== ======== ======== ======== ======== ======== ======== ======== ======== ========Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296NPV 10% $164,877

8.2 Product Launch Financial Model 103

Figure 8.7 Modifications for SensIt Formulas

123456789

1011121314151617181920212223242526272829303132333435

A B C D E FInputs

Price w/ o Entry 70Price w/ Entry 53Volume No Entry 3500Volume w/ Entry 3300Competitor Entry 1Design Costs 50000Capital Investment 100000Operating Expense Factor 0.15Unit Production Costs 23.33Overhead 3300

FINANCE The @RISK DemonstratioProduct Launch Risk Analysis 2001-20

2001 2002 2003 2004======== ======== ======== ========

Price No Entry =C2 =1.26*E20Price With Entry =C3 =1.27*E21Volume No Entry =C4 =1.24*E22Volume With Entry =C5 =1.26*E23Competitor Entry: =C6

Design Costs =C7Capital Investment =C8Operating Expense Factor =C9 =$E$28

Sales Price =IF($C$24=0,E20,E21) =IF($C$24=0,F20,F21)Sales Volume =IF($C$24=0,E22,E23) =IF($C$24=0,F22,F23)Sales Revenue =(E30*E31) =(F30*F31)Unit Production Cost =C10 =1.04*E33Overhead =C11 6944Cost of Goods Sold =(E31*E33)+E34 =(F31*F33)+F34

Figure 8.8 Data for Competitor Entry as Base Case

123456789

1011

A B C D E F GInputs Low Base High

Price w/ o Entry $70.00 $50.00 $70.00 $90.00Price w/ Entry $53.00 $40.00 $53.00 $68.00Volume No Entry 3,500 3,100 3,500 3,900Volume w/ Entry 3,300 2,800 3,300 3,800Competitor Entry 0 0 1 1Design Costs $50,000 $37,000 $50,000 $63,000Capital Investment $100,000 $60,000 $100,000 $140,000Operating Expense Factor 15.0% 6.5% 15.0% 23.0%Unit Production Costs 23.33 15.50 23.33 32.00Overhead $3,300 $2,800 $3,300 $4,000


Figure 8.9 Tornado Chart for Competitor Entry as Base Case

Sensit - Sensitivity Analysis - Tornado

3,100

$50.00

$2,800

$37,000

6.5%

$60,000

2,800

15.50

$40.00

0

3,900

$90.00

$4,000

$63,000

23.0%

$140,000

3,800

32.00

$68.00

1

$0 $100,000 $200,000 $300,000 $400,000 $500,000 $600,000 $700,000

Competitor Entry

Price w/ Entry

Unit Production Costs

Volume w/ Entry

Capital Investment

Operating Expense Factor

Design Costs

Overhead

Price w/o Entry

Volume No Entry

NPV 10%

Figure 8.10 Data for No Competitor Entry as Base Case

123456789

1011

A B C D E F GInputs Low Base High

Price w/ o Entry $70.00 $50.00 $70.00 $90.00Price w/ Entry $53.00 $40.00 $53.00 $68.00Volume No Entry 3,500 3,100 3,500 3,900Volume w/ Entry 3,300 2,800 3,300 3,800Competitor Entry 0 0 0 1Design Costs $50,000 $37,000 $50,000 $63,000Capital Investment $100,000 $60,000 $100,000 $140,000Operating Expense Factor 15.0% 6.5% 15.0% 23.0%Unit Production Costs 23.33 15.50 23.33 32.00Overhead $3,300 $2,800 $3,300 $4,000

8.3 Machine Simulation Model 105

Figure 8.11 Tornado Chart for No Competitor Entry as Base Case

Sensit - Sensitivity Analysis - Tornado

2,800

$40.00

$2,800

$37,000

$60,000

6.5%

3,100

15.50

0

$50.00

3,800

$68.00

$4,000

$63,000

$140,000

23.0%

3,900

32.00

1

$90.00

$100,000 $200,000 $300,000 $400,000 $500,000 $600,000 $700,000 $800,000 $900,000 $1,000,000

$1,100,000

Price w/o Entry

Competitor Entry

Unit Production Costs

Volume No Entry

Operating Expense Factor

Capital Investment

Design Costs

Overhead

Price w/ Entry

Volume w/ Entry

NPV 10%

8.3 MACHINE SIMULATION MODEL Adapted from Clemen's Making Hard Decisions. AJS, Ltd., is a manufacturing company that performs contract work for a wide variety of firms. It primarily manufactures and assembles metal items, and so most of its equipment is designed for precision machining tasks. The executive of AJS currently are trying to decide between two processes for manufacturing a product. Their main criterion for measuring the value of a manufacturing process is net present value (NPV). The contractor will pay AJS $8 per unit. AJS is using a three-year horizon for its evaluation (the current year and the next two years).

AJS Process 1 Under the first process, AJS's current machinery is used to make the product. The following inputs are used:

Demand Demand for each of the three years is unknown. The three annual demands are modeled as discrete uncertain quantities with the probability distributions shown in the spreadsheet display.


Variable Cost Variable cost per unit changes each year, depending on the costs for materials and labor. The uncertainty about each variable cost is represented by a continuous normal distribution with mean $4.00 and standard deviation $0.40.

Machine Failure Each year, AJS's machines fail occasionally, but obviously it is impossible to predict when or how many failures will occur during the year. Each time a machine fails, it costs the firm $8000. The uncertainty about the number of machine failures in each of the three years is represented by a Poisson random variable with average 4 failures per year.

Fixed Cost Each year a fixed cost of $12,000 is incurred.

AJS Process 2 The second process involves scrapping the current equipment (it has no salvage value) and purchasing new equipment to make the product at a cost of $60,000. Assume that the firm pays cash for the new machine, and ignore tax effects.

Demand Because of the new machine, the final product is slightly altered and improved, and consequently the demands are likely to be higher than before, although more uncertain. The new demand distributions are shown in the spreadsheet display.

Variable Cost Variable cost per unit still changes each year. With the new machine it is judged to be slightly lower but with more uncertainty, so the cost is described by a normal distribution with mean $3.50 and standard deviation $1.00.

Machine Failure Equipment failures are less likely with the new equipment, with an average of three per year. Such failures tend to be less serious with the new machine, costing only $6000.

Fixed Cost The annual fixed cost of $12,000 is unchanged.


Figure 8.12 Process 1 Display and Formulas

123456789

10111213141516171819202122232425262728293031323334

A B C D E F GProcess 1

Zero One TwoDemand D P(D) D P(D) D P(D)

11,000 0.2 8,000 0.2 4,000 0.116,000 0.6 19,000 0.4 21,000 0.521,000 0.2 27,000 0.4 37,000 0.4

Var Cost Mean StDevNormal $4.00 $0.40

Machine MeanFailure 4Poisson

Equipment $0Unit Price $8Failure Cost $8,000Fixed Cost $12,000Discount Rate 10%

Year Initial Zero One TwoDemand 16,000 19,000 21,000 ModeVar Cost $4.00 $4.00 $4.00 MeanFailures 4 4 4 MeanCash Flow $0 $20,000 $32,000 $40,000

NPV $74,681

Formula in B25: =-B15

Formula in C25: =C22*($B16-C23)-C24*$B17-$B18Copy to D25:E25

Formula in B27: =B25+NPV(B19,C25:E25)


Figure 8.13 Process 2 Display

123456789

101112131415161718192021222324252627

A B C D E F GProcess 2

Zero One TwoDemand D P(D) D P(D) D P(D)

14,000 0.3 12,000 0.36 9,000 0.419,000 0.4 23,000 0.36 26,000 0.124,000 0.3 31,000 0.28 42,000 0.5

Var Cost Mean StDevNormal $3.50 $1.00

Machine MeanFailure 3Poisson

Equipment $60,000Unit Price $8Failure Cost $6,000Fixed Cost $12,000Discount Rate 10%

Year Initial Zero One TwoDemand 19,000 23,000 26,000 ModeVar Cost $3.50 $3.50 $3.50 MeanFailures 3 3 3 MeanCash Flow -$60,000 $55,500 $73,500 $87,000

NPV $116,563

Figure 8.14 RiskSim Functions for Process 1 and Process 2 2021222324252627

A B C D E

Year Initial Zero One TwoDemand =randdiscrete(B4:C6) =randdiscrete(D4:E6) =randdiscrete(F4:G6)Var Cost =randnormal($B$9,$C$9) =randnormal($B$9,$C$9) =randnormal($B$9,$C$9)Failures =randpoisson($B$12) =randpoisson($B$12) =randpoisson($B$12)Cash Flow =-B15 =C22*($B16-C23)-C24*$B17-$B18 =D22*($B16-D23)-D24*$B17-$B18 =E22*($B16-E23)-E24*$B17-$B18

NPV =B25+NPV(B19,C25:E25)


Figure 8.15 RiskSim Output for Process 1 RiskSim - One Output - Summary Mean $90,526Date 9-Apr-01 St. Dev. $47,290Time 7:07 PM Mean St. Error $1,495Workbook AJS_WhatIf.xls Minimum -$59,664Worksheet Process 1 Probability First Quartile $58,050Output Cell $B$27 Median $91,460Output Label NPV Third Quartile $124,435Seed 0.5 Maximum $234,703Trials 1,000 Skewness -0.1034

RiskSim Histogram, 09-Apr-01, 07:07 PM

0

50

100

150

200

250

300

350

400

-$100,000 $0 $100,000 $200,000

NPV, Upper Limit of Interval

Freq

uenc

y

RiskSim Cumulative Chart, 09-Apr-01, 07:07 PM

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000

NPV

Cum

ulat

ive

Pro

babi

lity


Figure 8.16 RiskSim Output for Process 2 RiskSim - One Output - Summary Mean $116,159Date 9-Apr-01 St. Dev. $73,675Time 7:08 PM Mean St. Error $2,330Workbook AJS_WhatIf.xls Minimum -$70,685Worksheet Process 2 Probability First Quartile $60,199Output Cell $B$27 Median $114,335Output Label NPV Third Quartile $168,191Seed 0.5 Maximum $347,514Trials 1,000 Skewness 0.1390

RiskSim Histogram, 09-Apr-01, 07:08 PM

0

50

100

150

200

250

300

-$100,000 $0 $100,000 $200,000 $300,000

NPV, Upper Limit of Interval

Freq

uenc

y

RiskSim Cumulative Chart, 09-Apr-01, 07:08 PM

0.00.10.20.30.40.50.60.70.80.91.0

-$100,000

-$50,000 $0 $50,000 $100,000

$150,000

$200,000

$250,000

$300,000

$350,000

NPV

Cum

ulat

ive

Pro

babi

lity


Follow these instructions to show two or more risk profiles on the same chart.

Use RiskSim to obtain the sorted values, cumulative probabilities, and XY charts for strategy A and strategy B.

To add the data for strategy B to the existing plot for strategy A, select the sorted values and cumulative probabilities for strategy B (without including the text labels in row 1), and choose Edit | Copy.

Click just inside the outer border of the strategy A chart to select it. From the main menu, choose Edit | Paste Special. In the Paste Special dialog box, select "Add cells as New series," select "Values (Y) in Columns," check the box for "Categories (X Values) in First Column," and click OK.

Use the same method to add data for other strategies to the strategy A chart.

To change the lines and markers of a data series, click a data point on the chart to select the data series, and choose Format | Selected Data Series | Patterns.

If the X values are quite different for the various strategies, it may be necessary to adjust the minimum and maximum values on the Scale tab of the Format Axis dialog box.


Figure 8.17 Comparison of Process1 and Process 2 Process 1 Process 2

Mean $90,526 Mean $116,159St. Dev. $47,290 St. Dev. $73,675Mean St. Error $1,495 Mean St. Error $2,330Minimum -$59,664 Minimum -$70,685First Quartile $58,050 First Quartile $60,199Median $91,460 Median $114,335Third Quartile $124,435 Third Quartile $168,191Maximum $234,703 Maximum $347,514Skewness -0.1034 Skewness 0.1390

RiskSim Cumulative Chart

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000 $300,000 $350,000

NPV

Cum

ulat

ive

Prob

abili

ty

Process 1 Process 2

Modeling Inventory Decisions 9

This chapter describes simulation and expected value methods for determining how much of a product or service to have on hand for a single period when there is uncertain demand and no possibility of reordering.

9.1 NEWSVENDOR PROBLEM This approach is appropriate for decision situations with highly seasonal or style goods, perishable goods like flowers and foods, goods that become obsolete, like newspapers and magazines, and perishable services, like airline seats for a specific flight and hotel rooms for a specific date.

This decision problem is sometimes called the newsvendor problem, and it is the basis for more elaborate models called yield management or revenue management.

Stationery Wholesaler Example A wholesaler of stationery is deciding how many desk calendars to stock for the coming year. It is impossible to reorder, and leftover units are worthless. The wholesaler has approximated the uncertain demand as shown in the following table.

Demand, in thousands Probability 100 0.10 200 0.15 300 0.50 400 0.25

The calendars sell for $100 per thousand, and the incremental cost of purchase is $70 per thousand. The incremental cost of selling (sales commissions) is $5 per thousand.

114 Chapter 9 Modeling Inventory Decisions


Modeling Waiting Lines 10

10.1 QUEUE SIMULATION A warehouse has one dock used to unload railroad freight cars. Incoming freight cars are delivered to the warehouse during the night. It takes exactly half a day to unload a car. If more than two cars are waiting to be unloaded on a given day, the unloading of some of the cars is postponed until the following day. The cost is $100 per day for each car delayed.

Past experience has indicated that the number of cars arriving during the night have the frequencies shown in the table below. Furthermore, there is no apparent pattern, so that the number arriving on any night is independent of the number arriving on any other night.

Figure 10.1 Arrival Frequency Number of cars

arriving Relative

frequency 0 0.23 1 0.30 2 0.30 3 0.10 4 0.05 5 0.02

6 or more 0.00 1.00

Concepts for Queuing (waiting-line) Models

Arrival pattern

Service time

Number of servers

Queue discipline

116 Chapter 10 Modeling Waiting Lines

Performance measures

Equilibrium

Average waiting time

Average number of customers in line

System utilization, rho = mean arrival rate / mean service rate

Stable system: rho < 1

Figure 10.2 Influence Chart for Simulation Model TotalCost

Cost ofDelays

Number Number NumberDelayed Delayed Delayed

Number Actually Number Actually Number ActuallyTo Unload Unloaded To Unload Unloaded To Unload Unloaded

Number of Number of Number ofArrivals Arrivals Arrivals

Day 1 Day 2 Day N

Unloading Capacity

10.1 Queue Simulation 117

Figure 10.3 Simulation Model Spreadsheet Model Display

123456789

1011121314444546474849505152535455565758

A B C D E F G H IUnloading Capacity 2 Daily Delay Cost 100$

Random Number of Number Actually Number Annual Delay Cost 16,500$ Day Number Arrivals To Unload Unloaded Delayed

1 0.812 3 3 2 12 0.524 2 3 2 13 0.671 2 3 2 14 0.250 1 2 2 05 0.940 3 3 2 16 0.771 2 3 2 17 0.026 0 1 1 08 0.178 0 0 0 09 0.683 2 2 2 0

10 0.727 2 2 2 040 0.082 0 0 0 041 0.425 1 1 1 042 0.826 3 3 2 143 0.855 3 4 2 244 0.971 3 5 2 345 0.429 1 4 2 246 0.592 2 4 2 247 0.085 0 2 2 048 0.018 0 0 0 049 0.678 2 2 2 050 0.510 2 2 2 0

Total 86 33

Daily Average 1.72 0.66

Figure 10.4 Simulation Model Spreadsheet Model Formulas

123456789

1011121314444546474849505152535455565758

A B C D E F G H IUnloading Capacity 2 Daily Delay Cost 100

Random Number of Number Actually Number Annual Delay Cost =250*F58*I1Day Number Arrivals To Unload Unloaded Delayed

1 =RAND() =IF(B5<0.2,0,IF(B5<0.5,1,IF(B5<0.8,2,3))) =C5 =MIN(D5,$C$1) =D5-E52 =RAND() =IF(B6<0.2,0,IF(B6<0.5,1,IF(B6<0.8,2,3))) =F5+C6 =MIN(D6,$C$1) =D6-E63 =RAND() =IF(B7<0.2,0,IF(B7<0.5,1,IF(B7<0.8,2,3))) =F6+C7 =MIN(D7,$C$1) =D7-E74 =RAND() =IF(B8<0.2,0,IF(B8<0.5,1,IF(B8<0.8,2,3))) =F7+C8 =MIN(D8,$C$1) =D8-E85 =RAND() =IF(B9<0.2,0,IF(B9<0.5,1,IF(B9<0.8,2,3))) =F8+C9 =MIN(D9,$C$1) =D9-E96 =RAND() =IF(B10<0.2,0,IF(B10<0.5,1,IF(B10<0.8,2,3))) =F9+C10 =MIN(D10,$C$1) =D10-E107 =RAND() =IF(B11<0.2,0,IF(B11<0.5,1,IF(B11<0.8,2,3))) =F10+C11 =MIN(D11,$C$1) =D11-E118 =RAND() =IF(B12<0.2,0,IF(B12<0.5,1,IF(B12<0.8,2,3))) =F11+C12 =MIN(D12,$C$1) =D12-E129 =RAND() =IF(B13<0.2,0,IF(B13<0.5,1,IF(B13<0.8,2,3))) =F12+C13 =MIN(D13,$C$1) =D13-E1310 =RAND() =IF(B14<0.2,0,IF(B14<0.5,1,IF(B14<0.8,2,3))) =F13+C14 =MIN(D14,$C$1) =D14-E1440 =RAND() =IF(B44<0.2,0,IF(B44<0.5,1,IF(B44<0.8,2,3))) =F43+C44 =MIN(D44,$C$1) =D44-E4441 =RAND() =IF(B45<0.2,0,IF(B45<0.5,1,IF(B45<0.8,2,3))) =F44+C45 =MIN(D45,$C$1) =D45-E4542 =RAND() =IF(B46<0.2,0,IF(B46<0.5,1,IF(B46<0.8,2,3))) =F45+C46 =MIN(D46,$C$1) =D46-E4643 =RAND() =IF(B47<0.2,0,IF(B47<0.5,1,IF(B47<0.8,2,3))) =F46+C47 =MIN(D47,$C$1) =D47-E4744 =RAND() =IF(B48<0.2,0,IF(B48<0.5,1,IF(B48<0.8,2,3))) =F47+C48 =MIN(D48,$C$1) =D48-E4845 =RAND() =IF(B49<0.2,0,IF(B49<0.5,1,IF(B49<0.8,2,3))) =F48+C49 =MIN(D49,$C$1) =D49-E4946 =RAND() =IF(B50<0.2,0,IF(B50<0.5,1,IF(B50<0.8,2,3))) =F49+C50 =MIN(D50,$C$1) =D50-E5047 =RAND() =IF(B51<0.2,0,IF(B51<0.5,1,IF(B51<0.8,2,3))) =F50+C51 =MIN(D51,$C$1) =D51-E5148 =RAND() =IF(B52<0.2,0,IF(B52<0.5,1,IF(B52<0.8,2,3))) =F51+C52 =MIN(D52,$C$1) =D52-E5249 =RAND() =IF(B53<0.2,0,IF(B53<0.5,1,IF(B53<0.8,2,3))) =F52+C53 =MIN(D53,$C$1) =D53-E5350 =RAND() =IF(B54<0.2,0,IF(B54<0.5,1,IF(B54<0.8,2,3))) =F53+C54 =MIN(D54,$C$1) =D54-E54

Total =SUM(C5:C54) =SUM(F5:F54)

Daily Average =C56/50 =F56/50


Figure 10.5 Simulation Model Dynamic Histogram Display

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

K L M N O P Q R S T U V50-Day Trial 17,000$ Minimum 1,500$ Interval Max Frequency

1 12,500$ Maximum 58,000$ 5000 112 28,000$ 10000 393 2,500$ Mean 12,845$ 15000 244 16,000$ 20000 105 6,000$ StDev 9,016$ 25000 86 9,500$ 30000 37 10,500$ 35000 28 13,500$ 40000 29 7,000$ 45000 0

10 15,500$ 50000 011 21,500$ 55000 012 16,000$ 60000 113 9,000$ 65000 014 4,500$ 70000 015 8,500$ 75000 016 8,500$ 80000 017 15,000$ 85000 018 9,000$ 90000 019 2,000$ 95000 020 10,500$ 100000 021 16,500$ More 022 18,500$ 23 8,500$ 24 5,500$ 25 13,500$ 26 5,000$ 27 23,500$ 28 58,000$ 29 11,500$ 30 7,000$ 31 7,000$ 32 6,000$ 33 7,500$ 34 9,500$ 35 7,500$ 36 12,500$ 37 8,500$ 38 14,000$ 39 7,000$ 40 31,000$ 41 22,000$ 42 40,000$ 43 10,500$ 44 8,500$

Simulation

05

1015202530354045

5000 20000 35000 50000 65000 80000 95000

Annual Cost of Delays

Freq

uenc

y of

100

50-D

ay T

rials

10.1 Queue Simulation 119

Figure 10.6 Simulation Model Dynamic Histogram Formulas

123456789

1011121314151617181920212223242526

L M N O P Q R S50-Day Trial =I3 Minimum =MIN(M2:M101) Interval Max Frequency

1 =TABLE(,K1) Maximum =MAX(M2:M101) 5000 =FREQUENCY(M2:M101,R2:R21)2 =TABLE(,K1) 10000 =FREQUENCY(M2:M101,R2:R21)3 =TABLE(,K1) Mean =AVERAGE(M2:M101) 15000 =FREQUENCY(M2:M101,R2:R21)4 =TABLE(,K1) 20000 =FREQUENCY(M2:M101,R2:R21)5 =TABLE(,K1) StDev =STDEV(M2:M101) 25000 =FREQUENCY(M2:M101,R2:R21)6 =TABLE(,K1) 30000 =FREQUENCY(M2:M101,R2:R21)7 =TABLE(,K1) 35000 =FREQUENCY(M2:M101,R2:R21)8 =TABLE(,K1) 40000 =FREQUENCY(M2:M101,R2:R21)9 =TABLE(,K1) 45000 =FREQUENCY(M2:M101,R2:R21)10 =TABLE(,K1) 50000 =FREQUENCY(M2:M101,R2:R21)11 =TABLE(,K1) 55000 =FREQUENCY(M2:M101,R2:R21)12 =TABLE(,K1) 60000 =FREQUENCY(M2:M101,R2:R21)13 =TABLE(,K1) 65000 =FREQUENCY(M2:M101,R2:R21)14 =TABLE(,K1) 70000 =FREQUENCY(M2:M101,R2:R21)15 =TABLE(,K1) 75000 =FREQUENCY(M2:M101,R2:R21)16 =TABLE(,K1) 80000 =FREQUENCY(M2:M101,R2:R21)17 =TABLE(,K1) 85000 =FREQUENCY(M2:M101,R2:R21)18 =TABLE(,K1) 90000 =FREQUENCY(M2:M101,R2:R21)19 =TABLE(,K1) 95000 =FREQUENCY(M2:M101,R2:R21)20 =TABLE(,K1) 100000 =FREQUENCY(M2:M101,R2:R21)21 =TABLE(,K1) More =FREQUENCY(M2:M101,R2:R21)22 =TABLE(,K1)23 =TABLE(,K1)24 =TABLE(,K1)25 =TABLE(,K1)

Part 3 Decision Trees

Part 3 describes decision tree models, which are particularly useful for sequential decision problems under uncertainty. Documentation and examples are included for the TreePlan decision tree add-in for Excel.

Sensitivity analysis with standard Excel features is used to check decision tree input assumptions regarding probabilities and cash flows

Subsequent chapters describe value of information and risk attitude.

122


Introduction to Decision Trees 11

A decision tree can be used as a model for a sequential decision problems under uncertainty. A decision tree describes graphically the decisions to be made, the events that may occur, and the outcomes associated with combinations of decisions and events. Probabilities are assigned to the events, and values are determined for each outcome. A major goal of the analysis is to determine the best decisions.

11.1 DECISION TREE STRUCTURE Decision tree models include such concepts as nodes, branches, terminal values, strategy, payoff distribution, certainty equivalent, and the rollback method. The following problem illustrates the basic concepts.

DriveTek Problem, Part A DriveTek Research Institute discovers that a computer company wants a new tape drive for a proposed new computer system. Since the computer company does not have research people available to develop the new drive, it will subcontract the development to an independent research firm. The computer company has offered a fixed fee for the best proposal for developing the new tape drive. The contract will go to the firm with the best technical plan and the highest reputation for technical competence.

DriveTek Research Institute wants to enter the competition. Management estimates a moderate cost for preparing a proposal, but they are concerned that they may not win the contract.

If DriveTek decides to prepare a proposal, and if they win the contract, their engineers are not sure about how they will develop the tape drive. They are considering three alternative approaches. The first approach is a very expensive mechanical method, and the engineers are certain they can develop a successful model with this approach. A second approach involves electronic components. The engineers think that the electronic approach is a relatively inexpensive method for developing a model of the tape drive, but they are not sure that the results will be satisfactory for satisfying the contract. A third

124 Chapter 11 Introduction to Decision Trees

inexpensive approach uses magnetic components. This magnetic method costs more than the electronic method, and the engineers think that it has a higher chance of success.

DriveTek Research can work on only one approach at a time and has time to try only two approaches. If it tries either the magnetic or electronic method and the attempt fails, the second choice must be the mechanical method to guarantee a successful model.

The management of DriveTek Research needs help in incorporating this information into a decision to proceed or not.

Nodes and Branches Decision trees have three kinds of nodes and two kinds of branches. A decision node is a point where a choice must be made; it is shown as a square. The branches extending from a decision node are decision branches, each branch representing one of the possible alternatives or courses of action available at that point. The set of alternatives must be mutually exclusive (if one is chosen, the others cannot be chosen) and collectively exhaustive (all possible alternatives must be included in the set).

There are two major decisions in the DriveTek problem. First, the company must decide whether or not to prepare a proposal. Second, if it prepares a proposal and is awarded the contract, it must decide which of the three approaches to try to satisfy the contract.

An event node is a point where uncertainty is resolved (a point where the decision maker learns about the occurrence of an event). An event node, sometimes called a "chance node," is shown as a circle. The event set consists of the event branches extending from an event node, each branch representing one of the possible events that may occur at that point. The set of events must be mutually exclusive (if one occurs, the others cannot occur) and collectively exhaustive (all possible events must be included in the set). Each event is assigned a subjective probability; the sum of probabilities for the events in a set must equal one.

The three sources of uncertainty in the DriveTek problem are: whether it is awarded the contract or not, whether the electronic approach succeeds or fails, and whether the magnetic approach succeeds or fails.

In general, decision nodes and branches represent the controllable factors in a decision problem; event nodes and branches represent uncontrollable factors.

Decision nodes and event nodes are arranged in order of subjective chronology. For example, the position of an event node corresponds to the time when the decision maker learns the outcome of the event (not necessarily when the event occurs).

The third kind of node is a terminal node, representing the final result of a combination of decisions and events. Terminal nodes are the endpoints of a decision tree, shown as the

11.1 Decision Tree Structure 125

end of a branch on hand-drawn diagrams and as a triangle or vertical bar on computer-generated diagrams.

The following table shows the three kinds of nodes and two kinds of branches used to represent a decision tree.

Figure 11.1 Nodes and Symbols Type of Node Written Symbol Computer Symbol Node Successor Decision square square decision branches Event circle circle event branches Terminal endpoint triangle or bar terminal value

In the DriveTek problem, the first portion of the decision tree is shown in Figure 10.2.

Figure 11.2 DriveTek Initial Decision and Event

Awarded contract

Prepare proposal

Not awarded contract

Don't prepare proposal

If DriveTek is awarded the contract, they must decide which approach to use. For the electronic and magnetic approaches, the result is uncertain, as shown in Figure 10.3. The arrangement of the decision and event branches is called the structure of the decision tree.


Figure 11.3 DriveTek Decisions and Events (Structure)

Use mechanical method

Electronic success

Try electronic methodAwarded contract

Electronic failure

Magnetic successPrepare proposal

Try magnetic method

Magnetic failure

Not awarded contract


For representing a sequential decision problem, the tree diagram is usually better than the written description. In some decision problems, the choice may be obvious by looking at the diagram. That is, the decision maker may know enough about the desirability of the outcomes (endpoints in the tree) and how likely they are. But usually the next step in the analysis after documenting the structure is to assign values to the endpoints.

11.2 DECISION TREE TERMINAL VALUES Each terminal node has an associated terminal value, sometimes called a payoff value, outcome value, or endpoint value. Each terminal value measures the result of a scenario: the sequence of decisions and events on a unique path leading from the initial decision node to a specific terminal node. To determine the terminal value, one approach assigns a cash flow value to each decision branch and event branch and then sums the cash flow values on the branches leading to a terminal node to determine the terminal value. Some problems require a more elaborate value model to determine the terminal values.

DriveTek Problem, Part B DriveTek thinks it will cost $50,000 to prepare a proposal. If they are awarded the contract, DriveTek will receive an immediate payment of $250,000. The engineers think

11.2 Decision Tree Terminal Values 127

that the sure-success mechanical method will cost $120,000. The possibly-successful electronic approach will cost $50,000, and the more-likely-successful magnetic approach will cost $80,000. In the DriveTek problem, these distinct cash flows associated with many of the decision and event branches are shown in Figure 10.4.

Figure 11.4 DriveTek Cash Flows and Outcome Values

Use mechanical method$80,000

-$120,000

Electronic success$150,000

Try electronic method $0Awarded contract

-$50,000$250,000 Electronic failure

$30,000-$120,000

Magnetic successPrepare proposal $120,000

Try magnetic method $0-$50,000

-$80,000Magnetic failure

$0-$120,000

Not awarded contract-$50,000

$0

Don't prepare proposal$0

$0

Figure 10.4 also shows the sum of branch cash flows at the endpoints. For example, the $30,000 terminal value on the far right of the diagram is associated with the scenario shown in Figure 10.5.

Figure 11.5 Terminal Value for a Scenario Branch Type Branch Name Cash Flow Decision Prepare proposal –$50,000 Event Awarded contract +$250,000 Decision Try electronic method –$50,000 Event Electronic failure (Use mechanical method) –$120,000 Terminal value = +$30,000


11.3 DECISION TREE PROBABILITIES

DriveTek Problem, Part C DriveTek management thinks there is a fifty-fifty chance of winning the contract. The engineers think that the inexpensive electronic method has only a 50% chance of satisfactory results. In their opinion the somewhat more costly magnetic method has 70% chance of success.

Figure 11.6 DriveTek Probabilities and Terminal Values


0.5Electronic success

$150,0000.5 Try electronic methodAwarded contract

0.5Electronic failure

$30,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method

0.3Magnetic failure

$0

0.5Not awarded contract

-$50,000


Figure 4.6 is a complete decision tree model.

Next: How do you decide what choice to make at each decision node?

Concepts: Payoff distribution, certainty equivalent, expected value, rollback method

Decision Trees Using TreePlan 12

TreePlan is a decision tree add-in for Microsoft Excel 97 (and later versions of Excel) for Windows and Macintosh. It was developed by Professor Michael R. Middleton at the University of San Francisco and modified for use at Fuqua (Duke) by Professor James E. Smith.

12.1 TREEPLAN INSTALLATION All of TreePlan’s functionality is in a single file, TreePlan.xla. Depending on your preference, there are three ways to install TreePlan. (These instructions also apply to the other Decision ToolPak add-ins: SensIt.xla and RiskSim.xla.)

Occasional Use If you plan to use TreePlan on an irregular basis, simply use Excel’s File | Open command to load TreePlan.xla each time you want to use it. You may keep the TreePlan.xla file on a floppy disk, your computer’s hard drive, or a network server.

Selective Use You can use Excel’s Add-In Manager to install TreePlan. First, copy TreePlan.xla to a location on your computer’s hard drive. Second, if you save TreePlan.xla in the Excel or Office Library subdirectory, go to the third step. Otherwise, run Excel, choose Tools | Add-Ins; in the Add-Ins dialog box, click the Browse button, use the Browse dialog box to specify the location of TreePlan.xla, and click OK. Third, in the Add-Ins dialog box, note that TreePlan is now listed with a check mark, indicating that its menu command will appear in Excel, and click OK.

If you plan to not use TreePlan and you want to free up main memory, uncheck the box for TreePlan in the Add-In Manager. When you do want to use TreePlan, choose Tools | Add-Ins and check TreePlan’s box.

130 Chapter 12 Decision Trees Using TreePlan

To remove TreePlan from the Add-In Manager, use Windows Explorer or another file manager to delete TreePlan.xla from the Library subdirectory or from the location you specified when you used the Add-In Manager’s Browse command. The next time you start Excel and choose Tools | Add-Ins, a dialog box will state “Cannot find add-in … treeplan.xla. Delete from list?” Click Yes.

Steady Use If you want TreePlan’s options immediately available each time you run Excel, use Windows Explorer or another file manager to save TreePlan.xla in the Excel XLStart directory. Alternatively, in Excel you can use Tools | Options | General to specify an alternate startup file location and use a file manager to save TreePlan.xla there. When you start Excel, it tries to open all files in the XLStart directory and in the alternate startup file location.

For additional information visit “TreePlan FAQ” at www.treeplan.com.

After opening TreePlan.xla in Excel, the command "Decision Tree" appears at the bottom of the Tools menu (or, if you have a customized main menu, at the bottom of the sixth main menu item).

12.2 BUILDING A DECISION TREE IN TREEPLAN You can start TreePlan either by choosing Tools | Decision Tree from the menu bar or by pressing Ctrl+t (hold down the Ctrl key and press t). If the worksheet doesn't have a decision tree, TreePlan prompts you with a dialog box with three options; choose New Tree to begin a new tree. TreePlan draws a default initial decision tree with its upper left corner at the selected cell. For example, the figure below shows the initial tree when $B$2 is selected. (Note that TreePlan writes over existing values in the spreadsheet: begin your tree to the right of the area where your data is stored, and do not subsequently add or delete rows or columns in the tree-diagram area.) In Excel 5 and 95 a terminal node is represented by a triangle instead of a vertical bar.

Figure 12.1 TreePlan Initial Default Decision Tree

1234567891011

A B C D E F G H I

Decision 10

0 01

0Decision 2

00 0

12.2 Building a Decision Tree in TreePlan 131

Build up a tree by adding or modifying branches or nodes in the default tree. To change the branch labels or probabilities, click on the cell containing the label or probability and type the new label or probability. To modify the structure of the tree (e.g., add or delete branches or nodes in the tree), select the node or the cell containing the node in the tree to modify, and choose Tools | Decision Tree or press Ctrl+t. TreePlan will then present a dialog box showing the available commands.

For example, to add an event node to the top branch of the tree shown above, select the square cell (cell G4) next to the vertical line at the end of a terminal branch and press Ctrl+t.. TreePlan then presents this dialog box.

Figure 12.2 TreePlan Terminal Dialog Box

To add an event node to the branch, we change the selected terminal node to an event node by selecting Change to event node in the dialog box, selecting the number of branches (here two), and pressing OK. TreePlan then redraws the tree with a chance node in place of the terminal node. Figure 12.3

123456789

10111213141516

A B C D E F G H I J K L M

0.5Event 3

0Decision 1 0 0

0 0 0.5Event 4

01 0 0

0

Decision 20

0 0

The dialog boxes presented by TreePlan vary depending on what you have selected when you choose Tools | Decision Tree or press Ctrl+t. The dialog box shown below is presented when you press Ctrl+t with an event node selected; a similar dialog box is


presented when you select a decision node. If you want to add a branch to the selected node, choose Add branch and press OK. If you want to insert a decision or event node before the selected node, choose Insert decision or Insert event and press OK. To get a description of the available commands, click on the Help button.

Figure 12.4

The Copy subtree command is particularly useful when building large trees. If two or more parts of the tree are similar, you can copy and paste "subtrees" rather than building up each part separately. To copy a subtree, select the node at the root of the subtree and choose Copy subtree. This tells TreePlan to copy the selected node and everything to the right of it in the tree. To paste this subtree, select a terminal node and choose Paste subtree. TreePlan then duplicates the specified subtree at the selected terminal node.

Since TreePlan decision trees are built directly in Excel, you can use Excel's commands to format your tree. For example, you can use bold or italic fonts for branch labels: select the cells you want to format and change them using Excel's formatting commands. To help you, TreePlan provides a Select dialog box that appears when you choose Tools Decision Tree or press Ctrl+t without a node selected. You can also bring up this dialog box by pressing the Select button on the Node dialog box. From here, you can select all items of a particular type in the tree. For example, if you choose Probabilities and press OK, TreePlan selects all cells containing probabilities in the tree. You can then format all of the probabilities simultaneously using Excel's formatting commands. (Because of limitations in Excel, the Select dialog box will not be available when working with very large trees.)

12.3 ANATOMY OF A TREEPLAN DECISION TREE An example of a TreePlan decision tree is shown below. In the example, a firm must decide (1) whether to prepare a proposal for a possible contract and (2) which method to use to satisfy the contract. The tree consists of decision nodes, event nodes and terminal nodes connected by branches. Each branch is surrounded by cells containing formulas,

12.3 Anatomy of a TreePlan Decision Tree 133

cell references, or labels pertaining to that branch. You may edit the labels, probabilities, and partial cash flows associated with each branch. The partial cash flows are the amount the firm "gets paid" to go down that branch. Here, the firm pays $50,000 if it decides to prepare the proposal, receives $250,000 up front if awarded the contract, spends $50,000 to try the electronic method, and spends $120,000 on the mechanical method if the electronic method fails.

Figure 12.5


0.5 -$120,000 $80,000Awarded contract

2 0.5$250,000 $90,000 Electronic success

$150,000Try electronic method $0 $150,000

Prepare proposal -$50,000 $90,000 0.5Electronic failure

-$50,000 $20,000 $30,000-$120,000 $30,000

0.51 Not awarded contract

$20,000 -$50,000$0 -$50,000


$0 $0

BRANCH LABELS: Type text in these cells.

DECISION NODES: TreePlan formula for which alternative is optimal.

PARTIAL CASH FLOWS: Enter numbers or formulas in these cells.

PROBABILITIES: Enter numbers or formulas in these cells.

ROLLBACK EVs: TreePlan formula for expected value at this point in the tree.

TERMINAL NODES

EVENT NODES

TERMINAL VALUES: TreePlan formula for sum of partial cash flows along path.

The trees are "solved" using formulas embedded in the spreadsheet. The terminal values sum all the partial cash flows along the path leading to that terminal node. The tree is then "rolled back" by computing expected values at event nodes and by maximizing at decision nodes; the rollback EVs appear next to each node and show the expected value at that point in the tree. The numbers in the decision nodes indicate which alternative is optimal for that decision. In the example, the "1" in the first decision node indicates that it is optimal to prepare the proposal, and the "2" in the second decision node indicates the firm should try the electronic method because that alternative leads to a higher expected value, $90,000, than the mechanical method, $80,000.

TreePlan has a few options that control the way calculations are done in the tree. To select these options, press the Options button in any of TreePlan's dialog boxes. The first choice is whether to Use Expected Values or Use Exponential Utility Function for computing certainty equivalents. The default is to rollback the tree using expected values. If you choose to use exponential utilities, TreePlan will compute utilities of endpoint cash flows at the terminal nodes and compute expected utilities instead of expected values at event nodes. Expected utilities are calculated in the cell below the certainty equivalents. You may also choose to Maximize (profits) or Minimize (costs) at decision nodes; the default is to maximize profits. If you choose to minimize costs instead, the cash flows are


interpreted as costs, and decisions are made by choosing the minimum expected value or certainty equivalent rather than the maximum. See the Help file for details on these options.

12.4 STEP-BY-STEP TREEPLAN TUTORIAL A decision tree can be used as a model for a sequential decision problems under uncertainty. A decision tree describes graphically the decisions to be made, the events that may occur, and the outcomes associated with combinations of decisions and events. Probabilities are assigned to the events, and values are determined for each outcome. A major goal of the analysis is to determine the best decisions.

Decision tree models include such concepts as nodes, branches, terminal values, strategy, payoff distribution, certainty equivalent, and the rollback method. The following problem illustrates the basic concepts.

DriveTek Problem DriveTek Research Institute discovers that a computer company wants a new tape drive for a proposed new computer system. Since the computer company does not have research people available to develop the new drive, it will subcontract the development to an independent research firm. The computer company has offered a fee of $250,000 for the best proposal for developing the new tape drive. The contract will go to the firm with the best technical plan and the highest reputation for technical competence.

DriveTek Research Institute wants to enter the competition. Management estimates a cost of $50,000 to prepare a proposal with a fifty-fifty chance of winning the contract.

However, DriveTek's engineers are not sure about how they will develop the tape drive if they are awarded the contract. Three alternative approaches can be tried. The first approach is a mechanical method with a cost of $120,000, and the engineers are certain they can develop a successful model with this approach. A second approach involves electronic components. The engineers estimate that the electronic approach will cost only $50,000 to develop a model of the tape drive, but with only a 50 percent chance of satisfactory results. A third approach uses magnetic components; this costs $80,000, with a 70 percent chance of success.

DriveTek Research can work on only one approach at a time and has time to try only two approaches. If it tries either the magnetic or electronic method and the attempt fails, the second choice must be the mechanical method to guarantee a successful model.

The management of DriveTek Research needs help in incorporating this information into a decision to proceed or not.

12.4 Step-by-Step TreePlan Tutorial 135

[Source: The tape drive example is adapted from Spurr and Bonini, Statistical Analysis for Business Decisions, Irwin.]

Nodes and Branches Decision trees have three kinds of nodes and two kinds of branches. A decision node is a point where a choice must be made; it is shown as a square. The branches extending from a decision node are decision branches, each branch representing one of the possible alternatives or courses of action available at that point. The set of alternatives must be mutually exclusive (if one is chosen, the others cannot be chosen) and collectively exhaustive (all possible alternatives must be included in the set).

There are two major decisions in the DriveTek problem. First, the company must decide whether or not to prepare a proposal. Second, if it prepares a proposal and is awarded the contract, it must decide which of the three approaches to try to satisfy the contract.

An event node is a point where uncertainty is resolved (a point where the decision maker learns about the occurrence of an event). An event node, sometimes called a "chance node," is shown as a circle. The event set consists of the event branches extending from an event node, each branch representing one of the possible events that may occur at that point. The set of events must be mutually exclusive (if one occurs, the others cannot occur) and collectively exhaustive (all possible events must be included in the set). Each event is assigned a subjective probability; the sum of probabilities for the events in a set must equal one.

The three sources of uncertainty in the DriveTek problem are: whether it is awarded the contract or not, whether the electronic approach succeeds or fails, and whether the magnetic approach succeeds or fails.

In general, decision nodes and branches represent the controllable factors in a decision problem; event nodes and branches represent uncontrollable factors.

Decision nodes and event nodes are arranged in order of subjective chronology. For example, the position of an event node corresponds to the time when the decision maker learns the outcome of the event (not necessarily when the event occurs).

The third kind of node is a terminal node, representing the final result of a combination of decisions and events. Terminal nodes are the endpoints of a decision tree, shown as the end of a branch on hand-drawn diagrams and as a triangle on computer-generated diagrams.

The following table shows the three kinds of nodes and two kinds of branches used to represent a decision tree.


Figure 12.6 Nodes and Symbols Type of Node Written Symbol Computer Symbol Node Successor Decision square square decision branches Event circle circle event branches Terminal endpoint triangle or bar terminal value

Terminal Values Each terminal node has an associated terminal value, sometimes called a payoff value, outcome value, or endpoint value. Each terminal value measures the result of a scenario: the sequence of decisions and events on a unique path leading from the initial decision node to a specific terminal node.

To determine the terminal value, one approach assigns a cash flow value to each decision branch and event branch and then sum the cash flow values on the branches leading to a terminal node to determine the terminal value. In the DriveTek problem, there are distinct cash flows associated with many of the decision and event branches. Some problems require a more elaborate value model to determine the terminal values.

The following diagram shows the arrangement of branch names, probabilities, and cash flow values on an unsolved tree.


Figure 12.7

Use mechanical method

-$120,000


0.5 Try electronic method $0Awarded contract

-$50,000 0.5$250,000 Electronic failure

-$120,000

0.7Magnetic success

Prepare proposalTry magnetic method $0

-$50,000-$80,000 0.3

Magnetic failure

-$120,000


$0


$0

To build the decision tree, you use TreePlan’s dialog boxes to develop the structure. You enter a branch name, branch cash flow, and branch probability (for an event) in the cells above and below the left side of each branch. As you build the tree diagram, TreePlan enters formulas in other cells.

Building the Tree Diagram 1. Start with a new worksheet. (If no workbook is open, choose File | New. If a

workbook is open, choose Insert | Worksheet.)

2. Select cell A1. From the Tools menu, choose Decision Tree. In the TreePlan New dialog box, click the New Tree button. A decision node with two branches appears.


Figure 12.8

Figure 12.9

123456789

A B C D E F G

Decision 10

0 01

0Decision 2

00 0

3. Do not type the quotation marks in the following instructions. Select cell D2, and enter Prepare proposal. Select cell D4, and enter –50000. Select cell D7, and enter Don't prepare proposal.

Figure 12.10

123456789

A B C D E F G

Prepare proposal-50000

-50000 -500002

0Don't prepare proposal

00 0

4. Select cell F3. From the Tools menu, choose Decision Tree. In the TreePlan Terminal dialog box, select Change To Event Node, select Two Branches, and click OK. The tree is redrawn.


Figure 12.11

Figure 12.12

1234567891011121314

A B C D E F G H I J K0.5

Event 3-50000

Prepare proposal 0 -50000

-50000 -50000 0.5Event 4

-500002 0 -50000

0

Don't prepare proposal0

0 0

5. Select cell H2, and enter Awarded contract. Select cell H4, and enter 250000. Select cell H7, and enter Not awarded contract.


Figure 12.13

1234567891011121314

A B C D E F G H I J K0.5

Awarded contract200000

Prepare proposal 250000 200000

-50000 75000 0.5Not awarded contract

-500001 0 -50000

75000


0 0

6. Select cell J3. From the Tools menu, choose Decision Tree. In the TreePlan Terminal dialog box, select Change To Decision Node, select Three Branches, and click OK. The tree is redrawn.

Figure 12.14

123456789

101112131415161718192021222324

A B C D E F G H I J K L M N O

Decision 5200000

0 200000

0.5Awarded contract Decision 6

1 200000250000 200000 0 200000

Prepare proposal Decision 7200000

-50000 75000 0 200000


1 -5000075000 0 -50000


0 0

7. Select cell L2, and enter Use mechanical method. Select cell L4, and enter –120000. Select cell L7, and enter Try electronic method. Select cell L9, and


enter –50000. Select cell L12, and enter Try magnetic method. Select cell L14, and enter –80000.

Figure 12.15

123456789

101112131415161718192021222324

A B C D E F G H I J K L M N O

Use mechanical method80000

-120000 80000

0.5Awarded contract Try electronic method

2 150000250000 150000 -50000 150000

Prepare proposal Try magnetic method120000

-50000 50000 -80000 120000


1 -5000050000 0 -50000


0 0

8. Select cell N8. From the Tools menu, choose Decision Tree. In the TreePlan Terminal dialog box, select Change To Event Node, select Two Branches, and click OK. The tree is redrawn.


Figure 12.16

123456789

1011121314151617181920212223242526272829

A B C D E F G H I J K L M N O P Q R S


-120000 80000

0.5Event 8

0.5 150000Awarded contract Try electronic method 0 150000

2250000 150000 -50000 150000 0.5

Event 9150000

0 150000Prepare proposal

-50000 50000 Try magnetic method120000

-80000 120000


50000 -500000 -50000


0 0

9. Select cell P7, and enter Electronic success. Select cell P12, and enter Electronic failure. Select cell P14, and enter –120000.


Figure 12.17

123456789

1011121314151617181920212223242526272829



-120000 80000


0.5 150000Awarded contract Try electronic method 0 150000

3250000 120000 -50000 90000 0.5

Electronic failure30000

-120000 30000Prepare proposal

-50000 35000 Try magnetic method120000

-80000 120000


35000 -500000 -50000


0 0

10. Select cell N18. From the Tools menu, choose Decision Tree. In the TreePlan Terminal dialog box, select Change To Event Node, select Two Branches, and click OK. The tree is redrawn.


Figure 12.18

123456789

10111213141516171819202122232425262728293031323334



-120000 80000


1500000.5 Try electronic method 0 150000

Awarded contract3 -50000 90000 0.5

250000 120000 Electronic failure30000

-120000 30000

0.5Event 10

Prepare proposal 120000Try magnetic method 0 120000

-50000 35000-80000 120000 0.5

Event 11120000

0 120000

1 0.535000 Not awarded contract

-500000 -50000


0 0

11. Select cell P16, and enter .7. Select cell P17, and enter Magnetic success. Select cell P21, and enter .3. Select cell P22, and enter Magnetic failure. Select cell P24, and enter –120000.


Figure 12.19

123456789

10111213141516171819202122232425262728293031323334



-120000 80000


1500000.5 Try electronic method 0 150000

Awarded contract2 -50000 90000 0.5

250000 90000 Electronic failure30000

-120000 30000

0.7Magnetic success

Prepare proposal 120000Try magnetic method 0 120000

-50000 20000-80000 84000 0.3

Magnetic failure0

-120000 0

1 0.520000 Not awarded contract

-500000 -50000


0 0

12. Double-click the sheet tab (or right-click the sheet tab and choose Rename from the shortcut menu), and enter Original. Save the workbook.

Interpreting the Results The $30,000 terminal value on the far right of the diagram in cell S13 is associated with the following scenario:

Figure 12.20 Branch Type Branch Name Cash Flow Decision Prepare proposal –$50,000 Event Awarded contract $250,000 Decision Try electronic method –$50,000 Event Electronic failure (Use mechanical method) –$120,000

Terminal value $30,000

TreePlan put the formula =SUM(P14,L11,H12,D20) into cell S13 for determining the terminal value.


Other formulas, called rollback formulas, are in cells below and to the left of each node. These formulas are used to determine the optimal choice at each decision node.

In cell B26, a formula displays 1, indicating that the first branch is the optimal choice. Thus, the initial choice is to prepare the proposal. In cell J11, a formula displays 2, indicating that the second branch (numbered 1, 2, and 3, from top to bottom) is the optimal choice. If awarded the contract, DriveTek should try the electronic method. A subsequent chapter provides more details about interpretation.

Formatting the Tree Diagram The following steps show how to use TreePlan and Excel features to format the tree diagram. You may choose to use other formats for your own tree diagrams.

13. From the Edit menu, choose Move or Copy Sheet (or right-click the sheet tab and choose Move Or Copy from the shortcut menu). In the lower left corner of the Move Or Copy dialog box, check the Create A Copy box, and click OK.

14.On sheet Original (2), select cell H9. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Cells with Probabilities is selected, and click OK. With all probability cells selected, click the Align Left button.

Figure 12.21

15. Select cell H12. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Cells with Partial Cash Flows is selected, and click OK. With all partial cash flow cells selected, click the Align Left button. With those cells still selected, choose Format | Cells. In the Format Cells dialog box, click the Number tab. In the Category list box, choose


Currency; type 0 (zero) for Decimal Places; select $ in the Symbol list box; select -$1,234 for Negative Numbers. Click OK.

Figure 12.22

16. Select cell I12. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Cells with Rollback EVs/CEs is selected, and click OK. With all rollback cells selected, choose Format | Cells. Repeat the Currency formatting of step 16 above.

17. Select cell S3. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Cells with Terminal Values is selected, and click OK. With all terminal value cells selected, choose Format | Cells. Repeat the Currency formatting of step 16 above.


Figure 12.23

12345678910111213141516171819202122232425262728293031323334



-$120,000 $80,000


$150,0000.5 Try electronic method $0 $150,000Awarded contract

2 -$50,000 $90,000 0.5$250,000 $90,000 Electronic failure

$30,000-$120,000 $30,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method $0 $120,000

-$50,000 $20,000-$80,000 $84,000 0.3

Magnetic failure$0

-$120,000 $0

1 0.5$20,000 Not awarded contract

-$50,000$0 -$50,000


$0 $0

18. Double-click the Original (2) sheet tab (or right-click the sheet tab and choose Rename from the shortcut menu), and enter Formatted. Save the workbook.

Displaying Model Inputs When you build a decision tree model, you may want to discuss the model and its assumptions with co-workers or a client. For such communication it may be preferable to hide the results of formulas that show rollback values and decision node choices. The following steps show how to display only the model inputs.

19. From the Edit menu, choose Move or Copy Sheet (or right-click the sheet tab and choose Move Or Copy from the shortcut menu). In the lower left corner of the Move Or Copy dialog box, check the Create A Copy box, and click OK.

20. On sheet Formatted (2), select cell B1. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Columns with Nodes is selected, and click OK. With all node columns selected, choose Format | Cells | Number. In the Category list box, select Custom. Select the entry in the Type edit box, and type ;;; (three semicolons). Click OK.


Figure 12.24

Explanation: A custom number format has four sections of format codes. The sections are separated by semicolons, and they define the formats for positive numbers, negative numbers, zero values, and text, in that order. When you specify three semicolons without format codes, Excel does not display positive numbers, negative numbers, zero values, or text. The formula remains in the cell, but its result is not displayed. Later, if you want to display the result, you can change the format without having to enter the formula again. Editing an existing format does not delete it. All formats are saved with the workbook unless you explicitly delete a format.

21. Select cell A27. From the Tools menu, choose Decision Tree. In the TreePlan Select dialog box, verify that the option button for Cells with Rollback EVs/CEs is selected, and click OK. With all rollback values selected, choose Format | Cells | Number. In the Category list box, select Custom. Scroll to the bottom of the Type list box, and select the three-semicolon entry. Click OK.

22. Double-click the Formatted (2) sheet tab (or right-click the sheet tab and choose Rename from the shortcut menu), and enter Model Inputs. Save the workbook.


Printing the Tree Diagram 23. In the Name Box list box, select TreeDiagram (or select cells A1:S34).

24. To print the tree diagram from Excel, with the tree diagram range selected choose File | Print Area | Set Print Area. Choose File | Page Setup. In the Page Setup dialog box, click the Page tab; for Orientation click the option button for Landscape, and for Scaling click the option button for Fit To 1 Page Wide By 1 Page Tall. Click the Header/Footer tab; in the Header list box select None, and in the Footer list box select None (or select other appropriate headers and footers). Click the Sheet tab; clear the check box for Gridlines, and clear the check box for Row And Column Headings. Click OK. Choose File | Print and click OK.

25. To print the tree diagram from Word, clear the check boxes for Gridlines and for Row And Column Headings on Excel’s Page Setup dialog box Sheet tab. Select the tree diagram range. Hold down the Shift key and from the Edit menu choose Copy Picture. In the Copy Picture dialog box, click the option button As Shown When Printed, and click OK. In Word select the location where you want to paste the tree diagram and choose Edit | Paste.

Figure 12.25


-$120,000


$150,0000.5 Try electronic method $0Awarded contract

-$50,000 0.5$250,000 Electronic failure

$30,000-$120,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method $0

-$50,000-$80,000 0.3

Magnetic failure$0

-$120,000


-$50,000$0


$0

12.5 Decision Tree Solution 151

Alternative Model If you want to emphasize that the time constraint forces DriveTek to use the mechanical approach if they try either of the uncertain approaches and experience a failure, you can change the terminal nodes in cells R13 and R23 to decision nodes, each with a single branch.

Figure 12.26


-$120,000


$150,0000.5 Try electronic method $0Awarded contract

-$50,000 0.5$250,000 Electronic failure Use mechanical method

$30,000$0 -$120,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method $0

-$50,000-$80,000 0.3

Magnetic failure Use mechanical method$0

$0 -$120,000


-$50,000$0


$0

12.5 DECISION TREE SOLUTION

Strategy A strategy specifies an initial choice and any subsequent choices to be made by the decision maker. The subsequent choices usually depend upon events. The specification of a strategy must be comprehensive; if the decision maker gives the strategy to a colleague, the colleague must know exactly which choice to make at each decision node.

Most decision problems have many possible strategies, and a goal of the analysis is to determine the optimal strategy, taking into account the decision maker's risk attitude. There are four strategies in the DriveTek problem. One of the strategies is: Prepare the proposal; if not awarded the contract, stop; if awarded the contract, try the magnetic method; if the magnetic method is successful, stop; if the magnetic method fails, use the mechanical method. The four strategies will be discussed in detail below.


Payoff Distribution Each strategy has an associated payoff distribution, sometimes called a risk profile. The payoff distribution of a particular strategy is a probability distribution showing the probability of obtaining each terminal value associated with a particular strategy.

In decision tree models, the payoff distribution can be shown as a list of possible payoff values, x, and the discrete probability of obtaining each value, P(X=x), where X represents the uncertain terminal value associated with a strategy. Since a strategy specifies a choice at each decision node, the uncertainty about terminal values depends only on the occurrence of events. The probability of obtaining a specific terminal value equals the product of the probabilities on the event branches on the path leading to the terminal node.

DriveTek Strategies In this section each strategy of the DriveTek problem is described by a shorthand statement and a more detailed statement. The possible branches following a specific strategy are shown in decision tree form, and the payoff distribution is shown in a table with an explanation of the probability calculations.


Strategy 1 (Mechanical): Prepare; if awarded, use mechanical.

Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if awarded the contract, use the mechanical method (payoff = $80,000).

Figure 12.27





$30,000

0.7Magnetic success


0.3Magnetic failure

$0


-$50,000


Figure 12.28

Value, x Probability

P(X=x)

$80,000 0.50 -$50,000 0.50 1.00


Strategy 2 (Electronic): Prepare; if awarded, try electronic.

Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if awarded the contract, try the electronic method; if the electronic method is successful, stop (payoff = $150,000); if the electronic method fails, use the mechanical method (payoff = $30,000).

Figure 12.29





$30,000

0.7Magnetic success


0.3Magnetic failure

$0


-$50,000


Figure 12.30


P(X=x)

$150,000 0.25 = 0.5 * 0.5 $30,000 0.25 = 0.5 * 0.5 -$50,000 0.50 1.00


Strategy 3 (Magnetic): Prepare; if awarded, try magnetic.

Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if awarded the contract, try the magnetic method; if the magnetic method is successful, stop (payoff = $120,000); if the magnetic method fails, use the mechanical method (payoff = $0).

Figure 12.31





$30,000

0.7Magnetic success


0.3Magnetic failure

$0


-$50,000


Figure 12.32


P(X=x)

$120,000 0.35 = 0.5 * 0.7 $0 0.15 = 0.5 * 0.3 -$50,000 0.50 1.00


Strategy 4 (Don't): Don't.

Details: Don't prepare the proposal (payoff = $0).

Figure 12.33





$30,000

0.7Magnetic success


0.3Magnetic failure

$0


-$50,000


Figure 12.34


P(X=x)

$0 1.00 1.00

Strategy Choice Since each strategy can be characterized completely by its payoff distribution, selecting the best strategy becomes a problem of choosing the best payoff distribution.

One approach is to make a choice by direct comparison of the payoff distributions.


Figure 12.35 Strategy 1 (Mechanical) Strategy 2 (Electronic)


P(X=x)


P(X=x) $80,000 0.50 $150,000 0.25 -$50,000 0.50 $30,000 0.25 1.00 -$50,000 0.50 1.00 Strategy 3 (Magnetic) Strategy 4 (Don't)


P(X=x)


P(X=x) $120,000 0.35 $0 1.00 $0 0.15 1.00 -$50,000 0.50 1.00

Another approach for making choices involves certainty equivalents.

Certainty Equivalent A certainty equivalent is a certain payoff value which is equivalent, for the decision maker, to a particular payoff distribution. If the decision maker can determine his or her certainty equivalent for the payoff distribution of each strategy, then the optimal strategy is the one with the highest certainty equivalent.

The certainty equivalent is the minimum selling price for a payoff distribution; it depends on the decision maker's personal attitude toward risk. A decision maker may be risk preferring, risk neutral, or risk avoiding.

If the terminal values are not regarded as extreme (relative to the decision maker's total assets), if the decision maker will encounter other decision problems with similar payoffs, and if the decision maker has the attitude that he or she will "win some and lose some," then the decision maker's attitude toward risk may be described as risk neutral.

If the decision maker is risk neutral, the expected value is the appropriate certainty equivalent for choosing among the strategies. Thus, for a risk neutral decision maker, the optimal strategy is the one with the highest expected value.

The expected value of a payoff distribution is calculated by multiplying each terminal value by its probability and summing the products. The expected value calculations for each of the four strategies of the DriveTek problem are shown below.


Figure 12.36 Strategy 1 (Mechanical)


P(X=x)

x * P(X=x) $80,000 0.50 $40,000 -$50,000 0.50 -$25,000 $15,000

Strategy 2 (Electronic)


P(X=x)

x * P(X=x) $150,000 0.25 $37,500 $30,000 0.25 7,500 -$50,000 0.50 -$25,000 $20,000

Strategy 3 (Magnetic)


P(X=x)

x * P(X=x) $120,000 0.35 $42,000 $0 0.15 $0 -$50,000 0.50 -$25,000 $17,000

Strategy 4 (Don't)


P(X=x)

x * P(X=x) $0 1.00 $0 $0

The four strategies of the DriveTek problem have expected values of $15,000, $20,000, $17,000, and $0. Strategy 2 (Electronic) is the optimal strategy with expected value $20,000.

A risk neutral decision maker's choice is based on the expected value. However, note that if strategy 2 (Electronic) is chosen, the decision maker does not receive $20,000. The actual payoff will be $150,000, $30,000, or -$50,000, with probabilities shown in the payoff distribution.


Rollback Method If we have a method for determining certainty equivalents (expected values for a risk neutral decision maker), we don't need to examine every possible strategy explicitly. Instead, the method known as rollback determines the single best strategy.

The rollback algorithm, sometimes called backward induction or "average out and fold back," starts at the terminal nodes of the tree and works backward to the initial decision node, determining the certainty equivalent rollback values for each node. Rollback values are determined as follows:

• At a terminal node, the rollback value equals the terminal value.

• At an event node, the rollback value for a risk neutral decision maker is determined using expected value; the branch probability is multiplied times the successor rollback value, and the products are summed.

• At a decision node, the rollback value is set equal to the highest rollback value on the immediate successor nodes.

In TreePlan tree diagrams the rollback values are located to the left and below each decision, event, and terminal node. Terminal values and rollback values for the DriveTek problem are shown below.

Figure 12.37


$80,000


$150,0000.5 Try electronic method $150,000Awarded contract

$90,000 0.5$90,000 Electronic failure

$30,000$30,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method $120,000

$20,000$84,000 0.3

Magnetic failure$0

$0

0.5$20,000 Not awarded contract

-$50,000-$50,000


$0


Optimal Strategy After the rollback method has determined certainty equivalents for each node, the optimal strategy can be identified by working forward through the tree. At the initial decision node, the $20,000 rollback value equals the rollback value of the "Prepare proposal" branch, indicating the alternative that should be chosen. DriveTek will either be awarded the contract or not; there is a subsequent decision only if DriveTek obtains the contract. (In a more complicated decision tree, the optimal strategy must include decision choices for all decision nodes that might be encountered.) At the decision node following "Awarded contract," the $90,000 rollback value equals the rollback value of the "Try electronic method" branch, indicating the alternative that should be chosen. Subsequently, if the electronic method fails, DriveTek must use the mechanical method to satisfy the contract.

Cell B26 has the formula =IF(A27=E20,1,IF(A27=E34,2)) which displays 1, indicating that the first branch is the optimal choice. Thus, the initial choice is to prepare the proposal. Cell J11 has the formula =IF(I12=M4,1,IF(I12=M11,2,IF(I12=M21,3))) which displays 2, indicating that the second branch (numbered 1, 2, and 3, from top to bottom) is the optimal choice. If awarded the contract, DriveTek should try the electronic method.

The pairs of rollback values at the relevant decision nodes ($20,000 and $90,000) and the preferred decision branches are shown below in bold.


Figure 12.38

12345678910111213141516171819202122232425262728293031323334



$80,000


$150,0000.5 Try electronic method $150,000Awarded contract

2 $90,000 0.5$90,000 Electronic failure

$30,000$30,000

0.7Magnetic success

Prepare proposal $120,000Try magnetic method $120,000

$20,000$84,000 0.3

Magnetic failure$0

$0

1 0.5$20,000 Not awarded contract

-$50,000-$50,000


$0

Taking into account event branches with subsequent terminal nodes, all branches and terminal values associated with the optimal risk neutral strategy are shown below.


Figure 12.39





$30,000

0.7Magnetic success


0.3Magnetic failure

$0


-$50,000


The rollback method has identified strategy 2 (Electronic) as optimal. The rollback value on the initial branch of the optimal strategy is $20,000, which must be the same as the expected value for the payoff distribution of strategy 2. Some of the intermediate calculations for the rollback method differ from the calculations for the payoff distributions, but both approaches identify the same optimal strategy with the same initial expected value. For decision trees with a large number of strategies, the rollback method is more efficient.

12.6 NEWOX DECISION TREE PROBLEM The Newox Company is considering whether or not to drill for natural gas on its own land. If they drill, their initial expenditure will be $40,000 for drilling costs. If they strike gas, they must spend an additional $30,000 to cap the well and provide the necessary hardware and control equipment. (This $30,000 cost is not a decision; it is associated with the event "strike gas.") If they decide to drill but no gas is found, there are no other subsequent alternatives, so their outcome value is $-40,000.

If they drill and find gas, there are two alternatives. Newox could sell to West Gas, which has made a standing offer of $200,000 to purchase all rights to the gas well's production

12.7 Brandon Decision Tree Problem 163

(assuming that Newox has actually found gas). Alternatively, if gas is found, Newox can decide to keep the well instead of selling to West Gas; in this case Newox manages the gas production and takes its chances by selling the gas on the open market.

At the current price of natural gas, if gas is found it would have a value of $150,000 on the open market. However, there is a possibility that the price of gas will rise to double its current value, in which case a successful well will be worth $300,000.

The company's engineers feel that the chance of finding gas is 30 percent; their staff economist thinks there is a 60 percent chance that the price of gas will double.

12.7 BRANDON DECISION TREE PROBLEM Brandon Appliance Corporation, a predominant producer of microwave ovens, is considering the introduction of a new product. The new product is a microwave oven that will defrost, cook, brown, and boil food as well as sense when the food is done.

Brandon must decide on a course of action for implementing this new product line. An initial decision must be made to (1) nationally distribute the product from the start, (2) conduct a marketing test first, or (3) not market the product at all. If a marketing test is conducted, Brandon will consider the result and then decide whether to abandon the product line or make it available for national distribution.

The finance department has provided some cost information and probability assignments relating to this decision. The preliminary costs for research and development have already been incurred and are considered irrelevant to the marketing decision. A success nationally will increase profits by $5,000,000, and failure will reduce them by $1,000,000, while abandoning the product will not affect profits. The test market analysis will cost Brandon an additional $35,000.

If a market test is not performed, the probability of success in a national campaign is 60 percent. If the market test is performed, the probability of a favorable test result is 58 percent. With favorable test results, the probability for national success is approximately 93 percent. However, if the test results are unfavorable, the national success probability is approximately 14 percent.

Decision Tree Strategies Brandon Appliance Corporation must decide on a course of action for implementing this new microwave oven. An initial decision must be made to (1) nationally distribute the product from the start, (2) conduct a marketing test first, or (3) not market the product at all. If a marketing test is conducted, Brandon will consider the result and then decide whether to abandon the product line or make it available for national distribution. The


following decision tree is based on information about cash flows and probability assignments.

Figure 12.40 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National

0.070.58 FailureFavorable -$1,035

Don'tBrandon -$35

Test0.14Success

+$4,965National

0.860.42 FailureUnfavorable -$1,035

Don't-$35

Don't$0

In a decision tree model, a strategy is a specification of an initial choice and any subsequent choices that must be made by the decision maker.

How many strategies are there in the Brandon problem?

Describe each strategy.


Figure 12.41 Strategy 1: National 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0


Figure 12.42 Strategy 2: Test; if Favorable, National; if Unfavorable, National 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0


Figure 12.43 Strategy 3: Test; if Favorable, National; if Unfavorable, Don't 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0


Figure 12.44 Strategy 4: Test; if Favorable, Don't; if Unfavorable, National 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0


Figure 12.45 Strategy 5: Test; if Favorable, Don't; if Unfavorable, Don't 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0


Figure 12.46 Strategy 6: Don't 0.6Success

+$5,000National

0.4Failure

-$1,000

0.93Success

+$4,965National


Don'tBrandon -$35

Test0.14Success

+$4,965National


Don't-$35

Don't$0

Sensitivity Analysis for Decision Trees 13

13.1 ONE-VARIABLE SENSITIVITY ANALYSIS One-Variable Sensitivity Analysis using an Excel data table

1. Construct a decision tree model or financial planning model.

2. Identify the model input cell (H1) and model output cell (A10).

3. Modify the model so that probabilities will always sum to one. (That is, enter the formula =1-H1 in cell H6.)

Figure 13.1 Display for One-Variable Sensitivity Analysis

123456789

1011121314

A B C D E F G H I J K L0.6

High sales+$300

Introduce product +$600 +$300

-$300 +$100 0.4Low sales

-$2001 +$100 -$200

+$100

Don't introduce$0

$0 $0

Model Input Cell

Model Output Cell

=1-H1

4. Enter a list of input values in a column (N3:N13).

5. Enter a formula for determining output values at the top of an empty column on the right of the input values (=A10 in cell O2).

6. Select the data table range (N2:O13).

172 Chapter 13 Sensitivity Analysis for Decision Trees

7. From the Data menu choose the Table command.

Figure 13.2

123456789

1011121314

M N O P

+$1000.000.100.200.300.400.500.600.700.800.901.00

=A10

8. In the Data Table dialog box, select the Column Input Cell edit box. Type the model input cell (H1), or point to the model input cell (in which case the edit box displays $H$1). Click OK.

Figure 13.3

9. The Data Table command substitutes each input value into the model input cell, recalculates the worksheet, and displays the corresponding model output value in the table.

10. Optional: Change the formula in cell O2 to =CHOOSE(B9,”Introduce”,”Don’t”).

13.2 Two-Variable Sensitivity Analysis 173

Figure 13.4

123456789

1011121314

M N O PP(High Sales) Exp. Value

0.00 00.10 00.20 00.30 00.40 00.50 500.60 1000.70 1500.80 2000.90 2501.00 300

13.2 TWO-VARIABLE SENSITIVITY ANALYSIS Two-Variable Sensitivity Analysis using an Excel data table

Figure 13.5 Decision Tree for Strategy Region Table

123456789

10111213141516171819202122232425262728293031323334


Use mechanical method+$80,000

-$120,000 +$80,000


+$150,0000.50 Try electronic method $0 +$150,000Awarded contract

2 -$50,000 +$90,000 0.50+$250,000 +$90,000 Electronic failure

+$30,000-$120,000 +$30,000

0.70Magnetic success

Prepare proposal +$120,000Try magnetic method $0 +$120,000

-$50,000 +$20,000-$80,000 +$84,000 0.30

Magnetic failure$0

-$120,000 $0

1 0.50+$20,000 Not awarded contract

-$50,000$0 -$50,000


$0 $0


Optional: Activate the Base Case worksheet. From the Edit menu, choose Move Or Copy Sheet. In the Move Or Copy dialog box, check the box for Create A Copy, and click OK. Double-click the new worksheet tab and enter Strategy Region Table.

Setup for Data Table Select cell P11, and enter the formula =1–P6. Select cell P21, and enter the formula =1–P16.

In cell U3 enter P(Elec OK). In cell V3 enter 1, and in cell V4 enter 0.9. Select cells V3:V4. In the lower right corner of cell V4, click the fill handle and drag down to cell V13. With cells V3:V13 still selected, click the Increase Decimal button once so that all values are displayed with one decimal place.

Select columns V:AG. (Select column V. Click and drag the horizontal scroll bar until column AG is visible. Hold down the Shift key and click column AG.) From the Format menu choose Column | Width. In the Column Width edit box type 5 and click OK.

In cell W1 enter P(Mag OK). In cell W2 enter 0 (zero), and in cell X2 enter 0.1. Select cells W2:X2. In the lower right corner of cell X2, click the fill handle and drag right to cell AG2. With cells W2: AG2 still selected, click the Increase Decimal button once so that all values are displayed with one decimal place.

Select cell V2 and enter the formula =CHOOSE(J11,"Mech","Elec","Mag"). With the base case assumptions the formula shows Elec.

Figure 13.6

12345678910111213

U V W X Y Z AA AB AC AD AE AF AGP(Mag OK)

Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0P(Elec OK) 1.0

0.90.80.70.60.50.40.30.20.10.0

Obtaining Results Using Data Table Command Select the entire data table, cells V2:AG13.

13.2 Two-Variable Sensitivity Analysis 175

From the Data menu, choose Table. In the Table dialog box, type P16 in the Row Input Cell edit box, type P6 in the Column Input Cell edit box, and click OK.

With cells V2:AG13 still selected, click the Align Right button.

Figure 13.7

12345678910111213

U V W X Y Z AA AB AC AD AE AF AGP(Mag OK)

Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0P(Elec OK) 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec

0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag

Embellishments Select cells U1:AG13, and click the Copy button. Select cell AI1, right-click, and from the shortcut menu choose Paste Special. In the Paste Special dialog box, click the Values option button, and click OK. Right-click again, choose Paste Special, click the Formats option button, and click OK.

Select columns AJ:AU. Choose Format | Cells | Width, type 5, and click OK.

Select cell AJ2, right-click, and from the shortcut menu choose Clear Contents. Select cells AK2:AU2, move the cursor near the border of the selection until it becomes an arrow, click and drag the selection down to cells AK14:AU14. Similarly, select cell AK1 and move its contents down to cell AP15. Also, move the contents of cell AI3 to cell AI8. Select cell AN1, and enter Strategy Region Table.


Figure 13.8

123456789101112131415

AI AJ AK AL AM AN AO AP AQ AR AS AT AUStrategy Region Table

1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag

P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0P(Mag OK)

Apply borders to appropriate ranges and cells to show the strategy regions. Apply shading to cell AR8 to show the base case strategy.

Figure 13.9

123456789101112131415

AI AJ AK AL AM AN AO AP AQ AR AS AT AUStrategy Region Table

1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag

P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0P(Mag OK)

13.3 MULTIPLE-OUTCOME SENSITIVITY ANALYSIS Sensitivity Analysis for Multiple-Outcome Event Probabilities

Choose one of the outcome probabilities that will be explicitly changed.

For example, focus on P(Low Sales).

13.4 Robin Pinelli's Sensitivity Analysis 177

Keep same relative likelihood (base case) for the other probabilities.

Figure 13.10

123456789

10111213141516171819

A B C D E F G H I J K L M N O0.2 P(Low Sales) OptStratHigh Sales

+$1,500 1.00 Don't+$2,500 +$1,500 0.90 Don't

0.80 Don't0.5 0.70 Don't

Intro Medium Sales 0.60 Intro+$500 0.50 Intro

-$1,000 +$400 +$1,500 +$500 0.40 IntroBase -> 0.30 Intro

0.3 0.20 IntroLow Sales 0.10 Intro

1 -$500 0.00 Intro+$400 +$500 -$500

Don't$0

$0 $0

Figure 13.11

123456789

10111213141516171819

A B C D E F G H I J K L M N O=(0.2/(0.2+0.5))*(1-H11) P(Low Sales) OptStratHigh Sales =CHOOSE(B13,"Intro","Don't")

1.000.900.80

=(0.5/(0.2+0.5))*(1-H11) 0.70Intro Medium Sales 0.60

0.500.40

Base -> 0.300.3 0.20Low Sales 0.10

0.00

Don't

13.4 ROBIN PINELLI'S SENSITIVITY ANALYSIS Adapted from Clemen's Making Hard Decisions. Robin Pinelli is considering three job offers. In trying to decide which to accept, robin has concluded that three objectives are important in this decision. First, of course, is to maximize disposable income -- the amount left after paying for housing, utilities, taxes, and other necessities. Second, Robin likes cold weather and enjoys winter sports. The third objective relates to the quality of the community. Being single, Robin would like to live in a city with a lot of activities and a large population of single professionals.


Developing attributes for these three objectives turns out to be relatively straightforward. Disposable income can be measured directly by calculating monthly take-home pay minus average monthly rent (being careful to include utilities) for an appropriate apartment. The second attribute is annual snowfall. For the third attribute, Robin has located a magazine survey of large cities that scores those cities as places for single professionals to live. Although the survey is not perfect from Robin's point of view, it does capture the main elements of her concern about the quality of the singles community and available activities. Also all three of the cities under consideration are included in the survey.

Here are descriptions of the three job offers:

1 MPR Manufacturing in Flagstaff, Arizona. Disposable income estimate: $1600 per month. Snowfall range: 150 to 320 cm per year. Magazine score: 50 (out of 100).

2 Madison Publishing in St. Paul, Minnesota. Disposable income estimate: $1300 to $1500 per month. (This uncertainty here is because Robin knows there is a wide variety in apartment rental prices and will not know what is appropriate and available until spending some time in the city.) Snowfall range: 100 to 400 cm per year. Magazine score: 75.

3 Pandemonium Pizza in San Francisco, California. Disposable income estimate: $1200 per month. Snowfall range: negligible. Magazine score: 95.

Robin has created a decision tree to represent the situation. The uncertainty about snowfall and disposable income are represented by the chance nodes as Robin has included them in the tree. The ratings in the consequence matrix are such that the worst consequence has a rating of zero points and the best has 100.

Ratings in the consequence matrix (three attribute values at each endpoint of the decision tree) are proportional scores, corresponding to linear individual utility over the range of possible values for each attribute.

After considering the situation, Robin concludes that the quality of the city is most important, the amount of snowfall is next, and the third is income. (Income is important, but the variation between $1200 and $1600 is not enough to make much difference to Robin.) Furthermore, Robin concludes that the weight of the magazine rating in the consequence matrix should be 1.5 time the weight for the snowfall rating and three times as much as the weight for the income rating. This information is used to calculate the weights for the three attributes and to calculate overall scores for each of the endpoints in the decision tree.

13.4 Robin Pinelli's Sensitivity Analysis 179

Figure 13.12 Decision Tree and Multi-Attribute Utility (Robin Pinelli)

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152

A B C D E F G H I J K L M N O P Q R S T U VRobin Pinelli, Clemen2, pp. 150-151 Individual Utility Weight Ratio Input

Overall Mag/Snow 1.50Non-TreePlan Formulas Utility Income Snowfall Magazine Mag/Income 3.00V6 =V8/V3 0.15V7 =V8/V2 Snowfall 100 cm WeightsV8 =1/(1/V2+1/V3+1) 48.83 75 25 56 Income 0.167O6 =$V$6*Q6+$V$7*R6+$V$8*S6 Snowfall 0.333Select O6:O10; click and drag Magazine 0.500fill handle to O51. 0.60 0.70

Disp. Income $1500 Snowfall 200 cm57.17 75 50 56

0.15Snowfall 400 cm

73.83 75 100 56Madison Publishing

55.08 0.15Snowfall 100 cm

40.50 25 25 56

0.40 0.70Disp. Income $1300 Snowfall 200 cm

48.83 25 50 56

0.15Snowfall 400 cm

65.50 25 100 56

1 0.15Snowfall 150 cm

29.17 100 37.5 0

0.70MPR Manufacturing Snowfall 230 cm

35.83 100 57.5 035.96

0.15Snowfall 320 cm

43.33 100 80 0

Pandemonium Pizza50.00 0 0 100

50.00


Figure 13.13 Sensitivity Analysis of Weight-Ratio Input Assumptions

123456789

1011121314151617181920212223242526272829303132333435

X Y Z AA AB AC AD AE AF AG AHSensitivity Analysis

Mag/Income Weight Ratio1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Mag/Snow 1.00 Madison Madison Madison Madison Madison Madison Madison Madison MadisonWeight 1.25 Madison Madison Madison Madison Madison Madison Madison Madison Madison

Ratio 1.50 Madison Madison Madison Madison Madison Madison Madison Madison Madison1.75 Madison Madison Madison Madison Madison Madison Madison Pizza Pizza2.00 Madison Madison Madison Madison Madison Pizza Pizza Pizza Pizza2.25 Madison Madison Madison Madison Pizza Pizza Pizza Pizza Pizza2.50 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza2.75 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza3.00 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza

Mag/Income Weight Ratio0.25 0.50 0.75 1.00 1.50 2.00 2.50 3.00 3.50

Mag/Snow 0.25 MPR MPR MPR MPR Madison Madison Madison Madison MadisonWeight 0.50 MPR MPR MPR Madison Madison Madison Madison Madison Madison

Ratio 0.75 MPR MPR MPR Madison Madison Madison Madison Madison Madison1.00 MPR MPR MPR Madison Madison Madison Madison Madison Madison1.25 MPR MPR MPR Madison Madison Madison Madison Madison Madison1.50 MPR MPR MPR Madison Madison Madison Madison Madison Madison1.75 MPR MPR MPR Madison Madison Madison Madison Madison Madison2.00 MPR MPR MPR Madison Madison Madison Madison Madison Pizza2.25 MPR MPR MPR Madison Madison Madison Madison Pizza Pizza

FormulasY3 =CHOOSE(B34,"Madison","MPR", "Pizza")Y17 =CHOOSE(B34,"Madison","MPR", "Pizza")

Data Tables Y3:AH11 and Y17:AH26V3 Row Input CellV2 Column Input Cell

Value of Information in Decision Trees 14

14.1 VALUE OF INFORMATION Useful concept for

Evaluating potential information-gathering activities

Comparing importance of multiple uncertainties

14.2 EXPECTED VALUE OF PERFECT INFORMATION Several computational methods

Flipping tree, moving an event set of branches, appropriate for any decision tree

Payoff table, most appropriate only for single-stage tree (one set of uncertain outcomes with no subsequent decisions)

Expected improvement

All three methods start by determining Expected Value Under Uncertainty, EVUU, which is the expected value of the optimal strategy without any additional information.

To use these methods, you need (a) a model of your decision problem under uncertainty with payoffs and probabilities and (b) a willingness to summarize a payoff distribution (payoffs with associated probabilities) using expected value.

The methods can be modified to use certainty equivalents for a decision maker who is not risk neutral.

182 Chapter 14 Value of Information in Decision Trees

Expected Value of Perfect Information, Reordered Tree

Figure 14.1 Structure, Cash Flows, Endpoint Values, and Probabilities 0.5High Sales

$400,000$700,000

0.3Introduce Product Medium Sales

$100,000-$300,000 $400,000

0.2Low Sales

1 -$200,000$100,000

Don't Introduce$0

$0

Figure 14.2 Rollback Expected Values 0.5High Sales

$400,000


$100,000$190,000

0.2Low Sales

1 -$200,000$190,000

Don't Introduce$0

The two figures above show what is called the prior problem, i.e., the decision problem under uncertainty before obtaining any additional information.

14.2 Expected Value of Perfect Information 183

Figure 14.3 Structure Using Perfect Prediction

High Sales

Introduce Product Medium Sales

"High Sales" Low Sales

Don't Introduce

High Sales


Perfect Prediction "Medium Sales" Low Sales

Don't Introduce

High Sales


"Low Sales" Low Sales

Don't Introduce

Before you get a perfect prediction, you are uncertain about what that prediction will be.

If you originally think the probability of High Sales is 0.5, then you should also think the probability is 0.5 that a perfect prediction will tell you that sales will be high.

After you get a prediction of "High Sales," the probability of actually having high sales is 1.0.


Figure 14.4 Rollback Using Free Perfect Prediction 1.0

High Sales$400,000


$100,000$400,000

0.5 0.0"High Sales" Low Sales

1 -$200,000$400,000

Don't Introduce$0

0.0High Sales

$400,000


$100,000$100,000

0.3 0.0Perfect Prediction "Medium Sales" Low Sales

1 -$200,000$230,000 $100,000

Don't Introduce$0

0.0High Sales

$400,000


$100,000-$200,000

0.2 1.0"Low Sales" Low Sales

2 -$200,000$0

Don't Introduce$0

EVUU: Expected Value Under Uncertainty

the expected value of the best strategy without any additional information

EVPP Expected Value using a (free) Perfect Prediction

EVPI Expected Value of Perfect Information

EVPI = EVPP – EVUU

In this example, EVPI = $230,000 – $190,000 = $40,000


For a perfect prediction, the information message "Low Sales" is the same as the event Low Sales, so the detailed structure shown above is not needed.

Figure 14.5 Shortcut EVPP

Introduce Product0.5 $400,000High Sales

1$400,000

Don't Introduce$0

Introduce Product0.3 $100,000

Perfect Prediction Medium Sales1

$230,000 $100,000Don't Introduce

$0

Introduce Product0.2 -$200,000Low Sales

2$0

Don't Introduce$0

Expected Value of Perfect Information, Payoff Table This method is most appropriate only for a single-stage decision tree (one set of uncertain outcomes with no subsequent decisions).

Figure 14.6 Payoff Table for Prior Problem with Expected Values

AlternativesProbability Event Introduce Don't

0.5 High Sales $400,000 $00.3 Medium Sales $100,000 $00.2 Low Sales -$200,000 $0

Expected Value $190,000 $0


For each row in the body of the payoff table, if you receive a perfect prediction that the event in that row will occur, which alternative would you choose and what would your payoff be?

Before you receive the prediction, you don't know which of the payoffs you will receive (either $400,000 or $100,000 or $0), so you summarize the payoff distribution using expected value, EVPP.

Figure 14.7 Payoff Table with EVPP

Alternatives Payoff UsingProbability Event Introduce Don't Perfect Prediction

0.5 High Sales $400,000 $0 $400,0000.3 Medium Sales $100,000 $0 $100,0000.2 Low Sales -$200,000 $0 $0

Expected Value $190,000 $0 $230,000EVUU EVPP

EVPI = $230,000 – $190,000 = $40,000

Expected Value of Perfect Information, Expected Improvement Like the payoff table method, this method is most appropriate only for a single-stage decision tree.

(1) Use the prior decision tree or prior payoff table to find EVUU (the expected value of the best strategy without any additional information).

(2) If you are committed to the best strategy, consider each outcome of the uncertain event and whether you would change your choice if you received a perfect prediction that the event was going to occur.

In the example, you would not change your choice if you are told that sales will be high or medium. However, if you are told that sales will be low, you would change your choice from Introduce to Don't.

(3) Determine how much your payoff will improve in each of the cases.

In the example, your payoff will not improve if you are told that sales will be high or medium, but your payoff will improve by $200,000 (from –$200,000 to $0) if you are told that sales will be low.

(4) Compute expected improvement associated with having the perfect prediction by weighting each improvement by its associated probability.


In the example, the improvements associated with a perfect prediction of high, medium, and low are $0, $0, and $200,000, respectively, with probabilities 0.5, 0.3, 0.2.

EVPI = Expected Improvement = 0.5*0 + 0.3*0 + 0.2*200,000 = $40,000

Expected Value of Perfect Information, Single-Season Product

Figure 14.8 Prior Problem, Four Alternatives and Three Outcomes

123456789

101112131415161718192021222324252627

A B C D E FSingle-Season Product

Data

Price $3.00Equip. Size

None Small Medium LargeFixed Cost $0 $1,000 $2,000 $3,000Var. Cost $0.00 $0.90 $0.70 $0.50Capacity 0 4500 5500 6500

Payoff Table

Equip. SizeProb. Demand None Small Medium Large

0.3 3000 $0 $5,300 $4,900 $4,5000.4 4000 $0 $7,400 $7,200 $7,0000.3 5000 $0 $8,450 $9,500 $9,500

Exp.Val. $0 $7,085 $7,200 $7,000

C16 formula: =($B$5-C$9)*MIN(C$10,$B16)-C$8copied to C16:F18

C20 formula: =SUMPRODUCT($A16:$A18,C16:C18)copied to C20:F20

Figure 14.9 EVPP

14151617181920212223

A B C D E F G H IEquip. Size Payoff Using

Prob. Demand None Small Medium Large Perfect Prediction0.3 3000 $0 $5,300 $4,900 $4,500 $5,3000.4 4000 $0 $7,400 $7,200 $7,000 $7,4000.3 5000 $0 $8,450 $9,500 $9,500 $9,500

Exp.Val. $0 $7,085 $7,200 $7,000 $7,400

H16 formula =MAX(C16:F16) copied to H16:H18C20 formula copied to H20


EVPI = EVPP – EVUU = $7,400 – $7,200 = $200

Figure 14.10 Basic Probability Decision Tree High Sales

Introduce Product

Low SalesSuccess Prediction

Don't Introduce

High Sales

Introduce Product

Low SalesMarket Survey Inconclusive

Don't Introduce

High Sales

Introduce Product

Low SalesFailure Prediction

Don't Introduce

High Sales

Introduce Product

Low SalesDon't Survey

Don't Introduce


Figure 14.11 DriveTek EVPI Magnetic Success/Failure


+$80,000


+$150,0000.5 Try electronic method +$150,000Awarded contract

2 +$90,000 0.5+$90,000 Electronic failure

+$30,000+$30,000

0.7Magnetic success

Prepare proposal +$120,000Try magnetic method +$120,000

+$20,000+$84,000 0.3

Magnetic failure$0

$0No Additional Information

1 1 0.5+$20,000 +$20,000 Not awarded contract

-$50,000-$50,000


$0


+$80,000


+$150,0000.5 Try electronic method +$150,000Awarded contract

3 +$90,000 0.5+$120,000 Electronic failure

+$30,000+$30,000

1.02 Magnetic success

+$30,500 Prepare proposal +$120,000Try magnetic method +$120,000

+$35,000+$120,000 0.0

Magnetic failure$0

0.7 $0"Magnetic Success"


-$50,000-$50,000


$0


+$80,000

0.5Perfect Prediction Electronic success

+$150,000+$30,500 0.5 Try electronic method +$150,000

Awarded contract2 +$90,000 0.5

+$90,000 Electronic failure+$30,000

+$30,000

0.0Magnetic success

Prepare proposal +$120,000Try magnetic method +$120,000

+$20,000$0 1.0

Magnetic failure$0

0.3 $0"Magnetic Failure"


-$50,000-$50,000


$0


14.3 DRIVETEK POST-CONTRACT-AWARD PROBLEM DriveTek decided to prepare the proposal, and it turned out that they were awarded the contract. The $50,000 cost and $250,000 up-front payment are in the past. The current decision is to determine which method to use to satisfy the contract.

The following decision trees show costs as negative cash flows, so the decision criterion is to maximize expected cash flow. An alternative formulation (not shown here) would show all costs as positive values and would minimize expected cost.

Figure 14.12 EVUU

Use mechanical-120000

-120000 -120000


-50000Try electronic 0 -50000

2 -50000 -110000 0.5-110000 Electronic failure

-170000-120000 -170000

0.7Magnetic success

-80000Try magnetic 0 -80000

-80000 -116000 0.3Magnetic failure

-200000-120000 -200000

14.3 DriveTek Post-Contract-Award Problem 191

Figure 14.13 EVPP Elec


-120000 -120000

1Electronic success

-500000.5 Try electronic 0 -50000

"Electronic success"2 -50000 -50000 0

0 -50000 Electronic failure-170000

-120000 -170000

0.7Magnetic success



-200000-83000 -120000 -200000


-120000 -120000

0Electronic success

-500000.5 Try electronic 0 -50000

"Electronic failure"3 -50000 -170000 1


-120000 -170000

0.7Magnetic success



-200000-120000 -200000


Figure 14.14 EVPP Mag


-120000 -120000


-500000.7 Try electronic 0 -50000

"Magnetic success"3 -50000 -110000 0.5


-120000 -170000

1Magnetic success


-80000 -80000 0Magnetic failure

-200000-89000 -120000 -200000


-120000 -120000


-500000.3 Try electronic 0 -50000

"Magnetic failure"2 -50000 -110000 0.5


-120000 -170000

0Magnetic success



-200000-120000 -200000

14.3 DriveTek Post-Contract-Award Problem 193

Figure 14.15 EVPP Both EVPP Both


-120000 -120000

1Electronic success

-500000.7 Try electronic 0 -50000

"Magnetic success"2 -50000 -50000 0


-120000 -170000

1Magnetic success


0.5 -80000 -80000 0"Electronic success" Magnetic failure

-2000000 -50000 -120000 -200000


-120000 -120000

1Electronic success

-500000.3 Try electronic 0 -50000

"Magnetic failure"2 -50000 -50000 0


-120000 -170000

0Magnetic success



-200000-71000 -120000 -200000


-120000 -120000

0Electronic success

-500000.7 Try electronic 0 -50000

"Magnetic success"3 -50000 -170000 1


-120000 -170000

1Magnetic success


0.5 -80000 -80000 0"Electronic failure" Magnetic failure

-2000000 -92000 -120000 -200000


-120000 -120000

0Electronic success

-500000.3 Try electronic 0 -50000

"Magnetic failure"1 -50000 -170000 1


-120000 -170000

0Magnetic success



-200000-120000 -200000


14.4 SENSITIVITY ANALYSIS VS EVPI Working Paper Title: Do Sensitivity Analyses Really Capture Problem Sensitivity? An Empirical Analysis Based on Information Value

Authors: James C. Felli, Naval Postgraduate School and Gordon B. Hazen, Northwestern University

Date: March 1998

The most common methods of sensitivity analysis (SA) in decision-analytic modeling are based either on proximity in parameter-space to decision thresholds or on the range of payoffs that accompany parameter variation. As an alternative, we propose the use of the expected value of perfect information (EVPI) as a sensitivity measure and argue from first principles that it is the proper measure of decision sensitivity. EVPI has significant advantages over conventional SA, especially in the multiparametric case, where graphical SA breaks down. In realistically sized problems, simple one- and two-way SAs may not fully capture parameter interactions, raising the disturbing possibility that many published decision analyses might be overconfident in their policy recommendations. To investigate the extent of this potential problem, we re-examined 25 decision analyses drawn from the published literature and calculated EVPI values for parameters on which sensitivity analyses had been performed, as well as the entire set of problem parameters. While we expected EVPI values to indicate greater problem sensitivity than conventional SA due to revealed parameter interaction, we in fact found the opposite: compared to EVPI, the one- and two-parameter SAs accompanying these problems dramatically overestimated problem sensitivity to input parameters. This phenomenon can be explained by invoking the flat maxima principle enunciated by von Winterfeldt and Edwards.

http://www.mccombs.utexas.edu/faculty/jim.dyer/DA_WP/WP980019.pdf

Value of Imperfect Information 15

15.1 TECHNOMETRICS PROBLEM

Prior Problem Technometrics, Inc., a large producer of electronic components, is having some problems with the manufacturing process for a particular component. Under its current production process, 25 percent of the units are defective. The profit contribution of this component is $40 per unit. Under the contract the company has with its customers, Technometrics refunds $60 for each component that the customer finds to be defective; the customers then repair the component to make it usable in their applications. Before shipping the components to customers, Technometrics could spend an additional $30 per component to rework any components thought to be defective (regardless of whether the part is really defective). The reworked components can be sold at the regular price and will definitely not be defective in the customers' applications. Unfortunately, Technometrics cannot tell ahead of time which components will fail to work in their customers' applications. The following payoff table shows Technometrics' net cash flow per component.

Figure 15.1 Payoff Table Component Technometrics' Choice Condition Ship as is Rework first Good +$40 +$10 Defective -$20 +$10

What should Technometrics do?

How much should Technometrics be willing to pay for a test that could evaluate the condition of the component before making the decision to ship as is or rework first?

196 Chapter 15 Value of Imperfect Information

Imperfect Information An engineer at Technometrics has developed a simple test device to evaluate the component before shipping. For each component, the test device registers positive, inconclusive, or negative. The test is not perfect, but it is consistent for a particular component; that is, the test yields the same result for a given component regardless of how many times it is tested. To calibrate the test device, it was run on a batch of known good components and on a batch of know defective components. The results in the table below, based on relative frequencies, show the probability of a test device result, conditional on the true condition of the component.

Figure 15.2 Likelihoods Component Condition Test Result Good Defective Positive 0.70 0.10 Inconclusive 0.20 0.30 Negative 0.10 0.60

For example, of the known defective components tested, sixty percent had a negative test result.

An analyst at Technometrics suggested using Bayesian revision of probabilities to combine the assessments about the reliability of the test device (shown above) with the original assessment of the components' condition (25 percent defectives).

Technometrics uses expected monetary value for making decisions under uncertainty. What is the maximum (per component) the company should be willing to pay for using the test device?

Probabilities From Relative Frequencies

Figure 15.3 Joint Outcome Table

Component ConditionTest Result Good DefectivePositiveInconclusiveNegative

Random Process: select a component at random

15.1 Technometrics Problem 197

Six possible outcomes (most detailed description of result of random process), described by test result and component condition

Figure 15.4 Six Possible Outcomes

Component ConditionTest Result Good DefectivePositive P & G P & DInconclusive I & G I & DNegative N & G N & D

Event: a collection of outcomes

We say an event has occurred when the single outcome of the random process is contained in the event.

Five obvious events

For example, the event Good contains three outcomes in left column, and the event Negative contains two outcomes in the bottom row.

400 Components Classified by Test Result and Condition

Figure 15.5 Joint Frequency Table

Component ConditionTest Result Good DefectivePositive 210 10Inconclusive 60 30Negative 30 60

Figure 15.6 Joint Frequency Table with Row and Column Totals

Component ConditionTest Result Good DefectivePositive 210 10 220Inconclusive 60 30 90Negative 30 60 90

300 100 400


Figure 15.7 Joint Probability Table with Row and Column Totals

Component ConditionTest Result Good DefectivePositive 0.525 0.025 0.550Inconclusive 0.150 0.075 0.225Negative 0.075 0.150 0.225

0.750 0.250 1.000

15.1 Technometrics Problem 199

Figure 15.8 Decision Tree Model

123456789

1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859

A B C D E F G H I J K L M N O P Q R SEVSI 0.7500

2.25$ Good$40.00

Ship as is $40.00

$25.00 0.2500Defective

No add'l info -$20.001 -$20.00

$25.00EVUU

Rework first$10.00

$10.00

0.9545Good

$40.00Ship as is $40.00

$37.27 0.04550.5500 Defective

Positive -$20.002 1 -$20.00

$27.25 $37.27

Rework first$10.00

$10.00

0.6667Good

$40.00Ship as is $40.00

$20.00 0.33330.2250 Defective

Test Inconclusive -$20.001 -$20.00

$27.25 $20.00EVSP

Rework first$10.00

$10.00

0.3333Good

$40.00Ship as is $40.00

$0.00 0.66670.2250 Defective

Negative -$20.002 -$20.00

$10.00

Rework first$10.00

$10.00


Revision of Probability

Figure 15.9 Display

123456789101112131415

U V W X YPrior 0.75 0.25 = P(Main)Likelihood Good BadPositive 0.7 0.1 = P(Info | Main)Inconclusive 0.2 0.3Negative 0.1 0.6

Joint Good Bad PreposteriorPositive 0.525 0.025 0.550 = P(Info)Inconclusive 0.150 0.075 0.225Negative 0.075 0.150 0.225

Posterior Good BadPositive 0.9545 0.0455 = P(Main | Info)Inconclusive 0.6667 0.3333Negative 0.3333 0.6667

Figure 15.10 Formulas

123456789101112131415

U V W X YPrior 0.75 0.25 = P(Main)Likelihood Good BadPositive 0.7 0.1 = P(Info | Main)Inconclusive 0.2 0.3Negative 0.1 0.6

Joint Good Bad PreposteriorPositive =V$1*V3 =W$1*W3 =SUM(V8:W8) = P(Info)Inconclusive =V$1*V4 =W$1*W4 =SUM(V9:W9)Negative =V$1*V5 =W$1*W5 =SUM(V10:W10)

Posterior Good BadPositive =V8/$X8 =W8/$X8 = P(Main | Info)Inconclusive =V9/$X9 =W9/$X9Negative =V10/$X10 =W10/$X10

Modeling Attitude Toward Risk 16

16.1 RISK UTILITY FUNCTION A certainty equivalent is a certain payoff value which is equivalent, for the decision maker, to a particular payoff distribution. If the decision maker can determine his or her certainty equivalent for the payoff distribution of each strategy in a decision problem, then the optimal strategy is the one with the highest certainty equivalent.

The certainty equivalent, i.e., the minimum selling price for a payoff distribution, depends on the decision maker's personal attitude toward risk. A decision maker may be risk preferring, risk neutral, or risk avoiding.

If the terminal values are not regarded as extreme relative to the decision maker's total assets, if the decision maker will encounter other decision problems with similar payoffs, and if the decision maker has the attitude that he or she will "win some and lose some," then the decision maker's attitude toward risk may be described as risk neutral.

If the decision maker is risk neutral, the certainty equivalent of a payoff distribution is equal to its expected value. The expected value of a payoff distribution is calculated by multiplying each terminal value by its probability and summing the products.

If the terminal values in a decision situation are extreme or if the situation is "one-of-a-kind" so that the outcome has major implications for the decision maker, an expected value analysis may not be appropriate. Such situations may require explicit consideration of risk.

Unfortunately, it can be difficult to determine one's certainty equivalent for a complex payoff distribution. We can aid the decision maker by first determining his or her certainty equivalent for a simple payoff distribution and then using that information to infer the certainty equivalent for more complex payoff distributions.

A utility function, U(x), can be used to represent a decision maker's attitude toward risk. The values or certainty equivalents, x, are plotted on the horizontal axis; utilities or expected utilities, u or U(x), are on the vertical axis. You can use the plot of the function

202 Chapter 16 Modeling Attitude Toward Risk

by finding a value on the horizontal axis, scanning up to the plotted curve, and looking left to the vertical axis to determine the utility.

A typical risk utility function might have the general shape shown below if you draw a smooth curve approximately through the points.

Figure 16.1 Typical Risk Utility Function

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-$50,000 -$25,000 $0 $25,000 $50,000 $75,000 $100,000 $125,000 $150,000

Monetary Value x or Certainty Equivalent

Utilit

y U(

x) o

r Exp

ecte

d Ut

ility

Since more value generally means more utility, the utility function is monotonically non-decreasing, and its inverse is well-defined. On the plot of the utility function, you locate a utility on the vertical axis, scan right to the plotted curve, and look down to read the corresponding value.

The concept of a payoff distribution, risk profile, gamble, or lottery is important for discussing utility functions. A payoff distribution is a set of payoffs, e.g., x1, x2, and x3, with corresponding probabilities, P(X=x1), P(X=x2), and P(X=x3). For example, a payoff distribution may be represented in decision tree form as shown below.

16.1 Risk Utility Function 203

Figure 16.2 Figure 2 Payoff Distribution Probability Tree

P(X=x1)x1

P(X=x2)x2

P(X=x3)x3

The fundamental property of a utility function is that the utility of the certainty equivalent CE of a payoff distribution is equal to the expected utility of the payoffs, i.e,

U(CE) = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3).

It follows that if you compute the expected utility (EU) of a lottery,

EU = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3),

the certainty equivalent of the payoff distribution can be determined using the inverse of the utility function. That is, you locate the expected utility on the vertical axis, scan right to the plotted curve, and look down to read the corresponding certainty equivalent.

If a utility function has been determined, you can use this fundamental property to determine the certainty equivalent of any payoff distribution. Calculations for the Magnetic strategy in the DriveTek problem are shown below. First, using a plot of the utility function, locate each payoff x on the horizontal axis and determine the corresponding utility U(x) on the vertical axis. Second, compute the expected utility EU of the lottery by multiplying each utility by its probability and summing the products. Third, locate the expected utility on the vertical axis and determine the corresponding certainty equivalent CE on the horizontal axis.

Figure 16.3 Calculations Using Risk Utility Function P(X=x) x U(x) P(X=x)*U(x)

0.50 -$50,000 0.00 0.0000 0.15 $0 0.45 0.0675 0.35 $120,000 0.95 0.3325

0.4000 EU -$8,000 CE


16.2 EXPONENTIAL RISK UTILITY Instead of using a plot of a utility function, an exponential function may be used to represent risk attitude. The general form of the exponential utility function is

U(x) = A – B*EXP(–x/RT).

The risk tolerance parameter RT determines the curvature of the utility function reflecting the decision maker’s attitude toward risk. Subsequent sections cover three methods for determining RT.

EXP is Excel's standard exponential function, i.e., EXP(z) represents the value e raised to the power of z, where e is the base of the natural logarithms.

The parameters A and B determine scaling. After RT is determined, if you want to plot a utility function so that U(High) = 1.0 and U(Low) = 0.0, you can use the following formulas to determine the scaling parameters A and B.

A = EXP (–Low/RT) / [EXP (–Low/RT) – EXP (–High/RT)]

B = 1 / [EXP (–Low/RT) – EXP (–High/RT)]

The inverse function for finding the certainty equivalent CE corresponding to an expected utility EU is

CE = –RT*LN[(A–EU)/B],

where LN(y) represents the natural logarithm of y.

After the parameters A, B, and RT have been determined, the exponential utility function and its inverse can be used to determine the certainty equivalent for any lottery.

Calculations for the Magnetic strategy in the DriveTek problem are shown in Figure 4.

16.2 Exponential Risk Utility 205

Figure 16.4 Exponential Risk Utility Results

Computed values are displayed with four decimal places, but Excel's 15-digit precision is used in all calculations. For a decision maker with a risk tolerance parameter of $100,000, the payoff distribution for the Magnetic strategy has a certainty equivalent of -$7,676. That is, if the decision maker is facing the payoff distribution shown in A9:B12 in Figure 4, he or she would be willing to pay $7,676 to be relieved of the obligation.

Formulas are shown in Figure 5. To construct the worksheet, enter the text in column A and the monetary values in column B. To define names, select A2:B4, and choose Insert | Name | Create. Similarly, select A6:B7, and choose Insert | Name | Create. Then enter the formulas in B6:B7. Enter formulas in C10 and D10, and copy down. Finally, enter the EU formula in D13 and the CE formula in D15. The defined names are absolute references by default.


Figure 16.5 Exponential Risk Utility Formulas

Figure 6 shows results for the same payoff distribution using a simplified form of the exponential risk utility function with A = 1 and B = 1. This function could be represented as U(x) = 1–EXP(–x/RT) with inverse CE = –RT*LN(1–EU). The utility and expected utility calculations are different, but the certainty equivalent is the same.

Figure 16.6 Simplified Exponential Risk Utility Results

16.3 Approximate Risk Tolerance 207

16.3 APPROXIMATE RISK TOLERANCE The value of the risk tolerance parameter RT is approximately equal to the maximum value of Y for which the decision maker is willing to accept a payoff distribution with equally-likely payoffs of $Y and −$Y/2 instead of accepting $0 for certain.

Figure 16.7 Approximate Risk Tolerance 0.5Heads

+$YPlay

0.5Tails

-$Y/2

Don't$0

For example, in a personal decision, you may be willing to play the game shown in Figure 7 with equally-likely payoffs of $100 and –$50, but you might not play with payoffs of $100,000 and –$50,000. As the better payoff increases from $100 to $100,000 (and the corresponding worse payoff increases from –$50 to –$50,000), you reach a value where you are indifferent between playing the game and receiving $0 for certain. At that point, the value of the better payoff is an approximation of RT for an exponential risk utility function describing your risk attitude.

In a business decision for a small company, the company may be willing to play the game with payoffs of $200,000 and –$100,000 but not with payoffs of $20,000,000 and -$10,000,000. Somewhere between a better payoff of $200,000 and $20,000,000, the company would be indifferent between playing the game and not playing, thereby determining the approximate RT for their business decision.

16.4 EXACT RISK TOLERANCE USING EXCEL A simple payoff distribution, called a risk attitude assessment lottery, may be used to determine the decision maker's attitude toward risk. This lottery has equal probability of obtaining each of the two payoffs. It is good practice to use a better payoff at least as large as the highest payoff in the decision problem and a worse payoff as small as or smaller than the lowest payoff. In any case, the payoffs should be far enough apart that the decision maker perceives a definite difference in the two outcomes. Three values must be specified for the fifty-fifty lottery: the Better payoff, the Worse payoff, and the Certainty Equivalent, as shown in Figure 8.


Figure 16.8 Risk Attitude Assessment Lottery

0.5Better Payoff

CertaintyEquivalent =

0.5Worse Payoff

According to the fundamental property of a risk utility function, the utility of the certainty equivalent equals the expected utility of the lottery, so the three values are related as follows.

U(CertEquiv) = 0.5*U(BetterPayoff) + 0.5*U(WorsePayoff)

If you use the general form for an exponential utility function with parameters A, B, and RT, and if you simplify terms, it follows that RT must satisfy the following equation.

Exp(–CertEquiv/RT) = 0.5*Exp(–BetterPayoff/RT) + 0.5*Exp(–WorsePayoff/RT)

Given the values for CE, Better, and Worse, you could use trial-and-error to find the value of RT that exactly satisfies the equation. In Excel you can use Goal Seek or Solver by creating a worksheet like Figure 9.

Enter the text in column A. Enter the assessment lottery values in B2:B4. Enter a tentative RT value in B6. Select A2:B4, and use Insert | Name | Create; repeat for A6:B6 and A8:B9. Note that the parentheses symbol is not allowed in a defined name, so Excel changes U(CE) to U_CE and EU(Lottery) to EU_Lottery.

16.4 Exact Risk Tolerance Using Excel 209

Figure 16.9 Formulas for Risk Tolerance Search

Figure 16.10 Tentative Values for Risk Tolerance Search

Figure 10 shows tentative values for the search. From the Tools menu, choose Goal Seek. In the Goal Seek dialog box, enter B11, 0, and B6. If you point to cells, the reference appears in the edit box as an absolute reference, as shown in Figure 11. Click OK.


Figure 16.11 Goal Seek Dialog Box

The Goal Seek Status dialog box shows that a solution has been found. Click OK. The worksheet appears as shown in Figure 12.

Figure 16.12 Results of Goal Seek Search

The difference between U(CE) and EU(Lottery) is not exactly zero. If you start at $250,000, the Goal Seek converges to a difference of –6.2E–05 or 0.000062, which is closer to zero, resulting in a RT of $243,041.

If extra precision is needed, use Solver. With Solver's default settings, the difference is 2.39E–08 with RT equal to $243,261. If you change the precision from 0.000001 to 0.00000001 or an even smaller value in Solver's Options, the difference will be even closer to zero.

16.5 Exact Risk Tolerance Using RiskTol.xla 211

16.5 EXACT RISK TOLERANCE USING RISKTOL.XLA The Goal Seek and Solver methods for determining the risk tolerance parameter RT yield static results. For a dynamic result, use the risktol.xla add-in function. A major advantage of risktol.xla is that it facilitates sensitivity analysis. Whenever an input to the function changes, the result is recalculated. The function syntax is

RISKTOL(WorsePayoff,CertEquiv,BetterPayoff,BetterProb).

When you open the risktol.xla file, the function is added to the Math & Trig function category list.

The function returns a very precise value of the risk tolerance parameter for an exponential utility function. The result is consistent with CertEquiv as the decision maker’s certainty equivalent for a two-payoff assessment lottery with payoffs WorsePayoff and BetterPayoff, with probability BetterProb of obtaining BetterPayoff and probability 1 − BetterProb of obtaining WorsePayoff.

In case of an error, the RISKTOL function returns:

#N/A if there are too few or too many arguments. The first three arguments (WorsePayoff, CertEquiv, and BetterPayoff) are required; the fourth argument (BetterProb) is optional, with default value 0.5.

#VALUE! if WorsePayoff >= CertEquiv, or CertEquiv >= Better Payoff, or BetterProb (if specified) <= 0 or >= 1.

#NUM! if the search procedure fails to converge.

In Figure 13, the text in cells A2:A4 has been used as defined names for cells B2:B4, and the text in cell A6 is the defined name for cell B6, as shown in the name box. After opening the risktol.xla file, enter the function name and arguments, as shown in the formula bar. If one of the three inputs change, the result in cell B6 is recalculated.

Figure 16.13 Exact Risk Tolerance Using RiskTol.xla


16.6 EXPONENTIAL UTILITY AND TREEPLAN TreePlan's default is to rollback the tree using expected values. If you choose to use exponential utilities in TreePlan's Options dialog box, TreePlan will redraw the decision tree diagram with formulas for computing the utility and certainty equivalent at each node. For the Maximize option, the rollback formulas are U = A–B*EXP(–X/RT) and CE = -LN((A-EU)/B)*RT, where X and EU are cell references. For the Minimize option, the formulas are U = A-B*EXP(X/RT) and CE = LN((A-EU)/B)*RT.

TreePlan uses the name RT to represent the risk tolerance parameter of the exponential utility function. The names A and B determine scaling. If the names A, B, and RT don't exist on the worksheet when you choose to use exponential utility, they are initially defined as A=1, B=1, and RT=999999999999. You can redefine the names using the Insert | Name | Define or Insert | Name | Create commands.

To plot the utility curve, enter a list of X values in a column on the left, and enter the formula =A−B*EXP(−X/RT) in a column on the right, where X is a reference to the corresponding cell on the left. Select the values in both columns, and use the ChartWizard to develop an XY (Scatter) chart.

If RT is specified using approximate risk tolerance values, you can perform sensitivity analysis by (1) using the defined name RT for a cell, (2) constructing a data table with a list of possible RT values and an appropriate output formula (usually a choice indicator at a decision node or a certainty equivalent), and (3) specifying the RT cell as the input cell in the Data Table dialog box.

16.7 EXPONENTIAL UTILITY AND RISKSIM After using RiskSim to obtain model output results, select the column containing the Sorted Data, copy to the clipboard, select a new sheet, and paste. Alternatively, you can use the unsorted values, and you can also do the following calculations on the original sheet containing the model results. This example uses only ten iterations; 500 or 1,000 iterations are more appropriate.

Use one of the methods described previously to specify values of RT, A, and B. Since the model output values shown in Figures 14 and 15 range from approximately $14,000 to $176,000, the utility function is defined for a range from worse payoff $0 to better payoff $200,000. RT was determined using risktol.xla with a risk-seeking certainty equivalent of $110,000.

To obtain the utility of each model output value in cells A2:A11, select cell B2, and enter the formula =A−B*EXP(−A2/RT). Select cell B2, click the fill handle in the lower right corner of the cell and drag down to cell B11. Enter the formulas in cells A13:C13 and the labels in row 14.

16.7 Exponential Utility and RiskSim 213

Figure 16.14 Risk Utility Formulas for RiskSim

123456789

1011121314

A B CSorted Data Utility

14229.56 =A-B*EXP(-A2/RT)32091.92 =A-B*EXP(-A3/RT)51091.48 =A-B*EXP(-A4/RT)66383.79 =A-B*EXP(-A5/RT)69433.32 =A-B*EXP(-A6/RT)87322.23 =A-B*EXP(-A7/RT)95920.93 =A-B*EXP(-A8/RT)135730.71 =A-B*EXP(-A9/RT)154089.36 =A-B*EXP(-A10/RT)175708.87 =A-B*EXP(-A11/RT)

=AVERAGE(A2:A11) =AVERAGE(B2:B11) =-LN((A-B13)/B)*RTExp. Value Exp.Util. CE

Figure 16.15 Risk Utility Results for RiskSim

123456789

1011121314

A B CSorted Data Utility

14,230$ 0.0586232,092$ 0.1346251,091$ 0.2185166,384$ 0.2884169,433$ 0.3026087,322$ 0.3876795,921$ 0.42966

135,731$ 0.63382154,089$ 0.73363175,709$ 0.85600

88,200$ 0.40435 90,757$ Exp. Value Exp.Util. CE


16.8 RISK SENSITIVITY FOR MACHINE PROBLEM

Figure 16.16

12345678910111213141516171819202122232425262728293031323334353637383940

A B C D E F G H I J K LProcess 1 NPV Utility Process 2 NPV Utility RT AJS, Clemen2

$107,733 0.102133 $86,161 0.082554 $1,000,000 pp. 428-430$39,389 0.038623 $58,417 0.056744

$125,210 0.117689 $171,058 0.157228 Process 1 Process 2$66,032 0.063899 $263,843 0.231906$32,504 0.031982 $37,180 0.036498 ExpUtility 0.085527 0.107258

$138,132 0.129016 $254,027 0.224329$83,000 0.079649 $118,988 0.112181 CertEquiv $89,407 $113,458$48,178 0.047036 $133,862 0.125289$20,130 0.019928 $26,597 0.026247 ExpValue $90,526 $116,159$31,445 0.030956 $187,063 0.170608$19,739 0.019546 $88,060 0.084294

$4,641 0.00463 $114,837 0.108489$92,368 0.08823 $130,638 0.122465 Goal Seek

$102,585 0.097498 $138,882 0.12967$106,411 0.100945 $226,909 0.203006 CE2 - CE1 $24,050$110,528 0.104639 $156,102 0.144528$171,524 0.15762 $193,209 0.17569$87,698 0.083963 $92,004 0.087898

$123,907 0.116538 $163,780 0.151071 NPV values from RiskSim Summary$69,783 0.067404 $22,176 0.021932 Cell I2 has defined name RT

$144,052 0.134157 $135,190 0.12645 Formulas$131,461 0.123187 $61,013 0.059189 C2 =1-EXP(-B2/RT)$34,938 0.034335 $184,907 0.168819 Copy down to C1001$75,551 0.072768 $70,967 0.068507 G2 =1-EXP(-F2/RT)$32,144 0.031633 -$10,251 -0.010304 Copy down to G1001$61,719 0.059853 $89,645 0.085744 J6 =AVERAGE(C2:C1001)

$139,568 0.130266 $119,405 0.112551 K6 =AVERAGE(G2:G1001)$89,107 0.085252 $96,670 0.092144 J8 =-RT*LN(1-J6)$94,158 0.089861 $114,124 0.107853 K8 =-RT*LN(1-K6)$81,459 0.07823 $208,778 0.188425 J10 =AVERAGE(B2:B1001)

$139,258 0.129997 $24,580 0.02428 K10 =AVERAGE(F2:F1001)$58,190 0.056529 $155,958 0.144405 J16 =K8-J8

-$13,104 -0.01319 $198,519 0.180056$36,529 0.035869 $167,568 0.154281$91,239 0.0872 $36,676 0.036011

$147,155 0.13684 $225,777 0.202104$154,168 0.142872 $195,738 0.177773$180,770 0.165372 $53,467 0.052063$112,313 0.106235 $213,920 0.192587

16.9 Risk Utility Summary 215

Figure 16.17

123456789101112131415161718192021222324252627282930

M N O P Q R S T U V W X Y ZRiskTolerance CE Process 1 CE Process 2

$5,000 -$25,597 -$37,262$10,000 $3,504 -$10,097$15,000 $23,904 $10,897$20,000 $37,468 $26,409$25,000 $46,811 $38,010$30,000 $53,528 $46,998$35,000 $58,541 $54,184$40,000 $62,404 $60,067$45,000 $65,459 $64,972$50,000 $67,930 $69,122$55,000 $69,966 $72,675$60,000 $71,672 $75,749$65,000 $73,119 $78,431$70,000 $74,363 $80,791$75,000 $75,443 $82,882$80,000 $76,389 $84,746$85,000 $77,224 $86,417$90,000 $77,966 $87,924$95,000 $78,631 $89,288

$100,000 $79,229 $90,529

FormulasN2 =J8O2 =K8

Data TableI2 Column Input Cell

AJS

-$40,000

-$20,000

$0

$20,000

$40,000

$60,000

$80,000

$100,000

$0 $10,000 $20,000 $30,000 $40,000 $50,000 $60,000 $70,000 $80,000 $90,000 $100,000

Risk Tolerance Parameter for Exponential Utility

Cer

tain

ty E

quiv

alen

tProcess 1

Process 2

16.9 RISK UTILITY SUMMARY

Concepts Strategy, Payoff Distribution, Certainty Equivalent

Figure 16.18 Utility Function

Utility Function

0.00.10.20.30.40.50.60.70.80.91.0

-50000 -25000 0 25000 50000 75000 100000 125000 150000Value or Certainty Equivalent, x

Util

ity o

r Exp

ecte

d U

tility

, U(x

)


Fundamental Property of Utility Function The utility of the CE of a lottery equals the expected utility of the lottery's payoffs.

U(CE) = EU = p1*U(x1) + p2*U(x2) + p3*U(x3)

Using a Utility Function To Find the CE of a Lottery 1. U(x): Locate each payoff on the horizontal axis and determine the corresponding

utility on the vertical axis.

2. EU: Compute the expected utility of the lottery by multiplying each utility by its probability and summing the products.

3. CE: Locate the expected utility on the vertical axis and determine the corresponding certainty equivalent on the horizontal axis.

Exponential Utility Function General form: U(x) = A − B*EXP(−x/RT)

Parameters A and B affect scaling.

Parameter RT (RiskTolerance) depends on risk attitude and affects curvature.

Inverse: CE = −RT*LN[(A−EU)/B]

TreePlan's Simple Form of Exponential Utility Set A and B equal to 1.

U(x) = 1 − EXP(−x/RT)

CE = −RT*LN(1−EU)

Approximate Assessment of RiskTolerance Refer to the Clemen textbook, Figure 13.12, on page 478.

16.9 Risk Utility Summary 217

Figure 16.19 Assessing ApproximateRisk Tolerance Risk tolerance parameter for an exponential utility function is approximatelyequal to the maximum amount Y for which the decision maker will play.

0.5Heads

+$Y +$10 +$100 +$1,000 +$10,000 +$100,000 +$200,000 +$300,000Play

0.5Tails

-$Y/2 -$5 -$50 -$500 -$5,000 -$50,000 -$100,000 -$150,000

Don't$0 $0 $0 $0 $0 $0 $0 $0

more lessrisk riskaversion aversion

Exact Assessment of RiskTolerance The RISKTOL.XLA Excel add-in file adds the following function to the Math & Trig function category list:

RISKTOL(WorsePayoff,CertEquiv,BetterPayoff,BetterProb)

The first three arguments are required, and the last argument is optional with default value 0.5. WorsePayoff and BetterPayoff are payoffs of an assessment lottery, and CertEquiv is the decision maker's certainty equivalent for the lottery.

RISKTOL returns #N/A if there are too few or too many arguments, #VALUE! if WorsePayoff >= CertEquiv, or CertEquiv >= Better Payoff, or BetterProb (if specified) <= 0 or >= 1, and #NUM! if the search procedure fails to converge.

For example, consider a 50-50 lottery with payoffs of $100,000 and $0. A decision maker has decided that the certainty equivalent is $43,000. If you open the RISKTOL.XLA file and type =RISKTOL(0,43000,100000) in a cell, the result is 176226. Thus, the value of the RiskTolerance parameter in an exponential utility function for this decision maker should be 176226.

Using Exponential Utility for TreePlan Rollback Values 1. Select a cell, and enter a value for the RiskTolerance parameter.

2. With the cell selected, choose Insert Name | Define, and enter RT.

3. From TreePlan's Options dialog box, select Use Exponential Utility. The new decision tree diagram includes the EXP and LN functions for determining U(x) and the inverse.


Using Exponential Utility for a Payoff Distribution Enter the exponential utility function directly, using the appropriate value for RiskTolerance. If the payoff values are equally-likely, use the AVERAGE function to determine the expected utility; otherwise, use SUMPRODUCT. Enter the inverse function directly to obtain the certainty equivalent.

Part 4 Data Analysis

Part 4 reviews basic concepts of data analysis and uses multiple regression to model relationships for both cross-sectional and time series data.

The spreadsheet analysis uses Excel's standard Analysis ToolPak. Several chapters include step-by-step instructions for descriptive statistics, histograms, and multiple regression.

220


Introduction to Data Analysis 17

Why analyze data? understand and explain past variation

predict future observations

measure relationships among variables

object of analysis: person, thing, business entity, etc.

characteristic of interest: weight, hair color, diameter, sales, etc.

measurement of the characteristic: pounds, blond/brunette/red/etc., inches, dollars, etc.

17.1 LEVELS OF MEASUREMENT called measurement scales by some authors

important distinctions because analysis and summary methods are very different

two general levels of measurement, each with two specific levels

Categorical Measure also called qualitative measure

assign a category level to each object of analysis

Nominal Measure: simple classification, "assign a name"

Ordinal Measure: ranked categories, "assign an ordered classification"

Numerical Measure also called quantitative measure

assign a numerical value to each object of analysis

Interval Measure:, rankings and numerical differences are meaningful

222 Chapter 17 Introduction to Data Analysis

Ratio Measure: natural zero and numerical ratios are meaningful

17.2 DESCRIBING CATEGORICAL DATA List each categorical level with frequencies (counts) or relative frequencies (percentages).

Use an Excel pivot table to obtain frequencies.

Use an Excel bar chart, column chart, or pie chart.

To display the relationship between two categorical measures, use a two-way classification table.

For nominal data, the appropriate summary measure is the mode (most frequently occurring level)

For ordinal data, the appropriate summary measures are the mode and median (the middle-ranked category level with approximately 50% of the counts below and approximately 50% above).

Do not assign meaningless numerical values to the categorical levels.

Do not use the mean and standard deviation.

17.3 DESCRIBING NUMERICAL DATA

Frequency Distribution and Histogram Determine the range (maximum minus minimum), generally use between 5 and 15 equally-spaced intervals, and pick "nice" numbers for the upper limit of each interval (Excel "bins").

Use Excel's Histogram analysis tool, or use Excel's FREQUENCY array-entered worksheet function with an Excel Column chart (vertical bars).

Numerical Summary Measures Appropriate summary measures for central tendency ("What's a typical value?") include mean (average, most appropriate for mound-shaped data), median, and mode.

Appropriate summary measures for dispersion ("How typical is the typical value?") include range, standard deviation (most appropriate for mound-shaped distributions), and fractiles (first quartile, or 25th percentile, is a value with approximately 25% of the values below it and approximately 75% of the values above).

17.3 Describing Numerical Data 223

Appropriate summary measures for shape are Excel's SKEW worksheet function and Pearson's coefficient of skewness.

Distribution Shapes

Figure 17.1 Positively Skewed Distribution (Skewed to the Right)

Value

Freq

uenc

y

In a distribution with positive skew, the mean is greater than the median.

Figure 17.2 Negatively Skewed Distribution (Skewed to the Left)

Value

Freq

uenc

y

224 Chapter 17 Introduction to Data Analysis

In a distribution with negative skew, the mean is less than the median.

Figure 17.3 Mound-Shaped Distribution (Symmetric)

Value

Freq

uenc

y

In a symmetric distribution, the mean and median are equal.

Figure 17.4 Bimodal Distribution

Value

Freq

uenc

y

In a bimodal distribution, there is often a distinguishing characteristic for the two groups of data that have been combined into a single distribution.

Univariate Numerical Data 18

Excel includes several analysis tools useful for summarizing single-variable data. The Descriptive Statistics analysis tool provides measures of central tendency, variability, and skewness. The Histogram analysis tool provides a frequency distribution table, cumulative frequencies, and the histogram column chart.

These tools are appropriate for data without any time dimension. If the data were collected over time, first examine a time sequence plot of the data to detect patterns. If the time sequence plot appears random, then the univariate tools may be used to summarize the data.

If the Data Analysis command doesn't appear on the Tools menu, choose the Add-Ins command from the Tools menu; in the Add-Ins Available list box, check the box next to Analysis Tools. If Analysis Tools doesn't appear in the Add-Ins Available list box, you may need to add the Analysis ToolPak through a custom installation using the Microsoft Excel Setup program.

18.1 ANALYSIS TOOL: DESCRIPTIVE STATISTICS Example 18.1 The operating costs of the vehicles used by your company's salespeople are too high. A major component of operating expense is fuel costs; to analyze fuel costs, you collect mileage data from the company's cars for the previous month. Later you may examine other characteristics of the cars-for example, make, model, driver, or routes.

The following steps describe how to use Excel's Descriptive Statistics analysis tool.

1. Open a new worksheet and enter the gas mileage data in column A as shown in Figure 18.1. Be sure the values in your data set are entered in a single column on the worksheet, with a label in the cell just above the first value. Excel uses this label in the report on summary values.

2. From the Tools menu, choose the Data Analysis command. The Analysis Tools dialog box is shown in Figure 18.1.

226 Chapter 18 Univariate Numerical Data

Figure 18.1 Analysis Tools Dialog Box

3. Double-click Descriptive Statistics. The dialog box for Descriptive Statistics appears as shown in Figure 18.2, with prompts for inputs and outputs.

4. Input Range: Enter the reference for the range of cells containing the data, including the labels for the data sets. In Example 18.1 either type A1:A18 or click on cell A1 and drag to cell A18 (in which case $A$1:$A$18 appears as the input range). Press the Tab key to move to the next field of the dialog box. Do not press Enter or click OK until all the boxes are filled.

5. Grouped By: Click Columns for this example (if the data were arranged in rows on the worksheet, you would choose Rows).

6. Labels in First Row (or Labels in First Column, where the data are arranged in rows): Select this checkbox because the input range in this example includes a label.

18.1 Analysis Tool: Descriptive Statistics 227

Figure 18.2 Descriptive Statistics Dialog Box

7. Output Range: Click the option button, click the adjacent edit box, and specify a reference for the upper-left cell of the range where the descriptive statistics output should appear, either by typing C1 or by clicking on cell C1 (in which case $C$1 appears as the output range as shown in this example). Alternatively, you can choose to send the output to a new sheet in the current workbook or to a new sheet in a new workbook.

8. Summary statistics: This feature is the primary reason for using the Descriptive Statistics analysis tool, so it should be selected. The summary statistics require two columns in the output range for each data set.

9. Confidence Level for Mean: Select this checkbox to see the half-width of a confidence interval for the mean, and type a number in the % edit box for the desired confidence level. This example requests the half-width for a 90% confidence interval.

10. Kth Largest: Select this checkbox if you want to know the kth largest value in the data set, and type a number for k in the Kth Largest edit box. This example requests the fourth largest value.

11. Kth Smallest: Select this checkbox to get the kth smallest value in the data set and type a number for k in the Kth Smallest edit box. This example requests the fourth smallest value.


12. When finished, click OK. Excel computes the descriptive statistics and puts the results in the output range.

Formatting the Output Table The following steps describe how to change the column width and numerical display for the descriptive statistics output table.

1. To adjust column C's width to fit the longest entry, double-click the column heading border between C and D. To adjust column D's width, double-click the column heading border between D and E. (Alternatively, select columns C and D. From the Format menu, choose the Column command and choose AutoFit Selection.)

Some of the values in the output table are displayed with nine decimal places. To make the table easier to read, select cells, even noncontiguous ones, as a group and reformat them with fewer decimal places.

2. First select the Mean and Standard Error values in cells D3 and D4. (Click on D3, drag to cell D4, and release the mouse button.) Then hold down the Control key, and click on cell D7, drag to cell D10, and release. Finally, hold down the Control key, and click on cell D18. To decrease the number of decimal places displayed, repeatedly click on the Decrease Decimal button until the selected cells show three decimal places. (Alternatively, select the nonadjacent cells as described and choose the Cells command from the Format menu. In the Format Cells dialog box, select the Number tab. In the Category list box, select Number. Type 3 in the Decimal Places edit box, or click the spinner controls until 3 appears, and click OK.)

3. To adjust column D's width to fit the longest entry, double-click the column heading border between D and E.

The results are shown in columns A through D in Figure 18.3. The values in column D are static. If the data values in column A are changed, these results are not automatically updated. You must use the Descriptive Statistics command again to obtain updated results.

Column F in Figure 18.3 shows the worksheet functions that would produce the same results shown in column D. The worksheet functions are dynamic. If the data values in column A are changed, the result of a worksheet function is automatically recalculated (unless you have selected manual calculation using Tools | Options | Calculation | Manual). A worksheet function is useful if you want dynamic recalculation or if you don't want all of the summary statistics.

A worksheet usually displays the results of formulas, not the formulas themselves. If you want to see all formulas, choose Tools | Options | View | Formulas. However, the formula


view uses different column widths and formatting for the entire worksheet. To display only specific formulas, put a single quotation mark before the equal sign so that Excel displays the cell contents as text, as shown in column F in Figure 18.3.

Figure 18.3 Descriptive Statistics Output

Interpreting Descriptive Statistics The output table contains three measures of central tendency: mean, median, and mode. The mean gas mileage is 23.471 mpg, computed by dividing the sum (399) by the count (17).

The median is the middle-ranked value, here 21 mpg. Thus, approximately half of the cars have gas mileage greater than 21 mpg, and approximately half get less than 21 mpg. If the 17 values are sorted, and ranks 1 through 17 are assigned to the sorted values, then the middle-ranked value is the ninth value, 21 mpg. There are 8 values below this ninth-ranked value and 8 values above. (In a data set with an odd number of values, n, the median is the value with rank (n + 1)/2. In a data set with an even number of values, the median is a value halfway between the two middle values with ranks n/2 and n/2 + 1.)

The mode is the most frequently occurring value, reported here as 21 mpg. Actually, the value 21 mpg appears twice and the value 19 mpg also appears twice, so there are two modes. When two or more values have the same number of duplicate values (multiple modes), Excel reports the value that appears first in your data set.

In some data sets, each value may be unique, in which case each value is a mode, and Excel reports "#N/A." Where this occurs, first develop a frequency distribution and then report a range of values with the highest frequency; this result is termed a modal interval.

The output table contains several measures of variation. The range (33 mpg) equals the maximum (41 mpg) minus the minimum (8 mpg). In some data sets the range may be a


misleading measure of variation because it is based only on the two most extreme values, which may not be representative.

The sample standard deviation (9.214 mpg) is the most widely used measure of variation in data analysis. For each value in the data set the deviation between the value and the mean is computed. Each deviation is squared, and the squared deviations are summed. The sum of the squared deviations is divided by the count minus one (that is, n – 1), obtaining the sample variance (84.890). The standard deviation equals the square root of the variance.

The standard deviation has the same units or dimensions as the original values: mpg, in this example. The variance is expressed in squared units: squared miles per gallon. The standard deviation and variance reported in the output table are the sample standard deviation and sample variance, computed using n – 1 in the denominator. To determine the population standard deviation and population variance, computed using n in the denominator, use the STDEVP and VARP worksheet functions.

The largest(4) and smallest(4) values in the output table are the fourth largest (33 mpg) and fourth smallest (16 mpg) gas mileage values. To obtain similar results for all values in the data set, use the Rank and Percentile analysis tool. These values correspond to approximately the 75th percentile (third quartile) and 25th percentile (first quartile) in the data set of 17 values. Interpolated values for the third and first quartiles are obtained using the QUARTILE worksheet function, =QUARTILE(A2:A18,3) and =QUARTILE(A2:A18,1), respectively.

The standard error of the mean (2.235 mpg) equals the sample standard deviation divided by the square root of the sample size. The standard error is a measure of uncertainty about the mean, and it is used for statistical inference (confidence intervals and hypothesis tests).

The value shown for the confidence level (90.0%) (3.901 mpg) is the half-width of a 90% confidence interval for the mean. The specified confidence level, 90% in this example, corresponds to t = 1.746 for the t distribution with 10% in the sum of two tails and n – 1 = 17 – 1 = 16 degrees of freedom. The half-width of a confidence interval is t times the standard error—that is, 1.7459 times 2.2346 mpg, or 3.901 mpg.

A 90% confidence interval for the mean extends from the mean minus the half-width to the mean plus the half-width—that is, from 23.471 – 3.901 to 23.471 + 3.901, or approximately 19.6 to 27.4 mpg. Therefore, if we think of these 17 cars as a random sample from a larger population, we can say there is a 90% chance that the unknown population mean is between 19.8 and 27.1 mpg.

Kurtosis measures the degree of peakedness in symmetric distributions. If a symmetric distribution is flatter than the normal distribution—that is, if there are more values in the tails than a corresponding normal distribution—the kurtosis measure is positive. If the


distribution is more peaked than the normal distribution—that is, if there are fewer values in the tails—the kurtosis measure is negative. In this example, the distribution is approximately symmetric with negative kurtosis (–0.547). (Excel computes the kurtosis value using the fourth power of deviations from the mean. For details, search Help for "KURT function.")

Skewness refers to the lack of symmetry in a distribution. If there are a few extreme values in the positive direction, we say the distribution is positively skewed, or skewed to the right. If there are a few extreme values in the negative direction, the distribution is negatively skewed, or skewed to the left. Otherwise, the distribution is symmetric or approximately symmetric. In this example, the measure is positive (+0.361). (Excel computes the skewness value using the third power of deviations from the mean. For details, search Help for "SKEW function.")

Another Measure of Skewness Pearson's coefficient of skewness is a simple alternative to Excel's measure of skewness. Pearson's coefficient is defined as 3 * (mean – median) / standard deviation. The mean is affected by extreme values in a data set. Extreme values in the positive direction cause the mean to be greater than the median, in which case Pearson's coefficient has a positive value. Extreme values in the negative direction cause the mean to be less than the median, in which case the coefficient is negative. The constant 3 and the standard deviation in Pearson's coefficient affect the scaling and allow comparison of one distribution with another.

Follow these steps to compute Pearson's coefficient of skewness on your worksheet.

1. Select a blank cell (F10) and enter the formula =3*(D3-D5)/D7. Click the Decrease Decimal button to display three decimal places.

2. Enter the label Pearson's Coefficient of Skewness in cells F6 through F9.

3. If you want to document the formula using names, select cells C3:D7. From the Insert menu, choose Name | Create; in the Create Names dialog box, check Create Names in Left Column and click OK. Then select the cell containing the formula (F10) and from the Insert menu choose Name | Apply. In the Apply Names list box, select all names and click OK.

The result is shown in Figure 18.4.


Figure 18.4 Pearson's Coefficient of Skewness

The following guidelines apply to Pearson's Coefficient of Skewness and to Excel's SKEW worksheet function:

Pearson's Skew < –0.5 Excel's SKEW < –1 negatively skewed

–0.5 ≤ Pearson's Skew ≤ +0.5 –1 ≤ Excel's SKEW ≤ +1 approximately symmetric

Pearson's Skew > +0.5 Excel's SKEW > +1 positively skewed

For the small data set of Example 18.1, the value 0.804 for Pearson's Coefficient of Skewness indicates that the data are positively skewed, and the value 0.361 for Excel's SKEW worksheet function (shown in the Descriptive Statistics output) indicates that the data are approximately symmetric with only slight positive skew. For larger data sets, the two measures usually produce the same conclusion.

18.2 Analysis Tool: Histogram 233

18.2 ANALYSIS TOOL: HISTOGRAM The Histogram analysis tool determines a frequency distribution table for your data and prepares a histogram chart. In addition to individual frequencies there is an option to include cumulative frequencies in the results.

You should determine the intervals of the distribution before using this tool. Otherwise, Excel will use a number of intervals approximately equal to the square root of the number of values in your data set, with equal-width intervals starting and ending at the minimum and maximum values of your data set. If you specify the intervals yourself, you can use numbers that are multiples of two, five, or ten-which are much easier to analyze.

To determine intervals, first use the Descriptive Statistics analysis tool to determine the minimum and maximum values of the data set. Alternatively, enter the MIN and MAX functions on your worksheet. Use these extreme values to help determine the limits for your histogram's intervals. Usually 5 to 15 intervals are used for a histogram.

For the gas mileage data, the minimum is 8 and the maximum is 41. A compact histogram could start the first interval at 5, use an interval width of 5, and finish the last interval at 45, requiring 8 intervals. The approach used here adds an empty interval at each end; at the low end is an interval "5 or less," and at the high end is an interval "more than 45."

Excel refers to the maximum value for each interval as a bin. Here, the first bin is 5, and the interval will contain all values that are 5 or less. The Histogram tool automatically adds an interval labeled "More" to the bins you specify. Here, the last bin specified is 45, and the last interval (More) will contain all values greater than 45.

Refer to Figure 18.5 and follow these steps to obtain the frequency distribution and histogram.

1. Hide columns B through F. (Select columns B through F by clicking on B and dragging to F. Right-click and select Hide from the shortcut menu. To unhide the columns, select the two adjacent columns, A and G, right-click, and select Unhide. If column A is hidden, click the Select All button in the top-left corner at the intersection of the row and column headings, right-click a column heading, and select Unhide.)

2. Enter Bin as a label in cell H1, enter 5 in cell H2, and enter 10 in cell H3. Select H2:H3. Drag the AutoFill square in the lower-right corner of the selected range down to cell H10.

3. From the Tools menu, choose the Data Analysis command and choose Histogram from the Analysis Tools list box.


Figure 18.5 Bins and Histogram Dialog Box

4. Input Range: Enter the reference for the range of cells containing the data (A1:A18), including the label.

5. Bin Range: Enter the reference for the range of cells containing the values that separate the intervals (H1:H10), including the label. These interval break points, or bins, must be in ascending order.

6. Labels: Check this box to indicate that labels have been included in the references for the input range and bin range.

7. Output Range: Enter the reference for the upper-left cell of the range where you want the output table to appear (I1). The combined table and chart output requires approximately ten columns.

8. Pareto: To obtain a standard frequency distribution and chart, clear the Pareto checkbox. If this box is checked, the intervals are sorted according to frequencies before preparing the chart. (In this example the box has been cleared.)

9. Cumulative Percentage: Check this box for cumulative frequencies in addition to the individual frequencies for each interval. (In this example the box has been cleared.)

10. Chart Output: Check this box to obtain a histogram chart in addition to the frequency distribution table on the worksheet. (In this example the box has been checked.)

18.2 Analysis Tool: Histogram 235

11. After you provide inputs to the dialog box, click OK. (If you receive the error message "Cannot add chart to a shared workbook," click the OK button. Then click New Workbook under Output in the Histogram dialog box. Use the Edit | Move or Copy Sheet command to copy the results to the original workbook.)

Excel puts the frequency distribution and histogram on the worksheet. As shown in Figure 18.6, the output table in columns I and J includes the original bins specified. These bins are actually the upper limit for each interval; that is, the bins are actually bin boundaries.

For example, the interval associated with bin value 15 (cell I4) includes mileage values strictly greater than 10 (the previous bin value) and less than or equal to 15. There are two such mileage values in this data set: 12 mpg and 15 mpg. Thus, for bin value 15 the frequency is 2 (cell J4).

Figure 18.6 Histogram Output Table and Chart

Histogram Embellishments To make the chart more like a traditional histogram and easier to interpret, make the following changes.

1. Legend: Because only one series is shown on the chart, a legend isn't needed. Click on the legend ("Frequency" on the right side of the chart) and press the Delete key.

2. Plot area pattern: The plot area is the rectangular area bounded by the x and y axes. Double-click the plot area (above the bars); in the Format Plot Area dialog box, change Border to None and change Area to None. Click OK.

3. Y-axis labels: If you resize the chart vertically, intermediate values (0.5, 1.5,...) may appear on the y axis, but frequencies must be integer values. Double-click the y-axis (value axis); in the Format Axis dialog box on the Scale tab, set the Major Unit and Minor Unit values to 1. Click OK.


4. Bar width: In traditional histograms, the bars are adjacent to each other, not separated. Double-click one of the bars; in the Format Data Series dialog box on the Options tab, change the gap width from 150% to 0%. Click OK.

5. X-axis labels: Double-click the x-axis (category axis); in the Format Axis dialog box on the Alignment tab, double-click the Degrees edit box and type 0 (zero). With this setting, the x-axis labels will be horizontal even if the chart is resized. Click OK.

6. Chart title: Click on Histogram (chart title). Type Distribution of Gas Mileage, hold down Alt and press Enter, type for 17 cars, and press Enter. Click the Bold button to change from bold to normal type.

7. Y-axis title: Click on Frequency (value axis title). Click the Bold button to change from bold to normal type.

8. X-axis title: Click on Bin. Enter Interval Maximum, in miles per gallon. Click the Bold button to change from bold to normal type. Excel puts the x-axis values at the center of each interval, not at the marks that separate the intervals. This title makes it clear to the reader that these values are the maximum ones for each interval.

9. Bar color: Columns in a dark color may print as black with no gaps, in which case it is difficult to see the boundaries. Click on the center of one of the columns to select the data series. Click the right mouse button, choose Format Data Series, and click the Patterns tab. In the dialog box, leave Border at Automatic and change Area from Automatic to None. Click OK.

To move the chart, click just inside the chart's outer border (chart area) and drag the chart to the desired location. To resize the chart, first click the chart area and then click and drag one of the eight handles.

When you first create a chart, Excel uses automatic scaling for the font sizes of the chart title, the axis titles, and the axis labels. When you resize the chart, the font sizes change and the number of axis labels displayed may change. For example, if the axis labels on the horizontal axis have a large font size and you resize the chart to be narrow, perhaps only every other axis label will be displayed.

One approach to chart and font sizing is to first decide the size of the chart. For this example the chart is 6 columns wide using the standard column width of 8.43 and 14 rows high. The font size of the three titles is Arial 10, and the font size of the two axes is Arial 8 so that all axis labels are displayed. The resulting histogram chart is shown in Figure 18.7.

18.3 Better Histograms Using Excel 237

Figure 18.7 Histogram Chart with Embellishments

18.3 BETTER HISTOGRAMS USING EXCEL

Figure 18.8 Better Histogram Chart

Histogram

0

1

2

3

4

5

0 5 10 15 20 25 30 35 40 45 50

Miles Per Gallon

Freq

uenc

y

A histogram is usually shown in Excel as a Column chart type (vertical bars). The labels of a Column chart are aligned under each bar as shown in Figure 18.7, and there is no


Excel feature for changing the alignment. A better histogram has a horizontal axis with numerical labels aligned under the tick marks between the bars as shown in Figure 18.8.

To download a free Excel add-in for automatically creating a better histogram from data on a worksheet or to view step-by-step instructions for creating a better histogram using Excel's built-in features, go to the Better Histograms page at www.treeplan.com.

EXERCISES Exercise 18.1 Construct a frequency distribution and histogram for the following selling prices of 15 properties:

$26,000 $38,000 $43,600 31,000 39,600 44,800 37,400 31,200 40,600 34,800 37,200 41,800 39,200 38,400 45,200

Use intervals $5,000 wide starting at $25,000. Comment on the symmetry or skewness of the selling prices.

Exercise 18.2 Determine measures of central tendency and dispersion for the selling prices of the 15 properties in Exercise 18.1. Which measure(s) of central tendency should be used to describe a typical selling price? What is the mode or modal interval?

Exercise 18.3 To verify the symmetry or skewness observed in Exercise 18.1, calculate Pearson's coefficient of skewness.

Bivariate Numerical Data 19

A scatterplot is useful for examining the relationship between two numerical variables. In Excel this kind of chart is called an XY (scatter) chart; other names include scatter diagram, scattergram, and XY plot. Such a graphical display is often the first step before fitting a curve to the data using a regression model.

Example 19.1 (Adapted from Cryer, p. 139) The data shown in Figure 19.1 were collected in a study of real estate property valuation. The 15 properties were sold in a particular calendar year in a particular neighborhood in a city stratified into a number of neighborhoods. Although the data displayed are from a single year, similar data are available for each neighborhood for a number of years. Cryer's RealProp.dat file contains 4 variables for 60 observations; these 15 properties are the first and every fourth observation.

Because we expect that selling price might depend on square feet of living space, selling price becomes the dependent variable and square feet the explanatory variable. Some call the dependent variable the response variable or the y variable. Similarly, other terms for the explanatory variable are predictor variable, independent variable, or the x variable.

Our initial purpose is to visually examine the relationship between the square feet of living space and the selling price of the parcels. Then we will calculate two summary measures, correlation and covariance, using both the analysis tool and functions. Finally, we will include a third variable, assessed value of the property, and use the analysis tool to compute pairwise correlations. In subsequent chapters we will fit straight lines and curves to these same data using regression models.

240 Chapter 19 Bivariate Numerical Data

Figure 19.1 Initial XY (Scatter) Chart

19.1 XY (SCATTER) CHARTS The following steps describe how to create and embellish a scatterplot using Excel's Chart Wizard.

1. Arrange the data in columns on a worksheet with the x values (for the horizontal axis) on the left and the y values (for the vertical axis) on the right as shown in Figure 19.1. If the x variable is not on the left, insert a column on the left, select the x data, and click and drag to move the x data to the column on the left.

2. Select the x and y values (A2:B16). Do not include the labels above the data.

3. Click on the Chart Wizard tool.

4. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, select XY (Scatter) in the Chart Type list box and verify that the chart sub-type is "Scatter. Compares pairs of values." Click on the wide button Press and Hold to View Sample to preview the chart. Click Next.

5. In step 2 (Chart Source Data) on the Data Range tab, verify that cells A2:B16 were selected and that Excel is treating the data series as columns. (If you don't select the data range before starting the Chart Wizard, you can enter the data range in this step.) On the Series tab, verify that Excel is using cells A2:A16 for x values and cells B2:B16 as y values. (If the data ranges for the x and y values aren't correct, you can specify their locations here.) Click Next.

19.1 XY (Scatter) Charts 241

6. In step 3 (Chart Options) on the Titles tab, select the Chart Title edit box and type Real Estate Properties. Don't press Enter; use the mouse or Tab key to move among the edit boxes. Type Living Space, in Sq. Ft. for the value (x) axis title (the horizontal axis), and Selling Price, in Thousands of Dollars for the value (y) axis title (the vertical axis).

7. In step 3 (Chart Options) on the Gridlines tab, clear all checkboxes.

8. In step 3 (Chart Options) on the Legends tab, clear the checkbox for Show Legend. (With only one set of data on the chart, a legend is not needed.) Click Next.

9. In step 4 (Chart Location), verify that you want to place the chart as an object in the current worksheet. Click Finish.

The chart is embedded on the worksheet, as shown in Figure 19.1. The property data show a general positive relationship; more living space is associated with a higher selling price, on the average. Follow steps 10 through 12 to obtain the embellished scatterplot shown in Figure 19.2.

Figure 19.2 Final XY (Scatter) Chart

10. Change the x-axis to display 400 to 1400 square feet. Select the value (x) axis. Right-click, choose Format Axis from the shortcut menu, and click the Scale tab. Type 400 in the Minimum edit box, 1400 in the Maximum edit box, and 200 in the Major Unit edit box. Click OK.

11. Change the y-axis to display 20 to 50 thousands of dollars. Select the value (y) axis. Right-click, choose Format Axis from the shortcut menu, and click the


Scale tab; type 20, 50, and 10 in the Minimum, Maximum, and Major Unit edit boxes. Click the Number tab and set Decimal Places to zero. Then click OK.

12. To obtain the appearance shown in Figure 19.2, click just inside the outer border of the chart to select the chart area. Click and drag the sizing handles so the chart is approximately 6 standard column widths by 15 rows. Click the chart title and choose Arial Bold 12 from the formatting toolbar. For each horizontal and vertical axis and title, click the chart object and choose Arial Regular 10 from the formatting toolbar. Double-click the y-axis title and change the space after the comma to a carriage return. Double-click the grey plot area and change the pattern for both border and area to None. Select the Price data (B2:B16) and click the Increase Decimal button several times so that three significant figures are displayed to the right of the decimal point.

19.2 ANALYSIS TOOL: CORRELATION The correlation coefficient is a useful summary measure for bivariate data, in the same sense that the mean and standard deviation are useful summary measures for univariate data. The possible values for the correlation coefficient range from –1 (exact negative correlation, with all points falling on a downward-sloping straight line) through 0 (no linear relationship) to +1 (exact positive correlation, with all points falling on an upward-sloping straight line). The correlation coefficient measures only the amount of straight-line relationship; a strong curvilinear relationship (a U-shaped pattern, for example) might have a correlation coefficient close to zero. The long name for the correlation coefficient is "Pearson product moment correlation coefficient," which is often shortened to simply "correlation."

The following steps describe how to obtain the correlation coefficient using the analysis tool.

1. Enter the x and y data in a worksheet as shown in columns A and B of Figure 19.3 and enter Analysis Tool: Correlation in cell D1.

2. From the Tools menu, choose Data Analysis. From the Data Analysis dialog box, select Correlation in the Analysis Tools list box and click OK.

3. In the Input section of the Correlation dialog box, specify the location of the data in the Input Range edit box, including the labels (A1:B16). Verify that the data is grouped in columns and be sure the Labels in First Row box is checked.

4. In the Output options section, click the Output Range button, select the Range edit box, and specify the upper-left cell where the correlation output will be located (D2).

19.2 Analysis Tool: Correlation 243

5. Click OK. The output appears in cells D2:F4 as shown in Figure 19.3. (The discussions of CORREL function and covariance outputs follow.)

The output is a matrix of pairwise correlations. The diagonal values are 1, indicating that each variable has perfect positive correlation with itself. The value 0.814651 is the correlation of Price and SqFt. The upper-right section is blank, because its values would be the same as those in the lower-left section.

The following steps describe how to use Excel's CORREL function to determine the correlation.

1. Enter CORREL Function in cell D6.

2. Select cell D7. Click the insert Function tool button (icon fx). In the Insert Function dialog box, select Statistical in the category list box. In the function list box, select CORREL. Then click OK.

3. To move the CORREL dialog box, click in any open area and drag. Select the Array1 edit box, and click and drag on the worksheet to select A2:A16. Select the Array2 edit box, and click and drag to select B2:B16. Do not include the text labels in row 1 in either selection. Then click OK.

The value of the correlation coefficient appears in cell D7. Alternatively, you could have entered the formula =CORREL(A2:A16,B2:B16) by typing or by a combination of typing and pointing. Unlike the static text output of the analysis tool, the worksheet function is dynamic. If the data values in A2:B16 are changed, the value of the correlation coefficient in cell D7 will change.

Figure 19.3 Bivariate Correlation and Covariance


19.3 ANALYSIS TOOL: COVARIANCE The covariance is another measure for summarizing the extent of the linear relationship between two numerical variables. Unfortunately, the covariance is difficult to interpret because its measurement units are the product of the units for the two variables. For the selling price and living space data in Example 19.1, the covariance is expressed in units of square feet times thousands of dollars. It is usually preferable to use the correlation coefficient because it is scale-free. However, the covariance is used in finance theory to describe the relationship of one stock price with another.

The covariance computed by the analysis tool is a population covariance; that is, Excel 2002 uses n in the denominator (instead of using n – 1, which would be appropriate for sample covariance), where n is the number of data points.

The following steps describe how to obtain the covariance using the analysis tool.

1. Enter the x and y data in a worksheet as shown in columns A and B of Figure 19.3 and enter Analysis Tool: Covariance in cell D10.

2. From the Tools menu, choose Data Analysis. From the Data Analysis dialog box, select Covariance in the Analysis Tools list box and click OK.

3. In the Input section of the Covariance dialog box, specify the location of the data in the Input Range edit box, including the labels (A1:B16). Verify that the data is grouped in columns and be sure the Labels box is checked.

4. In the Output Options section, click the Output Range button, select the Range edit box, and specify the upper-left cell where the correlation output will be located (D11).

5. Click OK. The output appears in cells D11:F13 as shown in Figure 19.3.

The output is a matrix of pairwise population covariances. The diagonal values are population variances (the square of the population standard deviation) for each variable. The value 914.1886 is the population covariance of Price and SqFt. The upper-right section is blank, because its values would be the same as those in the lower-left section.

The following steps describe how to use Excel's COVAR function to determine the population covariance.

1. Optional: Enter COVAR Function in cell D15.

2. Select cell D16. Click the Insert Function tool button (icon fx). In the Insert Function dialog box, select Statistical in the category list box. In the function list box, select COVAR. Then click OK.

3. To move the COVAR dialog box, click in any open area and drag. Select the Array1 edit box, and click and drag on the worksheet to select A2:A16. Select

19.4 Correlations for Several Variables 245

the Array2 edit box, and click and drag to select B2:B16. Do not include the text labels in row 1 in either selection. Then click OK.

The population covariance value appears in cell D16. Alternatively, you could have entered the formula =COVAR(A2:A16,B2:B16) by typing or by a combination of typing and pointing. If the data values in A2:B16 are changed, the population covariance value in cell D16 will change. The covariance computed by Excel's COVAR function uses n in the denominator. In this example, n = 15, so 853.2427 = (14/15)*914.1886.

19.4 CORRELATIONS FOR SEVERAL VARIABLES The Correlation analysis tool is most useful for determining pairwise correlations for three or more variables, often as an aid to selecting variables for a multiple regression model. The following steps describe how to obtain correlations for several variables.

1. Enter the data in cells A1:C16 as shown in Figure 19.4. If the data for SqFt and Price are already in columns A and B, select A1:B16, copy to the clipboard (using the shortcut menu), select a new sheet, and paste into cell A1; then select column B, choose Insert from the shortcut menu, and enter the Assessed data.

2. Optional: Enter Analysis Tool: Correlation in cell E1.

Figure 19.4 Pairwise Correlations

3. From the Tools menu, choose Data Analysis. From the Data Analysis dialog box, select Correlation in the Analysis Tools list box and press OK. The Correlation dialog box appears as shown in Figure 19.5.


Figure 19.5 Correlation Dialog Box

4. In the Input section, specify the location of the data in the Input Range edit box, including the labels (A1:C16). Verify that the data is grouped in columns and be sure the Labels box is checked.

5. In the Output Options section, click the Output Range button, click the adjacent edit box, and specify the upper-left cell where the correlation output will be located (E3).

6. Click OK. The output appears in cells E3:H6 as shown in Figure 19.4.

The output shows three pairwise correlations. The highest correlation, 0.814651, is between SqFt and Price. The correlation between Assessed and Price, 0.67537, is smaller, indicating less of a linear relationship between these two variables. The lowest correlation, 0.424219, is between SqFt and Assessed.

If we must use a single explanatory variable to predict selling price in a linear regression model, these correlations suggest that SqFt is a better candidate than Assessed, because 0.814651 is higher than 0.67537. If we can use two explanatory variables to predict selling price in a multiple regression model, both SqFt and Assessed should be useful, and there shouldn't be a problem with multicollinearity because the correlation between these two explanatory variables is only 0.424219.

Exercises 247

EXERCISES Exercise 19.1 (Adapted from Keller, p. 642) An economist wanted to determine how office vacancy rates depend on average rent. She took a random sample of the monthly office rents per square foot and the percentage of vacant office space in ten different cities. The results are shown in the following table.

Vacancy Monthly Rent City Percentage per Sq. Ft.

1 10 $5.00 2 2 2.50 3 7 4.75 4 8 4.50 5 4 3.00 6 11 4.50 7 8 4.00 8 6 3.00 9 3 3.25 10 5 2.75

Arrange the data in appropriate columns and prepare a scatterplot. Does there appear to be a positive or negative relationship between the two variables?

Exercise 19.2 Compute the correlation coefficient for the data in Exercise 19.1. Comment on the direction and strength of the linear relationship.

Exercise 19.3 (Adapted from Canavos, p. 104) Does a student's test grade seem to depend on the number of hours spent studying? The following table shows the number of hours 20 students reported studying for a major test and their test grades.

Study Test Study Test Student Hours Grade Student Hours Grade 1 5 54 11 12 74 2 10 56 12 20 78 3 4 63 13 16 83 4 8 64 14 14 86 5 12 62 15 22 83 6 9 61 16 18 81 7 10 63 17 30 88 8 12 73 18 21 87 9 15 78 19 28 89 10 12 72 20 24 93


Arrange the data in appropriate columns and prepare a scatterplot. Does there appear to be a positive or negative relationship between the two variables?

Exercise 19.4 Compute the correlation coefficient for the data in Exercise 19.3. Comment on the direction and strength of the linear relationship.

One-Sample Inference for the Mean 20

This chapter covers the basic methods of statistical inference for the mean of a single population. These methods are appropriate for a single random sample consisting of values for a single variable. For example, a random sample of a particular brand of tires would be used to construct a confidence interval for the average mileage of all tires of that brand or to test the hypothesis that the average mileage of all tires is at least 40,000 miles.

20.1 NORMAL VERSUS t DISTRIBUTION If the values in the population have a normal distribution, and if the standard deviation of the population values is known, then the sample means have a normal distribution. However, due to the central limit theorem, the normal distribution is often used to describe uncertainty about sample means when the sample size is large, even though the population distribution may not be normal or the population standard deviation may be unknown. A common guideline is that "large" means 30 or more.

If the values in the population have a normal distribution, and if the standard deviation of the population values is unknown and must be estimated using the sample, then the standardized sample means have a t distribution. The t distribution is often used for analyzing small samples, even when the shape of the population distribution is unknown. You can use a histogram or other methods to check that your sample data are approximately normal. As long as the population isn't extremely skewed or otherwise nonnormal, the t distribution is generally regarded as an adequate approximation for the sampling distribution of means.

20.2 HYPOTHESIS TESTS A hypothesis test is an alternative to the confidence interval method of statistical inference. To conduct a hypothesis test, first set up two opposing hypothetical statements describing the population. These two statements are called the null hypothesis, H0, and

250 Chapter 20 One-Sample Inference for the Mean

the alternative hypothesis, HA. Usually, the alternative hypothesis is a statement about what we are trying to show or prove. For example, to detect if the mean of monthly accounts is significantly less than $70, the alternative hypothesis is HA: Mean < 70.

The null hypothesis is the opposite of the alternative hypothesis-that is, H0: Mean ≥ 70 or simply H0: Mean = 70. Using the hypothesis test method, develop the distribution of sample results that would be expected if the null hypothesis is true. Then compare the particular sample result with this sampling distribution. If the sample result is one that is likely to be obtained when the null hypothesis is true, we cannot reject the null hypothesis, and we cannot conclude that the alternative hypothesis is true. On the other hand, if the sample result is one that is unlikely to occur when the null hypothesis is true, reject the null hypothesis and conclude the alternative hypothesis may be true.

Left-Tail, Right-Tail, or Two-Tail There are three kinds of hypothesis tests, depending on the direction specified in the alternative hypothesis. If the alternative hypothesis is HA: Mean < 70, we must observe a sample mean significantly below 70 to reject the null hypothesis and conclude that the population mean is really less than 70. This kind of test is a left-tail test because sample means that cause rejection of the null hypothesis are in the left tail of the sampling distribution.

If we are trying to show that the average breaking strength of steel rods is greater than 500 pounds (HA: Mean > 500), then a right-tail test is appropriate. In this case, we must observe a sample mean significantly greater than 500 to reject the null hypothesis.

If we are trying to detect a change in either direction instead of a single direction, then a two-tail test is appropriate. For example, an insurance company may want to determine whether the actual mean commission payment to its agents differs from the previously planned $32,000 per year. In this situation, the null hypothesis specifies "no change" or "no difference," for example, H0: Mean = 32,000, and the alternative hypothesis is HA: Mean ≠ 32,000. We can reject the null hypothesis if we observe a sample mean either significantly above 32,000 or significantly below 32,000.

Decision Approach or Reporting Approach There are two ways to summarize the results of a hypothesis test. Using the decision approach, the decision maker must specify a significance level or alpha. Typical significance levels are 10%, 5%, or 1%. This value is the probability in the left tail, right tail, or sum of two tails of the sampling distribution; it determines the region of sample means in which we reject the null hypothesis. In effect, the significance level specifies what the decision maker regards as "close" or "far away" with regard to the null hypothesis. A smaller significance level (for example, 1% instead of 5%) requires that the sample mean must be farther away from the hypothesized population mean to reject the

20.2 Hypothesis Tests 251

null hypothesis. The end result of using this approach is a decision to either reject or not reject the null hypothesis.

The other way to summarize the results of a hypothesis test is to report a p-value (probability value, or prob-value). Using this reporting approach, we do not specify a significance level or make a decision about rejecting the null hypothesis. Instead, we simply report how likely it is that the observed sample result, or a sample result more extreme, could be obtained if the null hypothesis is true. In a left-tail or right-tail test, we report the probability in a single tail; in a two-tail test, we report the probability of obtaining a difference (between the observed sample mean and the hypothesized population mean) in either direction. A small p-value is associated with a more extreme sample result-that is, a sample mean that is significantly different from the hypothesized population mean.

252 Chapter 20 One-Sample Inference for the Mean


Simple Linear Regression 21

Simple linear regression can be used to determine a straight-line equation describing the average relationship between two variables. Three methods are described in this chapter: the Add Trendline command, the Regression analysis tool, and Excel functions. Before fitting a line, it is important to examine a scatterplot as described in Chapter 19191919. If the points on the scatterplot fall approximately on a straight line, the methods described in this chapter are appropriate. If the points fall on a curve or have another pattern, consider the nonlinear methods described in Chapter 22.

The data analyzed in this chapter are selling price and living space for 15 real estate properties as shown in Figure 19.2. Because we expect that selling price might depend on square feet of living space, selling price becomes the dependent variable and square feet the explanatory variable. Some call the dependent variable the response variable or the y variable. Similarly, other terms for the explanatory variable are predictor variable, independent variable, or the x variable.

The first step is to examine the relationship between selling price, in thousands of dollars, and living space, in square feet, by constructing a scatterplot. The general approach is to arrange the data so that the x variable for the horizontal axis is in a column on the left and the y variable for the vertical axis is in a column on the right. Then select the data excluding the labels, click the Chart Wizard tool, and follow the steps for an XY (scatter) chart. Details of these steps with subsequent rescaling and formatting are described in Section 19.1. The results are shown in Figure 21.1, where the chart title is Arial 10 bold and the axes and axis titles are Arial 8.

254 Chapter 21 Simple Linear Regression

Figure 21.1 Scatterplot before Inserting Trendline

21.1 INSERTING A LINEAR TRENDLINE The points in Figure 21.1 follow an approximate straight line, so a linear trendline is appropriate. The method of ordinary least squares determines the intercept and slope for the linear trendline such that the sum of the squared vertical distances between the actual y values and the line is as small as possible. Such a line is often called the line of average relationship. The following steps describe inserting a linear trendline on the scatterplot and formatting the results.

1. Select the data series by clicking on one of the data points. The points are highlighted, the name box shows "Series 1," and the formula bar shows that the SERIES is selected.

2. From the Chart menu, choose the Add Trendline command. Alternatively, right-click the data series and choose Add Trendline from the shortcut menu.

3. Click the Type tab of the Add Trendline dialog box, as shown in Figure 21.2.

4. On the Add Trendline Type tab, click the Linear icon. (The nonlinear trend/regression types are described in Chapter 22.)

21.1 Inserting a Linear Trendline 255

Figure 21.2 Add Trendline Dialog Box Type Tab

5. Click the Options tab of the Add Trendline dialog box, as shown in Figure 21.3.

6. On the Add Trendline Options tab, select the Automatic: Linear (Series1) button for Trendline Name. Be sure the checkbox for Set Intercept is clear. Click to put checks in the Display Equation on Chart and Display R-squared Value on Chart checkboxes, as shown in Figure 21.3. Then click OK. The trendline, equation, and R2 are inserted on the scatterplot as shown in Figure 21.4.

Figure 21.3 Add Trendline Dialog Box Options Tab


Figure 21.4 Initial Trendline on Scatterplot

Trendline Interpretation We can answer the question "What is the average relationship?" by examining the fitted equation y = 0.021x + 18.789, which may be written as

Predicted Price = 18.789 + 0.021 * SqFt.

The y-intercept or constant term in the equation is 18.789, measured in the same units as the y variable. Naively, the constant term says that a property with zero square feet of living space has a selling price of 18.789 thousands of dollars. However, there are no properties with fewer than 521 square feet in our data, so this constant can be considered a starting point that is relevant for properties with living space between 521 and 1,298 square feet.

The slope or regression coefficient, 0.021, indicates the average change in the y variable for a unit change in the x variable. The measurement units in this example are 0.021 thousands of dollars per square foot, or $21 per square foot. If two properties differ by 100 square feet of living space, we expect the selling prices to differ by 0.021 * 100 = 2.1 thousands of dollars, or $2,100.

One popular way to answer the question "How good is the relationship?" is to examine the value for R2, which measures the proportion of variation in the dependent variable, y, that is explained using the x variable and the regression line. Here the R2 value of 0.6637 indicates that approximately 66% of the variation in selling prices can be explained by a linear model using living space. Perhaps the remaining 34% of the variation can be explained using other property characteristics in a multiple regression model.

21.2 Regression Analysis Tool 257

Trendline Embellishments If the equation displayed on the chart is used to calculate predicted selling prices, the results may be imprecise because the intercept and slope have only three decimal places. To display more decimal places, double-click the chart to activate it and click on the region containing the equation and R2 value to select them for editing. Then click the Increase Decimal tool repeatedly to display more decimal places. The equation values shown in Figure 21.5 were obtained by clicking Increase Decimal twice to change from three decimal places to five. These changes affect both the equation and R2 value, and these changes must be made before any other editing.

With the equation and R2 value selected, you can move the entire text box by clicking and dragging near the edge of the box, and you can use the regular text editing options for rearranging the text. Figure 21.5 shows the result of such editing; variable names were substituted for x and y, terms were rearranged, and the last three significant figures of R2 were deleted. Once you begin any such editing, you are unable to use the Increase Decimal or Decrease Decimal tools to change the displayed precision.

Figure 21.5 Final Trendline on Scatterplot

21.2 REGRESSION ANALYSIS TOOL The Add Trendline command provides only the fitted line, equation, and R2. To obtain additional information for assessing the relationship between the two variables, follow these steps to use the Regression analysis tool.


1. Arrange the data in columns with the x variable on the left and the y variable on the right, as before. Make space for the results of the regression analysis to the right of the data. Allow at least 16 columns. (Delete the scatterplot or move it far to the right.)

2. From the Tools menu, choose the Data Analysis command. In the Data Analysis dialog box, scroll the list box, select Regression, and click OK. The Regression dialog box appears as shown in Figure 21.6.

Figure 21.6 Regression Dialog Box

In the Regression dialog box, move from box to box using the mouse or the tab key. For a box requiring a range, select the box and then select the appropriate range on the worksheet by pointing. To see cells on the worksheet, move the Regression dialog box by clicking on its title bar and dragging, or click the collapse button on the right side of each range edit box. Click the Help button for additional information.

3. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable. Include the label above the data.

4. Input X Range: Point to or enter the reference for the range containing values of the explanatory variable. Include the label above the data.


5. Labels: Select this box, because the labels at the top of the Input Y Range and Input X Range were included in those ranges.

6. Constant is Zero: Select this box only if you want to force the regression line to pass through the origin (0,0).

7. Confidence Level: Excel automatically includes 95% confidence intervals for the regression coefficients. For an additional confidence interval, select this box and enter the level in the Confidence Level box.

8. Output location: Click the Output Range button, click to select the range edit box on its right, and point to or type a reference for the top-left corner cell of a range 16 columns wide where the summary output and charts should appear. Alternatively, click the New Worksheet Ply button if you want the output to appear on a separate sheet and optionally type a name for the new sheet, or click the New Workbook button if you want the output in a separate workbook.

9. Residuals: Select this box to obtain the fitted values (predicted y) and residuals.

10. Residual Plots: Select this box to obtain charts of residuals versus each x variable.

11. Standardized Residuals: Select this box to obtain standardized residuals (each residual divided by the standard deviation of the residuals). This output makes it easy to identify outliers.

12. Line Fit Plots: Select this box to obtain an XY (scatter) chart of the y input data and fitted y values versus the x variable. This chart is similar to the scatterplot with an inserted trendline shown in Figure 21.4.

13. Normal Probability Plots: This option is not implemented properly, so don't check this box.

14. After selecting all options and pointing to or typing references, click OK. (If you receive the error message "Cannot add chart to a shared workbook," click the OK button. Then click New Workbook under Output in the Regression dialog box. If desired, use the Edit | Move or Copy Sheet command to copy the results back to the original workbook.) The summary output and charts appear.

15. Optional: To change column widths so that all summary output is visible, make a nonadjacent selection. First select the cell containing the Adjusted R Square label (D6). Hold down the Control key while clicking the following cells: Significance F (I11), Coefficients (E16), Standard Error (F16), and Upper 95% (J16). From the Format menu, choose Column | AutoFit Selection. The formatted summary output is shown in Figure 21.7.


Figure 21.7 Regression Tool Summary Output

16. Optional: The residual output appears below the summary output. To relocate the residuals to facilitate comparisons, select columns C:E and choose Insert from the shortcut menu. Select the residual output (H24:J39), including the row of labels but excluding the Observation numbers, and choose Cut or Copy from the shortcut menu. Select cell C1 and choose Paste from the shortcut menu. Adjust the widths of columns C:E and decrease the decimals displayed in cells C2:E16 to obtain the results shown in Figure 21.8.

Figure 21.8 Relocated Residual Output


Regression Interpretation The intercept and slope of the fitted regression line are in the lower-left section labeled "Coefficients" of the summary output in Figure 21.7. The Intercept coefficient 18.7894675 is the constant term in the linear regression equation, and the SqFt coefficient 0.02101025 is the slope. The regression equation is

Predicted Price = 18.7894675 + 0.02101025 * SqFt.

For an explanation of the intercept and slope, refer to Trendline Interpretation, Section 21.1.

In the residual output shown in Figure 21.8, the predicted prices, sometimes termed the fitted values, are the result of estimating the selling price of each property using this regression equation. The residuals are the difference between the actual and fitted values. For example, the first property has 521 square feet. On the average, we would expect this property to have a selling price of $29,736, but its actual selling price is $26,000. The residual for this property is $26,000 – $29,736—that is, –$3,736. Its actual selling price is $3,736 below what is expected. The residuals are also termed deviations or errors.

The four most common measures to answer the question "How good is the relationship?" are the standard error, R2, t statistics, and analysis of variance. The standard error, 3.23777441, shown in cell E7 of Figure 21.7, is expressed in the same units as the dependent variable, selling price. As the standard deviation of the residuals, it measures the scatter of the actual selling prices around the regression line. This summary of the residuals is $3,238. The standard error is often called the standard error of the estimate.

R square, shown in cell E5 of Figure 21.7, measures the proportion of variation in the dependent variable that is explained using the regression line. This proportion must be a number between zero and one, and it is often expressed as a percentage. Here approximately 66% of the variation in selling prices is explained using living space as a predictor in a linear equation. Adjusted R square, shown in cell E6, is useful for comparing this model with other models using additional explanatory variables.

The t statistics, shown in cells G17:G18 of Figure 21.7, are part of individual hypothesis tests of the regression coefficients. For example, these 15 properties could be treated as a sample from a larger population. The null hypothesis is that there is no relationship: the population regression coefficient for living space is zero, implying that differences in living space don't affect selling price. With a sample regression coefficient of 0.02101025 and a standard error of the coefficient (an estimate of the sampling error) of 0.004148397, the coefficient is 5.064667 standard errors from zero. The two-tail p-value, 0.000217, shown in cell H18, is the probability of obtaining these results, or something more extreme, assuming the null hypothesis is true. Therefore, we reject the null hypothesis and conclude there is a significant relationship between selling price and living space.


The analysis of variance table, shown in cells D10:I14 of Figure 21.7, is a test of the overall fit of the regression equation. Because it summarizes a test of the null hypothesis that all regression coefficients are zero, it will be discussed in Chapter 23 with multiple regression.

Regression Charts For simple linear regression the analysis tool provides two charts: residual plot and line fit plot. These charts are embedded near the top of the worksheet to the right of the summary output. In the real estate properties example, the charts are originally located in cells M1:S12; after relocating the residuals, the charts are in cells P1:V12.

Figure 21.9 Initial Line Fit Plot

The line fit plot is shown in Figure 21.9. This chart is similar to the scatterplot with inserted trendline, except that the predicted values in this chart are markers without a line. The following steps describe how to format the line fit plot.

1. Select the data series for Predicted Price by clicking one of the square markers that are in a straight line. (Alternatively, select any chart object and use the up and down arrow keys to make the selection.) The points are highlighted and "=SERIES("Predicted Price",...)" appears in the formula bar. Right-click, choose Format Data Series from the shortcut menu, and click the Patterns tab. Select Automatic for Line and select None for Marker. Then click OK.

2. Select the x-axis by clicking on the horizontal line at the bottom of the plot area. A square handle appears at each end of the x-axis. Right-click, choose Format Axis from the shortcut menu, and click the Scale tab. Clear the Auto checkbox for Minimum and type 400 in its edit box; clear the Auto checkbox for Maximum and type 1400 in its edit box; clear the Auto checkbox for Major Unit and type 200 in its edit box. Then click OK.

3. Select the y-axis. Right-click, choose Format Axis from the shortcut menu, and click the Scale tab. Clear the Auto checkbox for Minimum and type 20 in its edit


box; clear the Auto checkbox for Maximum and type 50 in its edit box; clear the Auto checkbox for Major Unit and type 10 in its edit box. Click the Number tab, select Number in the Category list box, and click the Decimal Places spinner control to select 0. Then click OK.

4. Optional: To obtain the appearance shown in Figure 21.10, select and enter more descriptive text for the chart title, x-axis title, and y-axis title. Resize the chart so that it is approximately 7 columns wide and 14 rows high. Select the chart title and choose Arial 10 bold from the formatting toolbar. For the legend, axes and axis titles, select each object and choose Arial 8.

Figure 21.10 Final Line Fit Plot

The residual plot (after resizing to approximately 6 columns by 14 rows) is shown in Figure 21.11. This type of chart is useful for determining whether the functional form of the fitted line is appropriate. If the residual plot is a random pattern, the linear fitted line is satisfactory; if the residual plot shows a pattern, additional modeling may be needed. When there is only one x variable (simple regression), the residual plot provides a view that is similar to making the fitted line in Figure 21.10 horizontal. When there are several x variables (multiple regression), the residual plot is an even more valuable tool for checking model adequacy, because there is usually no way to view the fitted equation in three or more dimensions.


Figure 21.11 Regression Tool Residual Plot

21.3 REGRESSION FUNCTIONS A third method for obtaining regression results is worksheet functions. Five functions described here are appropriate for simple regression (one x variable), and four of these have identical syntax for their arguments. For example, the syntax for the INTERCEPT function is

INTERCEPT(known_y's,known_x's).

The same syntax applies to the SLOPE, RSQ (R square), and STEYX (standard error of estimate). These four functions are entered in cells H2:H5 of Figure 21.12, and the values returned by these functions are shown in cells F2:F5.

To prepare Figure 21.12, the function results in column H are copied to the clipboard (Edit | Copy), and the values are pasted into column F (Edit | Paste Special | Values). The formulas are displayed in column H by choosing Options from the Tools menu, clicking the View tab, and checking the Formulas checkbox in the Window Option section.

Cells H9 and H11 show two methods for obtaining a predicted selling price for a property with 1,000 square feet of living space. If the intercept and slope of the regression equation have already been calculated, the formula "= intercept + slope * x" can be entered into a cell (H9) using appropriate cell references. Here the predicted selling price is 39.7997169881321, in thousands of dollars, or approximately $39,800.

Another method for obtaining a predicted value based on simple linear regression is the FORECAST function, with syntax

FORECAST(x,known_y's,known_x's).

21.3 Regression Functions 265

This method, shown in cell H11, calculates the intercept and slope using least squares and returns the predicted value of y for the specified value of x.

Figure 21.12 Regression Using Functions

Yet another method for obtaining predicted y values is the TREND function, which has the following syntax:

TREND(known_y's,known_x's,new_x's,const)

This function, unlike the FORECAST function, can also be used for multiple regression (two or more x variables). Because the TREND function is an array function, it must be entered in a special way, as described in the following steps.

1. Enter the data for the x and y variables (A2:B16) and values of the x variable (D13:D16) for which predicted y values will be calculated.

2. Select a range where the predicted y values are to appear (H13:H16).

3. From the Insert menu, choose the Function command. Alternatively, click the Insert Function button (icon fx). In the Insert Function dialog box, select Statistical in the category list box and select TREND in the function list box. Then click OK.

4. In the TREND dialog box, type or point (click and drag) to ranges on the worksheet containing the known y values (B2:B16), known x values (A2:A16), and new x values (D13:D16). Do not include the labels in row 1 in these ranges. In the edit box labeled "Const," type the integer 1, which is interpreted as true, indicating that an intercept term is desired. Then click OK.


5. With the function cells (H13:H16) still selected, press the F2 key (for editing). The word "Edit" appears in the status bar at the bottom of the screen. Hold down the Control and Shift keys and press Enter. The formula bar shows curly brackets around the TREND function, indicating that the array function has been entered correctly.

A companion function, LINEST, provides regression coefficients, standard errors, and other summary measures. Like TREND, this function can be used for multiple regression (two or more x variables) and must be array-entered. Its syntax is

LINEST(known_y's,known_x's,const,stats).

The "const" and "stats" arguments are true-or-false values, where "const" specifies whether the fitted equation has an intercept term and "stats" indicates whether summary statistics are desired.

To obtain the results shown in Figure 21.13, select D1:E5, type or use the Insert Function tool to enter LINEST, press F2, and finally hold down the Control and Shift keys while you press Enter. Cells D7:E11 show the numerical results that appear in cells D1:E5, and cells D13:E17 describe the contents of those cells. These same values appear with labels in the Regression analysis tool summary output shown in Figure 21.7.

Figure 21.13 Regression Using LINEST

Exercises 267

EXERCISES Exercise 21.1 Refer to the data on vacancy percentages and monthly rents for ten cities in Exercise 19.1.

1. Prepare a scatterplot and insert a linear trendline.

2. Use the Regression analysis tool to obtain complete diagnostics.

3. Make a prediction of vacancy percentage for a city where monthly rent per square foot is $3.50.

Exercise 21.2 Refer to the data on study hours and test grades for 20 students in Exercise 19.3.

1. Prepare a scatterplot and insert a linear trendline.


3. Make a prediction of test grade for a student who studies ten hours.

4. Student 7 studied ten hours and received a test grade of 63. Taking into account the number of study hours, is this test grade below average, average, or above average?

Simple Nonlinear Regression 22

This chapter describes four methods for modeling a nonlinear relationship between two variables: polynomial, logarithm, power, and exponential. For each functional form, I describe both inserting a trendline on a scatterplot and using the Regression analysis tool on transformed variables to obtain additional summary measures and diagnostics. For an exponential relationship, I also describe using the LOGEST function to obtain similar results.

It is important to examine a scatterplot as an aid to selecting the appropriate nonlinear form. Figure 22.1 shows four single-bulge nonlinear patterns that might be observed on a scatterplot. Each panel has a label indicating the direction of the bulge, and the direction may be used to determine an appropriate nonlinear form.

Figure 22.1 Single-Bulge Nonlinear Patterns

270 Chapter 22 Simple Nonlinear Regression

For example, the upper-left panel shows data where the bulge points toward the northwest (NW). The power (for x > 1) and logarithmic functions are appropriate for this pattern. The lower-left panel shows data with a bulge toward the southwest (SW), in which case the power, logarithmic, or exponential functions are candidates. And the lower-right panel shows data with a bulge toward the southeast (SE), where the power (for x > 1) and exponential functions are appropriate. In addition, all four data patterns may be modeled using a quadratic function (polynomial of order 2).

If the pattern of the data on a scatterplot doesn't fit any of the single-bulge examples shown in Figure 22.1, some other functional form may be needed. For example, if the data have two bulges (an S shape), a cubic function (polynomial of order 3) may be appropriate.

The general approach for inserting a nonlinear trendline is as follows. First, construct the scatterplot. (Arrange the data on a worksheet with the x data in a column on the left and the y data in a column on the right. Select both the x and y data and use the Chart Wizard to construct the XY chart.) Second, click a data point on the chart to select the data series, and choose Add Trendline from the Chart menu; alternatively, right-click the data series and choose Add Trendline from the shortcut menu. The upper portion of the Add Trendline dialog box Type tab is shown in Figure 22.2.

Figure 22.2 Add Trendline Dialog Box Type Tab

To obtain the trendline results shown in this chapter, select the appropriate type (polynomial, logarithmic, power, or exponential) and in the Options tab select the checkboxes for Display Equation on Chart and Display R-squared Value on Chart.

The first example is the real estate property data set described in Chapter 19. The dependent variable is selling price, in thousands of dollars, and the explanatory variable is living space, in square feet. Details for constructing the scatterplot are described in Chapter 19, and steps for inserting a linear trendline are in Chapter 21.

22.1 Polynomial 271

In the residual plot of real estate property data—shown in Figure 21.11—the first two properties with low square footage and the last two or three properties with high square footage have negative residuals. This observation is some indication that a nonlinear fit may be more appropriate. Although the curvature is minimal, the scatterplot shows a slight bulge pointing toward the northwest (NW). Thus, the quadratic (polynomial of order 2), power, and logarithmic functions are candidates.

22.1 POLYNOMIAL Figure 22.3 shows the results for a quadratic fit (polynomial of order 2). The R2 value of 68% is only slightly better than the value of 66% obtained with the linear fit described in Chapter 21.

Figure 22.3 Polynomial Trendline

The following steps describe how to obtain more complete regression results using the quadratic model.

1. Enter the data into columns A and C as shown in Figure 22.4. If the SqFt and Price data are already in columns A and B, select column B and choose Insert from the shortcut menu. Enter the label SqFt^2 in cell B1.

2. Select cell B2 and enter the formula =A2^2. To copy the formula to the other cells in column B, select cell B2 and double-click the fill handle in its lower-right corner. The squared values appear in column B.


3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, scroll the list box, select Regression, and click OK. The Regression dialog box appears.

4. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable (C1:C16), including the label in row 1.

5. Input X Range: Point to or enter the reference for the range containing values of the explanatory variables (A1:B16), including the labels in row 1.

6. Labels: Select this box, because labels were included in the Input X and Y Ranges.

7. Do not select the checkboxes for Constant is Zero or Confidence Level.

8. Output options: Click the Output Range option button, select the edit box to the right, and point to or enter a reference for the top-left corner cell of a range 16 columns wide where the summary output and charts should appear (E1). If desired, check the appropriate boxes for Residuals. Then click OK.

Figure 22.4 shows the regression output after deleting the ANOVA portion (by selecting E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared to the linear model in Chapter 21, this quadratic model has a slightly larger standard error and a smaller adjusted R2; using these criteria, the quadratic model is not really better than the linear one.

Figure 22.4 Polynomial Regression Results

To make a prediction of average selling price using the quadratic model, enter the SqFt value in a cell (A17, for example) and a formula for SqFt^2 (=A17^2 in cell B17). Then

22.2 Logarithmic 273

build a formula for predicted price (=F12+F13*A17+F14*B17 in cell C17). Chapter 23 discusses interpretation of multiple regression output and other methods for making predictions.

The quadratic model, using x and x2 as explanatory variables, can be used to fit a wide variety of single-bulge data patterns. If a scatterplot shows data with two bulges (an S shape) like the Polynomial icon shown in Figure 22.2, a cubic model may be appropriate. The Add Trendline feature may give erroneous results for a polynomial of order 3, so an alternative is to use the Regression tool using x, x2, and x3 as explanatory variables.

22.2 LOGARITHMIC The logarithmic model creates a trendline using the equation

y = c * Ln(x) + b

where Ln is the natural log function with base e (approximately 2.718). Because the log function is defined only for positive values of x, the values of the explanatory variable in your data set must be positive. If any x values are zero or negative, the Logarithmic icon on the Add Trendline Type tab will be grayed out. (As a workaround, you can add a constant to each x value.) The results of adding a logarithmic trendline to the scatterplot of real estate property data are shown in Figure 22.5.

Figure 22.5 Logarithmic Trendline

The following steps describe how to use the Regression analysis tool to obtain more complete regression results using the logarithmic model.


1. Enter the data into columns A and C as shown in Figure 22.6. If the SqFt and Price data are already in columns A and B, select column B and choose Insert from the shortcut menu. Enter the label Ln(SqFt) in cell B1.

2. Select cell B2 and enter the formula =LN(A2). To copy the formula to the other cells in column B, select cell B2 and double-click the fill handle in its lower-right corner. The log values appear in column B.

3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, scroll the list box, select Regression, and click OK. The Regression dialog box appears.

4. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable (C1:C16), including the label in row 1.

5. Input X Range: Point to or enter the reference for the range containing values of the explanatory variable (B1:B16), including the label in row 1.

6. Labels: Select this box, because labels were included in the Input X and Y Ranges.


8. Output options: Click the Output Range option button, select the text box to the right, and point to or enter a reference for the top-left corner cell of a range 16 columns wide where the summary output and charts should appear (E1). If desired, check the appropriate boxes for Residuals. Then click OK.

Figure 22.6 shows the regression output after deleting the ANOVA portion (by selecting E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared with the linear model in Chapter 21, this logarithmic model has a smaller standard error and a higher adjusted R2; using these criteria, the logarithmic model is somewhat better than the linear one.

22.3 Power 275

Figure 22.6 Logarithmic Regression Results

To make a prediction of average selling price using the logarithmic model, enter the SqFt value in a cell (A17, for example) and a formula for Ln(SqFt) (=LN(A17) in cell B17). Then build a formula for predicted price (=F12+F13*B17 in cell C17).

22.3 POWER The power model creates a trendline using the equation

y = c * xb.

Excel uses a log transformation of the original x and y data to determine fitted values, so the values of both the dependent and explanatory variables in your data set must be positive. If any y or x values are zero or negative, the Power icon on the Add Trendline Type tab will be grayed out. (As a workaround, you can add a constant to each y and x value.) The results of adding a power trendline to the scatterplot of real estate property data are shown in Figure 22.7.

The power trendline feature does not find values of b and c that minimize the sum of squared deviations between actual y and predicted y (= c * xb). Instead, Excel's method takes the logarithm of both sides of the power formula, which then can be written as

Ln(y) = Ln(c) + b * Ln(x),

and uses standard linear regression with Ln(y) as the dependent variable and Ln(x) as the explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of squared deviations between actual Ln(y) and predicted Ln(y), using the formula

Ln(y) = Intercept + Slope * Ln(x).


Therefore, the Intercept value corresponds to Ln(c), and c in the power formula is equal to Exp(Intercept). The Slope value corresponds to b in the power formula.

Figure 22.7 Power Trendline

The following steps describe how to use the Regression analysis tool on the transformed data to obtain regression results for the power model.

1. Enter the data into columns A and B as shown in Figure 22.8.

2. Enter the label Ln(SqFt) in cell C1. Select cell C2 and enter the formula =LN(A2).

3. Enter the label Ln(Price) in cell D1. Select cell D2 and enter the formula =LN(B2).

4. To copy the formulas to the other cells, select cells C2 and D2, and double-click the fill handle in the lower-right corner of cell D2. The log values appear in columns C and D.

5. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, select Regression and click OK. The Regression dialog box appears.

6. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable (D1:D16), including the label in row 1.

7. Input X Range: Point to or enter the reference for the range containing values of the explanatory variable (C1:C16), including the label in row 1.

8. Labels: Select this box, because labels are included in the Input X and Y Ranges.

22.4 Exponential 277


10. Output options: Click the Output Range option button, select the text box to the right, and point to or enter a reference for the top-left corner cell of a range 16 columns wide where the summary output and charts should appear (F1). If desired, check the appropriate boxes for Residuals. Then click OK.

Figure 22.8 shows the regression output after deleting the ANOVA portion (by selecting F10:N14 and choosing Delete | Shift Cells Up from the shortcut menu). The R Square and Standard Error values cannot be compared directly with the linear model in Chapter 21. Here, R Square is the proportion of variation in Ln(y) explained by Ln(x) in a linear model, and the Standard Error is expressed in the same units of measurement as Ln(y).

Figure 22.8 Power Regression Results

To determine the value of c for the power formula, select cell G14 and enter the formula =EXP(G12). To make a prediction of average selling price using the power model, enter the SqFt value in a cell (A17, for example). Then build a formula for predicted price (=G14*A17^G13 in cell B17).

22.4 EXPONENTIAL The exponential model creates a trendline using the equation

y = c * ebx.

Excel uses a log transformation of the original y data to determine fitted values, so the values of the dependent variable in your data set must be positive. If any y values are zero


or negative, the Exponential icon on the Add Trendline Type tab will be grayed out. (As a workaround, you can add a constant to each y value.)

This function may be used to model exponentially increasing growth. The data shown in Figure 22.9 are an example of such a pattern.

Figure 22.9 Annual Sales Data

Time series data are often displayed using an Excel line chart instead of an XY (scatter) chart. The following steps describe how to construct the line chart with an exponential trendline shown in Figure 22.10.

1. Enter the year and sales data as shown in Figure 22.9.

2. Select the sales data (B2:B9) and click the Chart Wizard button.

3. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select "Line with markers displayed at each data value." Click Next. In step 2 (Chart Source Data) on the Series tab, select the range edit box for Category (X) Axis Labels, and click and drag A2:A9 on the worksheet. Click Next. In step 3 (Chart Options) on the Titles tab, type the chart and axis labels shown in Figure 22.10; on the Legend tab, clear the checkbox for Show Legend. Click Finish.

4. Click one of the data points of the chart to select the data series. Right-click and choose Add Trendline from the shortcut menu. On the Type tab, click the Exponential icon. On the Options tab, click Display Equation on Chart and click Display R-squared Value on Chart. Then click OK.

Because this is a line chart instead of an XY (scatter) chart, Excel does not use the Year data in column A for fitting the exponential function. The Year data are used only as labels for the x-axis, but the values used for x in the exponential function are the numbers 1 through 8.

The exponential trendline feature does not find values of b and c that minimize the sum of squared deviations between actual y and predicted y (= c * ebx). Instead, Excel's


method takes the logarithm of both sides of the exponential formula, which then can be written as

Ln(y) = Ln(c) + b * x

and uses standard linear regression with Ln(y) as the dependent variable and x as the explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of squared deviations between actual Ln(y) and predicted Ln(y), using the formula

Ln(y) = Intercept + Slope * x.

Therefore, the Intercept value corresponds to Ln(c), and c in the exponential formula is equal to Exp(Intercept). The Slope value corresponds to b in the exponential formula.

Figure 22.10 Exponential Trendline

The following steps describe how to use the Regression analysis tool on the transformed data to obtain regression results for the exponential model.

1. Enter the data into columns A, B, and C as shown in Figure 22.11. If the Year and Sales data are already in columns A and B as shown in Figure 22.9, select column B, choose Insert from the shortcut menu, and enter the label X and integers 1 through 8 in column B.

2. Enter the label Ln(Sales) in cell D1. Enter the formula =LN(C2) in cell D2.

3. To copy the formula, select cell D2 and double-click the fill handle in its lower-right corner. The log values appear in column D.

4. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, select Regression and click OK. The Regression dialog box appears.


5. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable (D1:D9), including the label in row 1.

6. Input X Range: Point to or enter the reference for the range containing values of the explanatory variable (B1:B9), including the label in row 1.

7. Labels: Labels were included in the Input X and Y Ranges, so select this box.


9. Output options: Click the Output Range option button, select the range edit box to the right, and point to or enter a reference for the top-left corner cell of a range 16 columns wide where the summary output and charts should appear (F1). If desired, check the appropriate boxes for Residuals. Then click OK.

Figure 22.11 shows the regression output after deleting the ANOVA portion (by selecting F10:N14 and choosing Delete | Shift Cells Up from the shortcut menu). The R Square and Standard Error values cannot be compared directly with the linear model in Chapter 21. Here, R square is the proportion of variation in Ln(y) explained by x in a linear model, and the standard error is expressed in the same units of measurement as Ln(y).

Figure 22.11 Exponential Regression Results

To determine the value of c for the exponential formula, select cell G14, and enter the formula =EXP(G12). To make a prediction of average sales using the exponential model, enter the x value in a cell (9 in cell B10, for example). Then build a formula for predicted sales (=G14*EXP(G13*B10) in cell C10).

An alternative method for obtaining exponential regression results is to use the LOGEST and GROWTH worksheet functions. The descriptions of these functions in Excel's on-line help use the equation

y = b * mx.


This b value corresponds to c in the trendline exponential equation, and this m corresponds to eb.

LOGEST provides regression coefficients, standard errors, and other summary measures. This function can be used for multiple regression (two or more x variables) and must be array-entered. Its syntax is

LOGEST(known_y's,known_x's,const,stats).

The "const" and "stats" arguments are true-or-false values, where "const" specifies whether b is forced to equal one and "stats" indicates whether summary statistics are desired.

To obtain the results shown in Figure 22.12, select E1:F5, type or use the Insert Function button to enter LOGEST, press F2, and finally hold down the Control and Shift keys while you press Enter. Cells E7:F11 show the numerical results that appear in cells E1:F5, and cells E13:F17 describe the contents of those cells. These same values, except m, appear with labels in the Regression analysis tool summary output shown in Figure 22.11.

Figure 22.12 Regression Using LOGEST

The GROWTH function is similar to the TREND function, except that it returns fitted values for the exponential equation instead of the linear equation. GROWTH can also be used for multiple regression (two or more x variables) and must be array-entered.


EXERCISES Exercise 22.1 Seven identical automobiles were driven by employees for business purposes for several days. The drivers reported average speed, in miles per hour, and gas mileage, in miles per gallon, as shown in the following table.

Speed Gas Mileage MPH MPG 32 20 37 23 44 26 49 27 56 26 62 25 68 22

1. Prepare a scatterplot and insert a quadratic trendline.


3. Make a prediction of gas mileage for an automobile driven at an average speed of 50 miles per hour.

Exercise 22.2 A chain store tried different prices for a television set in five retail markets during a four-week period. The following table shows the retail prices and sales rates, in units sold per thousand of residents in the market.

Price Sales Rate $275 1.60 $300 0.95 $325 0.65 $350 0.50 $375 0.45

1. Prepare a scatterplot and insert an appropriate trendline.


3. Make a prediction of sales rate for a market where the price is $295.

Multiple Regression 23

In Chapter 21, a simple linear regression model examined the relationship between selling price and living space for 15 real estate properties. The standard error was $3,328, and R square was 0.664, indicating 66% of the variation in selling prices could be explained using living space as the explanatory variable in a linear model.

More of the variation in selling prices might be explained by using an additional variable. Data on the most recent assessed value (for property tax purposes) are also available; perhaps selling price is related to assessed value. Multiple regression can examine the relationship between selling price and two explanatory variables, living space and assessed value. (The pairwise correlations among these three variables were examined in Chapter 19.) The following steps describe how to use the Regression analysis tool for multiple regression.

1. Arrange the data in columns with the two explanatory variables in columns on the left and the dependent variable in a column on the right. The two (or more) explanatory variables must be in adjacent columns. If the data from Chapter 21 (or Example 19.1) are in columns A and B, insert a new column B and enter the new data for assessed value as shown in Figure 16.1.

2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, scroll the list box, select Regression, and choose OK.

3. Input Y Range: Point to or enter the reference for the range containing values of the dependent variable (selling prices, C1:C16). Include the label above the data.

4. Input X Range: Point to or enter the reference for the range containing values of the two explanatory variables (SqFt and Assessed, A1:B16). Include the labels above the data.

5. Other dialog box entries: Fill in the other checkboxes and edit boxes as shown in Figure 23.1. Then click OK. If the error message "Regression - Cannot add chart to a shared workbook" appears, click Cancel; to obtain chart output, select New Workbook under Output Options in the Regression dialog box.

284 Chapter 23 Multiple Regression


6. Optional: To change column widths so that all summary output labels are visible, select the cell containing the Adjusted R Square label (E6) and hold down the Control key while selecting cells containing the labels Coefficients (F16), Standard Error (G16), Significance F (J11), and Upper 95% (K16). From the Format menu, choose the Column command and select AutoFit Selection. The results are shown in Figure 23.2.

23.1 Interpretation of Regression Output 285

Figure 23.2 Multiple Regression Summary Output

23.1 INTERPRETATION OF REGRESSION OUTPUT Referring to the coefficients in cells F17:F19 shown in Figure 23.2, and rounding to three decimal places, the regression equation is

Price = 14.123 + 0.017 * SqFt + 0.361 * Assessed.

In a multiple regression model, the coefficients are called net regression coefficients or partial slopes. For example, if assessed value is held constant (or if we could examine a subset of the properties that have equal assessed value), and living space is allowed to vary, then selling price varies by 0.017 thousands of dollars for a unit change in square feet of living space. Similarly, if living space is held constant, then selling price varies by 0.361 thousands of dollars for a unit change in assessed value (also measured in thousands of dollars).

Significance of Coefficients The t statistic for the SqFt coefficient is greater than two, indicating that 0.017 is significantly different from zero. We can reject the null hypothesis that there is no relationship between SqFt and Price in this model and conclude that a significant relationship exists.

The t statistic for the Assessed coefficient is 2.79, indicating that 0.361 is significantly different from zero.


The p-value is a two-tail probability using the t distribution. Since we would expect to see a positive relationship between selling price and each explanatory variable, one-tail tests are appropriate here. Dividing each p-value in the summary output by two, the one-tail p-values are approximately 0.00038 and 0.0081. Thus, in this model we can reject the hypotheses of no relationship between selling price and each explanatory variable at the 1% level of significance.

The t statistic for the Intercept term is usually ignored.

Interpretation of the Regression Statistics Referring to row 7 of Figure 23.2, the standard error for the multiple regression model is $2,623, which is an improvement over the $3,328 standard error for the simple regression model. The R-Square value in row 5 indicates that approximately 80% of the variation in selling price can be explained using a linear model with living space and assessed value as explanatory variables. This is also an improvement over the simple model with one explanatory variable, where only 66% of the variation was explained.

Interpretation of the Analysis of Variance The analysis of variance output shown in rows 10 through 14 of Figure 23.2 is the result of testing the null hypothesis that all regression coefficients are simultaneously equal to zero. The final result is a p-value, labeled Significance F in the output. Here, the p-value is approximately 0.00007, the probability of getting these results in a random sample from a population with no relationship between selling price and the explanatory variables. Our p-value indicates it is extremely unlikely to observe these results in a random sample from such a population, so we reject the hypothesis of no relationship and conclude that at least one significant relationship exists.

23.2 ANALYSIS OF RESIDUALS Residual plots are useful for checking to see whether the assumptions of linear relationships and constant variance are appropriate. Excel provides plots of residuals versus each of the explanatory variables, as shown in Figure 23.3 and Figure 23.4. These charts are located to the right of the regression summary output.

23.2 Analysis of Residuals 287

Figure 23.3 Residuals versus SqFt of Living Space

If the relationship between selling price and living space is linear (after taking into account assessed value), then a random pattern should appear in the residual plot. On the other hand, if we see curvature or some other systematic pattern, then we should change our model to incorporate the nonlinear relationship.

Most observers would conclude that the residual plot is essentially random, so no additional modeling is required. Because our sample size is so small (15 observations), it can be difficult to detect nonlinear patterns.

Residual plots are useful for detecting situations where the residuals are smaller in one region and larger in another. The residual plot would have the shape of a tree resting on its side. In such cases the standard error of the estimate, which summarizes all of the residual terms, would overstate the variation in one region and understate the variation in another.

Looking at the plot of residuals versus assessed values shown in Figure 23.4, the pattern also appears random. Once again, the small sample size makes it difficult to detect nonlinear patterns.


Figure 23.4 Residuals versus Assessed Value

23.3 USING TREND TO MAKE PREDICTIONS When satisfied with the model, we can proceed to use the model to make predictions of selling price for new properties. Assume there are four properties with 600, 800, 1,000, and 1,200 square feet of living space and assessed values of $22,500, $25,000, $27,500, and $30,000, respectively. The following steps describe how to use the TREND function for making the predictions about selling price. The syntax for the TREND function is

TREND(known_y's,known_x's,new_x's,const).

1. Enter the values for the explanatory variables on the worksheet (A18:B21) as shown in Figure 23.5 (where Predicted Price, Residuals, and Standard Residuals have been relocated, and rows 11 through 14 are hidden).

2. Select the cells that will contain the predicted values (D18:D21). Type an equals sign, the TREND function in lowercase, and appropriate references for the function arguments:

=trend(c2:c16,a2:b16,a18:b21,1)

Don't press Enter; instead, hold down the Control and Shift keys and press Enter. The formula bar displays TREND in uppercase, indicating that Excel recognizes the function name, and displays curly brackets around the function as shown in Figure 23.5, indicating that the array function has been entered correctly.

23.3 Using TREND to Make Predictions 289

Instead of typing the TREND function, an alternative is to select the output cells (D18:D21) and click the Insert Function tool (icon fx). In the Insert Function dialog box, select Statistical in the category list box, select TREND in the function list box, and click OK. In the TREND dialog box, type or point to (click and drag) ranges on the worksheet containing the known y values (C2:C16), known x values (A2:B16), and new x values (A18:B21). Do not include the labels in row 1 in these ranges. In the edit box labeled "Const," type the integer 1, which is interpreted as true, indicating that an intercept term is desired. Then click OK. With the function cells (D18:D21) still selected, press the F2 key (for editing). The word "Edit" appears in the status bar at the bottom of the screen. Hold down the Control and Shift keys and press Enter.

Figure 23.5 Multiple Regression Predictions

Interpretation of the Predictions The best-guess prediction of selling price for a property with 800 square feet of living space and an assessed value of $25,000 is $36,445. An approximate 95% prediction interval uses this best guess plus or minus two standard errors of the estimate ($36,445 ± 2 * $2,623, or $36,445 ± $5,246, which is from $31,199 to $41,691). We are 95% confident that the selling price will be in this range.


However, there are two things approximate about this prediction interval. First, instead of using the standard error of the estimate, which measures only the scatter of the actual values around the regression equation, we should use the standard error of a prediction, which also takes into account uncertainty in the coefficients of the regression equation. The standard error of a prediction is always greater than the standard error of the estimate. Unfortunately, there is no simple way to compute the standard error of a prediction using Excel.

Second, the number of standard errors for a 95% prediction interval based on 15 observations with our model should use a value of the t statistic with 12 degrees of freedom, which is 2.179, not 2. (For a very large sample size, the normal distribution is appropriate, and the number of standard errors is 1.96, which is approximately 2.) Therefore, our approximate interval is very approximate. An exact 95% prediction interval would be wider.

EXERCISES Exercise 23.1 The president of a national real estate company wanted to know why certain branches of the company outperformed others. He felt that the key factors in determining total annual sales (in $ millions) were the advertising budget (in $ thousands) and the number of sales agents. To analyze the situation, he took a sample of eight offices and collected the data in the following table.

Advertising Number Annual Sales Office ($ thousands) of Agents ($ millions) 1 249 15 32 2 183 14 18 3 310 21 49 4 246 18 52 5 288 13 36 6 248 21 43 7 256 20 24 8 241 19 41

1. Prepare a regression model and interpret the coefficients.

2. Test to determine whether there is a linear relationship between each explanatory variable and the dependent variable, with a 5% level of significance.

3. Make a prediction of annual sales for a branch with an advertising budget of $250,000 and 17 agents.

Exercise 23.2 (adapted from Canavos, p. 602) A university placement office conducted a study to determine whether the variation in starting salaries for school of business

Exercises 291

graduates can be explained by the students' grade point average (GPA) and age upon graduation. The placement office obtained the sample data shown in the following table.

GPA Age Starting Salary 2.95 22 $25,500 3.40 23 28,100 3.20 27 28,200 3.10 25 25,000 3.05 23 22,700 2.75 28 22,500 3.15 26 26,000 2.75 26 23,800

1. Prepare a regression model and interpret the coefficients.

2. Determine whether grade point average and age contribute substantially in explaining the variation in the sample of starting salaries.

3. Make a prediction of starting salary for a 24-year-old graduate with a 3.00 GPA.

Regression Using Categorical Variables 24

This chapter describes regression models in which an explanatory variable or dependent variable is categorical (qualitative) instead of numerical (quantitative).

24.1 CATEGORIES AS EXPLANATORY VARIABLES In the regression models of previous chapters, the explanatory variables were numerical variables. In many situations it is better to use categorical variables as predictors. When binary, the categorical variables indicate the presence or absence of a characteristic, such as male/female, married/unmarried, or weekend/weekday. These binary variables can be used as predictors in a regression model by assigning the value 0 or 1 for each observation in the data set. The 0/1 variable is sometimes called an indicator variable or dummy variable.

In other situations a categorical variable has more than two categories, such as season (winter, spring, summer, or fall), weather (sunny, overcast, or rain), or academic major (accounting, management, or finance). In these cases we use a number of indicator variables equal to one less than the number of categories. For each observation the value of an indicator variable is 1 or 0, indicating whether the observation corresponds to one of the categories. For an observation that corresponds to the category that doesn't have an indicator variable, the value for all indicator variables is 0; this category is sometimes called the default category or base-case category.

Example 24.1 (adapted from Cryer, p. 139) In addition to square feet of living space and assessed value, each property is categorized by construction grade (low, medium, or high) as shown in Figure 24.1. This categorical variable can be used as a predictor variable in a regression model for explaining variation in the selling price of the property.

294 Chapter 24 Regression Using Categorical Variables

Figure 24.1 Real Estate Property Data

The initial analysis uses only construction grade as the predictor of selling price, followed by a multiple regression model using construction grade and the other predictor variables (square feet of living space and assessed value).

The following steps describe how to use indicator variables in a regression model. An indicator variable is defined for each of the three categories. Low is selected as the base-case category; only indicator variables for the Medium and High categories are included in the regression model.

1. Arrange the data in a worksheet as shown in Figure 24.1.

2. Select columns C:E. With the pointer in the selected range, right-click and choose Insert from the shortcut menu. Enter the labels Low, Medium, and High in cells C1:E1.

3. Enter a formula in cell C2 for determining values of the Low indicator variable: =IF(B2="Low",1,0). The meaning of this formula is "If the grade is low, use the value 1; otherwise use the value 0."

4. Enter a formula in cell D2 for determining values of the Medium indicator variable: =IF(B2="Medium",1,0). The meaning of this formula is "If the grade is medium, use the value 1; otherwise use the value 0."

5. Enter a formula in cell E2 for determining values of the High indicator variable: =IF(B2="High",1,0). The meaning of this formula is "If the grade is high, use the value 1; otherwise use the value 0." If the three formulas are entered correctly, the contents of cells C2:E2 are 1, 0, and 0.

24.1 Categories as Explanatory Variables 295

6. Select the new formulas in cells C2:E2. To copy the formulas to the other cells, double-click the fill handle (small square in the lower-right corner of the selected range). The worksheet should appear as shown in Figure 24.2.

Figure 24.2 Indicator Variables

7. Optional: The formulas in columns C, D, and E contain relative references to column B. If these formulas are copied to other parts of the worksheet, the references may not be correct. To eliminate the formulas and retain the zero-one values, select columns C, D, and E, right-click and choose Copy from the shortcut menu; with C, D, and E still selected, right-click, choose Paste Special from the shortcut menu, select Values (also, select None as the Operation and clear both checkboxes for Skip Blanks and Transpose), and click OK.

8. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, scroll the list box, select Regression, and click OK.

9. If necessary, refer to Chapter 21 for details on filling in the dialog box. The Input Y Range contains the selling prices (G1:G16), the Input X Range contains the values for the two explanatory variables, Medium and High (D1:E16), the Output Range is I1, and the Labels, Residuals, and Standardized Residuals checkboxes are selected.

10. Optional: Adjust column widths so that all labels of the regression output are visible. Details are described in Chapter 21. The formatted Summary Output section is shown in Figure 24.3.


Figure 24.3 Regression Output Using Two Indicators

24.2 INTERPRETATION OF REGRESSION USING INDICATORS Referring to the coefficients in the summary output shown in Figure 24.3 and rounding to three decimal places, the fitted regression model is

Price = 29.400 + 9.356 * Medium + 14.533 * High.

For a property with low construction grade (substituting Medium = 0 and High = 0 into the model), the fitted selling price is 29.400. The average selling price for properties with low construction grade is thus $29,400. For a property with medium construction (Medium = 1 and High = 0), the fitted selling price is 38.756. For a property with high construction grade (Medium = 0 and High = 1), the fitted selling price is 43.933.

The Intercept constant, 29.400, is the average selling price for the base-case category. The Medium coefficient, 9.356, indicates the difference in the average selling price for the Medium category from the base-case category, Low. And the High coefficient, 14.533, indicates the difference in the average selling price for the High category from the base-case category.

The R-square value of 0.820701 indicates that 82% of the variation in selling prices can be explained using only construction grade. This compares favorably with approximately 80% explained variation for the multiple regression model of Chapter 23 using living space and assessed value as explanatory variables.

24.3 Interpretation of Multiple Regression 297

These regression results yield the same average selling prices that would be obtained by simply averaging the price for each construction grade. For example, the mean selling price for the three high construction grade properties (44.8, 41.8, and 45.2) is 43.933.

An advantage of using indicator variables is that they can be combined with other explanatory variables in a multiple regression model. The following steps provide a general description of how to use construction grade, assessed value, and living space as explanatory variables.

1. The four x variables (SqFt, Medium, High, and Assessed) must be in adjacent columns. If the data are arranged as shown in Figure 24.2, one method is to select column F (Assessed), right-click, and choose Insert from the shortcut menu. Then select column A (SqFt), right-click, and choose Copy from the shortcut menu; select column F (empty), right-click, and choose Paste from the shortcut menu. (Alternatively, after inserting empty column F, select column A, position the mouse pointer near the edge of column A until it turns into an arrow, and click and drag column A to column F.)

2. In the Regression dialog box, the Input Y Range contains the selling prices (H1:H16), the Input X Range contains the values for the four explanatory variables, Medium, High, SqFt, and Assessed (D1:G16), the Output Range is J1, and the Labels, Residuals, and Standardized Residuals checkboxes are selected.

24.3 INTERPRETATION OF MULTIPLE REGRESSION After adjusting column widths, the summary output is shown in Figure 24.4. Rounding to three decimal places, the fitted regression model is

Price = 19.152 + 6.035 * Medium + 7.953 * High + 0.010 * SqFt + 0.184 * Assessed.

The net regression coefficients taking all four variables into consideration are different from the model in Chapter 23 (which used only SqFt and Assessed) and the previous model in this chapter (using only Medium and High). For example, for properties with the same construction grade and assessed value, selling price varies by 0.010 thousands of dollars for a unit change in square feet of living space, on the average.

R square indicates that 92% of the variation in selling prices can be explained using this linear model with construction grade, living space, and assessed value as explanatory variables. The remaining unexplained variation is summarized by the $1,783 standard error of estimate.


Figure 24.4 Multiple Regression Output

24.4 CATEGORIES AS THE DEPENDENT VARIABLE Discriminant analysis refers to the use of models where the dependent variable is categorical. If the dependent variable is binary (two categorical values, coded as 0 and 1), then multiple regression can be used to determine a fitted model. The more general problem involving a dependent variable with three or more categories requires advanced nonregression techniques not described here.

Example 17.2 (adapted from Cryer, p. 614) Figure 24.5 contains financial ratio data on 16 firms from 1968 to 1972. Seven of these firms went bankrupt two years later, and nine firms were financially sound at the end of the same period. Two financial ratios were selected as explanatory variables: net income to total assets (NI/TA) and current assets to net sales (CA/NS). The problem is to determine a linear combination of the two variables that best discriminates between the bankrupt firms and the financially sound firms.

24.4 Categories as the Dependent Variable 299

Figure 24.5 Financial Ratio and Bankruptcy Data

The following steps describe how to perform discriminant analysis for a binary dependent variable using multiple regression.

1. Enter the data shown in Figure 24.5 on a worksheet.

2. Use the Regression analysis tool as described in Chapters 21, 22, and 23. The Input Y Range is the bankruptcy 1/0 variable (C1:C17), the Input X Range contains the two financial ratios (A1:B17), and the Output Range is E1. Select the Labels checkbox and the Residuals checkbox.

3. Format the regression summary output as described in Chapter 21. The result is shown in Figure 24.6.


Figure 24.6 Financial Ratio and Bankruptcy Regression Output

Referring to the coefficients in the summary output shown in Figure 24.6 and rounding to four decimal places, the fitted regression model is

Bankrupt = - 0.0027 - 1.7623 * NI/TA + 0.9600 * CA/NS.

The Predicted Bankrupt values calculated using this model are located below the regression summary output. The following steps relocate the predicted values and calculate other values for the discriminant analysis.

4. To make room for additional calculations, select columns D:F. With the pointer in the selected range, right-click and choose Insert from the shortcut menu.

5. To relocate the predicted values, select cells I25:I41. With the pointer in the selected range, right-click and choose Copy from the shortcut menu. Then select cell D1, right-click, and choose Paste from the shortcut menu.

6. Optional: With the pasted range D1:D17 still selected, choose Column from the Format menu and select AutoFit Selection. Select the predicted values D2:D17 and repeatedly click the Decrease Decimal tool button until three decimal places are displayed.

The regression model uses the two financial ratios to predict the value 1 for bankrupt firms and 0 for the sound firms. However, the predicted values are not exactly equal to 1 or 0, so we need a rule for predicting which firms are bankrupt and which are sound. A simple rule is to predict bankruptcy if the Predicted Bankrupt value is greater than 0.5 and predict soundness if the Predicted Bankrupt value is less than or equal to 0.5.

24.4 Categories as the Dependent Variable 301

7. Enter the label Classification in cell E1 and adjust the column width. To classify the Predicted Bankrupt values, enter a formula in cell E2: =IF(D2>0.5,1,0). The meaning of this formula is "If the Predicted Bankrupt value is greater than 0.5, use the value 1; otherwise use the value 0."

8. Enter the label Correct in cell F1. To determine which firms were classified correctly, enter a formula in cell F2: =IF(C2=E2,1,0). This means "If the actual Bankrupt value equals the predicted classification, use the value 1; otherwise use the value 0."

9. Select the two formulas (E2:F2). To copy the formulas to the other cells, double-click the fill handle (small square in the lower-right corner of the selected range).

10. To determine the total number of correct classifications, select cell F18 and click the sum tool twice. The results are shown in Figure 24.7.

Figure 24.7 Bankruptcy Predictions

Interpretation of the Classifications Using the break point 0.5 to determine the classification from the Predicted Bankrupt values, the Correct values in Figure 24.7 show that observations in rows 3, 7, 11, and 15 are misclassified. Two of the seven bankrupt firms were misclassified, and two of the nine sound firms were misclassified.

Overall, 12 of 16 firms (75%) were properly classified by the model. If this "hit rate" is acceptable, then we could use the model to predict the soundness of another firm. We


would substitute the firm's financial ratios into our model, evaluate the regression equation to obtain a fitted value, and predict bankruptcy if the fitted value exceeds 0.5.

Additional analysis could involve trying classification threshold values other than 0.5. Such analysis could be automated using Excel's Data Table feature.

EXERCISES Exercise 24.1 Refer to the real estate property data in Figure 24.1. Determine the selling price per square foot of living space for each of the 15 properties. Develop a regression model using indicator variables for construction grade to explain the variation in price per square foot. Interpret the coefficients. What is the expected price per square foot for a property with low construction grade?

Exercise 24.2 (adapted from Canavos, p. 607) A personnel recruiter for industry wishes to identify the factors that explain the starting salaries for business school graduates. He believes that a student's grade point average (GPA) and academic major are appropriate explanatory variables.

GPA Major Starting Salary 2.95 Management $21,500 3.20 Management 23,000 3.40 Management 24,100 2.85 Accounting 24,000 3.10 Accounting 27,000 2.85 Accounting 27,800 2.75 Finance 20,500 3.10 Finance 22,200 3.15 Finance 21,800

Fit an appropriate model to these data, evaluate it, and interpret it. What is the expected starting salary for an accounting major with a 3.00 GPA?

Exercises 303

Exercise 24.3 The performance of each production line employee in a manufacturing plant has been classified as satisfactory or unsatisfactory. Each employee took pre-employment tests for manual dexterity and analytic aptitude. The company wants to use the test data to predict how future job applicants will perform.

Manual Analytic Satisfactory=1 Dexterity Aptitude Unsatisfactory=0 85 56 1 89 70 1 67 76 1 67 63 1 53 73 1 100 93 1 78 80 1 64 50 1 75 76 0 53 73 0 67 83 0 85 90 0 64 90 0 60 96 0 71 80 0 57 56 0 75 100 0 50 90 0

1. Use a regression model for discriminant analysis of these data.

2. What proportion of the employees are properly classified by the model?

3. If a prospective employee scores 75 on manual dexterity and 80 on analytic aptitude, what is the predicted performance: satisfactory or unsatisfactory?


Exercise 24.4 A credit manager has classified each of the company's loans as being either current or in default. For each loan, the manager has data describing the person's annual income and assets (both in thousands of dollars) and years of employment. The manager wants to use this information to develop a rule for predicting whether a loan applicant will default. Current=1 Years of Default=0 Income Assets Employment Performance 44 105 10 1 26 109 19 1 39 120 12 1 50 139 20 1 42 84 9 1 35 120 13 0 28 84 10 0 37 114 5 0 26 109 15 0 33 114 10 1 37 150 5 0 30 144 4 0 32 75 15 1 32 135 8 0 42 135 4 0 33 94 13 1 33 124 7 0 25 135 14 0

1. Use a regression model for discriminant analysis of these data.

2. What proportion of the loans is properly classified by the model?

3. If an applicant has $40,000 annual income, $100,000 assets, and 11 years of employment, what is the predicted performance: current or default?

Regression Models for Cross-Sectional Data 25

25.1 CROSS-SECTIONAL REGRESSION CHECKLIST

Plot Y versus each X 1 Verify that the relationship agrees with your prior judgment, e.g., positive vs

negative relationship, linear vs nonlinear, strong vs weak

2 Identify outliers or unusual observations and decide whether to exclude

3 Determine whether the relationship is linear; if not, consider using a nonlinear form, e.g., quadratic (include X and X^2 in the model)

Examine the correlation matrix 4 Identify potential multicollinearity problems, i.e., high correlation between a

pair of X variables; if so, consider using only one X of the pair in the model

Calculate the regression model with diagnostics 5 Verify that the sign of each regression coefficient agrees with your prior

judgment, i.e., positive vs negative relationship; otherwise, consider excluding that X and rerun the regression

6 Examine each plot of residuals vs X; if there is a non-random pattern (e.g., U-shape or upside-down-U-shape), use a nonlinear form for that X in a new model

7 Identify key X variables by comparing standardized regression coefficients, usually computed by multiplying an X coefficient by the standard deviation of that X and dividing by the standard deviation of Y. This dimensionless standardized regression coefficient measures how much Y (in standard deviation units) is affected by a change in X (in standard deviation units).

306 Chapter 25 Regression Models for Cross-Sectional Data

8 If a goal is to find a model with small standard error of estimate (approx. standard deviation of residuals), use the t-stat screening method. Disregard the t-stat for the intercept. If there are X variables with a t-stat between -1 and +1, remove the single X variable whose t-stat is closest to zero, and rerun the regression. Remove only one X variable at a time.

9 Before using the final model, examine each plot of residuals vs X to verify that the random scatter is the same for all values of X. If there is more scatter for higher values of X, consider using a log transformation of X in the model (instead of using X itself). If the scatter is not uniform with respect to X, the standard error of estimate may not be a useful measure of uncertainty because it overstates the uncertainty for some values of X and understates the uncertainty for other values of X.

Use the model 10 If the purpose is to identify unusual observations, examine the residuals directly

for large negative or large positive values, or examine the standardized residuals (each residual divided by the standard deviation of residuals) for values more extreme than +2 or -2 or for values more extreme than +3 or -3.

11 If the purpose is to make predictions, use the X values for a new observation to compute a predicted Y. Use the standard error of estimate to provide an interval estimate, e.g., an approximate 95% prediction interval that ranges from two standard errors below to two standard errors above the predicted Y. Avoid extrapolation, i.e., do not make predictions using X values outside the range of the original data.

Time Series Data and Forecasts 26

26.1 TIME SERIES PATTERNS Meandering time series pattern: Small changes from period to period, possible larger changes over a longer period of time

Use an autoregressive model

Figure 26.1 Typical Meandering Time Series Pattern

Time

Val

ue

308 Chapter 26 Time Series Data and Forecasts

Figure 26.2 Typical Long-Term Trend Time Series Patterns

Time

Val

ue

Negative Linear

Positive Linear

No Trend

Positive Nonlinear

Figure 26.3 Typical Quarterly Seasonal Time Series with Linear Trend

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Quarter

Valu

e

26.1 Time Series Patterns 309

Figure 26.4 Quarterly Seasonal Pattern with Nonlinear Trend

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Quarter

Valu

e

Strong seasonal pattern, no trend during first 12 quarters, positive trend during middle 12 quarters, no trend during last 12 quarters

310 Chapter 26 Time Series Data and Forecasts


Autocorrelation and Autoregression 27

This chapter describes techniques for analyzing time sequence data that exhibit a non-seasonal meandering pattern, where adjacent observations have values that are usually close but distant observations may have very different values. Meandering patterns are quite common for many economic time series, such as stock prices. If the time sequence data have seasonality—that is, a recurring pattern over time—the techniques described in Chapter 29 are appropriate.

To obtain the results shown in following figures, enter the month and wage data in columns A and B of a worksheet as shown in Figure 27.1. For each type of analysis described in this chapter, create a copy of the original data by choosing Move or Copy Sheet from the Edit menu, checking the Create a Copy checkbox, and clicking OK.

Figure 27.1 Wage Data and Time Sequence Plot

The first step is to examine a time sequence plot. Select the wage data, and use Excel's Chart Wizard to create a Line chart type. Figure 27.1 shows the data and a plot of average

312 Chapter 27 Autocorrelation and Autoregression

hourly wages of textile and apparel workers for the 18 months from January 1986 through June 1987. These data are the last 18 values from the 72-value data file APAWAGES.DAT that accompanies Cryer, second edition; the original source is Survey of Current Business, September issues, 1981–1987.

27.1 LINEAR TIME TREND Initial inspection of the time sequence plot in Figure 27.1 suggests that a straight-line fit may be an appropriate model. To obtain the results shown in following figures, create a copy of the data shown in Figure 27.1. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, select Regression from the Analysis Tools list box and click OK. In the Regression dialog box, the Input Y Range is B1:B19 and the Input X Range is A1:A19. Check the Labels box. Click the Output Range option button, select the adjacent text box, and specify D1. Check the Residuals and Line Fit Plots checkboxes in the Residuals section. Then click OK. (If the error message "Cannot add chart to a shared workbook" appears, click Cancel; in the Regression dialog box, click New Workbook in the Output options, and click OK.) An edited portion of the regression output is shown in Figure 27.2.

Figure 27.2 Simple Linear Regression Output

The R-square value indicates that approximately 73% of the variation in wages can be explained using a linear time trend. The regression model is Fitted Wage = 5.7709 + 0.0095 * Month, indicating that wages increase by 0.0095 dollars per month, on the average. The t statistic and p-value verify that there is a significant linear relationship.

The R Square, t statistic, and p-value indicate an excellent fit, but the line fit plot shown in Figure 27.3 shows that the regression model assumption of independent residuals may be violated. When wages are above the linear time trend, they tend to stay above, and when they are below the trend line, they tend to stay below. In other words, if the previous residual is positive, the current residual is likely to be positive, and if the previous residual is negative, the current residual is likely to be negative. Thus, the residuals are not independent. Successive residuals in this model are positively

27.2 Durbin-Watson Statistic 313

correlated. This "stickiness" is positive autocorrelation, which can be quantified using the Durbin-Watson statistic.

Figure 27.3 Time Sequence Plot and Linear Fit

27.2 DURBIN-WATSON STATISTIC The Durbin-Watson statistic may be used to test for correlation of successive residuals in a time series model. The statistic is calculated by first determining the difference between successive residuals. For example, in Figure 27.4, we could compute F26 – F25, F27 – F26, F28 – F27, and so on. These differences are squared and then summed to determine the numerator of the Durbin-Watson statistic. In Excel, the numerator can be computed using the SUMXMY2 function, where XMY2 means the square of x minus y. The denominator of the Durbin-Watson statistic is the sum of the squared residuals, which can be computed using Excel's SUMSQ function. Both functions accept arrays as arguments.

Figure 27.4 Residual Output and Durbin-Watson Statistic


For the linear time trend model, the residuals are in cells F25:F42. In Figure 27.4, cell H25 contains the following formula for computing the Durbin-Watson statistic:

=SUMXMY2(F26:F42,F25:F41)/SUMSQ(F25:F42)

In general, for time periods 1 through n, the first argument for SUMXMY2 is the range containing residuals for periods 2 through n, and the second argument is the range for residuals for periods 1 through n – 1. The argument for SUMSQ is the range containing residuals for periods 1 through n.

The possible values of the Durbin-Watson statistic range from 0 to 4. Values close to 0 indicate strong positive autocorrelation; a value of 2 indicates zero autocorrelation; values near 4 indicate strong negative autocorrelation. Here the value 1.050 shows that there is some positive autocorrelation of residuals.

27.3 AUTOCORRELATION The Durbin-Watson statistic measures autocorrelation of residuals associated with a model. It is often useful to examine the correlation of time series values with themselves before modeling. This approach looks at the correlation between current and previous values. The previous values are called lagged values, and the number of time periods between each current and previous value is the lag length. For example, values that are one time period before the current values are called lag 1; values that are two periods earlier are called lag 2.

The following steps describe how to construct an autocorrelation plot for lag 1.

1. Enter the month and wage data in columns A and B of a sheet as shown in Figure 27.1 or copy previously entered data to a new sheet.

2. Select column B, right-click, and choose Insert from the shortcut menu.

3. Type the label Lag 1 in cell B1.

4. Select cells C2:C18 containing the first 17 wage values, right-click, and choose Copy from the shortcut menu.

5. Select cell B3, right-click, and choose Paste from the shortcut menu. The top section of the sheet appears as shown in Figure 27.5.

27.3 Autocorrelation 315

Figure 27.5 Arranging Lag 1 Data

6. Select row 2, right-click, and choose Delete from the shortcut menu. The results appear as shown in columns A, B, and C in Figure 27.6.

7. To calculate the correlation coefficient, enter the label CORREL= in cell F1 and enter the formula =CORREL(B2:B18,C2:C18) in cell G1. The value of the correlation coefficient, r = 0.8545, appears in cell G1 as shown in Figure 27.6.

8. To prepare the chart, select cells B2:C18 and click the Chart Wizard button.

9. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, click the XY (Scatter) chart type and click Next. In step 2 (Chart Source Data), verify the data range and click Next. In step 3 (Chart Options) on the Titles tab, type chart and axis titles as shown in Figure 27.6; on the Gridlines tab, clear all checkboxes; on the Legend tab, clear the checkbox for Show Legend and click Finish.

10. To facilitate interpreting the autocorrelation plot, change its size and axes. Use the handles on the outermost edge of the chart to obtain a nearly square shape. For both the vertical axis and the horizontal axis, select the axis, double-click or right-click and choose Format Axis from the shortcut menu, click the Scale tab, change Minimum to 5.7, change Maximum to 6, change Major Unit to .05, and click OK. Change font size of the axes and titles to 8. The result appears as shown in Figure 27.6.


Figure 27.6 Lagged Data and Autocorrelation Plot

The autocorrelation plot shown in Figure 27.6 shows relatively strong correlation between current wage and one-month previous wage. When the wage is low in a particular month, it is likely that it will be low in the following month; when the wage is high in a particular month, it is likely to be high in the following month.

27.4 AUTOREGRESSION A regression model may be used to quantify the functional relationship between current and previous values of time sequence data. When regression is used to analyze data that exhibit autocorrelation, the technique is called autoregression, and the model is called an autoregressive model. If only one-period lagged data are used for the explanatory variable, the model is called an AR(1) model.

To develop an AR(1) for the wage data, prepare the autocorrelation plot described in the previous section. Right-click on a data point and choose Add Trendline from the shortcut menu. In the Add Trendline dialog box, click the Type tab and click the Linear icon. Click the Options tab and click the checkboxes for Display Equation on Chart and Display R-squared Value on Chart. Then click OK. Optionally, click and drag to relocate the equation and R2. The results appear as shown in Figure 27.7.

27.4 Autoregression 317

Figure 27.7 AR(1) Model Using Add Trendline

The linear fit equation could be written as Wage = 0.8253 + 0.86 * Lag 1, or Current = 0.8253 + 0.86 * Previous, or Yt = 0.8253 + 0.86 * Yt – 1. The R2 value indicates that approximately 73% of the variation in wages can be explained using this simple linear autoregressive model.

A forecast of wage for period 19 can be expressed as Y19 = 0.8253 + 0.86 * Y18 = 0.8253 + 0.86 * 5.91 = 5.9079. A forecast for period 20 could be based on the forecast for period 19: Y20 = 0.8253 + 0.86 * Y19 = 0.8253 + 0.86 * 5.9079 = 5.9061. Of course, the likely error increases for forecasts made further into the future. To quantify the error, to obtain additional diagnostics, and to plot fitted and actual values in a time sequence plot, use the Regression analysis tool.

If a blank sheet is needed, choose Worksheet from the Insert menu. Copy the data shown in columns A, B, and C in Figure 27.7, select a blank worksheet, select cell A1, and Paste. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, click Regression in the Analysis Tools list box and click OK.

In the Regression dialog box, the Input Y Range is C1:C18 and the Input X Range is B1:B18. Check the Labels checkbox. The Output Range is E1. Check the Residuals and Residual Plots checkboxes. Then click OK. (If the error message "Cannot add chart to a shared workbook" appears, click Cancel; in the Regression dialog box, click New Workbook in the Output Options, and click OK.) The results are shown in Figure 27.8.


Figure 27.8 AR(1) Model Using Regression Tool

Referring to cell F7 in Figure 27.8, the standard error of estimate for this AR(1) model is 0.03235, slightly larger than the standard error for the linear time trend model, 0.0319. Thus, an approximate 95% prediction interval uses the previously calculated point estimate plus or minus six cents (two standard errors = 2 * $0.03235 = $0.0647). The residual plot, not shown here, has an essentially random pattern, indicating that the linear relationship between wage and lag 1 is appropriate.

27.4 Autoregression 319

The following steps describe how to construct a time sequence plot showing actual and fitted values.

1. Select C1:C18 and hold down the Control key while selecting F24:F41. Click the Chart Wizard tool.

2. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select Line for the chart type, select "Line with markers displayed at each data value" for the chart sub-type, and click Next.

3. In step 2 (Chart Source Data) on the Series tab, select the range edit box for Category (X) Axis Labels, click and drag cells A2:A18 on the worksheet, and click Next.

4. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles as shown in Figure 27.9. On the Gridlines tab, uncheck all boxes and click Finish.

5. Select the horizontal axis and double-click, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Alignment tab, select the Degrees edit box, type 0 (zero), and click OK.

6. Select the Predicted Wage data series by clicking one of its markers on the chart. Right-click and choose Format Data Series from the shortcut menu. In the Format Data Series dialog box, click the Patterns tab. For Line, click the Custom button and select the small dashed-line pattern from the Line Style drop-down list box. Click OK.

7. Use the chart's fill handles to resize the chart to be approximately 8 standard columns wide and 17 rows high. Change the font size of the chart title, axis titles, axes, and legend to 8. The chart appears as shown in Figure 27.9.


Figure 27.9 Time Sequence Plot and AR(1) Fit

Each Predicted Wage value shown in Figure 27.9 depends upon the actual wage in the previous month. The standard error of estimate is a summary measure of the vertical distances between the actual wage and predicted wage for each month.

27.5 AUTOCORRELATION COEFFICIENTS FUNCTION Autocorrelation coefficients are useful for measuring autocorrelation at various lags. The results may be used as a guide for determining the appropriate number of lagged values for explanatory variables in an autoregressive model. A function that provides the autocorrelation coefficients for any specified lag is called an autocorrelation coefficients function (ACF). A plot of autocorrelation coefficients versus lags is called a correlogram.

The following steps describe how to calculate autocorrelation coefficients.

1. Enter the month and wage data in columns A and B, or make a copy of the data shown in Figure 27.1.

2. Enter the label Z in cell C1. Select cells B1:C19 and from the Insert menu choose Name | Create. In the Create Names dialog box, check the Top Row checkbox and click OK. This step creates the name Wage for the range B2:B19 and the name Z for the range C2:C19.

3. Select cell C2 and enter the formula

=(B2-AVERAGE(Wage))/STDEV(Wage).

27.5 Autocorrelation Coefficients Function 321

With cell C2 selected, double-click the fill handle in the lower-right corner. With cells C2:C19 still selected, click the Decrease Decimal button repeatedly until three decimal places are displayed.

4. Enter the labels Lag and ACF in cells E1 and F1, respectively. Enter the digits 1 through 6 in cells E2:E7. (Here we examine only the first 6 lags. For monthly data where seasonality is expected, the first 12 lags should be investigated.)

5. Select cell F2. Enter the formula

=SUMPRODUCT(OFFSET(Z,E2,0,18-E2),OFFSET(Z,0,0,18-E2))/17.

With cell F2 selected, double-click the fill handle in the lower-right corner. With cells F2:F7 still selected, click the Decrease Decimal button repeatedly until three decimal places are displayed. The results appear as shown in columns A:F in Figure 27.10. (To adapt the formula to other data, use the number of observations, n, instead of 18, and use n–1 instead of 17.)

6. To create the correlogram, select cells F2:F7 and click the Chart Wizard tool.

7. In step 1 of the Chart Wizard (Chart Type), select Column as the chart type and Clustered Column as the chart sub-type, and click Next. In step 2 (Chart Source Data), verify the data range and click Next. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles shown in Figure 27.10; on the Gridlines tab, clear all checkboxes; on the Legend tab, clear the checkbox for Show Legend, and click Finish.

8. Double-click the vertical axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Scale tab; click Minimum and type –0.2; click Maximum and type 1; click Major Unit and type 0.2; click OK.

9. Double-click the horizontal axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Patterns tab; in the Tick-Mark Labels section click Low and click OK. The correlogram appears as shown in Figure 27.10.

The lag 1 autocorrelation coefficient 0.822 shown in Figure 27.10 differs slightly from the regular correlation coefficient 0.8545 for current and lag 1 shown in cell G1 in Figure 27.6. One of the reasons is that the autocorrelation coefficient uses z values for current and lag based on the mean and standard deviation of all 18 observations, but the regular correlation coefficient computes z values using the first 17 observations for current and using the last 17 for lag. The autocorrelation coefficients for wages decrease gradually, indicating that it may be worthwhile to investigate autoregressive models incorporating lagged values beyond lag 1.


Figure 27.10 Autocorrelation Coefficients Function (ACF)

27.6 AR(2) MODEL The autocorrelation coefficients computed in the previous section are 0.822 for lag 1 and 0.664 for lag 2, suggesting that the autoregressive model might be improved by using both lag 1 and lag 2 as explanatory variables.

The following steps describe how to arrange the data for an AR(2) model.

1. Enter the month and wage data in columns A and B, or make a copy of the data shown in Figure 27.1.

2. Select columns B and C. Right-click and choose Insert from the shortcut menu.

3. Enter the labels Lag 1 and Lag 2 in cells B1 and C1, respectively.

4. Copy the wage data in cells D2:D18, select cell B3, and paste.

5. Copy the wage data in cells D2:D17, select cell C4, and paste. The top portion of the worksheet appears as shown in Figure 27.11.

27.6 AR(2) Model 323

Figure 27.11 Arranging Lag 2 Data

6. Select rows 2 and 3. Choose Delete from the shortcut menu. Columns A through D appear as shown in Figure 27.12.

After arranging the data, from the Tools menu choose Data Analysis. In the Data Analysis dialog box, click Regression in the Analysis Tools list box and click OK. In the Regression dialog box, the Input Y Range is D1:D17 and the Input X Range is B1:C17. Check the Labels checkbox. The Output Range is F1. Optionally, select outputs in the Residuals section and click OK. Formatted and edited results without the ANOVA table are shown in Figure 27.12.

Figure 27.12 AR(2) Data and Edited Regression Tool Output

Compared to the AR(1) model, this AR(2) model has a slightly higher standard error of estimate and a lower adjusted R2. The t statistic for the Lag 2 explanatory variable is 0.16251, indicating that the Lag 2 regression coefficient is not significantly different from zero. After taking lag 1 into account, the addition of lag 2 is not useful for explaining the variation in wages.


EXERCISES Exercise 27.1 (adapted from Keller, p. 930) As a preliminary step in forecasting future values, a large mail-order retail outlet has recorded the sales figures, in millions of dollars, shown in the following table.

Year Sales Year Sales 1974 6.7 1984 14.2 1975 7.4 1985 18.1 1976 8.5 1986 16.0 1977 11.2 1987 11.2 1978 12.5 1988 14.8 1979 10.7 1989 15.2 1980 11.9 1990 14.1 1981 11.4 1991 12.2 1982 9.8 1992 15.7 1983 11.5

1. Fit a linear time trend and compute the Durbin-Watson statistic.

2. Construct an autocorrelation plot and develop an autoregressive model.

3. Make forecasts for 1993 using the linear time trend and autoregressive model.

Exercise 27.2 The following table shows annual sales in thousands of units for a new product from the Ekans company.

Year Sales Year Sales Year Sales 1980 36 1985 61 1990 79 1981 44 1986 63 1991 87 1982 52 1987 66 1992 97 1983 56 1988 69 1993 101 1984 58 1989 73 1994 103

1. Fit a linear time trend and compute the Durbin-Watson statistic.

2. Calculate values of the autocorrelation function for lags 1 through 6.

3. Try autoregressive models AR(1), AR(2), AR(3), and AR(4). Which of these models is most appropriate?

Time Series Smoothing 28

This chapter describes two methods for smoothing time series data: moving averages and exponential smoothing. The purpose of smoothing is to eliminate the irregular and seasonal variation in the data so it's easier to see the long-run behavior of the time series. The long-run pattern is called the trend, and it may also include variation due to the business cycle. The smoothed version of the data may be used to make a forecast of trend, or it may be used as part of the analysis of seasonality, as described in Chapter 29.

The data set used for moving averages in this chapter and for seasonal analysis in Chapter 29 is quarterly U.S. retail sales, in billions of dollars, from first quarter 1983 through fourth quarter 1987. These data, shown in column C of Figure 28.1, are a quarterly aggregation of the monthly data in the file RETAIL.DAT that accompanies the second edition of Cryer; the original source is Survey of Current Business, 1987.

326 Chapter 28 Time Series Smoothing

Figure 28.1 Labels and Sales Data

The following steps describe how to construct a time sequence plot using two lines (quarter and year) for labeling the horizontal axis.

1. Enter the labels Year, Quarter, and Sales in row 1 and enter the years, quarters, and sales data in columns A, B, and C.

2. Select cells A1:C21 and click the Chart Wizard button.

3. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, select Line for chart type and "Line with markers displayed at each data value" for chart sub-type. Click Next.

4. In step 2 (Chart Source Data), verify the data range and click Next.

5. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles shown in Figure 28.2. On the Gridlines tab, clear all checkboxes. On the Legend tab, clear the checkbox for Show Legend. Click Finish.

6. Click and drag the sizing handles so that the chart is approximately 9 columns wide and 20 rows high.

7. To change the font size of the chart title, axis titles, and axes to 7, select each object, click the Font Size tool on the Formatting toolbar, and enter 7.

28.1 Moving Average Using Add Trendline 327

8. Select the vertical axis and click the Decrease Decimal button.

9. Double-click the vertical axis; in the Format Axis dialog box on the Scale tab, enter 200 for the Minimum.

10. Double-click the horizontal axis; in the Format Axis dialog box on the Alignment tab, enter 0 (zero) in the Degrees edit box. The chart appears as shown in Figure 28.2.

Figure 28.2 Time Sequence Plot of Sales Data

Quarterly U.S. retail sales exhibit strong seasonality with an upward linear trend. A moving average may be used to eliminate the seasonal variation so the trend is even more apparent.

28.1 MOVING AVERAGE USING ADD TRENDLINE The following steps describe how to insert the moving average line on the time sequence chart.

1. Right-click one of the markers of the data series and choose Add Trendline from the shortcut menu.

2. In the Add Trendline dialog box on the Type tab, click the Moving Average icon and enter 4 as the Period, as shown in Figure 28.3. The moving average line appears on the chart as shown in Figure 28.4.


Figure 28.3 Add Trendline Dialog Box

The first moving average shown in Figure 28.4 is an average of the first four quarters and is associated with 1983 quarter IV. The period is specified as 4 in this example because the repeating pattern is four quarters long. If the time series data are monthly, the period is usually 12. If daily data have a recurring pattern each week, the period should be 7.

Figure 28.4 Time Sequence Plot with Moving Average

When the Add Trendline command is used to obtain the moving average, the default pattern is a medium-weight line as shown in Figure 28.4. The style and weight of the line may be changed by double-clicking on the moving average line, but it isn't possible to add markers. Also, there is no way to access the values that Excel uses to plot the moving average.

28.2 Moving Average Data Analysis Tool 329

28.2 MOVING AVERAGE DATA ANALYSIS TOOL The following steps describe how to obtain the moving average values and a chart.

1. Copy the labels and sales data shown in Figure 28.1 to a new worksheet. Enter the label MovAvg in cell D1 and the label StdError in cell E1.

2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, click Moving Average in the Analysis Tools list box, and click OK. The Moving Average dialog box appears as shown in Figure 28.5.

Figure 28.5 Moving Average Dialog Box

3. Make entries in the Moving Average dialog box as shown in Figure 19.5. Then click OK. (If you receive the error message "Cannot add chart to a shared workbook," click the OK button. To construct a line chart, select the Sales and Moving Average data and click the Chart Wizard button.) The output appears in columns D and E, as shown in Figure 28.6.

4. Double-click the vertical axis. In the Format Axis dialog box on the Scale tab, click the Minimum edit box and enter 200. The results appear as shown in Figure 28.6.


Figure 28.6 Output of Moving Average Analysis Tool

The Moving Average analysis tool puts formulas in the worksheet. Cell D5 contains the formula =AVERAGE(C2:C5), cell D6 contains =AVERAGE(C3:C6), and so on. Each average uses four values: the current sales and the three previous sales.

Cell E8 contains the formula =SQRT(SUMXMY2(C5:C8,D5:D8)/4). The SUMXMY2(C5:C8,D5:D8) portion of this formula computes the difference between the smoothed values in cells D5:D8 and the actual values in cells C5:C8, squares each of the four differences, and sums the squared differences. Each of the standard error values in column E is based on the four most recent values.

A simplistic forecasting model could use the last moving average, 376.8, as a forecast for the next quarter's trend, with the standard error, 23.7, as a measure of uncertainty. A forecast of the seasonal component could be combined with this trend forecast to obtain a more accurate prediction of next quarter's sales.

28.3 EXPONENTIAL SMOOTHING TOOL The moving average approach to smoothing uses a specified number of actual values to obtain the smoothed result. For seasonal data, the number of values in each average is usually set equal to the cycle length. For example, for quarterly data, four actual values are used to calculate the smoothed value.

28.3 Exponential Smoothing Tool 331

Instead of using a finite number of values, the exponential smoothing approach theoretically uses the entire past history of the actual time series values to compute smoothed values. Practically, the smoothed or forecast values are calculated using a simple recursive formula:

Forecastt+1 = Alpha * Actualt + (1 – Alpha) * Forecastt

where alpha is a number between 0 and 1 called the smoothing constant. To apply this formula to actual values, we must choose an initial forecast value and an appropriate value of alpha.

Excel uses the term damping factor for the quantity (1 – alpha). Thus, to obtain exponential smoothed forecasts using a smoothing constant, alpha, equal to 0.1, we must specify a value for the damping factor equal to 0.9.

The following data are based on quarterly Iowa nonfarm income per capita from the data file IOWAINC.DAT that accompanies the Cryer textbook. The values shown in column B of Figure 28.7 are percent changes, rounded to one decimal place, using the last 18 periods.

Figure 28.7 Data and Output for Smoothing Constant 0.1

The following steps describe how to use the Exponential Smoothing analysis tool without specifying an initial smoothed value.

1. Enter the Quarter and Actual labels and data in columns A and B of a new worksheet as shown in Figure 28.7. Enter the label Forecast in cell C1 and the label StdError in cell D1.


2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, click Exponential Smoothing in the Analysis Tools list box and click OK. The Exponential Smoothing dialog box appears as shown in Figure 28.8.

Figure 28.8 Exponential Smoothing Dialog Box

3. Make entries in the Exponential Smoothing dialog box as shown in Figure 28.8. Then click OK. The output appears in columns C and D, with the chart output to the right. Adjust the size of the chart by clicking and dragging a handle on the border to obtain the results shown in Figure 28.7.

The Exponential Smoothing analysis tool puts formulas in the worksheet. The actual value in the first period is used as the forecast for the second period. That is, cell C3 contains the formula =B2. The forecast for the third period uses the actual value and forecast from the second period in the recursive formula; cell C4 contains the formula =0.1*B3+0.9*C3. In general, the forecast for a specific period is based on the actual and forecast values from the previous period.

The damping factor specified here is 0.9, so the smoothing constant alpha is 0.1. To obtain a forecast, the most recent actual value receives weight 0.1 in the recursive formula. Because this weight is relatively small, the smoothed values respond very slowly to changes in the actual values.

Cell D6 contains the formula =SQRT(SUMXMY2(B3:B5,C3:C5)/3). Each of the standard error values in column D is based on the three previous actual values and forecasts.

To obtain a forecast for quarter 19, a simplistic forecasting model could use the actual and forecast values from quarter 18 in the recursive formula: 0.1 * 1.3 + 0.9 * 2.669 = 2.532. This forecast could be obtained by selecting cell C19 and dragging the fill handle in the lower-right corner down to cell C20, which then contains the copied formula =0.1*B19+0.9*C19, with the result 2.532.

Exercises 333

EXERCISES Exercise 28.1 (adapted from Mendenhall, p. 635) The week's end closing prices for the securities of the Color-Vision Company, a manufacturer of color television sets, have been recorded over a period of 30 consecutive weeks as shown in the following table.

Week Price Week Price Week Price 1 $71 11 $75 21 $72 2 70 12 70 22 73 3 69 13 75 23 72 4 68 14 75 24 77 5 64 15 74 25 83 6 65 16 78 26 81 7 72 17 86 27 81 8 78 18 82 28 85 9 75 19 75 29 85 10 75 20 73 30 84

1. Determine the five-week moving average.

2. Use exponential smoothing with smoothing constant, alpha, of 0.1.

3. Use exponential smoothing with smoothing constant, alpha, of 0.5.

4. Which of the three smoothing results are most appropriate for detecting the long-term trend for these data?

Exercise 28.2 (adapted from Mendenhall, p. 638) The following table shows gross monthly sales revenue, in thousands of dollars, of a pharmaceutical company from January 1989 through December 1992.

Year Month 1989 1990 1991 1992 January 18.0 23.3 24.7 28.3 February 18.5 22.6 24.4 27.5 March 19.2 23.1 26.0 28.8 April 19.0 20.9 23.2 22.7 May 17.8 20.2 22.8 19.6 June 19.5 22.5 24.3 20.3 July 20.0 24.1 27.4 20.7 August 20.7 25.0 28.6 21.4 September 19.1 25.2 28.8 22.6 October 19.6 23.8 25.1 28.3 November 20.8 25.7 29.3 27.5 December 21.0 26.3 31.4 28.1


1. Construct a time sequence plot of the monthly sales revenue.

2. To help identify the long-term trend, smooth the time series using a three-month moving average.

3. Smooth the time series using exponential smoothing with smoothing constant, alpha, of 0.1.

4. Smooth the time series using exponential smoothing with smoothing constant, alpha, of 0.3.

Time Series Seasonality 29

This chapter describes three methods for analyzing seasonal patterns in time series data. These methods may be used whenever the data have a pattern that repeats itself on a regular basis. These recurring patterns are often associated with the seasons of the year, but the same methods of analysis may be applied to any systematic, repeating pattern.

The first two methods use regression: regression using indicator variables and autoregres-sion. The focus of the third method is determining seasonal indexes: classical time series decomposition. The three methods are illustrated using quarterly U.S. retail sales, in billions of dollars, from first quarter 1983 through fourth quarter 1987. To develop Figure 29.1, select A2:C21 and use the Chart Wizard to create a line chart.

Figure 29.1 Labels, Data, and Time Sequence Plot

The time series shown in Figure 29.1 has a strong seasonal pattern with an upward trend. Sales are consistently highest in quarter IV of each year and lowest in quarter I. The trend appears to be linear.

336 Chapter 29 Time Series Seasonality

29.1 REGRESSION USING INDICATOR VARIABLES Retail sales may be analyzed using a multiple regression model including both the trend and seasonal components. The trend component may be modeled as a linear time trend using the data shown in column D in Figure 29.2. The seasonal component may be described using seasonal indicator variables. As shown in columns E:H in Figure 29.2, one of four possible categories (Winter, Spring, Summer, and Fall, corresponding to quarters I, II, III, and IV) is associated with each observation. The number of indicator variables included in the multiple regression model is one less than the number of categories being modeled, so three indicator variables are used. If the data are monthly, 11 indicator variables are used.

Figure 29.2 Data for Regression

The following steps describe how to develop a regression model with linear time trend and seasonal indicator variables.

1. Enter the labels and data shown in Figure 29.2. (Enter 1 and 2 in cells D2:D3, select D2:D3, and double-click the fill handle. Enter the zero-one pattern in cells E2:H5, copy, and paste to cells E6, E10, E14, and E18.)

2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, select Regression from the Analysis Tools list box and click OK. The Regression dialog box appears as shown in Figure 29.3.

29.1 Regression Using Indicator Variables 337


3. In the Regression dialog box, the Input Y Range is C1:C21 and the Input X Range is D1:G21. (It is important to include only three of the four indicator variables as x variables for the regression model.) Check the Labels box. Click the Output Range option button, select the adjacent text box, and specify J1. Check all checkboxes in the Residuals section. Then click OK. (If the error message "Cannot add chart to a shared workbook" appears, click Cancel; in the Regression dialog box, click New Workbook in the Output Options, and click OK.) An edited portion of the regression output is shown in Figure 29.4.

Figure 29.4 Edited Portion of Regression Summary Output


The Coefficients section of the output in Figure 29.4 shows that the fitted equation is

Sales = 311.005 + 5.106*Time – 56.601*Winter – 19.387*Spring – 22.574*Summer.

After taking seasonality into account, retail sales increase by 5.106 billions of dollars per quarter, on the average. The Fall quarter indicator variable was not included in the regression input, so the Fall seasonal effect is included in the constant term 311.005. The coefficient for the Winter indicator variable tells us that retail sales in the Winter quarter are 56.601 billions of dollars less than sales in the Fall, on the average. Similarly, the seasonal effect of Spring relative to Fall is measured by the –19.387 coefficient, and the effect of Summer relative to Fall is measured by the –22.574 coefficient.

R square indicates that approximately 98.2% of the variation in retail sales can be explained using linear time trend and seasonal indicators. The standard error of the residuals is 6.089 billions of dollars, which may be loosely interpreted as the error associated with predictions using this model. The absolute values of the t statistics are far greater than two, and the related p-values are less than 0.0005, indicating significant relationships between each explanatory variable and retail sales.

The Regression analysis tool's line fit plot for explanatory variable Time shows the actual and fitted values in a time sequence plot. The following steps describe some embellishments to obtain the chart shown in Figure 29.5.

4. Click and drag the chart sizing handles so that the chart is approximately 10 columns wide and 20 rows high. Change the font size to 10 for the chart title, axis titles, axes, and legend.

5. Select the vertical axis. Double-click, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Scale tab. Click Minimum and type 200. Click Maximum and type 450. Click OK.

6. Select the horizontal axis. Double-click, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Scale tab. Click Minimum and type 1. Click Maximum and type 20. Click Major Unit and type 1. Click OK.

7. Click one of the square markers associated with the Predicted Sales data series, or use the up and down arrow keys to select the series. The formula bar shows =SERIES("Predicted Sales",...). Double-click, or right-click and choose Format Data Series from the shortcut menu. In the Format Data Series dialog box, click the Patterns tab. Click Automatic for Line, click None for Marker, and click OK. The chart appears as shown in Figure 29.5.


Figure 29.5 Formatted Regression Chart Output

A forecast of retail sales in quarter 21 (Winter 1988) is obtained by setting Time = 21, Winter = 1, Spring = 0, and Summer = 0. Referring to the fitted equation,

predicted Sales = 311.005 + 5.106 * 21 – 56.601 * 1 – 19.387 * 0 – 22.574 * 0

= 311.005 + 107.226 – 56.601 – 0 – 0

= 361.63 billions of dollars.

Forecasts for individual quarters may be calculated in a similar manner.

To calculate fitted values and forecasts for a large number of quarters, the TREND function is convenient. The following steps describe how to obtain fitted values for the first 20 quarters and forecasts for the next 4 quarters.

8. Copy cells A18:B21 and paste into cell A22. Enter 1988 in cell A22.

9. Select cells D20:D21 and drag the fill handle down to cell D25.

10. Copy cells E18:H21 and paste into cell E22.

11. Enter the label Forecast in cell I1.

12. Select cells I2:I25. Click the Insert Function tool button (icon fx). In the Insert Function dialog box, select Statistical in the category list box, select TREND in the function list box, and click OK. In the TREND dialog box, fill in the dialog box as shown in Figure 29.6 and click OK.


Figure 29.6 TREND Function Dialog Box

13. With I2:I25 selected, press F2 (or click in formula bar). To array-enter the formula, hold down the Control and Shift keys and press Enter. Click the Decrease Decimal button to display one decimal place. The results appear as shown in Figure 29.7.

Figure 29.7 Forecast Using TREND Function


The forecasts for the next four quarters are shown in cells I22:I25 in Figure 29.7. The forecast for quarter 21 (Winter 1988) using TREND agrees with the value calculated earlier using the fitted equation from the Regression analysis tool: 361.6 billions of dollars.

The following steps describe how to prepare a time sequence plot showing the actual, fitted, and forecast values.

14. Select cells C1:C25. Hold down the Control key and select I1:I25. Click the Chart Wizard button.

15. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select Line for chart type and "Line with markers displayed at each data value" for chart sub-type. Click Next.

16. In step 2 (Chart Source Data) on the Series tab, select the range edit box for Category (X) Axis Labels, and click and drag A2:B25. Click Next.

17. In step 3 (Chart Options) on the Titles tab, type the chart and axis labels shown in Figure 29.8. On the Gridlines tab, clear all checkboxes. Click Finish.

18. Double-click the vertical axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Scale tab; click Minimum and type 200; click Maximum and type 500; click Major Unit and type 50; click OK.

19. Double-click the horizontal axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box on the Alignment tab, select the Degrees edit box, type 0 (zero), and click OK.

20. Click on a data point or use the up and down arrow keys to select the actual sales data series. Double-click, or right-click and choose Format Data Series from the shortcut menu. In the Format Data Series dialog box, click the Patterns tab. Click None for Line, click Automatic for Marker, and click OK.

21. Click on a data point or use the up and down arrow keys to select the forecast data series. Double-click, or right-click and choose Format Data Series from the shortcut menu. In the Format Data Series dialog box, click the Patterns tab. Click Automatic for Line, click None for Marker, and click OK.

22. To format the chart as shown in Figure 29.8, click and drag the chart sizing handles so that the chart is approximately 10 standard columns wide and 20 rows high. Change the font size to 8 for the chart title, axis titles, and legend. Change the font size to 6 for the axes.


Figure 29.8 Time Sequence Plot with Forecast

29.2 AR(4) MODEL Seasonal autoregression is an alternative to using indicator variables to model seasonality. The general idea is to relate values in the current period to values with an appropriate lag. For seasonal quarterly data, we expect current Winter sales to be correlated with the previous year's Winter sales. Autocorrelation and autoregression are discussed in Chapter 27, which includes details for calculating the autocorrelation coefficients function (ACF) in section 27.5. The ACF results are useful for identifying which lagged variables should be included in the autoregressive model.

The following steps describe how to construct the ACF shown in Figure 20.9.

1. Enter the data shown in columns A:C in Figure 29.1 on a new sheet.

2. Enter the labels Z, Lag, and ACF in cells D1, F1, and G1, and enter the digits 1 through 8 in cells F2:F9.

3. Select cells C1:D21. From the Insert menu choose Name | Create. In the Create Names dialog box check the Top Row checkbox. Click OK.

4. In cell D2, enter the formula =(C2-AVERAGE(Sales))/STDEV(Sales). With cell D2 selected, double-click the fill handle.

5. In cell G2, enter the formula

=SUMPRODUCT(OFFSET(Z,F2,0,20-F2),OFFSET(Z,0,0,20-F2))/19.

29.2 AR(4) Model 343

With cell G2 selected, double-click the fill handle.

6. Select cells G2:G9, click the Chart Wizard button, and create a Clustered Column chart. See Chapter 27, section 27.5, for details on obtaining the appearance shown in Figure 29.9.

Figure 29.9 Autocorrelation Coefficients Function (ACF)

Referring to Figure 29.9, the correlation is highest at lag 4, as expected. An autoregressive model may be used to explain variation in sales with lag 4 for seasonality and lag 1 for short-term trend (after taking seasonality into account). The following steps describe how to construct the AR(4) model.

7. Enter the data shown in columns A:C in Figure 29.1 on a new sheet. Alternatively, copy the data, choose Worksheet from the Insert menu, and paste.

8. Select columns C and D. Right-click and choose Insert from the shortcut menu. Enter the labels Lag 1 and Lag 4 in cells C1 and D1.

9. Select cells E2:E20. Click the Copy button, or right-click and choose Copy from the shortcut menu. Select cell C3. Click the Paste button, or right-click and choose Paste from the shortcut menu.

10. Copy cells E2:E17 and paste into cell D6. The top portion of the worksheet appears as shown in Figure 20.10.


Figure 29.10 Arranging Lagged Data

11. Select rows 2:5. Right-click and choose Delete from the shortcut menu. The data appear as shown in columns C:E in Figure 29.11.

12. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box, select Regression from the Analysis Tools list box and click OK. In the Regression dialog box, the Input Y Range is E1:E17 and the Input X Range is C1:D17. Check the Labels box. Click the Output Range option button, select the adjacent text box, and specify H1. Check the Residuals checkbox in the Residuals section. Then click OK. A portion of the regression output is shown in Figure 29.11.

Figure 29.11 Lagged Data and Regression Output

Rounded to four decimal places, the fitted equation is Sales = 87.5903 – 0.1198 * Lag1 + 0.9236 * Lag4. The t statistics and p-values indicate significant relationships, and R

29.2 AR(4) Model 345

square shows that approximately 97% of the variation in sales can be explained using the lagged variables.

The standard error of this AR(4) model is 5.9 billions of dollars, very close to the standard error of the model using indicator variables, 6.1 billions of dollars. The following steps describe how to obtain forecasts for the next four quarters and a plot of actual, fitted, and forecast values.

13. Copy cells A14:B17 and paste into cell A18. Enter 1988 in cell A18.

14. Enter the label Forecast in cell F1.

15. The Predicted Sales values from regression output appear below the Summary Output. Copy cells I26:I41 into cell F2.

16. Select cell E18. Enter the formula =I$17+I$18*E17+I$19*E14. Click the fill handle and drag down to cell E21. The results appear as shown in Figure 29.12.

Figure 29.12 Preparing Forecasts

17. Select cells E18:E21. Move the mouse pointer near the edge of the selected region until the pointer becomes an arrow. Click and drag right to column F. (Alternatively, cut E18:E21 and paste special values to F18.) The results appear as shown in Figure 29.13.


Figure 29.13 Sales Data and Forecasts for Chart

18. To prepare a line chart, select cells E1:F21 and click the Chart Wizard button. In step 2 (Chart Source Data), select the range edit box for Category (X) Axis Labels, and click and drag cells A2:B21.

19. Details for the Chart Wizard steps and formatting are described in steps 15 through 22 in section 29.1. The results appear as shown in Figure 29.14.

29.3 Classical Time Series Decomposition 347

Figure 29.14 Time Sequence Plot with AR(4) Forecast

29.3 CLASSICAL TIME SERIES DECOMPOSITION A third method for analyzing seasonality is classical time series decomposition. The time series values are decomposed into several components: long-term trend; business cycle effects; seasonality; and unexplained, random variation. Because it is usually very difficult to isolate the business cycle effects, the approach described here assumes the trend component has both long-term average and cyclical effects. The multiplicative model is

Valuet = Trendt * Seasonalt * Randomt.

The trend component is expressed in the same units as the original time series values, and the seasonal and random components are expressed as index numbers (percentages) or decimal equivalents.

A common method for estimating the trend component uses moving averages. Other approaches are exponential smoothing, linear time trend using simple regression, and nonlinear regression. The following steps describe centered moving averages.

1. Enter the data shown in columns A:C in Figure 29.7 on a new sheet. Alternatively, copy the data, choose Worksheet from the Insert menu, and paste.

2. Enter the labels Early_MA, Late_MA, and Center_MA in cells D1:F1, as shown in Figure 29.15.


3. Select cell D4 and enter the formula =AVERAGE(C2:C5). This average of the first four quarters is actually associated with a time point located between the second and third quarters. Because it is located on the row of the third quarter, it is labeled "Early_MA."

4. Select cell E4 and enter the formula =AVERAGE(C3:C6). This average of the second through fifth quarters is actually associated with a time point located between the third and fourth quarters. Since it is located on the row of the third quarter, it is labeled "Late_MA."

5. Select cell F4 and enter the formula =AVERAGE(D4:E4). This average of the Early_MA and Late_MA is centered on the third quarter.

6. Select cells D4:F4. Click the fill handle in the lower-right corner of the selection and drag down to cell F19. Format the extended selection to display one decimal place. The results appear as shown in Figure 29.15.

Figure 29.15 Worksheet for Centered Moving Average

7. To chart the moving average, select cells C1:C25. Hold down the Control key and select cells F1:F25. Click the Chart Wizard button.


8. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select Line for chart type and "Line with markers displayed at each data value" for chart sub-type; click Next. In step 2 (Chart Source Data) on the Series tab, select the range edit box for Category (X) Axis Labels, and click and drag A2:B25; click Next. In step 3 (Chart Options) on the Titles tab, type the chart and axis labels shown in Figure 29.16; on the Gridlines tab, clear all checkboxes; click Finish.

9. To format the chart, double-click the vertical axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box, click the Scale tab; click Minimum and type 200; click Maximum and type 500; click Major Units and type 50; click OK.

10. Double-click the horizontal axis, or right-click and choose Format Axis from the shortcut menu. In the Format Axis dialog box on the Alignment tab, select the Degrees edit box, type 0 (zero), and click OK.

11. Click on a data point to select the centered moving average data series. Double-click, or right-click and choose Format Data Series from the shortcut menu. In the Format Data Series dialog box, click the Patterns tab. Click Automatic for Line, click None for Marker, and click OK.

12. To display all labels on the horizontal axis, click and drag the sizing handles to make the chart wider. Also, select a smaller font size for the axes, axis titles, and legend. The results are shown in Figure 29.16.

Figure 29.16 Plot of Actual Sales and Centered Moving Average


13. Enter the labels Ratio, AvgRatio, and Standard in cells G1:I1.

14. Select cell G4. Enter the formula =C4/F4. With cell G4 selected, click the fill handle and drag down to cell G19. The results appear as shown in column G in Figure 29.17. These numbers are the ratio of actual sales to the moving average. For example, the number 1.0748 in cell G5 indicates that actual sales in that particular fourth quarter were 107.48% of the average sales during the year.

15. Select cell H2 and enter the formula =AVERAGE(G6,G10,G14,G18). With cell H2 selected, click the fill handle and drag down to cell H3.

16. Select cell H4 and enter the formula =AVERAGE(G4,G8,G12,G16). With cell H4 selected, click the fill handle and drag down to cell H5. The results are shown in column H in Figure 29.17. These formulas summarize the ratios for a particular quarter for all years. For example, the value 1.0175 (approximately 1.02) in cell H3 indicates that sales in the second quarter are typically 2% above the annual average. If the set of ratios in column G for a particular quarter has outliers, these summaries in column H could use the MEDIAN or TRIMMEAN functions.

17. Select cell H6 and click the AutoSum tool twice.

18. The base for an index is 1.00, so the four prospective indexes should sum to 4. To modify the average ratios so that they sum to 4, select cell I2 and enter the formula =H2*4/$H$6. With cell I2 selected, click the fill handle and drag down to cell I5.

19. Select cell I6 and click the AutoSum tool twice. The seasonal indexes in column I sum to 4 as shown in Figure 29.17.

One use for the seasonal indexes shown in cells I2:I5 in Figure 29.17 is to seasonally adjust historical data. The multiplicative model is Valuet = Trendt * Seasonalt * Randomt, so if an original value is divided by the seasonal index, the result has only trend and random components remaining. Successive seasonally adjusted values can be compared to detect changes in the long-run behavior of the time series.

A second use is to combine the seasonal index with a forecast of trend to obtain a forecast of value. The trend forecast may be obtained by extrapolating the moving average or using a regression model. The following steps describe how to seasonally adjust the historical data, extrapolate the linear time trend of the adjusted values four quarters, and multiply the extrapolated trend by the appropriate seasonal index to obtain the forecasts.


Figure 29.17 Worksheet for Seasonal Indexes

20. Enter the labels Index, Trend, and Forecast in cells J1:L1.

21. Select cells I2:I5 and click the Copy button (or right-click and choose Copy from the shortcut menu). Select cell J2, right-click, and choose Paste Special from the shortcut menu. In the Paste Special dialog box, select Values for Paste and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear and click OK.

22. Copy the values in cells J2:J5 and paste into cells J6, J10, J14, J18, and J22.

23. Select cell K2 and enter the formula =C2/J2. With cell K2 selected, click the fill handle and drag down to cell K21. The values in cells K2:K21 are the seasonally adjusted historical data.

24. With cells K2:K21 selected, right-click and choose Copy from the shortcut menu. With cells K2:K21 still selected, right-click and choose Paste Special from the shortcut menu. In the Paste Special dialog box, select Values for Paste and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear and click OK.

25. With cells K2:K21 selected, click the fill handle in the lower-right corner of cell 21 and drag down to cell K25. The results are shown in column K in Figure 29.18. When Excel's AutoFill is used in this manner, the series of numbers in K2:K21 is extended using a linear trend. The same results could be obtained


using the values 1 through 20 as explanatory variables for fitting simple linear regression and using the values 21 through 24 for predictions.

Figure 29.18 Worksheet for Forecasts

Figure 29.19 Extrapolation of Seasonally Adjusted Sales


26. To chart the actual sales, seasonally adjusted sales, and the linear extrapolation, select cells C1:C25, hold down the Control key, and select cells K1:K25. Click the Chart Wizard, prepare a line chart, and format using steps 8 through 12 in this section. The result is shown in Figure 29.19.

27. To combine the trend and seasonal components in the forecasts, select cell L22 and enter the formula =J22*K22. With cell L22 selected, double-click the fill handle. The results appear as shown in Figure 29.18.

28. To chart the actual sales and forecasts, select cells C1:C25, hold down the Control key, and select cells L1:L25. Click the Chart Wizard, prepare a line chart, and format using steps 8 through 12 in this section. The result is shown in Figure 29.20.

Figure 29.20 Actual Sales and Forecasts

The three methods analyze seasonality using different models, so there are some differences in the results, as shown in Figure 29.21.

Figure 29.21 Forecast Results


The additive model using linear time trend and seasonal indicator variables and the multiplicative model using classical time series decomposition have very similar results. For these particular data, the autoregressive model produces forecasts that are consistently below the results of the other models; the autoregressive model using lag 1 and lag 4 would be more appropriate for seasonal data with a long-term meandering pattern.

EXERCISES Exercise 29.1 (adapted from Mendenhall, p. 647) The following table shows quarterly earnings, in millions of dollars, for a multimedia communications firm for the years 1984 through 1989.

Year Quarter 1984 1985 1986 1987 1988 1989 1 302.2 426.5 504.2 660.9 743.6 1043.6 2 407.3 451.5 592.4 706.0 774.5 1037.8 3 483.3 543.9 647.9 751.3 915.7 1167.6 4 463.2 590.5 726.4 758.6 1013.4 1345.3

1. Construct a time sequence plot of the quarterly earnings.

2. Develop a regression model using linear time trend and quarterly indicator variables. Make forecasts for the next four quarters.

3. Develop a regression model using quadratic time trend and quarterly indicator variables. Make forecasts for the next four quarters.

4. Develop an AR(4) model. Make forecasts for the next four quarters.

5. Use classical time series decomposition to obtain seasonal indexes.

Exercise 29.2 (adapted from Mendenhall, p. 646) Texas Chemical Products manufactures an agricultural chemical that is applied to farmlands after crops have been harvested. Because the chemical tends to deteriorate in storage, Texas Chemical cannot stockpile quantities in advance of the winter season demand for the product. The following table shows sales of the product, in thousands of pounds, over four consecutive years.

Exercises 355

Year Month 1 2 3 4 January 123 134 144 145 February 130 146 159 146 March 157 174 168 164 April 155 163 153 158 May 161 176 179 182 June 169 154 164 169 July 142 166 160 166 August 157 168 170 174 September 169 166 160 166 October 185 223 208 215 November 209 238 221 213 December 238 252 244 258

1. Construct a time sequence plot of the monthly sales.

2. Develop a regression model using linear time trend and monthly indicator variables. Make forecasts for the next 12 months.

3. Develop an AR(12) model. Make forecasts for the next 12 months.

4. Use classical time series decomposition to obtain seasonal indexes.

Regression Models for Time Series Data 30

30.1 TIME SERIES REGRESSION CHECKLIST Relevant explanatory variables (X) for time series data related to business activity (Y), e.g., sales over time, include several general types:

a Internal business activity, like advertising, promotion, research and development

b Competitor business activity, like competitor sales and competitor advertising

c Industry activity, like number of competitors and market size

d General economic activity, like personal disposable income

Plot Y versus time 1 Identify any systematic pattern to help determine an appropriate model

Plot Y versus each X 2 Verify that the relationship agrees with your prior judgment, e.g., positive vs

negative relationship, linear vs nonlinear, strong vs weak

3 Identify outliers or unusual observations and decide whether to exclude

4 Determine whether the relationship is linear; if not, consider using a nonlinear form, e.g., quadratic (include X and X^2 in the model)

Examine the correlation matrix 5 Include a time period variable in the correlation matrix. For example, if there are

n equally-spaced time periods, include a variable in your data set with values 1,2,...,n.

6 Identify potential multicollinearity problems, i.e., high correlation between a pair of X variables; if so, consider using only one X of the pair in the model

358 Chapter 30 Regression Models for Time Series Data

Calculate the regression model with diagnostics 7 Verify that the sign of each regression coefficient agrees with your prior

judgment, i.e., positive vs negative relationship; otherwise, consider excluding that X and rerun the regression

8 Examine each plot of residuals vs X; if there is a non-random pattern (e.g., U-shape or upside-down-U-shape), use a nonlinear form for that X in a new model

9 In addition to the residual plots generated automatically by Excel's Regression tool, prepare and examine a plot of residuals vs time. If there is a snake-like pattern of residuals, consider adding lag Y as an explanatory variable. Optionally, compute the Durbin-Watson statistic to detect autocorrelation of residuals.

10 Identify key X variables by comparing standardized regression coefficients, usually computed by multiplying an X coefficient by the standard deviation of that X and dividing by the standard deviation of Y. This dimensionless standardized regression coefficient measures how much Y (in standard deviation units) is affected by a change in X (in standard deviation units).

11 If a goal is to find a model with small standard error of estimate (approx. standard deviation of residuals), use the t-stat screening method. Disregard the t-stat for the intercept. If there are X variables with a t-stat between -1 and +1, remove the single X variable whose t-stat is closest to zero, and rerun the regression. Remove only one X variable at a time.

12 Before using the final model, examine each plot of residuals vs X to verify that the random scatter is the same for all values of X. If there is more scatter for higher values of X, consider using a log transformation of X in the model (instead of using X itself). If the scatter is not uniform with respect to X, the standard error of estimate may not be a useful measure of uncertainty because it overstates the uncertainty for some values of X and understates the uncertainty for other values of X.

Use the model 13 If the purpose is to identify unusual observations, examine the residuals directly

for large negative or large positive values, or examine the standardized residuals (each residual divided by the standard deviation of residuals) for values more extreme than +2 or -2 or for values more extreme than +3 or -3.

14 If the purpose is to make predictions, use the X values for a new observation to compute a predicted Y. Use the standard error of estimate to provide an interval estimate, e.g., an approximate 95% prediction interval that ranges from two standard errors below to two standard errors above the predicted Y. Note that a

30.2 Autocorrelation of Residuals 359

time series forecast usually extrapolates beyond the original range of data, so the standard error of estimate is a minimum indication of the uncertainty surrounding a forecast.

30.2 AUTOCORRELATION OF RESIDUALS

Figure 30.1 Undesirable Extreme Negative Autocorrelation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Time

Res

idua

l

360 Chapter 30 Regression Models for Time Series Data

Figure 30.2 Undesirable Extreme Positive Autocorrelation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Time

Res

idua

l

Figure 30.3 Desirable Zero Autocorrelation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Time

Res

idua

l

Part 5 Constrained Optimization

Part 5 describes decision models involving constrained optimization. The topic is introduced using the classic product mix problem. Subsequent chapters examine constrained optimization problems in the areas of marketing, transportation logistics, and finance.

The spreadsheet analysis uses Excel's standard Solver add-in for linear, nonlinear, and integer problems.

362


Product Mix Optimization 31

31.1 LINEAR PROGRAMMING CONCEPTS

Formulation Decision variables (Excel Solver “Changing Cells”)

Objective function (“Target Cell”)

Constraints and right-hand-side values (“Constraints”)

Non-negativity constraints (“Constraints”)

Graphical Solution Constraints

Feasible region

Corner points (extreme points)

Objective function value at each corner point

Total enumeration vs. simplex algorithm (search)

Optimal solution

Sensitivity Analysis Post-optimality analysis and interpretation of computer print-outs

Shadow price (a marginal value)

(Excel Solver Sensitivity Report, Constraints section, “Shadow Price”)

The shadow price for a particular constraint is the amount of change in the value of the objective function corresponding to a unit change in the right-hand-side value of the constraint.

364 Chapter 31 Product Mix Optimization

Range on a right-hand-side (RHS) value

(Excel Solver Sensitivity Report, Constraints section, “Allowable Increase/Decrease”)

Range over which the shadow price applies. The optimal values of the decision variables would change depending on the exact RHS value, but the current mix of decision variables remains optimal over the specified range of RHS values.

Range on an objective function coefficient

(Excel Solver Sensitivity Report, Changing Cells section, “Allowable Increase/Decrease”)

Range over which an objective function coefficient could change with the current optimal solution remaining optimal (same mix and values of decision variables). The value of the objective function would change depending on the exact value of the objective function coefficient.

Simplex algorithm terminology

Slack, surplus, and artificial variables

Basic variables (variables "in the solution," typically with non-zero values)

Non-basic variables (value equal to zero)

Complementary slackness

31.2 Basic Product Mix Problem 365

31.2 BASIC PRODUCT MIX PROBLEM

Figure 31.1 Display

12345678910111213141516

A B C D E F GSmall Example 1: Product mix problemYour company manufactures TVs and stereos, using a common parts inventoryof power supplies, speaker cones, etc. Parts are in limited supply and you mustdetermine the most profitable mix of products to build.

TV set Stereo RHSNumber to Build-> 250 100 Used Available Slack

Part Name Chassis 1 1 350 450 100Picture Tube 1 0 250 250 0Speaker Cone 2 2 700 800 100Power Supply 1 1 350 450 100Electronics 2 1 600 600 0

ProfitPer Unit $75 $50

By Product $18,750 $5,000Total $23,750


123456789

10111213141516

A B C D E F GSmall Example 1: Product mix problemYour company manufactures TVs and stereos, using a common parts inventoryof power supplies, speaker cones, etc. Parts are in limited supply and you mustdetermine the most profitable mix of products to build.

TV set Stereo RHSNumber to Build-> 250 100 Used Available Slack

Part Name Chassis 1 1 =SUMPRODUCT($C$7:$D$7,C8:D8) 450 =F8-E8Picture Tube 1 0 =SUMPRODUCT($C$7:$D$7,C9:D9) 250 =F9-E9Speaker Cone 2 2 =SUMPRODUCT($C$7:$D$7,C10:D10) 800 =F10-E10Power Supply 1 1 =SUMPRODUCT($C$7:$D$7,C11:D11) 450 =F11-E11Electronics 2 1 =SUMPRODUCT($C$7:$D$7,C12:D12) 600 =F12-E12

ProfitPer Unit $75 $50

By Product =C14*C7 =D14*D7Total =SUMPRODUCT(C7:D7,C14:D14)


Figure 31.3 Graphical Solution

0

100

200

300

400

500

600

0 100 200 300 400 500 600 700

Number of Stereos

Num

ber o

f TV

s

Picture Tube

Electronics

Chassis & Power Supply

Speaker Cone

Feasible Region

Five Constraints


Figure 31.4 Solver Parameters Main Dialog Box

Figure 31.5 Solver Add Constraint Dialog Box


Figure 31.6 Solver Options Dialog Box

Figure 31.7 Solver Solution


Figure 31.8 Solver Answer Report

Target Cell (Max)Cell Name Original Value Final Value

$C$16 Total Profit $23,750 $25,000

Adjustable CellsCell Name Original Value Final Value

$C$7 Number to Build-> TV set 250 200$D$7 Number to Build-> Stereo 100 200

ConstraintsCell Name Cell Value Formula Status Slack

$E$8 Chassis Used 400 $E$8<=$F$8 Not Binding 50$E$9 Picture Tube Used 200 $E$9<=$F$9 Not Binding 50$E$10 Speaker Cone Used 800 $E$10<=$F$10 Binding 0$E$11 Power Supply Used 400 $E$11<=$F$11 Not Binding 50$E$12 Electronics Used 600 $E$12<=$F$12 Binding 0

Figure 31.9 Solver Sensitivity Report

Adjustable CellsFinal Reduced Objective Allowable Allowable

Cell Name Value Cost Coefficient Increase Decrease$C$7 Number to Build-> TV set 200 $0.00 $75.00 $25.00 $25.00$D$7 Number to Build-> Stereo 200 $0.00 $50.00 $25.00 $12.50

ConstraintsFinal Shadow Constraint Allowable Allowable

Cell Name Value Price R.H. Side Increase Decrease$E$8 Chassis Used 400 $0.00 450 1E+30 50$E$9 Picture Tube Used 200 $0.00 250 1E+30 50$E$10 Speaker Cone Used 800 $12.50 800 100 100$E$11 Power Supply Used 400 $0.00 450 1E+30 50$E$12 Electronics Used 600 $25.00 600 50 200


Figure 31.10 Solver Limits Report

TargetCell Name Value

$C$16 Total Profit $25,000

Adjustable Lower Target Upper TargetCell Name Value Limit Result Limit Result

$C$7 Number to Build-> TV set 200 0 $10,000 200 $25,000$D$7 Number to Build-> Stereo 200 0 $15,000 200 $25,000

31.3 OUTDOORS PROBLEM Outdoors, Inc., has lawn furniture as one of its product lines. They currently have three items in that line: a lawn chair, a standard bench, and a table. These products are produced in a two-step manufacturing process involving the tube bending department and the welding department. The hours required by each item in each department is as follows: Product Department Chair Bench Table Present Capacity Bending 1.2 1.7 1.2 1,000 hours Welding 0.8 0.0 2.3 1,200 hours

The profit contribution that Outdoors receives from manufacture and sale of one unit of each product is $3 for a chair, $3 for a bench, and $5 for a table.

The company is trying to plan its production mix for the current selling season. They feel that they can sell any number they produce, but unfortunately production is further limited by available material because of a prolonged strike. The company currently has on hand 2,000 pounds of tubing. The three products require the following amounts of this tubing: 2 pounds per chair, 3 pounds per bench, and 4.5 pounds per table.

In order to determine the optimal product mix, the production manager has formulated the linear programming problem as shown below.

31.3 Outdoors Problem 371

Product Chair Bench Table Contribution $3 $3 $5 Constraint Relation Limit Bending 1.2 1.7 1.2 <= 1,000 Welding 0.8 0.0 2.3 <= 1,200 Tubing 2.0 3.0 4.5 <= 2,000

A. The inventory manager suggests that the company produce 200 units of each product. Is the plan to produce 200 units of each product a feasible plan, i.e., does it satisfy all contraints? If not, which constraints are not satisfied?

B. If the company produces 200 chairs, 200 benches, and 200 tables, how much tubing, if any, will be left over?

Each of the following questions refer to the solution of the original linear programming problem.

C. A local manufacturing firm has excess capacity in its welding department and has offered to sell 100 hours of welding time to Outdoors for $3 per hour. This arrangement would cost $300 and would increase welding capacity from 1,200 hours to 1,300 hours. Should Outdoors purchase the additional welding capacity? Why or why not?

D. The marketing manager thinks that the original estimate of $3 profit contribution per chair should be changed to $2.50 per chair. Should the production manager solve the linear programming problem again using the $2.50 value, or should Outdoors go ahead with the plan to produce 700 chairs, zero benches, and 133 tables? Why or why not?

E. A local metal products distributor has offered to sell Outdoors some additional metal tubing for 60 cents per pound. Should Outdoors buy additional tubing at this price? If so, how much would their contribution increase if they bought 500 pounds and used it in an optimal fashion?

F. The R&D department has been redesigning the bench to make it more profitable. The new design will require 1.1 hours of tube bending time, 2 hours of welding time, and 2.0 pounds of metal tubing. If they can sell one unit of this bench with a unit contribution of $3, what effect will it have on overall contribution?

G. Marketing has suggested a new patio awning that would require 1.8 hours of tube bending time, 0.5 hours of welding time, and 1.3 pounds of metal tubing.


What contribution must this new product have to make it attractive to produce this season?

H. Outdoors, Inc., has a chance to sell some of its capacity in tube bending at a price of $1.50 per hour. If it sells 200 hours at that price, how will this affect contribution?

I. If Outdoors, Inc., feels that it must produce benches to round out its production line, what effect will production of benches have on overall contribution?

Adapted from Vatter et al., Quantitative Methods in Management, Irwin, 1978.

Spreadsheet Model

Figure 31.11 Model

123456789

A B C D E F G HOutdoors, Inc.

Chair Bench TableNumber to Build-> 100 100 100 Used Available Slack

Resource Tube Bending 1.2 1.7 1.2 410 1000 590Welding 0.8 0 2.3 310 1200 890Tubing 2 3 4.5 950 2000 1050

Profits Per Unit $3 $3 $5By Product $300 $300 $500

Total $1,100


123456789


Chair Bench TableNumber to Build-> 100 100 100 Used Available Slack

Resource Tube Bending 1.2 1.7 1.2 =SUMPRODUCT(C$3:E$3,C4:E4) 1000 =G4-F4Welding 0.8 0 2.3 =SUMPRODUCT(C$3:E$3,C5:E5) 1200 =G5-F5Tubing 2 3 4.5 =SUMPRODUCT(C$3:E$3,C6:E6) 2000 =G6-F6

Profits Per Unit 3 3 5By Product =C7*C3 =D7*D3 =E7*E3

Total =SUMPRODUCT(C3:E3,C7:E7)

31.3 Outdoors Problem 373

Figure 31.13 Solution

123456789


Chair Bench TableNumber to Build-> 700 0 133.33 Used Available Slack

Resource Tube Bending 1.2 1.7 1.2 1000 1000 0Welding 0.8 0.0 2.3 866.67 1200 333.33Tubing 2.0 3.0 4.5 2000 2000 0

Profits Per Unit $3 $3 $5By Product $2,100.00 $0.00 $666.67

Total $2,766.67

Solver Reports

Figure 31.14 Answer Report

Target Cell (Max)Cell Name Original Value Final Value

$C$9 Total Chair $1,100 $2,767

Adjustable CellsCell Name Original Value Final Value

$C$3 Number to Build-> Chair 100 700$D$3 Number to Build-> Bench 100 0$E$3 Number to Build-> Table 100 133.33

ConstraintsCell Name Cell Value Formula Status Slack

$F$4 Tube Bending Used 1000 $F$4<=$G$4 Binding 0$F$5 Welding Used 866.67 $F$5<=$G$5 Not Binding 333.33$F$6 Tubing Used 2000 $F$6<=$G$6 Binding 0


Figure 31.15 Sensitivity Report

Adjustable CellsFinal Reduced Objective Allowable Allowable

Cell Name Value Cost Coefficient Increase Decrease$C$3 Number to Build-> Chair 700 $0.00 $3.00 $2.00 $0.778$D$3 Number to Build-> Bench 0 -$1.383 $3.00 $1.383 1E+30$E$3 Number to Build-> Table 133 $0.00 $5.00 $1.75 $2.00

ConstraintsFinal Shadow Constraint Allowable Allowable

Cell Name Value Price R.H. Side Increase Decrease$F$4 Tube Bending Used 1000 $1.167 1000 200 466.67$F$5 Welding Used 866.67 $0.00 1200 1E+30 333.33$F$6 Tubing Used 2000 $0.80 2000 555.56 333.33

Modeling Marketing Decisions 32

32.1 ALLOCATING ADVERTISING EXPENDITURES

Figure 32.1 Quick Tour

123456789

1011121314151617181920212223

A B C D E F G H IQuick Tour of Microsoft Excel SolverMonth Q1 Q2 Q3 Q4 TotalSeasonality 0.9 1.1 0.8 1.2

Units Sold 3,592 4,390 3,192 4,789 15,962

Sales Revenue $143,662 $175,587 $127,700 $191,549 $638,498Cost of Sales 89,789 109,742 79,812 119,718 399,061Gross Margin 53,873 65,845 47,887 71,831 239,437

Salesforce 8,000 8,000 9,000 9,000 34,000Advertising 10,000 10,000 10,000 10,000 40,000Corp Overhead 21,549 26,338 19,155 28,732 95,775

Total Costs 39,549 44,338 38,155 47,732 169,775

Prod. Profit $14,324 $21,507 $9,732 $24,099 $69,662Profit Margin 10% 12% 8% 13% 11%

Product Price $40.00Product Cost $25.00

The following examples show you how to work with the model above to solve for one value or severalvalues to maximize or minimize another value, enter and change constraints, and save a problem model.

376 Chapter 32 Modeling Marketing Decisions

23242526272829303132333435363738394041424344454647

Row Contains Explanation3 Fixed values Seasonality factor: sales are higher in quarters 2 and 4,

and lower in quarters 1 and 3.

5 =35*B3*(B11+3000)^ 0.5 Forecast for units sold each quarter: row 3 containsthe seasonality factor; row 11 contains the cost ofadvertising.

6 =B5*$B$18 Sales revenue: forecast for units sold (row 5) t imesprice (cell B18).

7 =B5*$B$19 Cost of sales: forecast for units sold (row 5) t imesproduct cost (cell B19).

8 =B6-B7 Gross margin: sales revenues (row 6) minus cost ofsales (row 7).

10 Fixed values Sales personnel expenses.

11 Fixed values Advertising budget (about 6.3% of sales).

12 =0.15*B6 Corporate overhead expenses: sales revenues (row 6)t imes 15%.

32.1 Allocating Advertising Expenditures 377

4849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394

A B C D E F G H I13 =SUM(B10:B12) Total costs: sales personnel expenses (row 10) plus

advert ising (row 11) plus overhead (row 12).

15 =B8-B13 Product profit: gross margin (row 8) minus total costs(row 13).

16 =B15/ B6 Profit margin: profit (row 15) divided by sales revenue(row 6).

18 Fixed values Product price.

19 Fixed values Product cost.

This is a typical marketing model that shows sales rising from a base figure (perhaps due to the salespersonnel) along with increases in advertising, but with diminishing returns. For example, the first$5,000 of advertising in Q1 yields about 1,092 incremental units sold, but the next $5,000 yields onlyabout 775 units more.

You can use Solver to find out whether the advertising budget is too low, and whether advertising should be allocated differently over t ime to take advantage of the changing seasonality factor.

Solving for a Value to Maximize Another ValueOne way you can use Solver is to determine the maximum value of a cell by changing another cell. Thetwo cells must be related through the formulas on the worksheet. If they are not, changing the value inone cell will not change the value in the other cell.

For example, in the sample worksheet, you want to know how much you need to spend on advertisingto generate the maximum profit for the first quarter. You are interested in maximizing profit by changingadvert ising expenditures.

On the Tools menu, click Solver. In the Set target cell box, type b15 or select cell B15 (first-quarter profits) on the worksheet. Select the Max option.In the By changing cells box, type b11 or select cell B11 (first-quarter advertising)on the worksheet. Click Solve.

You will see messages in the status bar as the problem is set up and Solver starts working. After amoment, you'll see a message that Solver has found a solution. Solver finds that Q1 advertising of$17,093 yields the maximum profit $15,093.

After you examine the results, select Restore original values and click OK todiscard the results and return cell B11 to its former value.

Resetting the Solver Options

If you want to return the options in the Solver Parameters dialog box to their original settings so thatyou can start a new problem, you can click Reset All.


9596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141

A B C D E F G H ISolving for a Value by Changing Several Values

You can also use Solver to solve for several values at once to maximize or minimize another value. Forexample, you can solve for the advert ising budget for each quarter that will result in the best profits forthe entire year. Because the seasonality factor in row 3 enters into the calculation of unit sales in row 5as a mult iplier, it seems logical that you should spend more of your advert ising budget in Q4 when the sales response is highest, and less in Q3 when the sales response is lowest. Use Solver to determinethe best quarterly allocation.

On the Tools menu, click Solver. In the Set target cell box, type f15 or selectcell F15 (total profits for the year) on the worksheet. Make sure the Max option isselected. In the By changing cells box, type b11:e11 or select cells B11:E11(the advertising budget for each of the four quarters) on the worksheet. Click Solve.

After you examine the results, click Restore original values and click OK todiscard the results and return all cells to their former values.

You've just asked Solver to solve a moderately complex nonlinear optimization problem; that is, to findvalues for the four unknowns in cells B11 through E11 that will maximize profits. (This is a nonlinearproblem because of the exponentiation that occurs in the formulas in row 5). The results of thisunconstrained optimization show that you can increase profits for the year to $79,706 if you spend$89,706 in advert ising for the full year.

However, most realistic modeling problems have limiting factors that you will want to apply to certainvalues. These constraints may be applied to the target cell, the changing cells, or any other value thatis related to the formulas in these cells.

Adding a Constraint

So far, the budget recovers the advertising cost and generates addit ional profit, but you're reaching apoint of diminishing returns. Because you can never be sure that your model of sales response toadvertising will be valid next year (especially at greatly increased spending levels), it doesn't seemprudent to allow unrestricted spending on advert ising.

Suppose you want to maintain your original advertising budget of $40,000. Add the constraint to the problem that limits the sum of advert ising during the four quarters to $40,000.

On the Tools menu, click Solver, and then click Add. The Add Constraint dialog box appears. In the Cell reference box, type f11 or select cell F11(advertising total) on the worksheet. Cell F11 must be less than or equal to $40,000.The relationship in the Constraint box is <= (less than or equal to) by default, so you don't have to change it . In the box next to the relationship, type 40000. ClickOK, and then click Solve.

After you examine the results, click Restore original values and then click OKto discard the results and return the cells to their former values.

32.1 Allocating Advertising Expenditures 379

142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188

A B C D E F G H IThe solution found by Solver allocates amounts ranging from $5,117 in Q3 to $15,263 in Q4. Total Profit has increased from $69,662 in the original budget to $71,447, without any increase in theadvert ising budget.

Changing a Constraint

When you use Microsoft Excel Solver, you can experiment with slightly different parameters to decidethe best solution to a problem. For example, you can change a constraint to see whether the resultsare better or worse than before. In the sample worksheet, try changing the constraint on advertisingdollars to $50,000 to see what that does to total profits.

On the Tools menu, click Solver. The constraint, $F$11<=40000, should already be selected in the Subject to the constraints box. Click Change. Inthe Constraint box, change 40000 to 50000. Click OK, and then click Solve.Click Keep solver solution and then click OK to keep the results that are displayed on the worksheet.

Solver finds an optimal solution that yields a total profit of $74,817. That's an improvement of $3,370over the last figure of $71,447. In most firms, it 's not too difficult to justify an incremental investment of$10,000 that yields an additional $3,370 in profit, or a 33.7% return on investment. This solution alsoresults in profits of $4,889 less than the unconstrained result, but you spend $39,706 less to get there.

Saving a Problem Model

When you click Save on the File menu, the last selections you made in the Solver Parameters dialog box are attached to the worksheet and retained when you save the workbook. However, youcan define more than one problem for a worksheet by saving them individually using Save Model in the Solver Options dialog box. Each problem model consists of cells and constraints that you entered in the Solver Parameters dialog box.

When you click Save Model, the Save Model dialog box appears with a default select ion, based on the active cell, as the area for saving the model. The suggested range includes a cell for each constraint plus three additional cells. Make sure that this cell range is an empty range on the worksheet.

On the Tools menu, click Solver, and then click Options. Click Save Model.In the Select model area box, type h15:h18 or select cells H15:H18 on theworksheet. Click OK.

Note You can also enter a reference to a single cell in the Select model area box. Solver will usethis reference as the upper-left corner of the range into which it will copy the problem specifications.

To load these problem specificat ions later, click Load Model on the Solver Options dialog box,type h15:h18 in the Model area box or select cells H15:H18 on the sample worksheet, and thenclick OK. Solver displays a message asking if you want to reset the current Solver option sett ings withthe sett ings for the model you are loading. Click OK to proceed.


Figure 32.2 Quick Tour Influence Chart SolvSamp.xlsQuick Tour

(Row number for each variable)

ProfitMargin 16

Prod.Profit 15

GrossMargin 8

Cost ofSales 7

SalesRevenue 6

CorporateOverhead 12

TotalCosts 13

UnitsSold 5

ProductCost 19

Seasonality3

Advertising11

ProductPrice 18

OverheadRate (15%)

Salesforce10

Nonlinear Product Mix Optimization 33

33.1 DIMINISHING PROFIT MARGIN

Figure 33.1 Product Mix Problem

123456789

1011121314151617181920212223

A B C D E F G H IExample 1: Product mix problem with diminishing profit margin.Your company manufactures TVs, stereos and speakers, using a common parts inventoryof power supplies, speaker cones, etc. Parts are in limited supply and you must determinethe most profitable mix of products to build. But your profit per unit built decreases withvolume because extra price incentives are needed to load the distribution channel.

TV set Stereo SpeakerNumber to Build-> 100 100 100

Part Name Inventory No. UsedChassis 450 200 1 1 0Picture Tube 250 100 1 0 0 DiminishingSpeaker Cone 800 500 2 2 1 ReturnsPower Supply 450 200 1 1 0 Exponent:Electronics 600 400 2 1 1 0.9

Profits:By Product $4,732 $3,155 $2,208

Total $10,095

This model provides data for several products using common parts, each with a different profit marginper unit. Parts are limited, so your problem is to determine the number of each product to build from theinventory on hand in order to maximize profits.

382 Chapter 33 Nonlinear Product Mix Optimization

2324252627282930313233343536373839

Problem Specifications

Target Cell D18 Goal is to maximize profit.

Changing cells D9:F9 Units of each product to build.

Constraints C11:C15<=B11:B15 Number of parts used must be less than or equal to the number of parts in inventory.

D9:F9>=0 Number to build value must be greater than orequal to 0.

The formulas for profit per product in cells D17:F17 include the factor ^ H15 to show that profit per unitdiminishes with volume. H15 contains 0.9, which makes the problem nonlinear. If you change H15 to1.0 to indicate that profit per unit remains constant with volume, and then click Solve again, theoptimal solution will change. This change also makes the problem linear.

Integer-Valued Optimization Models 34 34.1 TRANSPORTATION PROBLEM

Figure 34.1 Transportation Problem

123456789

101112131415161718192021222324252627

A B C D E F G H IExample 2: Transportation Problem.Minimize the costs of shipping goods from production plants to warehouses near metropolitan demandcenters, while not exceeding the supply available from each plant and meeting the demand from eachmetropolitan area.

Number to ship from plant x to warehouse y (at intersection):Plants: Total San Fran Denver Chicago Dallas New YorkS. Carolina 5 1 1 1 1 1Tennessee 5 1 1 1 1 1Arizona 5 1 1 1 1 1

--- --- --- --- ---Totals: 3 3 3 3 3

Demands by Whse --> 180 80 200 160 220Plants: Supply Shipping costs from plant x to warehouse y (at intersection):S. Carolina 310 10 8 6 5 4Tennessee 260 6 5 4 3 6Arizona 280 3 4 5 5 9

Shipping: $83 $19 $17 $15 $13 $19

The problem presented in this model involves the shipment of goods from three plants to five regionalwarehouses. Goods can be shipped from any plant to any warehouse, but it obviously costs more toship goods over long distances than over short distances. The problem is to determine the amountsto ship from each plant to each warehouse at minimum shipping cost in order to meet the regionaldemand, while not exceeding the plant supplies.

384 Chapter 34 Integer-Valued Optimization Models

2728293031323334353637383940414243444546


Target cell B20 Goal is to minimize total shipping cost.

Changing cells C8:G10 Amount to ship from each plant to each warehouse.

Constraints B8:B10<=B16:B18 Total shipped must be less than or equal to supply at plant.

C12:G12>=C14:G14 Totals shipped to warehouses must be greaterthan or equal to demand at warehouses.

C8:G10>=0 Number to ship must be greater than or equal to 0.

You can solve this problem faster by selecting the Assume linear model check box in the Solver Options dialog box before clicking Solve. A problem of this type has an optimum solution at whichamounts to ship are integers, if all of the supply and demand constraints are integers.

34.2 MODIFIED TRANSPORTATION PROBLEM

Figure 34.2 Display

123456789

1011121314151617181920212223

A B C D E F G H IModified Example 2: Transportation Problem.

Minimize the costs of shipping goods from production plants to warehouses near metropolitan demandcenters, while not exceeding the supply available from each plant and meeting the demand from eachmetropolitan area.

Number to ship from plant to warehouseWarehouse Shipped Plant

Plant San Fran Denver Chicago Dallas New York from plant supplyS. Carolina 1 1 1 1 1 5 310Tennessee 1 1 1 1 1 5 260Arizona 1 1 1 1 1 5 280Shipped to warehouse 3 3 3 3 3Warehouse demand 180 80 200 160 220

Shipping cost from plant to warehouseWarehouse

Plant San Fran Denver Chicago Dallas New YorkS. Carolina $10 $8 $6 $5 $4Tennessee $6 $5 $4 $3 $6Arizona $3 $4 $5 $5 $9

Total shipping cost $83

34.2 Modified Transportation Problem 385


123456789

1011121314151617181920212223

A B C D E F G H IModified Example 2: Transportation Problem.

Minimize the costs of shipping goods from production plants to warehouses near metropolitan demandcenters, while not exceeding the supply available from each plant and meeting the demand from eachmetropolitan area.

Number to ship from plant to warehouseWarehouse Shipped Plant

Plant San Fran Denver Chicago Dallas New York from plant supplyS. Carolina 1 1 1 1 1 =SUM(C10:G10) 310Tennessee 1 1 1 1 1 =SUM(C11:G11) 260Arizona 1 1 1 1 1 =SUM(C12:G12) 280Shipped to warehouse =SUM(C10:C12) =SUM(D10:D12) =SUM(E10:E12) =SUM(F10:F12) =SUM(G10:G12)Warehouse demand 180 80 200 160 220

Shipping cost from plant to warehouseWarehouse

Plant San Fran Denver Chicago Dallas New YorkS. Carolina $10 $8 $6 $5 $4Tennessee $6 $5 $4 $3 $6Arizona $3 $4 $5 $5 $9

Total shipping cost =SUMPRODUCT(C10:G12,C19:G21)


34.3 SCHEDULING PROBLEM

Figure 34.4 Personnel Scheduling

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

A B C D E F G H I J K L MExample 3: Personnel scheduling for an Amusement Park.For employees working five consecutive days with two days off, find the schedule that meets demandfrom attendance levels while minimizing payroll costs.

Sch. Days off Employees Sun Mon Tue Wed Thu Fri Sat A Sunday, Monday 0 0 0 1 1 1 1 1 B Monday, Tuesday 8 1 0 0 1 1 1 1 C Tuesday, Wed. 0 1 1 0 0 1 1 1 D Wed., Thursday 10 1 1 1 0 0 1 1 E Thursday, Friday 0 1 1 1 1 0 0 1 F Friday, Saturday 7 1 1 1 1 1 0 1 G Saturday, Sunday 0 0 1 1 1 1 1 0

Schedule Totals: 25 25 17 17 15 15 18 25

Total Demand: 22 17 13 14 15 18 24

Pay/Employee/Day: $40Payroll/Week: $1,000

The goal for this model is to schedule employees so that you have sufficient staff at the lowest cost. Inthis example, all employees are paid at the same rate, so by minimizing the number of employees workingeach day, you also minimize costs. Each employee works five consecutive days, followed by two daysoff.


Target cell D20 Goal is to minimize payroll cost.

Changing cells D7:D13 Employees on each schedule.

Constraints D7:D13>=0 Number of employees must be greater than or equalto 0.

D7:D13=Integer Number of employees must be an integer.

F15:L15>=F17:L17 Employees working each day must be greater than orequal to the demand.

Possible schedules Rows 7-13 1 means employee on that schedule works that day.

In this example, you use an integer constraint so that your solut ions do not result in fractional numbers ofemployees on each schedule. Select ing the Assume linear model check box in the Solver Options dialog box before you click Solve will greatly speed up the solution process.

34.3 Scheduling Problem 387

Figure 34.5 Personnel Scheduling with Corrections

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

A B C D E F G H I J K L MExample 3: Personnel scheduling for an Amusement Park. (with corrections)For employees working five consecutive days with two days off, find the schedule that meets demandfrom attendance levels while minimizing payroll costs.

Sch. Days off Employees Sun Mon Tue Wed Thu Fri Sat A Sunday, Monday 0 0 0 1 1 1 1 1 B Monday, Tuesday 8 1 0 0 1 1 1 1 C Tuesday, Wed. 0 1 1 0 0 1 1 1 D Wed., Thursday 10 1 1 1 0 0 1 1 E Thursday, Friday 0 1 1 1 1 0 0 1 F Friday, Saturday 7 1 1 1 1 1 0 0 G Saturday, Sunday 0 0 1 1 1 1 1 0

Schedule Totals: 25 25 17 17 15 15 18 18

Total Demand: 22 17 13 14 15 18 24

Pay/Employee/Day: $40Payroll/Week: $5,000

The goal for this model is to schedule employees so that you have sufficient staff at the lowest cost. Inthis example, all employees are paid at the same rate, so by minimizing the number of employees workingeach day, you also minimize costs. Each employee works five consecutive days, followed by two daysoff.


Target cell D20 Goal is to minimize payroll cost.

Changing cells D7:D13 Employees on each schedule.

Constraints D7:D13>=0 Number of employees must be greater than or equalto 0.

D7:D13=Integer Number of employees must be an integer.

F15:L15>=F17:L17 Employees working each day must be greater than orequal to the demand.

Possible schedules Rows 7-13 1 means employee on that schedule works that day.

In this example, you use an integer constraint so that your solutions do not result in fractional numbers ofemployees on each schedule. Selecting the Assume linear model check box in the Solver Options dialog box before you click Solve will greatly speed up the solution process.

Optimization Models for Finance Decisions 35

35.1 WORKING CAPITAL MANAGEMENT PROBLEM

Figure 35.1 Working Capital Management

123456789

101112131415161718192021222324252627282930313233

A B C D E F G H I JExample 4: Working Capital Management.Determine how to invest excess cash in 1-month, 3-month and 6-month CDs so as tomaximize interest income while meeting company cash requirements (plus safety margin).

Yield Term Purchase CDs in months:1-mo CDs: 1.0% 1 1, 2, 3, 4, 5 and 6 Interest3-mo CDs: 4.0% 3 1 and 4 Earned:6-mo CDs: 9.0% 6 1 Total $7,700

Month: Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 EndInit Cash: $400,000 $205,000 $216,000 $237,000 $158,400 $109,400 $125,400Matur CDs: 100,000 100,000 110,000 100,000 100,000 120,000Interest: 1,000 1,000 1,400 1,000 1,000 2,3001-mo CDs: 100,000 100,000 100,000 100,000 100,000 100,0003-mo CDs: 10,000 10,0006-mo CDs: 10,000Cash Uses: 75,000 (10,000) (20,000) 80,000 50,000 (15,000) 60,000End Cash: $205,000 $216,000 $237,000 $158,400 $109,400 $125,400 $187,700

-290000

If you're a financial officer or a manager, one of your tasks is to manage cash and short-term investments in away that maximizes interest income, while keeping funds available to meet expenditures. You must trade offthe higher interest rates available from longer-term investments against the flexibility provided by keeping fundsin short-term investments.

This model calculates ending cash based on init ial cash (from the previous month), inflows from maturingcertificates of deposit (CDs), outflows for new CDs, and cash needed for company operations for each month.

You have a total of nine decisions to make: the amounts to invest in one-month CDs in months 1 through 6;the amounts to invest in three-month CDs in months 1 and 4; and the amount to invest in six-month CDs inmonth 1.

390 Chapter 35 Optimization Models for Finance Decisions

343536373839404142434445464748495051525354555657585960616263646566

A B C D E F G H I JProblem Specifications

Target cell H8 Goal is to maximize interest earned.

Changing cells B14:G14 Dollars invested in each type of CD.B15, E15, B16

Constraints B14:G14>=0 Investment in each type of CD must be greater thanB15:B16>=0 or equal to 0.E15>=0

B18:H18>=100000 Ending cash must be greater than or equal to$100,000.

The optimal solution determined by Solver earns a total interest income of $16,531 by investing as much aspossible in six-month and three-month CDs, and then turns to one-month CDs. This solution satisfies all of theconstraints.

Suppose, however, that you want to guarantee that you have enough cash in month 5 for an equipmentpayment. Add a constraint that the average maturity of the investments held in month 1 should not be morethan four months.

The formula in cell B20 computes a total of the amounts invested in month 1 (B14, B15, and B16), weightedby the maturities (1, 3, and 6 months), and then it subtracts from this amount the total investment, weighted by4. If this quantity is zero or less, the average maturity will not exceed four months. To add this constraint,restore the original values and then click Solver on the Tools menu. Click Add. Type b20 in the Cell Reference box, type 0 in the Constraint box, and then click OK. To solve the problem, click Solve.

To satisfy the four-month maturity constraint, Solver shifts funds from six-month CDs to three-month CDs. Theshifted funds now mature in month 4 and, according to the present plan, are reinvested in new three-month CDs. If you need the funds, however, you can keep the cash instead of reinvesting. The $56,896 turningover in month 4 is more than sufficient for the equipment payment in month 5. You've traded about $460 ininterest income to gain this flexibility.

35.2 Work Cap Alternate Formulations 391

35.2 WORK CAP ALTERNATE FORMULATIONS

Figure 35.2 Working Capital Management Horizontal Time


Figure 35.3 Working Capital Management Vertical Time

35.3 Stock Portfolio Problem 393

35.3 STOCK PORTFOLIO PROBLEM

Figure 35.4 Efficient Stock Portfolio

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

A B C D E F G H I J KExample 5: Efficient stock portfolio.Find the weightings of stocks in an efficient portfolio that maximizes the portfolio rate ofreturn for a given level of risk. This worksheet uses the Sharpe single-index model; youcan also use the Markowitz method if you have covariance terms available.

Risk-free rate 6.0% Market variance 3.0%Market rate 15.0% Maximum weight 100.0%

Beta ResVar Weight *Beta *Var.Stock A 0.80 0.04 20.0% 0.160 0.002Stock B 1.00 0.20 20.0% 0.200 0.008Stock C 1.80 0.12 20.0% 0.360 0.005Stock D 2.20 0.40 20.0% 0.440 0.016T-bills 0.00 0.00 20.0% 0.000 0.000

Total 100.0% 1.160 0.030Return Variance

Portfolio Totals: 16.4% 7.1%

Maximize Return: A21:A29 Minimize Risk: D21:D290.1644 0.07077

5 5TRUE TRUETRUE TRUETRUE TRUETRUE TRUETRUE TRUETRUE TRUETRUE TRUE

One of the basic principles of investment management is diversification. By holding a portfolio of severalstocks, for example, you can earn a rate of return that represents the average of the returns from theindividual stocks, while reducing your risk that any one stock will perform poorly.

Using this model, you can use Solver to find the allocation of funds to stocks that minimizes the portfoliorisk for a given rate of return, or that maximizes the rate of return for a given level of risk.

This worksheet contains figures for beta (market-related risk) and residual variance for four stocks. Inaddition, your portfolio includes investments in Treasury bills (T-bills), assumed to have a risk-free rate ofreturn and a variance of zero. Init ially equal amounts (20 percent of the portfolio) are invested in eachsecurity.

Use Solver to try different allocations of funds to stocks and T-bills to either maximize the portfolio rate ofreturn for a specified level of risk or minimize the risk for a given rate of return. With the init ial allocationof 20 percent across the board, the portfolio return is 16.4 percent and the variance is 7.1 percent.


474849505152535455565758596061626364656667686970717273747576

A B C D E F G H I J KProblem Specifications

Target cell E18 Goal is to maximize portfolio return.

Changing cells E10:E14 Weight of each stock.

Constraints E10:E14>=0 Weights must be greater than or equal to 0.

E16=1 Weights must equal 1.

G18<=0.071 Variance must be less than or equal to 0.071.

Beta for each stock B10:B13

Variance for each stock C10:C13

Cells D21:D29 contain the problem specifications to minimize risk for a required rate of return of 16.4percent. To load these problem specifications into Solver, click Solver on the Tools menu, click Options, click Load Model, select cells D21:D29 on the worksheet, and then click OK until theSolver Parameters dialog box is displayed. Click Solve. As you can see, Solver finds portfolioallocations in both cases that surpass the rule of 20 percent across the board.

You can earn a higher rate of return (17.1 percent) for the same risk, or you can reduce your risk withoutgiving up any return. These two allocations both represent efficient portfolios.

Cells A21:A29 contain the original problem model. To reload this problem, click Solver on the Tools menu, click Options, click Load Model, select cells A21:A29 on the worksheet, and then click OK.

Solver displays a message asking if you want to reset the current Solver option settings with the settingsfor the model you are loading. Click OK to proceed.

35.4 MoneyCo Problem 395

35.4 MONEYCO PROBLEM

Figure 35.5 Display

123456789101112131415161718192021222324252627

A B C D E F G H I J K L MReturn on investments

CD rate = 0.06A B C D E CD1 CD2 CD3

Time 1 -1.00 -1.00 -1.00 -1.00Time 2 1.15 -1.00 1.06 -1.00Time 3 1.28 -1.00 1.06 -1.00Time 4 1.40 1.15 1.32 1.06

Max to invest $500 $500 $500 $500 $500 $1,000,000 $1,000,000 $1,000,000

Amount Invested $100 $100 $100 $100 $100 $100 $100 $100 Feasible

Cash flows from investments Cash in Cash outTime 1 -$100 -$100 -$100 $0 $0 -$100 $0 $0 $1,000 $600Time 2 $0 $115 $0 $0 -$100 $106 -$100 $0 $0 $21Time 3 $0 $0 $128 -$100 $0 $0 $106 -$100 $0 $34Time 4 $140 $0 $0 $115 $132 $0 $0 $106 $493 Final balance

Legend

data cells input assumptions, uncontrollable, constraints Defined NamesAmount_Invested = $B$11:$I$11

changing cells decision variables, controllable Cash_out = $K$14:$K$17Final_balance = $K$17

computed cells intermediate and output variables, target Max_to_invest = $B$9:$I$9


123456789101112131415161718192021

A B C D E F G H I J K LReturn on investments

CD rate = 0.06A B C D E CD1 CD2 CD3

Time 1 -1.00 -1.00 -1.00 -1.00Time 2 1.15 -1.00 =1+$B$2 -1.00Time 3 1.28 -1.00 =1+$B$2 -1.00Time 4 1.40 1.15 1.32 =1+$B$2

Max to invest $500 $500 $500 $500 $500 $1,000,000 $1,000,000 $1,000,000

Amount Invested $100 $100 $100 $100 $100 $100 $100 $100 =IF(AND(Amount_Invested<

Cash flows from investments Cash in Cash outTime 1 =B4*B$11 =C4*C$11 =D4*D$11 =E4*E$11 =F4*F$11 =G4*G$11 =H4*H$11 =I4*I$11 $1,000 =SUM(B14:J14)Time 2 =B5*B$11 =C5*C$11 =D5*D$11 =E5*E$11 =F5*F$11 =G5*G$11 =H5*H$11 =I5*I$11 $0 =SUM(B15:J15)Time 3 =B6*B$11 =C6*C$11 =D6*D$11 =E6*E$11 =F6*F$11 =G6*G$11 =H6*H$11 =I6*I$11 $0 =SUM(B16:J16)Time 4 =B7*B$11 =C7*C$11 =D7*D$11 =E7*E$11 =F7*F$11 =G7*G$11 =H7*H$11 =I7*I$11 =SUM(B17:J17) Final balanc

Array-entered (Control+Shift+Enter) formula in K11: =IF(AND(Amount_Invested<=Max_to_invest,Cash_out>=0),"Feasible","Not Feasible")Enter =B4*B$11 in cell B14 and copy to cells B14:I17Enter =SUM(B14:J14) in cell K14 and copy to K14:K17


Figure 35.7 Solver Dialog Box

Appendix Excel for the Macintosh

The step-by-step instructions and screen shots in this book are based on Excel 2002 (Office XP). This appendix describes some differences between Excel 2002 on Windows and Excel on the Macintosh.

If you are using Excel on an Apple Macintosh computer, first learn the Macintosh graphical user interface, the basic features of the operating system, and the online help. For example, to get answers to your questions about using Mac OS X, choose Mac Help from the Help menu, type your question, and press the Return key.

The Shortcut Menu One frequently occurring exception is accessing the shortcut menu: Windows users will press the right mouse button; Macintosh users may either hold down the Control (Ctrl) key and click the mouse button or hold down the Option and Command keys and click the mouse button. This book emphasizes double-clicking, right-clicking, and shortcut menus.

Relative and Absolute References When entering formulas with the insertion point in a reference, Windows users will press the F4 key to cycle through the four combinations of relative and absolute references; on a Macintosh without function keys, users will substitute Command-t for F4.

398


References

Canavos, George C., and Don M. Miller. An Introduction to Modern Business Statistics. Belmont, Calif.: Duxbury, 1993.

Clemen, Robert T. Making Hard Decisions: An Introduction to Decision Analysis. 2nd ed. Belmont, Calif.: Duxbury, 1996.

Cryer, Jonathan D., and Robert B. Miller. Statistics for Business: Data Analysis and Modeling. 2nd ed. Belmont, Calif.: Duxbury, 1994.

Keller, Gerald, Brian Warrack, and Henry Bartel. Statistics for Management and Economics. 3rd ed. Belmont, Calif.: Duxbury, 1994.

Mendenhall, William, James E. Reinmuth, and Robert J. Beaver. Statistics for Management and Economics. 7th ed. Belmont, Calif.: Duxbury, 1993.

Menzefricke, Ulrich. Statistics for Managers. Belmont, Calif.: Duxbury, 1995.

Survey of Current Business. Washington, D.C.: U.S. Government Printing Office, 1983-1987.

400


Decision Analysis Using Microsoft...

Documents

Transcript of Decision Analysis Using Microsoft...