NG BB 36 Simple Linear Regression

48
This material is not for general distribution, and its contents should not be quoted, extracted for publication, or otherwise copied or distributed without prior coordination with the Department of the Army, ATTN: ETF. UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training Module 36 Simple Linear Regression

description

 

Transcript of NG BB 36 Simple Linear Regression

Page 1: NG BB 36 Simple Linear Regression

This material is not for general distribution, and its contents should not be quoted, extracted for publication, or otherwisecopied or distributed without prior coordination with the Department of the Army, ATTN: ETF. UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

National GuardBlack Belt Training

Module 36

Simple Linear Regression

Page 2: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

CPI Roadmap – Analyze

Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.

TOOLS•Value Stream Analysis•Process Constraint ID •Takt Time Analysis•Cause and Effect Analysis •Brainstorming•5 Whys•Affinity Diagram•Pareto •Cause and Effect Matrix •FMEA•Hypothesis Tests•ANOVA•Chi Square •Simple and Multiple Regression

ACTIVITIES

• Identify Potential Root Causes

• Reduce List of Potential Root Causes

• Confirm Root Cause to Output Relationship

• Estimate Impact of Root Causes on Key Outputs

• Prioritize Root Causes

• Complete Analyze Tollgate

1.Validate the

Problem

4. Determine Root

Cause

3. Set Improvement

Targets

5. Develop Counter-

Measures

6. See Counter-MeasuresThrough

2. IdentifyPerformance

Gaps

7. Confirm Results

& Process

8. StandardizeSuccessfulProcesses

Define Measure Analyze ControlImprove

8-STEP PROCESS

Page 3: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

3Simple Linear Regression

Learning Objectives

Terminology and data requirements for conducting a regression analysis

Interpretation and use of scatter plots

Interpretation and use of correlation coefficients

The difference between correlation and causation

How to generate, interpret, and use regression equations

Page 4: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

4Simple Linear Regression

Application Examples

Administrative – A financial analyst wants to predict the cash needed to support growth and increases in training

Market/Customer Research – The main exchange wants to determine how to predict a customer’s buying decision from demographics and product characteristics

Hospitality – The MWR Guest House wants to see if there is a relationship between room service delays and order size

Page 5: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

5Simple Linear Regression

When Should I Use Regression?

The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but can also be used with count or categorical

inputs and outputs.

Continuous AttributeA

ttri

bu

te C

on

tin

uo

us

Independent Variable (X)D

ep

en

de

nt

Va

ria

ble

(Y

)

Regression ANOVA

Logistic

Regression

Chi-Square (2)

Test

Page 6: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

6Simple Linear Regression

General Strategy for Regression Modeling

• What variables?• How will I get the data?• How much data do I need?

• What input variables have the biggest effect on the response variable?

• What are some candidate prediction models?

• What is the best model?

• How well does the model predict new observations?

Planning and Data Collection

Initial Analysis and Reduction of Variables

Select and Refine Models

Validate Model

Page 7: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

7Simple Linear Regression

Regression Terminology

Types of Variables

Input Variable (Xs)

These are also called predictor variables or independent variables

Best if the variables are continuous, but can be count or categorical

Output Variable (Ys)

These are also called response variables or dependent variables (what we’re trying to predict)

Best if the variables are continuous, but can be count or categorical

Process or

Product

X1

X2

X3

Y

Error

Page 8: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

8Simple Linear Regression

Visualize the Data – A Good Start!

Lets you “see” patterns in data

Supports or refutes theories about the data

Helps create or refine hypotheses

Predicts effects under other circumstances (be careful extending predictions beyond the range of data used)

Scatter Plot: A graph showing a relationship (or correlation) between two factors or variables

Be CarefulCorrelation does not guarantee causation!

Page 9: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

9Simple Linear Regression

Correlation vs. Causation

Correlation by itself does not imply a cause and effect relationship!

Lurking

variables!

Other examples?

Ave

rag

e lif

e ex

pec

tan

cy

# divorces/10,000 Price of automobiles

Gas

mile

age

When is it correct to infer causation?

Page 10: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

10Simple Linear Regression

Example: Mortgage Estimates

A Belt is trying to reduce the call length for military clients calling for a good faith estimate on a VA loan

The Belt thinks that there is a relationship between broker experience and call length, and creates a scatter plot to visualize the relationship

Page 11: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

11Simple Linear Regression

Example: Mortgage Estimate Scatter Plot

Does it look like a relationship exists between Broker Experience and Call Length?

302010

60

50

40

30

20

Broker Experience

Call

Length

Hypothesis: Brokers with more experience can provide

estimates in a shorter time.

Page 12: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

12Simple Linear Regression

302010

60

50

40

30

20

Broker Experience

Call

Length

Scatter Plot - Structure

Y Axis(Result?)

PairedData

X Axis( SuspectedInfluence )

Paired Data?To use a scatter plot, you must have measured two factors for a single observation or item (ex: for a given measurement, you need to know both the call length and the broker’s experience). You have to make sure that the data “pair-up” properly in Minitab, or the diagram will be meaningless.

Page 13: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

13Simple Linear Regression

Input, Process, Output Context

Y

X

PREDICTOR MEASURES RESULTS MEASURES

Input OutputProcess

• Customer Satisfaction

• Total Defects

• Cycle Time

• Cost

• Arrival Time

• Accuracy

• Cost

• Key Specs

(X) (Y) (X)

• Time Per Task

• In-Process Errors

• Labor Hours

• Exceptions

X Axis –Independent Variable

Y Axis –Dependent Variable

Page 14: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

14Simple Linear Regression

Scatter Plots

See how one factor relates to changes in another

Develop and/or verify hypotheses

Judge strength of relationship by width or tightness of scatter

Don’t assume a causal relationship!

No Correlation PositiveCurvilinearNegative

Page 15: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

15Simple Linear Regression

Exercise: Interpreting Scatter Plots

1. As a team, review assigned Scatter Plots – see next pages

2. What kind of correlation do you see? (Name)

3. What does it mean?

4. What can you conclude?

5. What data might this represent? (Example)

Page 16: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

16Simple Linear Regression

Example One

Page 17: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

17Simple Linear Regression

Example Two

Page 18: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

18Simple Linear Regression

Example Three

Page 19: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

19Simple Linear Regression

Minitab Example: Scatter Plot

Next, we will work through a Minitab example using data collected at the Anthony’s Pizza company

The Belt suspects that the customers have to wait too long on days when there are many deliveries to make at Anthony’s Pizza

Page 20: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

20Simple Linear Regression

Minitab Example: Pizza Scatter Plot

A month of data was collected, and stored in the Minitab file Regression-Pizza.mtw

Page 21: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

21Simple Linear Regression

Pizza Scatter Plot (Cont.)

1. Open worksheet Regression-Pizza.mtw

2. Choose Graph>Scatterplot

Page 22: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

22Simple Linear Regression

Pizza Scatter Plot (Cont.)

When you click on Scatterplots,this is the first dialog box thatcomes up

3. Select the Simple Scatterplot

4. Click on OK to move to the next dialog box

Page 23: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

23Simple Linear Regression

Pizza Scatter Plot (Cont.)

5. Double click on C5 Wait Time to enter it as the Y variable, then double click on C6 Deliveries to enter it as the X variable

6. Edit dialog box options (Optional)

7. Click OK

Page 24: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

24Simple Linear Regression

Pizza Scatter Plot (Cont.)

Does it look like the number of Deliveries influences the customer’s Wait Time?

Deliveries

Wa

it T

ime

353025201510

55

50

45

40

35

Scatterplot of Wait Time vs Deliveries

Page 25: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

25Simple Linear Regression

Pizza Scatter Plot (Cont.)

Note: Hold your cursor over any point on the Scatterplot and Minitab will identify the

Row, X-Value and Y-Value for that point

Page 26: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

26Simple Linear Regression

Correlation Coefficients (r & r2)

Numbers that indicate the strength of the correlation between two factors

r - strength and the direction of the relationship

Also called Pearson’s Correlation Coefficient

r2 - percentage of variation in Y attributable to the independent variable X.

Adds precision to a person’s visual judgment about correlation

Test the power of your hypothesis

How much influence does this factor have?

Are there other, more important, “vital few” causes?

Page 27: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

27Simple Linear Regression

Interpreting Correlation Coefficients

r falls on or between -1 and 1

Calculate in Minitab

Figures below -0.65 and above 0.65 indicate a meaningful correlation

1 = “Perfect” positive correlation

-1 = “Perfect” negative correlation

Use to calculate r2

r=0

r=-.8

Page 28: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

28Simple Linear Regression

Pearson Correlation Coefficient (r) – Mortgage

Betty Black Belt used the scatter plot to get a visual picture of the relationship between broker experience and call length

Now she uses the Pearson Correlation Coefficient, r, to quantify the strength of the relationship

r = - 0.896(a strong negative correlation)

302010

60

50

40

30

20

Broker Experience

Call

Length

Page 29: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

29Simple Linear Regression

Exercise: Correlation

The scatter plot shows that the customers are waiting longer when Anthony’s Pizza has to make more deliveries

Next, the Belt wants to quantify the strength of that relationship

To do that, we will calculate the Pearson Correlation Coefficient, r

Page 30: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

30Simple Linear Regression

Pizza Correlation

1. Choose Stat > Basic Statistics > Correlation

Page 31: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

31Simple Linear Regression

Correlation Input Window

2. Double click on C5 Wait Time and C6 Deliveriesto add them to the Variables box

3. Uncheck the box, Display p-values

4. Click OK

Page 32: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

32Simple Linear Regression

Correlation Coefficient

Since r, the Pearson correlation, is 0.970, there is a meaningful correlation between the wait time and number of deliveries

Page 33: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

33Simple Linear Regression

Interpreting Coefficients – r2

First, we obtained r from the Correlation analysis

Next, in Regression, we will look at r2 to see how good our model (regression equation) is

r2: Compute by multiplying r x r (Pearson correlation squared)

Example: With an r value of .970, in the Pizza example, the team computed r2 :

.970 x .970 = .941 or 94.1%

So, 94% of the variation in wait time is explained by the variability in deliveries

Page 34: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

34Simple Linear Regression

Regression Analysis

Regression Analysis is used in conjunction with Correlation and Scatter Plots to predict future performance using past results

While Correlation shows how much linear relationship exists between two variables, Regression defines the relationship more precisely

Use this tool when there is existing data over a defined range

Regression analysis is a tool that uses data on relevant variables to develop a prediction equation, or model

Page 35: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

35Simple Linear Regression

Linear Regression

In Simple Linear Regression, a single variable “X” is used to define/predict “Y”

e.g.; Wait Time = B1 + (B2) x (Deliveries) + (error)

Simple Regression Equation: Y = B1 + (B2) x (X) + Y

X

x

B2 = Slope

y

Page 36: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

36Simple Linear Regression

Exercise: Regression

Since the Pearson Correlation (r) was .970, we know that there is a strong positive correlation between the number of deliveries and the wait time

Next, the Belt would like to get an equation to predict how long the customers will be waiting

Page 37: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

37Simple Linear Regression

Regression (Cont.)

1. Choose Stat>Regression>Fitted Line Plot

Page 38: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

38Simple Linear Regression

Fitted Line Input Window

2. Double click on C5 Wait Time to enter it as the Response (Y) variable

3. Double click on C6 Deliveries to enter it as the Predictor (X) variable

4. Make sure Linear is checked for the type of Regression

5.Edit dialog box options (Optional)

6. Click OK

Page 39: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

39Simple Linear Regression

Pizza Regression Plot

Deliveries

Wa

it T

ime

353025201510

55

50

45

40

35

S 1.11885

R-Sq 94.1%

R-Sq(adj) 93.9%

Fitted Line PlotWait Time = 32.05 + 0.5825 Deliveries

Page 40: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

40Simple Linear Regression

Regression Analysis Results – Session Window

R-Sq is the amount of variation in the data explained by the model. Notice that 94.1 = .970 * .970. R-Sq is the square of the Pearson

correlation from the previous analysis.

Prediction Equation(Regression Model)

Page 41: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

41Simple Linear Regression

Using the Prediction Equation

If we have 20 deliveries to make, how long will the customer have to wait for their order?

Based on our 30 minute guarantee, how acceptable is our performance?

Page 42: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

42Simple Linear Regression

Method of “Least Squares” Regression – Technical Note

Minitab will find the “best fitting” line for us. How does it do that?•We want to have as little difference as possible between the true observations and the fitted line

•Minitab minimizes the sums of squares of the distance between the fitted and true observations

Deliveries

Wa

it T

ime

353025201510

55

50

45

40

35

Fitted Line PlotWait Time = 32.05 + 0.5825 Deliveries

Y

Y

true observation (the data point)

“fitted” observation (the line)

Page 43: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

43Simple Linear Regression

Multiple Regression

Use this when you want to consider more than one predictor variable

The benefit is that you might need more predictors to create an accurate model

In the case of our Anthony’s Pizza example, we may want to look at the impact that incorrect orders, damaged pizzas, and cold pizzas have on wait time

Page 44: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

44Simple Linear Regression

Individual Exercise: Pizza

As a Anthony’s Pizza Belt, you suspect that the number of pizza defects increases when more pizzas are ordered. You want to visualize the data and quantify the relationship

Use the Minitab file Pizza Exercise.mtw data to investigate the relationship between “Total Pizzas” and “Defects”

Create a scatter plot

Determine correlation

Create a fitted line plot

Determine the prediction equation

How many defects do we usually have when 50 pizzas are on order? What do you think of this model?

Page 45: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

45Simple Linear Regression

Another Exercise: Absentee Rate

The human resources director of a chain of fast-food restaurants studied the absentee rate of employees. Whenever employees called in sick, or simply did not show up, the restaurant manager had to find replacements in a hurry, or else work short-handed

The director had data on the number of absences per 100 employees per week (Y) and the average number of months’ experience at the restaurant (X) for 10 restaurants in the chain. The director expected that long-term employees would be more reliable and absent less often

Page 46: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

46Simple Linear Regression

Absentee Rate

1. Open an blank Minitab worksheet and input the data

2. Create a scatter plot and decide whether a straight line is a reasonable model

3. Conduct a regression analysis and get the linear prediction equation

4. Predict the number of absences for employees with 19.5 months of experience

Experience Absences

18.1 31.5

20.0 33.1

20.8 27.4

21.5 24.5

22.0 27.0

22.4 27.8

22.9 23.3

24.0 24.7

25.4 16.9

27.3 18.1

Page 47: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

47Simple Linear Regression

Takeaways

Start with a visual tool – create a scatter plot

Determine the Pearson correlation coefficient, r, to determine the strength of the relationship

Remember that correlation does not guarantee causation!

Create and interpret the Regression Plot

Use the prediction equation

Validate the prediction model’s r-squared using new data (not part of the data set used in creating the prediction equation)

Page 48: NG BB 36 Simple Linear Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

What other comments or questions

do you have?