Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent...

21
Multiple Regression

Transcript of Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent...

Page 1: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Multiple Regression

Page 2: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Simple Regression in detail

Yi = βo + β1 xi + εi

Where

• Y =>Dependent variable

• X =>Independent variable

• βo =>Model parameter– Mean value of dependent variable (Y) when the

independent variable (X) is zero

Page 3: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Simple Regression in detail

• Β1 => Model parameter

- Slope that measures change in mean value of dependent variable associated with a one-unit increase in the independent variable

• εi =>

- Error term that describes the effects on Yi of all factors other than value of Xi

Page 4: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Assumptions of the Regression Model

• Error term is normally distributed (normality assumption)

• Mean of error term is zero (E{εi} = 0)

• Variance of error term is a constant and is independent of the values of X (constant variance assumption)

• Error terms are independent of each other (independent assumption)

• Values of the independent variable X is fixed – No error in X values.

Page 5: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Estimating the Model Parameters

• Calculate point estimate bo and b1 of unknown parameter βo and β1

• Obtain random sample and use this information from sample to estimate βo and β1

• Obtain a line of best "fit" for sample data points - least squares line

= bo + b1 Xi

Where is the predicted value of Y

iY

iY

Page 6: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Values of Least Squares Estimates bo and b1

b1 = n xiyi - (xi)(yi)

n xi2 - (xi)2

bo = y - bi x

Where

y = yi ; x = xi

n n

• bo and b1 vary from sample to sample. Variation is given by their Standard Errors Sbo and Sb1

Page 7: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 1

• To see relationship between Advertising and Store Traffic

• Store Traffic is the dependent variable and Advertising is the independent variable

• We find using the formulae that bo=148.64 and b1 =1.54

• Are bo and b1 significant?

• What is Store Traffic when Advertising is 600?

Page 8: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 2 • Consider the following data

• Using formulae we find that b0 = -2.55 and b1 = 1.05

Sales (X) Advertising(Y)

3 7

8 13

17 13

4 11

15 16

7 6

Page 9: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 2

Therefore the regression model would be

Ŷ = -2.55 + 1.05 Xi

r2 = (0.74)2 = 0.54 (Variance in sales (Y) explained by ad (X))

Assume that the Sbo(Standard error of b0) = 0.51 and

Sb1 = 0.26 at = 0.5, df = 4,

Is bo significant? Is b1 significant?

Page 10: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Idea behind Estimation: Residuals• Difference between the actual and predicted values are

called Residuals

• Estimate of the error in the population

ei = yi - yi

= yi - (bo + b1 xi)

Quantities in hats are predicted quantities

• bo and b1 minimize the residual or error sums of squares (SSE)

SSE = ei2 = ((yi - yi)2

= Σ [yi-(bo + b1xi)]2

Page 11: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Testing the Significance of the Independent Variables

• Null Hypothesis• There is no linear relationship between the

independent & dependent variables

• Alternative Hypothesis• There is a linear relationship between the

independent & dependent variables

Page 12: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Testing the Significance of the Independent Variables

• Test Statistic

t = b1 - β1

sb1

• Degrees of Freedom

v = n - 2

• Testing for a Type II Error

H0: β1 = 0

H1: β1 0

• Decision Rule

Reject H0: β1 = 0 if α > p value

Page 13: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Significance Test for Store Traffic Example

• Null hypothesis, Ho: β1=0

• Alternative hypothesis, HA: β1 0

• The test statistic is t = = =7.33

• With as 0.5 and with Degree of Freedom v = n-2 =18, the value of t from the table is 2.10

• Since , we reject the null hypothesis of no linear relationship. Therefore Advertising affects Store Traffic

21.

054.1

1

11

bs

b

tablecalc tt

Page 14: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Predicting the Dependent Variable

• How well does the model yi = bo + bixi predict?

• Error of prediction without indep var is yi - yi

• Error of prediction with indep var is yi- yi

• Thus, by using indep var the error in prediction reduces by (yi – yi)-(yi- yi)= (yi – yi)

• It can be shown that

(yi - y)2 = ( yi - y)2 + (yi - yi)2

Page 15: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Predicting the Dependent Variable

• Total variation (SST)= Explained variation (SSM) + Unexplained variation (SSE)

• A measure of the model’s ability to predict is the Coefficient of Determination (r2)

r2 = =

• For our example, r2 =0.74, i.e, 74% of variation in Y

is accounted for by X• r2 is the square of the correlation between X and Y

SST

SSE-SST

SST

SSM

Page 16: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Multiple Regression

• Used when more than one indep variable affects dependent variable

• General model

WhereY: Dependent variable

: Independent variables: Coefficients of the n indep variables

: A constant (Intercept)

nn XXY ...110

nXXX ...,, , 21

n ...,, , 21

0

Page 17: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Issues in Multiple Regression

• Which variables to include• Is relationship between dep variables and each of

the indep variables linear?• Is dep variable normally distributed for all values

of the indep variables?• Are each of the indep variables normally

distributed (without regard to dep var)• Are there interaction variables?• Are indep variables themselves highly correlated?

Page 18: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 3

• Cataloger believes that age (AGE) and income (INCOME) can predict amount spent in last 6 months (DOLLSPENT)

• The regression equation isDOLLSPENT = 351.29 - 0.65 INCOME

+0.86 AGE• What happens when income(age) increases?• Are the coefficients significant?

Page 19: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 4

• Which customers are most likely to buy?• Cataloger believes that ratio of total orders to total

pieces mailed is good measure of purchase likelihood

• Call this ratio RESP• Indep variables are

- TOTDOLL: total purchase dollars- AVGORDR: average dollar order- LASTBUY: # of months since last purchase

Page 20: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 4

• Analysis of Variance table

- How is total sum of squares split up?

- How do you get the various Deg of Freedom?

- How do you get/interpret R-square?

- How do you interpret the F statistic?

- What is the Adjusted R-square?

Page 21: Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Example 4

• Parameter estimates table

- What are the t-values corresp to the estimates?

- What are the p-values corresp to the estimates?

- Which variables are the most important?

- What are standardized estimates?

- What to do with non-significant variables?