Categorical Independent Variables STA302 Fall 2013.

18
Categorical Independent Variables STA302 Fall 2013 See last slide for copyright information

Transcript of Categorical Independent Variables STA302 Fall 2013.

Page 1: Categorical Independent Variables STA302 Fall 2013.

Categorical Independent Variables

STA302 Fall 2013

See last slide for copyright information

Page 2: Categorical Independent Variables STA302 Fall 2013.

Categorical means unordered categories

• Like Field of Study: Humanities, Sciences, Social Sciences

• Could number them 1 2 3, but what would the regression coefficients mean?

• But you really want them in your regression model.

Page 3: Categorical Independent Variables STA302 Fall 2013.

One Categorical Explanatory Variable

• X=1 means Drug, X=0 means Placebo

• Population mean is

• For patients getting the drug, mean response is

• For patients getting the placebo, mean response is

Page 4: Categorical Independent Variables STA302 Fall 2013.

Sample regression coefficients for a binary explanatory variable

• X=1 means Drug, X=0 means Placebo

• Predicted response is

• For patients getting the drug, predicted response is

• For patients getting the placebo, predicted response is

Page 5: Categorical Independent Variables STA302 Fall 2013.

Regression test of

• Same as an independent t-test• Same as a oneway ANOVA with 2 categories• Same t, same F, same p-value.

• Now extend to more than 2 categories

Page 6: Categorical Independent Variables STA302 Fall 2013.

Drug A, Drug B, Placebo• x1 = 1 if Drug A, Zero otherwise

• x2 = 1 if Drug B, Zero otherwise• • Fill in the table

Page 7: Categorical Independent Variables STA302 Fall 2013.

Drug A, Drug B, Placebo

• x1 = 1 if Drug A, Zero otherwise

• x2 = 1 if Drug B, Zero otherwise•

Regression coefficients are contrasts with the category that has no indicator – the reference category

Page 8: Categorical Independent Variables STA302 Fall 2013.

Indicator dummy variable coding with intercept

• Need p-1 indicators to represent a categorical explanatory variable with p categories.

• If you use p dummy variables, columns of the X matrix are linearly dependent.

• Regression coefficients are contrasts with the category that has no indicator.

• Call this the reference category.

Page 9: Categorical Independent Variables STA302 Fall 2013.

Now add a quantitative variable (covariate)

• x1 = Age

• x2 = 1 if Drug A, Zero otherwise

• x3 = 1 if Drug B, Zero otherwise•

Page 10: Categorical Independent Variables STA302 Fall 2013.

Covariates

• Of course there could be more than one• Reduce MSE, make tests more sensitive• If values of categorical IV are not randomly

assigned, including relevant covariates could change the conclusions.

Page 11: Categorical Independent Variables STA302 Fall 2013.

Interactions

• Interaction between independent variables means “It depends.”

• Relationship between one explanatory variable and the response variable depends on the value of the other explanatory variable.

• Can have– Quantitative by quantitative– Quantitative by categorical– Categorical by categorical

Page 12: Categorical Independent Variables STA302 Fall 2013.

Quantitative by Quantitative

For fixed x2

Both slope and intercept depend on value of x2

And for fixed x1, slope and intercept relating x2 to E(Y) depend on the value of x1

Page 13: Categorical Independent Variables STA302 Fall 2013.

Quantitative by Categorical• One regression line for each category.• Interaction means slopes are not equal• Form a product of quantitative variable by each

dummy variable for the categorical variable• For example, three treatments and one

covariate: x1 is the covariate and x2, x3 are dummy variables

Page 14: Categorical Independent Variables STA302 Fall 2013.

Make a table

Page 15: Categorical Independent Variables STA302 Fall 2013.

What null hypothesis would you test for

• Equal slopes• Comparing slopes for group one vs three• Comparing slopes for group one vs two• Equal regressions• Interaction between group and x1

Page 16: Categorical Independent Variables STA302 Fall 2013.

General principle

• Interaction between A and B means– Relationship of A to Y depends on value of B– Relationship of B to Y depends on value of A

• The two statements are formally equivalent

Page 17: Categorical Independent Variables STA302 Fall 2013.

What to do if H0: β4=β5=0 is rejected

• How do you test Group “controlling” for x1?

• A reasonable choice is to set x1 to its sample mean, and compare treatments at that point.

• How about setting x1 to sample mean of the group (3 different values)?

• With random assignment to Group, all three means just estimate E(X1), and the mean of all the x1 values is a better estimate.

Page 18: Categorical Independent Variables STA302 Fall 2013.

Copyright Information

This slide show was prepared by Jerry Brunner, Department of

Statistics, University of Toronto. It is licensed under a Creative

Commons Attribution - ShareAlike 3.0 Unported License. Use

any part of it as you like and share the result freely. These

Powerpoint slides will be available from the course website:

http://www.utstat.toronto.edu/brunner/oldclass/302f13