Testing Relationships between Variables

18
July 1, 2008 Lecture 17 - Regression Testing 1 Testing Relationships between Variables Statistics 111 - Lecture 17

description

Statistics 111 - Lecture 17. Testing Relationships between Variables. Administrative Notes. Administrative Notes. Homework 5 due tomorrow Lecture on Wednesday will be review of entire course. Final Exam. Thursday from 10:40-12:10 It’ll be right here in this room - PowerPoint PPT Presentation

Transcript of Testing Relationships between Variables

Page 1: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 1

Testing Relationships between Variables

Statistics 111 - Lecture 17

Page 2: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 2

Administrative NotesAdministrative Notes

• Homework 5 due tomorrow

• Lecture on Wednesday will be review of entire course

Page 3: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 3

Final Exam

• Thursday from 10:40-12:10 • It’ll be right here in this room

• Calculators are definitely needed!• Single 8.5 x 11 cheat sheet (two-sided)

allowed• I’ve put a sample final on the course website

Page 4: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 4

Outline

• Review of Regression coefficients• Hypothesis Tests• Confidence Intervals• Examples

Page 5: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 5

Two Continuous Variables

• Visually summarize the relationship between two continuous variables with a scatterplot

• Numerically, we focus on best fit line (regression)

Education and Mortality Draft Order and Birthday

Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday

Page 6: Testing Relationships  between Variables

June 30, 2008 Stat 111 - Lecture 16 - Regression

6

Best values for Regression Parameters

• The best fit line has these values for the regression coefficients:

• Also can estimate the average squared residual:

Best estimate of slope

Best estimate of intercept

Page 7: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 7

Significance of Regression Line

• Does the regression line show a significant linear relationship between the two variables?• If there is not a linear relationship, then we would

expect zero correlation (r = 0)• So the slope b should also be zero

• Therefore, our test for a significant relationship will focus on testing whether our slope is significantly different from zero

H0 : = 0 versus Ha : 0

Page 8: Testing Relationships  between Variables

June 30, 2008 Stat 111 - Lecture 16 - Regression

8

Linear Regression

• Best fit line is called Simple Linear Regression Model:

• Coefficients:is the intercept and is the slope

• Other common notation: 0 for intercept, 1 for slope

• Our Y variable is a linear function of the X variable but we allow for error (εi) in each prediction

• We approximate the error by using the residualObserved Yi

Predicted Yi = + Xi

Page 9: Testing Relationships  between Variables

June 30, 2008 Stat 111 - Lecture 16 - Regression

9

Test Statistic for Slope

• Our test statistic for the slope is similar in form to all the test statistics we have seen so far:

• The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate• We will not be doing this calculation manually

because the JMP software does this calculation for us!

Page 10: Testing Relationships  between Variables

June 30, 2008 Stat 111 - Lecture 16 - Regression

10

Example: Education and Mortality

Page 11: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 11

Confidence Intervals for Coefficients

• JMP output also gives the information needed to make confidence intervals for slope and intercept

• 100·C % confidence interval for slope :

b +/- tn-2* SE(b)

• The multiple t* comes from a t distribution with n-2 degrees of freedom

• 100·C % confidence interval for intercept :

a +/- tn-2* SE(a)• Usually, we are less interested in intercept but it

might be needed in some situations

Page 12: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 12

Confidence Intervals for Example

• We have n = 60, so our multiple t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00

• 95 % confidence interval for slope :

-37.6 ± 2.0*8.307 = (-54.2,-21.0)Note that this interval does not contain zero!

• 95 % confidence interval for intercept :

1353± 2.0*91.42 = (1170,1536)

Page 13: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 13

Another Example: Draft Lottery• Is the negative linear association we see between

birthday and draft order statistically significant?

p-value

Page 14: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 14

Another Example: Draft Lottery

• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order• Statistical evidence that the randomization was not done

properly!

• 95 % confidence interval for slope :

-.23±1.98*.05 = (-.33,-.13)• Multiple t* = 1.98 from t distribution with n-2 = 363 d.f.• Confidence interval does not contain zero, which we

expected from our hypothesis test

Page 15: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 15

• Dataset of 78 seventh-graders: relationship between IQ and GPA

• Clear positive association between IQ and grade point average

Education Example

Page 16: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 16

Education Example• Is the positive linear association we see between

GPA and IQ statistically significant?

p-value

Page 17: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 17

Education Example

• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA

• 95 % confidence interval for slope :

.101±1.99*.014 = (.073,.129)

• Multiple t* = 1.99 from t distribution with n-2 = 76 d.f.• Confidence interval does not contain zero, which we

expected from our hypothesis test

Page 18: Testing Relationships  between Variables

July 1, 2008 Lecture 17 - Regression Testing 18

Next Class - Lecture 18

• Review of course material