Chapter 1: Linear Regression with One Predictor Variable ...
Transcript of Chapter 1: Linear Regression with One Predictor Variable ...
BSTT523: Kutner et al., Chapter 1 1
Chapter 1: Linear Regression with One Predictor Variable
also known as: Simple Linear Regression
Bivariate Linear Regression
Introduction:
· Functional relation between two variables:
𝑌 = 𝑓(𝑋)
Value of X ⇒ Value of Y
Example: °F = 32° + (9/5)°C
is a deterministic relationship
1 value of X ⇒ 1 unique value of Y
· Statistical relation between two variables:
1 value of X ⇒ a distribution of values of Y
Y = Dependent / Response / Outcome Variable
X = Independent / Explanatory / Predictor Variable
BSTT523: Kutner et al., Chapter 1 2
Linear Equation: General equation for a straight line
𝑌 = 𝑏0 + 𝑏1𝑋
𝑏0: Intercept = value of Y when X=0
𝑏1: Slope = change in Y per unit change in X
𝑏1 =𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑌−𝑣𝑎𝑙𝑢𝑒
𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑋−𝑣𝑎𝑙𝑢𝑒="𝑟𝑖𝑠𝑒"
"𝑟𝑢𝑛"
What if X increases by 1 unit?
𝑌 = 𝑏0 + 𝑏1(𝑋 + 1) = {𝑏0 + 𝑏1𝑋} + 𝑏1
Y increases by 𝑏1 units
{ Other than linear: e.g. curvilinear 𝑌 = 𝑏0 + 𝑏1𝑋 + 𝑏2𝑋2 }
Regression of Y on X
· Observe data points {(𝑋1, 𝑌1), . . . , (𝑋𝑛, 𝑌𝑛)}
· At each point 𝑋𝑖 there is a distribution of 𝑌𝑖’s
BSTT523: Kutner et al., Chapter 1 3
Example: Y = head circumference (cm), X = gestational age (wks)
in a sample of 100 low birth weight infants
Qs: Does average head circumference change with gestational age?
What is the form of the relationship? (linear? curvilinear?)
How to estimate the relationship, given the data?
How to make predictions for new observations?
15
20
25
30
35
40
20 22 24 26 28 30 32 34 36
Y =
He
ad C
ircu
mfe
ren
ce (
cm)
X = Gestational Age (weeks)
BSTT523: Kutner et al., Chapter 1 4
Descriptive Data:
· Scatterplot
· Correlation Coefficients
Some examples:
BSTT523: Kutner et al., Chapter 1 5
Population Correlation Coefficient:
Random Variables X and Y with parameters 𝜇𝑋, 𝜇𝑌, 𝜎𝑋2, 𝜎𝑌
2
𝜌 =𝐶𝑜𝑣(𝑋,𝑌)
𝜎𝑋𝜎𝑌=𝐸[(𝑋−𝜇𝑋)(𝑌−𝜇𝑌)]
𝜎𝑋𝜎𝑌 , −1 ≤ 𝜌 ≤ +1
𝜌 measures the direction and strength of linear association
between X and Y
Maximum likelihood estimator of 𝜌 is
the Pearson Correlation Coefficient:
𝑟 =∑(𝑋𝑖−𝑋)(𝑌𝑖−𝑌)
√∑(𝑋𝑖−𝑋)2∑(𝑌𝑖−𝑌)
2=∑(𝑋𝑖−𝑋)(𝑌𝑖−𝑌)
(𝑛−1)𝑠𝑋𝑠𝑌
Inference on 𝜌:
If X and Y are both Normal,
𝐻0: 𝜌 = 0 vs. 𝐻𝑎: 𝜌 ≠ 0
𝑇 =𝑟√𝑛−2
√1−𝑟2 ~ 𝑡(𝑛−2) under 𝐻0: 𝜌 = 0
critical value = ±𝑡(𝑛−2,𝛼 2⁄ )
BSTT523: Kutner et al., Chapter 1 6
Spearman Rank Correlation Coefficient:
For X or Y non-Normal
𝑅𝑋𝑖= rank of 𝑋𝑖
𝑅𝑌𝑖= rank of 𝑌𝑖
𝑅𝑋 = 𝑅𝑌 =𝑛+1
2 means of ranks 𝑅𝑋𝑖 or 𝑅𝑌𝑖
Spearman rank correlation coefficient is
𝑟𝑠 =∑(𝑅𝑋𝑖−𝑅𝑋)(𝑅𝑌𝑖−𝑅𝑌)
√∑(𝑅𝑋𝑖−𝑅𝑋)2∑(𝑅𝑌𝑖−𝑅𝑌)
2 , −1 ≤ 𝑟𝑠 ≤ +1
𝐻0: There is no association between X and Y
𝐻𝑎: There is association between X and Y
𝑇 =𝑟𝑠√𝑛−2
√1−𝑟𝑠2 ~ 𝑡(𝑛−2) under 𝐻0
critical value = ±𝑡(𝑛−2,𝛼 2⁄ )
BSTT523: Kutner et al., Chapter 1 7
The Simple Linear Regression Model
𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖 , i = 1, . . . ,n observations
𝑌𝑖 value of response for ith observation
𝑋𝑖 value of predictor for ith observation
Population parameters (unknown):
𝛽0 Population intercept
𝛽1 Population regression coefficient
𝜀𝑖 is ith random error term
Mean: 𝐸(𝜀𝑖) = 0
Variance: 𝑉𝑎𝑟(𝜀𝑖) = 𝜎2
Independence: 𝜀𝑖 and 𝜀𝑗 are uncorrelated for 𝑖 ≠ 𝑗
Normality: 𝜀𝑖~𝑁(0, 𝜎2) 𝑖. 𝑖. 𝑑. for all i
⇒ 𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖⏟ 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡
+ 𝜀𝑖⏟𝑅𝑎𝑛𝑑𝑜𝑚,𝑖.𝑖.𝑑.𝑁(0,𝜎2)
⇒ 𝐸(𝑌𝑖) = 𝐸(𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖) = 𝛽0 + 𝛽1𝑋𝑖
𝑉𝑎𝑟(𝑌𝑖) = 𝑉𝑎𝑟(𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖) = 𝑉𝑎𝑟(𝜀𝑖) = 𝜎2
⇒ 𝑌𝑖 ~ 𝑁(𝜇𝑌, 𝜎2) 𝑖. 𝑖. 𝑑.
where 𝜇𝑌 = 𝛽0 + 𝛽1𝑋𝑖
BSTT523: Kutner et al., Chapter 1 8
How to obtain �̂�0 and �̂�1, estimates for 𝛽0 and 𝛽1?
Least Squares Estimators (LSE):
LSEs minimize the sum of squared deviations of 𝑌𝑖 from 𝐸(𝑌𝑖)
Least Squares Criterion:
𝑄 = ∑ [𝑌𝑖 − 𝐸(𝑌𝑖)]2𝑛
𝑖=1 = ∑ [𝑌𝑖 − (𝛽0 + 𝛽1𝑋𝑖)]2𝑛
𝑖=1
Minimize Q: set first derivatives w.r.t. each parameter = 0
First derivatives are:
𝜕𝑄
𝜕𝛽0= −2∑(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖) (1)
𝜕𝑄
𝜕𝛽1= −2∑𝑋𝑖(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖) (2)
Normal Equations: set (1)=0 and (2)=0; call solutions �̂�0 and �̂�1
−2∑(𝑌𝑖 − �̂�0 − �̂�1𝑋𝑖) = 0
−2∑𝑋𝑖(𝑌𝑖 − �̂�0 − �̂�1𝑋𝑖) = 0
⇒
∑𝑌𝑖 = 𝑛�̂�0 + �̂�1∑𝑋𝑖
∑𝑋𝑖𝑌𝑖 = �̂�0∑𝑋𝑖 + �̂�1∑𝑋𝑖2
BSTT523: Kutner et al., Chapter 1 9
Solution to Normal Equations:
Least Squares Estimators (LSE):
�̂�𝟏 =∑(𝑿𝒊−𝑿)(𝒀𝒊−𝒀)
∑(𝑿𝒊−𝑿)𝟐
�̂�𝟎 = 𝒀 − �̂�𝟏𝑿
Properties of LSE:
Unbiased estimators (accuracy)
𝐸(�̂�0) = 𝛽0 , 𝐸(�̂�1) = 𝛽1
Minimum variance (precision)
Robust against Normality assumption
Note:
functions are called “estimators”
calculated values from data are called “estimates”
Interpretation:
Intercept (�̂�0) Value of Y when X=0
(not always meaningful!)
Slope (�̂�1) Average change in Y per unit increase in X
“Effect” of X on Y; “regression coefficient”
BSTT523: Kutner et al., Chapter 1 10
Example: X = gestational age (wks), Y = head circumference (cms)
Formula for least squares regression line is:
Y = 3.91 + 0.78X
Intercept: not meaningful! (extrapolation to X = 0 weeks)
Slope: For every increase of one week gestational age,
there is an increase of about 0.78 cm head circumference.
15
20
25
30
35
40
20 22 24 26 28 30 32 34 36
Y =
He
ad C
ircu
mfe
ren
ce (
cm)
X = Gestational Age (weeks)
BSTT523: Kutner et al., Chapter 1 11
Another approach:
Method of Maximum Likelihood
The MLE maximizes the likelihood function
(the likelihood of the observed data, given the model parameters)
Q. Under which parameter values is the sample data
most likely to occur?
[see explanation of MLE on p.27-29]
For simple linear regression:
𝜀𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖 ~ 𝑁(0, 𝜎2)
⇒ 𝑓(𝜀𝑖) =1
√2𝜋𝜎𝑒𝑥𝑝 {−
1
2𝜎2(𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
2}
⇒ likelihood = 𝐿 = ∏ 𝑓(𝜀𝑖)𝑛𝑖=1
⇒ 𝑙𝑜𝑔𝑒𝐿 = 𝑙𝑛 {1
(2𝜋𝜎2)𝑛 2⁄ 𝑒𝑥𝑝 [−1
2𝜎2∑ (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
2𝑛𝑖=1 ]}
= −𝑛
2𝑙𝑛(2𝜋𝜎2) −
1
2𝜎2∑ (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)
2𝑛𝑖=1
𝜕𝑙𝑜𝑔𝑒𝐿
𝜕𝛽0= 0 ,
𝜕𝑙𝑜𝑔𝑒𝐿
𝜕𝛽1= 0 ,
𝜕𝑙𝑜𝑔𝑒𝐿
𝜕𝜎= 0
⇒ same solution as LSE (please prove for yourself!)
same nice properties
BSTT523: Kutner et al., Chapter 1 12
After calculating the fitted regression line:
Fitted value �̂�𝒊 �̂�𝑖 = �̂�0 + �̂�1𝑋𝑖
On the fitted line for the value 𝑋𝑖
Fitted Y- values are estimates of the Mean Response Function
�̂�𝑖 is an unbiased estimator of the mean response at 𝑋𝑖
The fitted line is an unbiased estimator of the mean response
function
Note: the point (𝑋, 𝑌) is ALWAYS on the fitted regression line,
i.e,
𝑌 = �̂�0 + �̂�1𝑋
BSTT523: Kutner et al., Chapter 1 13
The ith residual 𝑒𝑖:
𝑒𝑖 = 𝑌𝑖 − �̂�𝑖 = 𝑌𝑖 − (�̂�0 + �̂�1𝑋𝑖)
· it is the vertical distance between (𝑋𝑖 , 𝑌𝑖) and (𝑋𝑖 , �̂�𝑖)
· it is the estimate of the ith error term, 𝑒𝑖 = 𝜀�̂�
· ∑ 𝑒𝑖 = 0𝑛𝑖=1
Proof:
∑𝑒𝑖 = ∑[𝑌𝑖 − (�̂�0 + �̂�1𝑋𝑖)]
= ∑𝑌𝑖 − 𝑛�̂�0 − �̂�1∑𝑋𝑖
= 0 (by normal equation 1)
BSTT523: Kutner et al., Chapter 1 14
Error Sum of Squares:
𝑆𝑆𝐸 = ∑ (𝑌𝑖 − �̂�𝑖)2𝑛
𝑖=1 = ∑ 𝑒𝑖2𝑛
𝑖=1
minimum when residuals are from LSE or MLE.
associated degrees of freedom 𝑑𝑓 = 𝑛 − 2
(generally 𝑑𝑓 = 𝑛 − 𝑝 where p = # of parameters in the model)
Mean Squared Error: unbiased estimator of 𝜎2
𝑀𝑆𝐸 =𝑆𝑆𝐸
𝑑𝑓=
𝑆𝑆𝐸
𝑛−2 𝐸(𝑀𝑆𝐸) = 𝜎2
BSTT523: Kutner et al., Chapter 1 15
Example: X = gestational age and Y = head circumference
100 observations
scatterplot, fitted line, fitted values
15
20
25
30
35
40
20 22 24 26 28 30 32 34 36
Y =
He
ad C
ircu
mfe
ren
ce (
cm)
X = Gestational Age (weeks)
BSTT523: Kutner et al., Chapter 1 16
EXCEL: SUMMARY OUTPUT
Regression Statistics Multiple R 0.780691936 R Square 0.609479899 Adjusted R
Square 0.605495 Standard
Error 1.590413353 Observations 100
ANOVA df SS MS F Significance F
Regression 1 386.8673658 386.8674 152.9474 1.00121E-21 Residual 98 247.8826342 2.529415
Total 99 634.75
Coefficients Standard
Error t Stat P-value Intercept 3.914264144 1.82914689 2.13994 0.034842 X Variable 1 0.780053162 0.063074406 12.36719 1E-21
BSTT523: Kutner et al., Chapter 1 17
SAS output:
The REG Procedure
Model: MODEL1
Dependent Variable: headcirc
Number of Observations Read 100
Number of Observations Used 100
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 386.86737 386.86737 152.95 <.0001
Error 98 247.88263 2.52941
Corrected Total 99 634.75000
Root MSE 1.59041 R-Square 0.6095
Dependent Mean 26.45000 Adj R-Sq 0.6055
Coeff Var 6.01290
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 3.91426 1.82915 2.14 0.0348
gestage 1 0.78005 0.06307 12.37 <.0001