Post on 05-Apr-2020
§15.3–Estimation and Prediction
Tom Lewis
Fall Term 2009
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 1 / 10
Outline
1 Estimating the conditional mean of the response variable
2 Estimating the observed value of the response variable
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 2 / 10
Estimating the conditional mean of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of β1xp + β0?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to β1xp + β0. This value is called theconditional mean of the response variable (at x = xp).
To understand how good an approximation yp gives, we need tounderstand its variability.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 3 / 10
Estimating the conditional mean of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of β1xp + β0?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to β1xp + β0. This value is called theconditional mean of the response variable (at x = xp).
To understand how good an approximation yp gives, we need tounderstand its variability.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 3 / 10
Estimating the conditional mean of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of β1xp + β0?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to β1xp + β0. This value is called theconditional mean of the response variable (at x = xp).
To understand how good an approximation yp gives, we need tounderstand its variability.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 3 / 10
Estimating the conditional mean of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of β1xp + β0?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to β1xp + β0. This value is called theconditional mean of the response variable (at x = xp).
To understand how good an approximation yp gives, we need tounderstand its variability.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 3 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable yp is normally distributed.
The mean of yp is β1xp + β0
The standard deviation of yp is
σ
√1
n+
(xp − x)2
Sxx
In other words
z =yp − (β1xp + β0)
σ√
1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 4 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable yp is normally distributed.
The mean of yp is β1xp + β0
The standard deviation of yp is
σ
√1
n+
(xp − x)2
Sxx
In other words
z =yp − (β1xp + β0)
σ√
1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 4 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable yp is normally distributed.
The mean of yp is β1xp + β0
The standard deviation of yp is
σ
√1
n+
(xp − x)2
Sxx
In other words
z =yp − (β1xp + β0)
σ√
1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 4 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable yp is normally distributed.
The mean of yp is β1xp + β0
The standard deviation of yp is
σ
√1
n+
(xp − x)2
Sxx
In other words
z =yp − (β1xp + β0)
σ√
1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 4 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable yp is normally distributed.
The mean of yp is β1xp + β0
The standard deviation of yp is
σ
√1
n+
(xp − x)2
Sxx
In other words
z =yp − (β1xp + β0)
σ√
1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 4 / 10
Estimating the conditional mean of the response variable
Theorem
The random variable
t =yp − (β1xp + β0)
se
√1n +
(xp−x)2
Sxx
has a t-distribution with n − 2 degrees of freedom.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 5 / 10
Estimating the conditional mean of the response variable
Problem
The data in the set LinModSec3.txt was created according to the modely = −2x + 30 + ε, where ε is a standard normal random variable. Here aresome summary statistics:
Sxx Syy Sxy b1 b0
82.5 338.536 -164.6 -1.9952 30.093
therefore
SST SSR SSE
338.536 328.4019394 10.13406061
Create a 95% confidence interval for the conditional mean of the responsevariable at xp = 8.5.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 6 / 10
Estimating the observed value of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of yp, the observed value of the response variable?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to yp.
To understand how good an approximation yp gives, we need tounderstand the variability of yp − yp.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 7 / 10
Estimating the observed value of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of yp, the observed value of the response variable?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to yp.
To understand how good an approximation yp gives, we need tounderstand the variability of yp − yp.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 7 / 10
Estimating the observed value of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of yp, the observed value of the response variable?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to yp.
To understand how good an approximation yp gives, we need tounderstand the variability of yp − yp.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 7 / 10
Estimating the observed value of the response variable
The problem
We will assume that y = β1x + β0 + ε satisfies the assumptions of theregression model and that the standard deviation of ε is σ. Given a fixedvalue xp of the explanatory variable, what is our best estimate for thevalue of yp, the observed value of the response variable?
The basic idea
Given a sample {x1, x2, . . . , xn} of explanatory variables, we canconstruct the regression equation y = b1x + b0. The value
yp = b1xp + b0
should be a close approximation to yp.
To understand how good an approximation yp gives, we need tounderstand the variability of yp − yp.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 7 / 10
Estimating the observed value of the response variable
Theorem
The random variable yp − yp is normally distributed.
The mean of yp − yp is 0.
The standard deviation of yp − yp is
σ
√1 +
1
n+
(xp − x)2
Sxx
In other words
z =yp − yp
σ√
1 + 1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 8 / 10
Estimating the observed value of the response variable
Theorem
The random variable yp − yp is normally distributed.
The mean of yp − yp is 0.
The standard deviation of yp − yp is
σ
√1 +
1
n+
(xp − x)2
Sxx
In other words
z =yp − yp
σ√
1 + 1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 8 / 10
Estimating the observed value of the response variable
Theorem
The random variable yp − yp is normally distributed.
The mean of yp − yp is 0.
The standard deviation of yp − yp is
σ
√1 +
1
n+
(xp − x)2
Sxx
In other words
z =yp − yp
σ√
1 + 1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 8 / 10
Estimating the observed value of the response variable
Theorem
The random variable yp − yp is normally distributed.
The mean of yp − yp is 0.
The standard deviation of yp − yp is
σ
√1 +
1
n+
(xp − x)2
Sxx
In other words
z =yp − yp
σ√
1 + 1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 8 / 10
Estimating the observed value of the response variable
Theorem
The random variable yp − yp is normally distributed.
The mean of yp − yp is 0.
The standard deviation of yp − yp is
σ
√1 +
1
n+
(xp − x)2
Sxx
In other words
z =yp − yp
σ√
1 + 1n +
(xp−x)2
Sxx
is a standard normal random variable.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 8 / 10
Estimating the observed value of the response variable
Theorem
The random variable
t =yp − yp
se
√1 + 1
n +(xp−x)2
Sxx
has a t-distribution with n − 2 degrees of freedom.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 9 / 10
Estimating the observed value of the response variable
Problem
The data in the set LinModSec3.txt was created according to the modely = −2x + 30 + ε, where ε is a standard normal random variable. Here aresome summary statistics:
Sxx Syy Sxy b1 b0
82.5 338.536 -164.6 -1.9952 30.093
therefore
SST SSR SSE
338.536 328.4019394 10.13406061
Create a 95% confidence interval for the observed value of the responsevariable at xp = 8.5.
Tom Lewis () §15.3–Estimation and Prediction Fall Term 2009 10 / 10