Day 10 prediction and regression
-
Upload
elih-sutisna-yanto -
Category
Education
-
view
473 -
download
4
Transcript of Day 10 prediction and regression
![Page 1: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/1.jpg)
Thursday, December 18, 20141
![Page 2: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/2.jpg)
The Slides discuss• The extending of the concept of correlation in day 9 and
show how it cat be used in prediction. The statistical test that is introduced in this slide is called regression.
• The process of using one variable to predict another when two are correlated.
• How to calculate and determine how accurate your prediction is going to be with the use of the standard error of estimate (SE)
• A numerical example to demonstrate and apply the concepts and terms. The statistical and practical significance of the results are also explained and discussed.
![Page 3: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/3.jpg)
Three kinds of relationships between variables
• Association or Correlation or Covary– Both variables tend to be high or low (positive relationship)
or one tends to be high when the other is low (negative relationship). Variables do not have independent & dependent roles.
• Prediction– Variables are assigned independent and dependent roles.
Both variables are observed. There is a weak causal implication that the independent predictor variable is the cause and the dependent variable is the effect.
• Causal– Variables are assigned independent and dependent roles.
The independent variable is manipulated and the dependent variable is observed. Strong causal statements are allowed.
![Page 4: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/4.jpg)
Regression AnalysesRegression Analyses
Regression: technique concerned with predicting some variables by knowing others
The process of predicting variable Y using variable X or the process of using one variable to predict another when the two are correlated.It makes sense to expect that the higher the correlation between the variables, the more accurate the prediction.
![Page 5: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/5.jpg)
RegressionRegression
Uses a variable (x) to predict some outcome Uses a variable (x) to predict some outcome variable (y)variable (y)
Tells you how values in y change as a function of Tells you how values in y change as a function of changes in values of xchanges in values of x
![Page 6: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/6.jpg)
Examples of PredictionExamples of Prediction
When we hear thunder and see lightning, we When we hear thunder and see lightning, we often predict they will be followed by rain.often predict they will be followed by rain.
We also might predict the relationship between We also might predict the relationship between the day of the week and the expected crowd at the day of the week and the expected crowd at the movie theatre.the movie theatre.
We might predict that a bright elementary We might predict that a bright elementary school student will do well in high school.school student will do well in high school.
A student who is having difficulties on the A student who is having difficulties on the midterm examination is probably going to get a midterm examination is probably going to get a low grade on the final examination.low grade on the final examination.
![Page 7: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/7.jpg)
Correlation and RegressionCorrelation and Regression
Correlation describes the strength of a Correlation describes the strength of a linear relationship between two variables
Linear means “straight line”
Regression tells us how to draw the straight line described by the correlation
![Page 8: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/8.jpg)
Simple RegressionSimple Regression Prediction is based on the assumption that when Prediction is based on the assumption that when
two variables are correlated, we can use one of two variables are correlated, we can use one of them to predict the other.them to predict the other.
The variable used as a predictor is the independent variable (X). The predicted variable (Y) is called the criterion variable or dependent variable.
The technique used for prediction is called regression.
When only one variable is used to predict another, the procedure is called simple regression, and when two or more variables are used as predictors, the procedure is called multiple regression.
![Page 9: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/9.jpg)
The Formula Used in Simple The Formula Used in Simple RegressionRegression
1.
2.
3.
![Page 10: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/10.jpg)
An Example of Simple Regression An Example of Simple Regression
Ms. Wright, an eight-grade language arts teacher, want to know whether she could use a practice test she constructed to predict the scores of her students on the state-mandated end-of year language arts test. The teacher hypothesizes that the practice test administered at the beginning of the second semester is good predictor of the state-mandated test. Thus, she might want to administer the practice test to her students, then use the test result to design early intervention and remediation programs for students who are expected to do poorly on the state-mandated test. To ascertain whether the practice test is a good predictor of the state-mandated test, the teacher uses the scores from the practice test (the predictor, or independent variable) and the scores from the state-mandated test (the criterion, or dependent variable) from her last year’s students to generate the regression equation. Since the state-mandated language arts test is scored on a scale of 1 to 50, the teacher has designed her test to use the same scale.
![Page 11: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/11.jpg)
TasksTasks
![Page 12: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/12.jpg)
Data SPSS OutcomeData SPSS OutcomeDescriptive Statistics
Mean Std. Deviation N
State-mandated test 41.60 6.883 10
Practice test 41.80 7.843 10
Descriptive Statistics
Mean Std. Deviation N
State-mandated test 41.60 6.883 10
Practice test 41.80 7.843 10
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
1 (Constant) 13.572 8.500 1.597 .149
Practice test .671 .200 .764 3.349 .010
a. Dependent Variable: State-mandated test
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
1 (Constant) 13.572 8.500 1.597 .149
Practice test .671 .200 .764 3.349 .010
a. Dependent Variable: State-mandated test
Model Summary
Model
Change Statistics
R Square Change
F
Change df1 df2 Sig. F Change
1 .584 11.218 1 8 .010
Model Summary
Model
Change Statistics
R Square Change
F
Change df1 df2 Sig. F Change
1 .584 11.218 1 8 .010
![Page 13: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/13.jpg)
TaskTask1. Research Question: Does Practice test scores influence State-mandated test scores?1. Research Question: Does Practice test scores influence State-mandated test scores?
2. HypothesesHo: ß=0 : Practice test scores does not influence state-mandated test scores.HA: ß≠0 : Practice test scores influence state-mandated test scores
2. HypothesesHo: ß=0 : Practice test scores does not influence state-mandated test scores.HA: ß≠0 : Practice test scores influence state-mandated test scores
![Page 14: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/14.jpg)
Student Practice Test (X) State Test (Y)
ABCDEFGHIJ
45454650354723464041
40463749315032484439
Mean
SD
By Using the Pearson Product Moment Correlation coefficient The teachers finds that the correlation between the two test is rxy= 0.764 .Next the teacher computes the b coefficient, followed by the computation of the value of a as followed.
![Page 15: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/15.jpg)
After finding the values of b (the slope) and a (the intercept), they can be entered into the regression equation.After finding the values of b (the slope) and a (the intercept), they can be entered into the regression equation.
![Page 16: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/16.jpg)
Now, after administering the practice test to her students, the teacher can use the equation to predict their scores on the state-administered language arts test. For example, the teacher can predict that a student with a practice text (X) score of 30 is expected to have a score of 33.73 on the state test :
Now, after administering the practice test to her students, the teacher can use the equation to predict their scores on the state-administered language arts test. For example, the teacher can predict that a student with a practice text (X) score of 30 is expected to have a score of 33.73 on the state test :
![Page 17: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/17.jpg)
Of course, using this equation to predict the scores of new students on the state mandated language arts test is predicted on the assumption that the new students taking the practice test are similar to those whose scores were used to derive the regression equation.
Of course, using this equation to predict the scores of new students on the state mandated language arts test is predicted on the assumption that the new students taking the practice test are similar to those whose scores were used to derive the regression equation.
Using the equation above, we found that the standard error of estimate for the data in the table above is 4.4Using the equation above, we found that the standard error of estimate for the data in the table above is 4.4
This means that for each student, on the average, the teacher is likely to overestimate or underestimate the state-mandated language arts score by close to 4.5 points. For example, for students whose Y’ score about 42, about 68 percent the time the actual Y score will lie within 4.44 above or below the Y’score (i.e., between approximately 37.5 and 46.5)
This means that for each student, on the average, the teacher is likely to overestimate or underestimate the state-mandated language arts score by close to 4.5 points. For example, for students whose Y’ score about 42, about 68 percent the time the actual Y score will lie within 4.44 above or below the Y’score (i.e., between approximately 37.5 and 46.5)
![Page 18: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/18.jpg)
The coefficient of Determination r2The coefficient of Determination r2
The teacher found that the correlation between the two test was rxy=0.764. To find the coefficient of determination we need to square the correlation (rxy2)With rxy=0.764, the coefficient of determination is 0.7642= 0.584 (or 58%).
The teacher found that the correlation between the two test was rxy=0.764. To find the coefficient of determination we need to square the correlation (rxy2)With rxy=0.764, the coefficient of determination is 0.7642= 0.584 (or 58%).
This coefficient means that about 58 percent of the variation in performance on the state test (Y) can be accounted for by individual differences in performance on the practice test (X); 42 percent of the variation is due to other factors. In other words, 58 percent of the total variation on the state test (Y) can be explained by the linear relationship between X(practice test) and Y(state-mandated test) as described by the regression equation. The other 42 percent of the total variation in Y (state-mandated test) remain unexplained.
This coefficient means that about 58 percent of the variation in performance on the state test (Y) can be accounted for by individual differences in performance on the practice test (X); 42 percent of the variation is due to other factors. In other words, 58 percent of the total variation on the state test (Y) can be explained by the linear relationship between X(practice test) and Y(state-mandated test) as described by the regression equation. The other 42 percent of the total variation in Y (state-mandated test) remain unexplained.
The coefficient of determination (r2) can be used to describe the relationship between the variables. In our sample, the language arts teacher used her own practice test to predict her students’ scores on the end-of-year state-mandated language arts test.
The coefficient of determination (r2) can be used to describe the relationship between the variables. In our sample, the language arts teacher used her own practice test to predict her students’ scores on the end-of-year state-mandated language arts test.
![Page 19: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/19.jpg)
Graphing the Regression EquationGraphing the Regression Equation
Figure : A regression line for predicting scores of ten students on the state-mandated test using the practice test scores as a predictor
Figure : A regression line for predicting scores of ten students on the state-mandated test using the practice test scores as a predictor
![Page 20: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/20.jpg)
Regression Equation
Regression equation describes the regression line mathematically Intercept Slope
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120Wt (kg)
SBP(mmHg)
![Page 21: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/21.jpg)
Linear EquationsLinear Equations
Y
Y = b X + a
a = Y - i n t e r c e p t
X
C h a n g ei n Y
C h a n g e i n X
b = S l o p e
bXay +=
![Page 22: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/22.jpg)
Task Do in group of 5-6. An English teacher in an Islamic Boarding School wants to know whether the students’ grammar scores influence the students’ speaking ability.
Task Do in group of 5-6. An English teacher in an Islamic Boarding School wants to know whether the students’ grammar scores influence the students’ speaking ability.
No Students’ grammar scores (X) Students’ speaking ability (Y)
1 70 80
2 75 90
3 60 70
4 55 65
5 45 65
6 65 75
7 65 70
8 65 80
9 45 60
10 60 65
11 55 60
12 75 70
13 80 75
14 70 60
15 75 80
16 85 85
17 70 75
18 60 65
19 60 75
20 55 80
![Page 23: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/23.jpg)
TasksTasks
![Page 24: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/24.jpg)
Hours studying and gradesHours studying and grades
![Page 25: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/25.jpg)
Regressing grades on hours grades on hours
Linear Regression
2.00 4.00 6.00 8.00 10.00
Number of hours spent studying
70.00
80.00
90.00
Fin
al g
rad
e in
co
urs
e
Final grade in course = 59.95 + 3.17 * studyR-Square = 0.88
Predicted final grade in class =
59.95 + 3.17*(number of hours you study per week)
![Page 26: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/26.jpg)
Predict the final grade ofPredict the final grade of……
Someone who studies for 12 hours Final grade = 59.95 + (3.17*12) Final grade = 97.99
Someone who studies for 1 hour: Final grade = 59.95 + (3.17*1) Final grade = 63.12
Predicted final grade in class = 59.95 + 3.17*(hours of study)
![Page 27: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/27.jpg)
An additional way toInterpret Pearson r
• Coefficient of Determination– r2
– The proportion of the variability of Y accounted for by X
Variability of YThis area of overlap
represents the proportion of variability of Y accounted
for by X (value is expressed as a %)
X
![Page 28: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/28.jpg)
ExerciseExercise
A sample of 6 persons was selected the A sample of 6 persons was selected the value of their age ( x variable) and their value of their age ( x variable) and their weight is demonstrated in the following weight is demonstrated in the following table. Find the regression equation and table. Find the regression equation and what is the predicted weight when age is what is the predicted weight when age is 8.5 years8.5 years..
![Page 29: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/29.jpg)
Serial no. Age (x( Weight (y(
123456
768569
128
12101113
![Page 30: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/30.jpg)
AnswerAnswer
Serial no. Age (x( Weight (y( xy X2 Y2
123456
768569
128
12101113
8448965066
117
493664253681
14464
144100121169
Total 41 66 461 291 742
![Page 31: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/31.jpg)
6.836
41x == 11
6
66 ==y
92.0
6
)41(291
6
6641461
2=
−
×−=b
Regression equation
6.83)0.9(x11y (x) −+=
![Page 32: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/32.jpg)
0.92x4.675y (x) +=
12.50Kg8.5*0.924.675y (8.5) =+=
Kg58.117.5*0.924.675y (7.5) =+=
![Page 33: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/33.jpg)
11.411.611.8
1212.212.412.6
7 7.5 8 8.5 9
Age (in years)
Wei
ght (
in K
g)
we create a regression line by plotting two estimated values for y against their X component,
then extending the line right and left.
![Page 34: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/34.jpg)
Data SPSS Grammar test and Speaking test OutcomeData SPSS Grammar test and Speaking test Outcome
Descriptive Statistics
Mean Std. Deviation N
Students Speaking Scores 72.25 8.656 20
Students' grammar scores 64.50 10.748 20
Descriptive Statistics
Mean Std. Deviation N
Students Speaking Scores 72.25 8.656 20
Students' grammar scores 64.50 10.748 20
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .621a .385 .351 6.972
a. Predictors: (Constant), Students' grammar scores
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .621a .385 .351 6.972
a. Predictors: (Constant), Students' grammar scores
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
1 (Constant) 40.000 9.724 4.113 .001
Students' grammar scores .500 .149 .621 3.360 .003
a. Dependent Variable: Students Speaking Scores
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
B Std. Error Beta t Sig.
1 (Constant) 40.000 9.724 4.113 .001
Students' grammar scores .500 .149 .621 3.360 .003
a. Dependent Variable: Students Speaking Scores
![Page 35: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/35.jpg)
ReferencesReferences Main Sources
Coolidge, F. L.2000. Statistics: A gentle introduction. London: Sage.Kranzler, G & Moursund, J .1999. Statistics for the terrified. (2nd ed.). Upper Saddle River, NJ: Prentice Hall.Butler Christopher.1985. Statistics in Linguistics. Oxford: Basil Blackwell.Hatch Evelyn & Hossein Farhady.1982. Research design and Statistics for Applied Linguistics. Massachusetts: Newbury House Publishers, Inc.Ravid Ruth.2011. Practical Statistics for Educators, fourth Ed. New York: Rowman & Littlefield Publisher, Inc.Quirk Thomas. 2012. Excel 2010 for Educational and Psychological Statistics: A Guide to Solving Practical Problem. New York: Springer.
Other relevant sources
Agresi A, & B. Finlay.1986. Statistical methods for the social sciences. San Francisco, CA: Dellen Publishing Company.Bachman, L.F. 2004. Statistical Analysis for Language Assessment. New York: Cambridge University Press.Field, A. (2005). Discovering statistics using SPSS (2nd ed.). London: Sage. Moore, D. S. (2000). The basic practice of statistics (2nd ed.). New York: W. H. Freeman and Company.
Thursday, December 18, 2014
![Page 36: Day 10 prediction and regression](https://reader033.fdocuments.net/reader033/viewer/2022052413/55a201ec1a28ab4d268b4684/html5/thumbnails/36.jpg)