Presentation on Regression Analysis
-
Upload
j-p-verma -
Category
Data & Analytics
-
view
371 -
download
3
Transcript of Presentation on Regression Analysis
Presentation on Regression Analysis
Presentation on Chapter 9
Presented by
Dr.J.P.VermaMSc (Statistics), PhD, MA(Psychology), Masters(Computer Application)
Professor(Statistics)Lakshmibai National Institute of Physical Education,
Gwalior, India(Deemed University)
Email: [email protected]
2
What is Regression ?
To answer the questions like
Going back to the original
Why to use?
Can I predict the fat % on the basis of the skinfolds?
What will be the weight of the person if the height is 175 cms?
3
Purpose of Regression Analysis
Which has not occurred so far Which is difficult to measure in field situation Which should occur for a particular independent variable
To predict the phenomenon
4
Types of Regression Analysis
Simple Regression Multiple Regression
5
This Presentation is based on
Chapter 9 of the book
Sports Research with Analytical Solution Using SPSS
Published by Wiley, USA
Complete Presentation can be accessed on
Companion Website
of the Book
Request an Evaluation Copy For feedback write to [email protected]
6
Simple Regression
Developing Regression equation With
One Dependent and one Independent variable
7
Mechanism of prediction in regression analysis
Develop an equation of line between Y(dependent)
and X(independent) variablesy
x
cbxy
c
Height
Weight
8
Application of Regression in Physical Education
Predicting Obesity Coronary Heart
Disease Risk Body mass index Fitness status
Projection ofWinning MedalsEstimating performanceRuns scored
In Physical Education In Sports
Efficient prediction enhances success in sports
9
Procedure for Developing Regression Equations
Deviation methods Least Square methods
How to find the regression line?
Y=bX+c
10
Deviation Method for Constructing Regression Equation
Regression equation of Y on X
)XX(rYYx
y
YXrXrYx
y
x
y
cbXY
Regression equation of X on Y
)YY(rXXy
x
b = regression coefficient x
yr
c= slope YXr
x
y
…………(1)
…………(2)
Computing coefficients
11
Can these Two Equations be Same ?Yes if the slopes of the two equations are same
)xx(r)yy(x
y
)xx(r
)yy(x
y
------(1)
------(2)
After solving )xx(r)yy(x
y
------(3)
)yy(r)xx(y
x
------(4)
Equation (3) and (4) would be same if x
y
x
y
rr
r2 = 1 or r = 1
Implication If the relationship between two variables is either perfectly positive or perfectly negative one can be estimated with the help of others with 100% accuracy, which is rarely the case.
12
Precaution in Regression Analysis
But association is a necessary prerequisite for inferring causation
The independent variable must precede the dependent variable in time.
The dependent and independent variables must be plausibly lined by a theory
Regression focuses on association and not causation
13
Least Square Method for Constructing Regression Equation
Uses the concept of differential calculus
For n population points (x1,y1), (x2,y2), …….(xN , yN) an aggregate trend line can be obtained = β0 + β1xy
where : the estimated value of y
β0 : the population intercept (regression constant) β1 : the population slope(regression coefficient)
y
i yi = β0 + β1x +
For a particular score yi
Almost always Regression lines are developed on the basis of sample data hence these β0 and β1 are estimated by the sample slope b0 and intercept b1
14
Regression Line with Least Square Method
Infinite number of trend lines can be developed by changing the slope b1 and intercept b0
yi = b0 + b1xi + i
For n sample data the aggregate regression line = b0 + b1x y
y
x
0b
xbby 10
15
Regression Line with Least Square Method To find the best line so that the sum of
squared deviations is minimized What the issue is?
For a particular point (x1,y1) in the scattergram
y
x
i
To get the best line needs to be minimized
2i10i
2ii
2i
2 )xbby()yy(S
- A least square method
iy
xbby 10
0b
ii yy
yy ii
or
y1 = b0 + b1x1 +
1
2i
2S
16
What we do in Least Square method?
Find the values of slope(b0) and intercept (b1) for which the S2 is minimized
This is done by using the differential calculus
210i
2ii
2 )xbby()yy(S
0bS
0)xbby(2
n
1ii10i
1bS
n
1ii10ii 0)xbby(x2
Solving we get normal equations
n
1ii0
n
1ii1 ynbxb
n
1iii
n
1ii0
n
1i
2i1 yxxbxb
22
2
0 )x(xnxyxxyb 221
xxnyxxyn
b
xbby 10 - A line of best fit
17
Assumptions in Regression Analysis
Data must be parametric There is no outliers in the data Variables are normally distributed(if not try log, square root, square, and inverse
transformation The regression model is linear in nature The errors are independent (no autocorrelation) The error terms are normally distributed There is no multicollinearity The error has a constant variance(assumption of homoscedasticity)
18
Regression Analysis -An Illustration
Athletes data_________________
Height LBWin cms (in lbs)(x) (y)_________________191 162.5186 136191.5 163.5188 154190 149188.5 140.5193 157.3190.5 154.5189 151.5192 160.5_________________
19
SPSS Commands for Regression Analysis
After selecting variables
Click the tag Statistics on the screen Check the box of
R squared change Descriptive Part and partial correlations
Press Continue
Click the Method option and select any one of the following option
Stepwise Enter Forward Backward
Press O.K for output
Analyze Regression Linear
20
Different Methods Used in Regression Analysis
Variables selected in a particular stage is tested for its significance at every stage
Stepwise
All variables are selected for developing regression equationEnter
Variables once selected in a particular stage is retained in the model in subsequent stages
Forward
All variables are used to develop the regression model and then the variables are
dropped one by one depending upon their low predictability.
Backward
21
Output of Regression Analysis in SPSS
Model summary ANOVA table showing F-values for all the
models Regression coefficients and their significance
22
SPSS output – Model Summary
Model Summaryb
_________________________________________________
Model R R Square Adjusted R Std. Error of Square the Estimate
1 .816 .666 .624 5.56___________________________________________a. Predictors: (Constant), Heightb. Dependent Variable: Body Weight
23
SPSS output Testing Regression coefficients
Regression analysis output for the Body weight example ______________________________________________________________________
Unstandardized Standardized Coefficients Coefficients t Sig.
___________________________________ Model B Std. Error Beta ______________________________________________________________________
1 (Constant) - 517.047 167.719 -3.083 .015Height 3.527 .883 .816 3.995(Click) .004
______________________________________________________________________ Dependent Variable: Body weight R = 0.816 R2 = 0.666 Adjusted R2 =0.624
Look at the value of t computed in the last slide and in the SPSS
output Y(Weight) = -517.047 + 3.527 ×(Height)
24
SPSS Output - ANOVA table
F = t2 = 3.9952 = 15.96
In simple regression significance of regression coefficient and model are same
Significance of the model is tested by F value in ANOVA
ANOVA table___________________________________________________________________ Model Sum of Squaresdf Mean Square F Sig.___________________________________________________________________
2 Regression 494.203 1 494.203 15.959.004Residual 247.738 8 30.967Total 741.941 9
___________________________________________________________________ a. Predictors: (Constant), Height
Back to R2
25
Testing Assumptions of Regression Analysis with Residuals
Let’s See what the residual is?
26
Analyzing Residuals
Table Computation of residuals ____________________________________________Height Body weightin cms (in lbs)x y ___________________________________________
191 162.5 156.61 5.89
186 136 138.975 -2.975
191.5 163.5 158.3735 5.1265
188 154 146.029 7.971190 149 153.083 -4.083
188.5 140.5 147.7925 -7.2925
193 157.3 163.664 -6.364
190.5 154.5 154.8465 -0.3465189 151.5 149.556 1.944
192 160.5 160.137 0.363____________________________________________
yResiduals are estimates of experimental errors
For instance, for x= 188, = -517.047 + 3.527×188 = 146.029y
Maximum error: 7.97 1 lbs for height =188 cms Minimum error: 0.3465 lbs for height = 190.5 cms.
Worst case
Best case
Useful in identifying the outliers
yy
27
Residual Plot
184 186 188 190 192 194
Res
idua
ls
-10-8-6-4-202468
Height in cms
Residual plot for the data on lean body mass and height
Obtained by plotting an ordered pair of (xi, y- )
y
Useful in testing the assumptions in the regression analysis
28
How to test linearity of regression model?
Independent variable x
0R
esid
uals
o
ooo
ooo
o
o
o
o
o
oo
o
o
o
o
o
oo
o
o
oo
o oo
oo oo o
oo o
o
o
o
o
o oo
o
o
oooo
oo
o
ooo
o
ooo
ooo
oo
oo
o
oo
A curvilinear regression model
Residual plot
For low and high values of x the residuals are positives
And for middle value it is negative
29
How to test the Independence of errors?
oo
oo
o
o
oo o
oo
ooo
ooo
oo
o
oo
ooo o
oo
o
o oo
ooo
ooo ooo
oo o
o
o
oo oo
ooo o o o o
o oo ooo
oooo
o
x
0
Res
idua
ls
oo o
o oo
o
ooo
o
o
o oo
o ooo o
o oo o
o
oo
ooo o oooo o
oo oo
oooooooo o
o ooo
o oo
oo
Independent variable
Showing that the errors are related
No serial correlation should occur between a given error
term and itself over various time intervals
What is the pattern? : small positive residual occurs next to a small positive residual and a larger positive residual occurs next to the large positive residual
30
How to test the normality of error terms?
Normal Q-Q plot of the residuals
Error to be normally distributed all the points should be very close to the straight line
31
How to test that errors have constant variance - A test of homoscedasticity
Independent variable
o
ooo
ooo
oo
o
o
ooo
oo
o
o
o
oo
oo
oo o
oo
oo oo
oo oo
o
o
oo
o oo
oo
oo
oo
ooo
oo
oo
oo
o ooo oo
oo
o o
oo o
ooo o o
oooo
ooooo
ooo o
oooo
ooooo o
o ooo oo
ooo ooo o
oo o o o
o oo ooo
o
oo
o
o
o
oooo
oooo
o
x
0R
esid
uals
o oo oo o
oo o
o o o
o oo o o oo ooo
o
Showing unequal error variance
For homoscedasticity assumption to holds true
variations among the error terms should be similar at different points of x.
Back
32
Healthy residual Plot
The regression model is linear in nature The errors are independent The error terms are normally distributed The error has a constant variance
Holds all the assumptions of regression analysis
Independent variable
oo
oo
ooo
oo o
o
ooo
o
oo o
o
oo
oo
oo o
o
o
ooo
oo
o ooo
o
oo
ooo
oo
oo ooo
oooo
o
oo
o ooo ooo o
o o
o
ooo o o
oooo
oooo
oo
oo o
oooo
oooo o
oo o
oo oo
ooo ooo o
oo o o o o oo o
oo o
o
oo
oo
o
o
o oo
o
o
o
x
0
Res
idua
ls
o oo oo o
oo ooo
oo oo
o
o oo
o o oo o o o
o oooo o o oo o
oo
oooo
oo
oo oo o
o o oo o
o o oo ooo oo
o o o
o oooooo
oo
oo
oo o
oo
oo
oo o
Figure 6.9 Healthy residual plot
33
Different ways of testing a regression model
Analyzing residuals Residual Plot Standard error of estimate Testing significance of slopes Testing the significance of overall model coefficient of determination(R2)
34
To buy the book
Sports Research With Analytical Solutions Using SPSS
and all associated presentations click Here
Complete presentation is available on companion website of the book
For feedback write to [email protected] an Evaluation Copy