Presentation on Regression Analysis

Post on 15-Apr-2017

371 views 3 download

Transcript of Presentation on Regression Analysis

Presentation on Regression Analysis

Presentation on Chapter 9

Presented by

Dr.J.P.VermaMSc (Statistics), PhD, MA(Psychology), Masters(Computer Application)

Professor(Statistics)Lakshmibai National Institute of Physical Education,

Gwalior, India(Deemed University)

Email: vermajprakash@gmail.com

2

What is Regression ?

To answer the questions like

Going back to the original

Why to use?

Can I predict the fat % on the basis of the skinfolds?

What will be the weight of the person if the height is 175 cms?

3

Purpose of Regression Analysis

Which has not occurred so far Which is difficult to measure in field situation Which should occur for a particular independent variable

To predict the phenomenon

4

Types of Regression Analysis

Simple Regression Multiple Regression

5

This Presentation is based on

Chapter 9 of the book

Sports Research with Analytical Solution Using SPSS

Published by Wiley, USA

Complete Presentation can be accessed on

Companion Website

of the Book

Request an Evaluation Copy For feedback write to vermajprakash@gmail.com

6

Simple Regression

Developing Regression equation With

One Dependent and one Independent variable

7

Mechanism of prediction in regression analysis

Develop an equation of line between Y(dependent)

and X(independent) variablesy

x

cbxy

c

Height

Weight

8

Application of Regression in Physical Education

Predicting Obesity Coronary Heart

Disease Risk Body mass index Fitness status

Projection ofWinning MedalsEstimating performanceRuns scored

In Physical Education In Sports

Efficient prediction enhances success in sports

9

Procedure for Developing Regression Equations

Deviation methods Least Square methods

How to find the regression line?

Y=bX+c

10

Deviation Method for Constructing Regression Equation

Regression equation of Y on X

)XX(rYYx

y

YXrXrYx

y

x

y

cbXY

Regression equation of X on Y

)YY(rXXy

x

b = regression coefficient x

yr

c= slope YXr

x

y

…………(1)

…………(2)

Computing coefficients

11

Can these Two Equations be Same ?Yes if the slopes of the two equations are same

)xx(r)yy(x

y

)xx(r

)yy(x

y

------(1)

------(2)

After solving )xx(r)yy(x

y

------(3)

)yy(r)xx(y

x

------(4)

Equation (3) and (4) would be same if x

y

x

y

rr

r2 = 1 or r = 1

Implication If the relationship between two variables is either perfectly positive or perfectly negative one can be estimated with the help of others with 100% accuracy, which is rarely the case.

12

Precaution in Regression Analysis

But association is a necessary prerequisite for inferring causation

The independent variable must precede the dependent variable in time.

The dependent and independent variables must be plausibly lined by a theory

Regression focuses on association and not causation

13

Least Square Method for Constructing Regression Equation

Uses the concept of differential calculus

For n population points (x1,y1), (x2,y2), …….(xN , yN) an aggregate trend line can be obtained = β0 + β1xy

where : the estimated value of y

β0 : the population intercept (regression constant) β1 : the population slope(regression coefficient)

y

i yi = β0 + β1x +

For a particular score yi

Almost always Regression lines are developed on the basis of sample data hence these β0 and β1 are estimated by the sample slope b0 and intercept b1

14

Regression Line with Least Square Method

Infinite number of trend lines can be developed by changing the slope b1 and intercept b0

yi = b0 + b1xi + i

For n sample data the aggregate regression line = b0 + b1x y

y

x

0b

xbby 10

15

Regression Line with Least Square Method To find the best line so that the sum of

squared deviations is minimized What the issue is?

For a particular point (x1,y1) in the scattergram

y

x

i

To get the best line needs to be minimized

2i10i

2ii

2i

2 )xbby()yy(S

- A least square method

iy

xbby 10

0b

ii yy

yy ii

or

y1 = b0 + b1x1 +

1

2i

2S

16

What we do in Least Square method?

Find the values of slope(b0) and intercept (b1) for which the S2 is minimized

This is done by using the differential calculus

210i

2ii

2 )xbby()yy(S

0bS

0)xbby(2

n

1ii10i

1bS

n

1ii10ii 0)xbby(x2

Solving we get normal equations

n

1ii0

n

1ii1 ynbxb

n

1iii

n

1ii0

n

1i

2i1 yxxbxb

22

2

0 )x(xnxyxxyb 221

xxnyxxyn

b

xbby 10 - A line of best fit

17

Assumptions in Regression Analysis

Data must be parametric There is no outliers in the data Variables are normally distributed(if not try log, square root, square, and inverse

transformation The regression model is linear in nature The errors are independent (no autocorrelation) The error terms are normally distributed There is no multicollinearity The error has a constant variance(assumption of homoscedasticity)

18

Regression Analysis -An Illustration

Athletes data_________________

Height LBWin cms (in lbs)(x) (y)_________________191 162.5186 136191.5 163.5188 154190 149188.5 140.5193 157.3190.5 154.5189 151.5192 160.5_________________

19

SPSS Commands for Regression Analysis

   After selecting variables

Click the tag Statistics on the screen Check the box of

R squared change Descriptive Part and partial correlations

Press Continue 

Click the Method option and select any one of the following option

Stepwise Enter Forward Backward

Press O.K for output

Analyze Regression Linear

20

Different Methods Used in Regression Analysis

Variables selected in a particular stage is tested for its significance at every stage

Stepwise

All variables are selected for developing regression equationEnter

Variables once selected in a particular stage is retained in the model in subsequent stages

Forward

All variables are used to develop the regression model and then the variables are

dropped one by one depending upon their low predictability.

Backward

21

Output of Regression Analysis in SPSS

Model summary ANOVA table showing F-values for all the

models Regression coefficients and their significance

22

SPSS output – Model Summary

Model Summaryb

_________________________________________________

Model R R Square Adjusted R Std. Error of Square the Estimate

1 .816 .666 .624 5.56___________________________________________a. Predictors: (Constant), Heightb. Dependent Variable: Body Weight

23

SPSS output Testing Regression coefficients

Regression analysis output for the Body weight example ______________________________________________________________________

Unstandardized Standardized Coefficients Coefficients t Sig.

___________________________________ Model B Std. Error Beta ______________________________________________________________________

1 (Constant) - 517.047 167.719 -3.083 .015Height 3.527 .883 .816 3.995(Click) .004

______________________________________________________________________ Dependent Variable: Body weight R = 0.816 R2 = 0.666 Adjusted R2 =0.624

Look at the value of t computed in the last slide and in the SPSS

output Y(Weight) = -517.047 + 3.527 ×(Height)

24

SPSS Output - ANOVA table

F = t2 = 3.9952 = 15.96

In simple regression significance of regression coefficient and model are same

Significance of the model is tested by F value in ANOVA

ANOVA table___________________________________________________________________ Model Sum of Squaresdf Mean Square F Sig.___________________________________________________________________

2 Regression 494.203 1 494.203 15.959.004Residual 247.738 8 30.967Total 741.941 9

___________________________________________________________________  a. Predictors: (Constant), Height

Back to R2

25

Testing Assumptions of Regression Analysis with Residuals

Let’s See what the residual is?

26

Analyzing Residuals

Table Computation of residuals ____________________________________________Height Body weightin cms (in lbs)x y ___________________________________________

191 162.5 156.61 5.89

186 136 138.975 -2.975

191.5 163.5 158.3735 5.1265

188 154 146.029 7.971190 149 153.083 -4.083

188.5 140.5 147.7925 -7.2925

193 157.3 163.664 -6.364

190.5 154.5 154.8465 -0.3465189 151.5 149.556 1.944

192 160.5 160.137 0.363____________________________________________

yResiduals are estimates of experimental errors

For instance, for x= 188, = -517.047 + 3.527×188 = 146.029y

Maximum error: 7.97 1 lbs for height =188 cms Minimum error: 0.3465 lbs for height = 190.5 cms.

Worst case

Best case

Useful in identifying the outliers

yy

27

Residual Plot

184 186 188 190 192 194

Res

idua

ls

-10-8-6-4-202468

Height in cms

Residual plot for the data on lean body mass and height

Obtained by plotting an ordered pair of (xi, y- )

y

Useful in testing the assumptions in the regression analysis

28

How to test linearity of regression model?

Independent variable x

0R

esid

uals

o

ooo

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o oo

oo oo o

oo o

o

o

o

o

o oo

o

o

oooo

oo

o

ooo

o

ooo

ooo

oo

oo

o

oo

A curvilinear regression model

Residual plot

For low and high values of x the residuals are positives

And for middle value it is negative

29

How to test the Independence of errors?

oo

oo

o

o

oo o

oo

ooo

ooo

oo

o

oo

ooo o

oo

o

o oo

ooo

ooo ooo

oo o

o

o

oo oo

ooo o o o o

o oo ooo

oooo

o

x

0

Res

idua

ls

oo o

o oo

o

ooo

o

o

o oo

o ooo o

o oo o

o

oo

ooo o oooo o

oo oo

oooooooo o

o ooo

o oo

oo

Independent variable

Showing that the errors are related

No serial correlation should occur between a given error

term and itself over various time intervals

What is the pattern? : small positive residual occurs next to a small positive residual and a larger positive residual occurs next to the large positive residual

30

How to test the normality of error terms?

Normal Q-Q plot of the residuals

Error to be normally distributed all the points should be very close to the straight line

31

How to test that errors have constant variance - A test of homoscedasticity

Independent variable

o

ooo

ooo

oo

o

o

ooo

oo

o

o

o

oo

oo

oo o

oo

oo oo

oo oo

o

o

oo

o oo

oo

oo

oo

ooo

oo

oo

oo

o ooo oo

oo

o o

oo o

ooo o o

oooo

ooooo

ooo o

oooo

ooooo o

o ooo oo

ooo ooo o

oo o o o

o oo ooo

o

oo

o

o

o

oooo

oooo

o

x

0R

esid

uals

o oo oo o

oo o

o o o

o oo o o oo ooo

o

Showing unequal error variance

For homoscedasticity assumption to holds true

variations among the error terms should be similar at different points of x.

Back

32

Healthy residual Plot

The regression model is linear in nature The errors are independent The error terms are normally distributed The error has a constant variance

Holds all the assumptions of regression analysis

Independent variable

oo

oo

ooo

oo o

o

ooo

o

oo o

o

oo

oo

oo o

o

o

ooo

oo

o ooo

o

oo

ooo

oo

oo ooo

oooo

o

oo

o ooo ooo o

o o

o

ooo o o

oooo

oooo

oo

oo o

oooo

oooo o

oo o

oo oo

ooo ooo o

oo o o o o oo o

oo o

o

oo

oo

o

o

o oo

o

o

o

x

0

Res

idua

ls

o oo oo o

oo ooo

oo oo

o

o oo

o o oo o o o

o oooo o o oo o

oo

oooo

oo

oo oo o

o o oo o

o o oo ooo oo

o o o

o oooooo

oo

oo

oo o

oo

oo

oo o

Figure 6.9 Healthy residual plot

33

Different ways of testing a regression model

Analyzing residuals Residual Plot Standard error of estimate Testing significance of slopes Testing the significance of overall model coefficient of determination(R2)

34

To buy the book

Sports Research With Analytical Solutions Using SPSS

and all associated presentations click Here

Complete presentation is available on companion website of the book

For feedback write to vermajprakash@gmail.comRequest an Evaluation Copy