Presentation on Regression Analysis

34
Presentation on Regression Analysis Presentation on Chapter 9 Presented by Dr.J.P.Verma MSc (Statistics), PhD, MA(Psychology), Masters(Computer Application) Professor(Statistics) Lakshmibai National Institute of Physical Education, Gwalior, India (Deemed University)

Transcript of Presentation on Regression Analysis

Page 1: Presentation on Regression Analysis

Presentation on Regression Analysis

Presentation on Chapter 9

Presented by

Dr.J.P.VermaMSc (Statistics), PhD, MA(Psychology), Masters(Computer Application)

Professor(Statistics)Lakshmibai National Institute of Physical Education,

Gwalior, India(Deemed University)

Email: [email protected]

Page 2: Presentation on Regression Analysis

2

What is Regression ?

To answer the questions like

Going back to the original

Why to use?

Can I predict the fat % on the basis of the skinfolds?

What will be the weight of the person if the height is 175 cms?

Page 3: Presentation on Regression Analysis

3

Purpose of Regression Analysis

Which has not occurred so far Which is difficult to measure in field situation Which should occur for a particular independent variable

To predict the phenomenon

Page 4: Presentation on Regression Analysis

4

Types of Regression Analysis

Simple Regression Multiple Regression

Page 5: Presentation on Regression Analysis

5

This Presentation is based on

Chapter 9 of the book

Sports Research with Analytical Solution Using SPSS

Published by Wiley, USA

Complete Presentation can be accessed on

Companion Website

of the Book

Request an Evaluation Copy For feedback write to [email protected]

Page 6: Presentation on Regression Analysis

6

Simple Regression

Developing Regression equation With

One Dependent and one Independent variable

Page 7: Presentation on Regression Analysis

7

Mechanism of prediction in regression analysis

Develop an equation of line between Y(dependent)

and X(independent) variablesy

x

cbxy

c

Height

Weight

Page 8: Presentation on Regression Analysis

8

Application of Regression in Physical Education

Predicting Obesity Coronary Heart

Disease Risk Body mass index Fitness status

Projection ofWinning MedalsEstimating performanceRuns scored

In Physical Education In Sports

Efficient prediction enhances success in sports

Page 9: Presentation on Regression Analysis

9

Procedure for Developing Regression Equations

Deviation methods Least Square methods

How to find the regression line?

Y=bX+c

Page 10: Presentation on Regression Analysis

10

Deviation Method for Constructing Regression Equation

Regression equation of Y on X

)XX(rYYx

y

YXrXrYx

y

x

y

cbXY

Regression equation of X on Y

)YY(rXXy

x

b = regression coefficient x

yr

c= slope YXr

x

y

…………(1)

…………(2)

Computing coefficients

Page 11: Presentation on Regression Analysis

11

Can these Two Equations be Same ?Yes if the slopes of the two equations are same

)xx(r)yy(x

y

)xx(r

)yy(x

y

------(1)

------(2)

After solving )xx(r)yy(x

y

------(3)

)yy(r)xx(y

x

------(4)

Equation (3) and (4) would be same if x

y

x

y

rr

r2 = 1 or r = 1

Implication If the relationship between two variables is either perfectly positive or perfectly negative one can be estimated with the help of others with 100% accuracy, which is rarely the case.

Page 12: Presentation on Regression Analysis

12

Precaution in Regression Analysis

But association is a necessary prerequisite for inferring causation

The independent variable must precede the dependent variable in time.

The dependent and independent variables must be plausibly lined by a theory

Regression focuses on association and not causation

Page 13: Presentation on Regression Analysis

13

Least Square Method for Constructing Regression Equation

Uses the concept of differential calculus

For n population points (x1,y1), (x2,y2), …….(xN , yN) an aggregate trend line can be obtained = β0 + β1xy

where : the estimated value of y

β0 : the population intercept (regression constant) β1 : the population slope(regression coefficient)

y

i yi = β0 + β1x +

For a particular score yi

Almost always Regression lines are developed on the basis of sample data hence these β0 and β1 are estimated by the sample slope b0 and intercept b1

Page 14: Presentation on Regression Analysis

14

Regression Line with Least Square Method

Infinite number of trend lines can be developed by changing the slope b1 and intercept b0

yi = b0 + b1xi + i

For n sample data the aggregate regression line = b0 + b1x y

y

x

0b

xbby 10

Page 15: Presentation on Regression Analysis

15

Regression Line with Least Square Method To find the best line so that the sum of

squared deviations is minimized What the issue is?

For a particular point (x1,y1) in the scattergram

y

x

i

To get the best line needs to be minimized

2i10i

2ii

2i

2 )xbby()yy(S

- A least square method

iy

xbby 10

0b

ii yy

yy ii

or

y1 = b0 + b1x1 +

1

2i

2S

Page 16: Presentation on Regression Analysis

16

What we do in Least Square method?

Find the values of slope(b0) and intercept (b1) for which the S2 is minimized

This is done by using the differential calculus

210i

2ii

2 )xbby()yy(S

0bS

0)xbby(2

n

1ii10i

1bS

n

1ii10ii 0)xbby(x2

Solving we get normal equations

n

1ii0

n

1ii1 ynbxb

n

1iii

n

1ii0

n

1i

2i1 yxxbxb

22

2

0 )x(xnxyxxyb 221

xxnyxxyn

b

xbby 10 - A line of best fit

Page 17: Presentation on Regression Analysis

17

Assumptions in Regression Analysis

Data must be parametric There is no outliers in the data Variables are normally distributed(if not try log, square root, square, and inverse

transformation The regression model is linear in nature The errors are independent (no autocorrelation) The error terms are normally distributed There is no multicollinearity The error has a constant variance(assumption of homoscedasticity)

Page 18: Presentation on Regression Analysis

18

Regression Analysis -An Illustration

Athletes data_________________

Height LBWin cms (in lbs)(x) (y)_________________191 162.5186 136191.5 163.5188 154190 149188.5 140.5193 157.3190.5 154.5189 151.5192 160.5_________________

Page 19: Presentation on Regression Analysis

19

SPSS Commands for Regression Analysis

   After selecting variables

Click the tag Statistics on the screen Check the box of

R squared change Descriptive Part and partial correlations

Press Continue 

Click the Method option and select any one of the following option

Stepwise Enter Forward Backward

Press O.K for output

Analyze Regression Linear

Page 20: Presentation on Regression Analysis

20

Different Methods Used in Regression Analysis

Variables selected in a particular stage is tested for its significance at every stage

Stepwise

All variables are selected for developing regression equationEnter

Variables once selected in a particular stage is retained in the model in subsequent stages

Forward

All variables are used to develop the regression model and then the variables are

dropped one by one depending upon their low predictability.

Backward

Page 21: Presentation on Regression Analysis

21

Output of Regression Analysis in SPSS

Model summary ANOVA table showing F-values for all the

models Regression coefficients and their significance

Page 22: Presentation on Regression Analysis

22

SPSS output – Model Summary

Model Summaryb

_________________________________________________

Model R R Square Adjusted R Std. Error of Square the Estimate

1 .816 .666 .624 5.56___________________________________________a. Predictors: (Constant), Heightb. Dependent Variable: Body Weight

Page 23: Presentation on Regression Analysis

23

SPSS output Testing Regression coefficients

Regression analysis output for the Body weight example ______________________________________________________________________

Unstandardized Standardized Coefficients Coefficients t Sig.

___________________________________ Model B Std. Error Beta ______________________________________________________________________

1 (Constant) - 517.047 167.719 -3.083 .015Height 3.527 .883 .816 3.995(Click) .004

______________________________________________________________________ Dependent Variable: Body weight R = 0.816 R2 = 0.666 Adjusted R2 =0.624

Look at the value of t computed in the last slide and in the SPSS

output Y(Weight) = -517.047 + 3.527 ×(Height)

Page 24: Presentation on Regression Analysis

24

SPSS Output - ANOVA table

F = t2 = 3.9952 = 15.96

In simple regression significance of regression coefficient and model are same

Significance of the model is tested by F value in ANOVA

ANOVA table___________________________________________________________________ Model Sum of Squaresdf Mean Square F Sig.___________________________________________________________________

2 Regression 494.203 1 494.203 15.959.004Residual 247.738 8 30.967Total 741.941 9

___________________________________________________________________  a. Predictors: (Constant), Height

Back to R2

Page 25: Presentation on Regression Analysis

25

Testing Assumptions of Regression Analysis with Residuals

Let’s See what the residual is?

Page 26: Presentation on Regression Analysis

26

Analyzing Residuals

Table Computation of residuals ____________________________________________Height Body weightin cms (in lbs)x y ___________________________________________

191 162.5 156.61 5.89

186 136 138.975 -2.975

191.5 163.5 158.3735 5.1265

188 154 146.029 7.971190 149 153.083 -4.083

188.5 140.5 147.7925 -7.2925

193 157.3 163.664 -6.364

190.5 154.5 154.8465 -0.3465189 151.5 149.556 1.944

192 160.5 160.137 0.363____________________________________________

yResiduals are estimates of experimental errors

For instance, for x= 188, = -517.047 + 3.527×188 = 146.029y

Maximum error: 7.97 1 lbs for height =188 cms Minimum error: 0.3465 lbs for height = 190.5 cms.

Worst case

Best case

Useful in identifying the outliers

yy

Page 27: Presentation on Regression Analysis

27

Residual Plot

184 186 188 190 192 194

Res

idua

ls

-10-8-6-4-202468

Height in cms

Residual plot for the data on lean body mass and height

Obtained by plotting an ordered pair of (xi, y- )

y

Useful in testing the assumptions in the regression analysis

Page 28: Presentation on Regression Analysis

28

How to test linearity of regression model?

Independent variable x

0R

esid

uals

o

ooo

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o oo

oo oo o

oo o

o

o

o

o

o oo

o

o

oooo

oo

o

ooo

o

ooo

ooo

oo

oo

o

oo

A curvilinear regression model

Residual plot

For low and high values of x the residuals are positives

And for middle value it is negative

Page 29: Presentation on Regression Analysis

29

How to test the Independence of errors?

oo

oo

o

o

oo o

oo

ooo

ooo

oo

o

oo

ooo o

oo

o

o oo

ooo

ooo ooo

oo o

o

o

oo oo

ooo o o o o

o oo ooo

oooo

o

x

0

Res

idua

ls

oo o

o oo

o

ooo

o

o

o oo

o ooo o

o oo o

o

oo

ooo o oooo o

oo oo

oooooooo o

o ooo

o oo

oo

Independent variable

Showing that the errors are related

No serial correlation should occur between a given error

term and itself over various time intervals

What is the pattern? : small positive residual occurs next to a small positive residual and a larger positive residual occurs next to the large positive residual

Page 30: Presentation on Regression Analysis

30

How to test the normality of error terms?

Normal Q-Q plot of the residuals

Error to be normally distributed all the points should be very close to the straight line

Page 31: Presentation on Regression Analysis

31

How to test that errors have constant variance - A test of homoscedasticity

Independent variable

o

ooo

ooo

oo

o

o

ooo

oo

o

o

o

oo

oo

oo o

oo

oo oo

oo oo

o

o

oo

o oo

oo

oo

oo

ooo

oo

oo

oo

o ooo oo

oo

o o

oo o

ooo o o

oooo

ooooo

ooo o

oooo

ooooo o

o ooo oo

ooo ooo o

oo o o o

o oo ooo

o

oo

o

o

o

oooo

oooo

o

x

0R

esid

uals

o oo oo o

oo o

o o o

o oo o o oo ooo

o

Showing unequal error variance

For homoscedasticity assumption to holds true

variations among the error terms should be similar at different points of x.

Back

Page 32: Presentation on Regression Analysis

32

Healthy residual Plot

The regression model is linear in nature The errors are independent The error terms are normally distributed The error has a constant variance

Holds all the assumptions of regression analysis

Independent variable

oo

oo

ooo

oo o

o

ooo

o

oo o

o

oo

oo

oo o

o

o

ooo

oo

o ooo

o

oo

ooo

oo

oo ooo

oooo

o

oo

o ooo ooo o

o o

o

ooo o o

oooo

oooo

oo

oo o

oooo

oooo o

oo o

oo oo

ooo ooo o

oo o o o o oo o

oo o

o

oo

oo

o

o

o oo

o

o

o

x

0

Res

idua

ls

o oo oo o

oo ooo

oo oo

o

o oo

o o oo o o o

o oooo o o oo o

oo

oooo

oo

oo oo o

o o oo o

o o oo ooo oo

o o o

o oooooo

oo

oo

oo o

oo

oo

oo o

Figure 6.9 Healthy residual plot

Page 33: Presentation on Regression Analysis

33

Different ways of testing a regression model

Analyzing residuals Residual Plot Standard error of estimate Testing significance of slopes Testing the significance of overall model coefficient of determination(R2)

Page 34: Presentation on Regression Analysis

34

To buy the book

Sports Research With Analytical Solutions Using SPSS

and all associated presentations click Here

Complete presentation is available on companion website of the book

For feedback write to [email protected] an Evaluation Copy