Calculating Residuals © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.

20
Calculating Residuals Calculating Residuals © Christine Crisp Teach A Level Maths” Teach A Level Maths” Vol. 2: A2 Core Vol. 2: A2 Core Modules Modules

Transcript of Calculating Residuals © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.

Calculating ResidualsCalculating Residuals

© Christine Crisp

““Teach A Level Maths”Teach A Level Maths”

Vol. 2: A2 Core Vol. 2: A2 Core ModulesModules

Calculating Residuals

"Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages"

Statistics 1

AQA

EDEXCELOCR

Calculating Residuals

Foot length and height of UK children

Height (cm)

Foot length (cm)

Once we have found a regression line, we may need to know how close any particular observation is to the line.To do this, we find a residual. For the height and foot length data . . .

y on x regression

line

To find the residual for the point we find

yyA ),( AA yx

),( AA yx

Calculating Residuals

e.g. The marks for 10 students in Maths and Physics are as follows:

A B C D E F G H I J

Maths, x 41 37 38 39 47 42 34 35 48 49

Physics, y 36 20 31 24 35 42 26 27 29 37

The regression line for y on x is

xy 700811

yyA Residual of point A =( The residual is negative if the point is below the line.)To find y, substitute the value of x at point A into the regression line:

530)41(700811 y

495513036 yyA

Calculating Residuals

),( AA yx

),( yxA x

SUMMARYTo find the residual for a particular

observation, A,• calculate the y-coordinate on the regression

line corresponding to the x-value at A,

yyA • find

• The residual is negative if the point is below the line

• Since , the residual at A is also given by

bxay AA bxay

Calculating Residuals

OutliersOutliers are points that lie well away from the regression line.

Since a residual measures the distance of a point from a line, residuals are used to identify outliers.

Outliers can have a considerable effect on a regression line and make it unreliable.

Calculating Residuals

e.g. The diagram is a scatter diagram of the data shown in the table.

38

77

116

125

144

123

182

51

yx

If we were to draw the line “by eye”, the 1st point . . . would lie well away from the line we would want to draw.

However, the calculation of the regression line includes the 1st point and distorts the position of the line.

Calculating Residuals

The diagram shows the y on x regression line for all the data. The residuals are shown by the red lines.

38

77

116

125

144

123

182

51

yx

xy 8802114

The left-hand end of the line is further down than it would be without the 1st point.

e.g. The diagram is a scatter diagram of the data shown in the table.

Calculating Residuals

x y

1 5

2 18

3 12

4 14

5 12

6 11

7 7

8 3

Removing the 1st point . . .

xy 8802114

e.g. The diagram is a scatter diagram of the data shown in the table.

Calculating Residuals

x y

1 5

2 18

3 12

4 14

5 12

6 11

7 7

8 3xy 0723621

xy 8802114

e.g. The diagram is a scatter diagram of the data shown in the table.

Removing the 1st point gives

Calculating Residuals

1392 R

The sum of the squares of the residuals,

9192 R

The sum of the squares of the residuals,

Without the 1st point, we have a regression line that is a much better fit.

xy 8802114

e.g. The diagram is a scatter diagram of the data shown in the table.

xy 0723621

Removing the 1st point gives

Calculating ResidualsExercise

(a) Find the equation of the regression line of y on x

(b) Estimate the percentage of accidents to children in an area with 10% open space.

(c) Find the residual for A.

1. The table shows the number of accidents to children as a percentage of those to adults, y, in 9 areas of London together with the percentage of open space in those areas, x.

17·1

23·8

30·8

33·6

3738·2

4042·9

46·3

Children’s Accidents (%)

14·814·66·35·24·571·41·35Open Spaces(%)

IHGFEDCBA

Calculating ResidualsSolutions(a) Find the equation of the regression line of y

on x(b) Estimate the percentage of accidents to

children in an area with 10% open space.(c) Find the residual for A.

(a) The equation of the regression line for y on x is

Solution:

xy 6514045 (b)

92810 yxNearly 29% of accidents will involve

children.(c) At , )346,5( A

1537)5(65140455 yx

1591537346 yyAResidual

=

Calculating Residuals

The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied.For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.

Calculating Residuals

• calculate the y-coordinate on the regression line corresponding to the x-value at A,

),( AA yx

),( yxA x

SUMMARY

To find the residual for a particular observation, A,

The residual is negative if the point is below the line

yyA • find

• Since , the residual at A is also given by

bxay AA bxay

Calculating Residuals

yyA

e.g. The marks for 10 students in Maths and Physics are as follows:

36

41

A

20

37

B

31

38

C

24

39

D

35

47

E

42

42

F

26

34

G

27

35

H

29

48

I

37

49

J

Physics, y

Maths, x

The regression line for y on x is

xy 720071

Residual of point A =( The residual is negative if the point is below the line.)To find y, substitute the value of x at point A into the regression line:

5930)41(720071 y415593036 yyA

Calculating Residuals

OutliersOutliers are points that lie well away from the regression line.

Since a residual measures the distance of a point from a line, residuals are used to identify outliers.

Outliers can have a considerable effect on a regression line and make it unreliable.

Calculating Residuals

e.g. The diagram is a scatter diagram of the data shown in the table.

38

77

116

125

144

123

182

51

yx

If we were to draw the line “by eye”, the 1st point . . . would lie well away from the line we would want to draw.

However, the calculation of the regression line includes the 1st point and distorts the position of the line.

Calculating Residuals

e.g. The diagram shows the y on x regression line for the data in the table. The residuals are shown by the lines parallel to the y-axis.

1392 R

The sum of the squares of the residuals,

9192 R

The sum of the squares of the residuals,

Without the 1st point, we have a regression line that is a much better fit.

xy 0723621

xy 4617117

The 1st point has the largest residual.