Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

19
Section 1 Topic 3 1 Section1 Topic 3 Summarising metric data: Median, IQR, and boxplots

Transcript of Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Page 1: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 1

Section1 Topic 3

Summarising metric data:Median, IQR, and boxplots

Page 2: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 2

Summarising metric data: Median, IQR & Box Plots

Can we describe a distribution with just one or two numbers?

What is the median, how is it calculated and what does it tell us?

What is the interquartile range, how is it calculated and what does it tell us?

What is a five number summary? What is a box plot and why is it

useful?

Page 3: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 3

Will less than the whole picture do?

Summary Statistics Measures of centre

Median Mean

Measures of spread Range Interquartile Range Standard Deviation

Page 4: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 4

Median3 5 1 4 8

Firstly numerically order the data set

1 3 4 5 8

50% higher than or equal to median

50% lower than or equal to median

Location of Median = (n+1)/2

= (5+1)/2

= 3rd observationNotes p.97

Page 5: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

For an odd number of data values the median will be one of the data values

1 3 4 5 8

Median = 4

For an even number of data values the median may not coincide with an actual data value

3 4 5 8

Median = 4.5

Location of Median = (4+1)/2

= (5)/2

= 2.5 observation

Page 6: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 6

Limitations: Range Depends on only two extreme values.

Data set 1 5 6 7 8 9 10 11 12 Range = 12 - 5 = 7 Data set 2 5 12 12 12 12 12 12 12

Page 7: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 7

Interquartile range

Quartiles are the points that divide a distribution into quarters

Q1 Q2 Q3

25% 50% 75%Median

IQR = Q3 - Q1

The interquartile range (IQR) is defined to be the spread of the middle 50% of data values, so that

Notes p.99

Page 8: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 8

Why is the IQR more useful that the range?

IQR describes the middle 50% of observations.

Upper 25% and lower 25% of observations are discarded.

IQR generally not affected by outliers.

Page 9: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 9

Fre

qu

ency

0

2

4

6

8

10

12

14

bottom 25% middle 50% top 25%

Q 1

Q 2

Q 3

Picturing quartiles with histogram

Notes p.97

Page 10: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 10

Five number summary

Minimum value, Q1, Median, Q3, Maximum value

Page 11: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 11

The BoxplotGraphical representation of five number summary

Notes p.98

Page 12: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 12

Constructing a Boxplot

Notes p.99

Page 13: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 13

*Exercise 4

Notes p.103

Page 14: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 14

Q1 Q3M

For a symmetric distribution, the box plot is also symmetric. The median

is in the middle of the box and the whiskers are approximately equal in

length.

Relating a boxplot to the shape of the distribution : Symmetric

Notes p.104

Page 15: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 15

Positively skewed distributions

Q1 Q3M

positive skew

The box plot of a positively skewed distribution has the median off-centre

and to the left. The left hand whisker will be short, while the right hand

whisker will be long reflecting the gradual tailing off data values to the

right.

Page 16: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 16

Q3Q1 M

negative skew

The box plot of a negatively-skewed distribution has the median off-centre

and to the right. The right hand whisker will be short, while the left hand

whisker will be long reflecting the gradual tailing off data values to the left.

Negatively skewed distributions

Page 17: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 17

Boxplot with outliers Possible outliers defined as any values

outside of the interval

(Q1-1.5 X IQR, Q3 + 1.5 X IQR)

We say possible, since the point may just be part of the tail of the distribution but we may not have enough data to be sure

Notes p.101

Page 18: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 18

Boxplot with outliers

Min Q1 M Q3 Max

38 63 70 75 76

Page 19: Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.

Section 1 Topic 3 19

*Exercise 5

Notes p.107