Recap All about measures of location measures of centre Mean Median Mode measures of Any Position...

20
Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from grouped and raw data You should also be able to draw a box and whisker plot MH-Variance -Kuwait

Transcript of Recap All about measures of location measures of centre Mean Median Mode measures of Any Position...

Page 1: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Recap

All about measures of location

measures of centre Mean MedianMode

measures of Any Position Percentiles

You should be able to calculate these from grouped and raw data

You should also be able to draw a box and whisker plot

Page 2: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

This week Measures of Spread

Sample of Heights of peoples in Coventry and Norwich

We need more then the mean to compare data setsWe need a numerical measure representing how the data varies

Page 3: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Measures of Spread

Range

Inter Quartile Range

Variance

Standard Deviation

This hour lesson we concentrate on how to calculate the following two measures

Page 4: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Range

MH-Variance -Kuwait

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Range = largest value - smallest valueRange = 615 - 425 = 190

Page 5: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Interquartile Range

The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data.

It overcomes the sensitivity to extreme data values.

375375

400400

425425

450450

475475

500500

525525

550550

575575

600600

625625

Page 6: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Interquartile Range

MH-Variance -Kuwait

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

3rd Quartile (Q3) = 5251st Quartile (Q1) = 445

Interquartile Range = Q3 - Q1 = 525 - 445 = 80

L25= (n+1)*25/100 71/4 = 17.75 18th value

L75= (n+1)*75/100 71*3/4 = 53.25 53th value

Page 7: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Basic NotationAs we will be working with formulas we need to make sure about some notation

10, 30, 301 , 46, 18, 21, 19, 83, 4, .............., 88

Data set “X”

x1 x2 x3 x4 x5 x6 x5 x6 x7 xn

We often refer to a data set with an upper case letter like X,

In which case the numbers in the data set are called elements (x1, x2, ..., xn)

“n” or “N” is the number of elements or observations

n321 x.......................xxx n

1ix X

Page 8: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Net deviations from the meanwill always sum to zero

0)(1

n

ii xx

x1x 2x 3x 4x

So “total distance” from the mean is zero Because +ve and –ve contributions

cancel

Page 9: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Measures of data Spread• But we want a measure that will represent these net

deviations somehow.

• One way to ensure a non-zero result is to square each deviation before adding it.

• We can then average these deviations by dividing by their

number “n” and use this compare data sets

• OR, we can average and take the square root of the above

• This latter approach will have the same units as the underlying data.

VarianceUnits squared

Standard deviation Units of Units

Page 10: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Calculate the Variance for the following data set

10

3.5

27

12

2

Mean is 10.9 n=5

-0.9

-7.4

16.1

1.1

-8.9

xx i 0.81

54.76

259.21

1.21

79.21

395.2

2i )x(x

This data relates to Measures of distance travelled to work in units of (miles)

Units in miles

04795

23952

2 ..

N

)x(x σ i

This is the population variance (miles2)

8985

23952

..

N

)x(x σ i

This is the population standard deviation (miles)

ix

Page 11: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Rent (€) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

M i

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

Population Variance for Grouped Data

M i - x

-63.7-43.7-23.7-3.716.336.356.376.396.3116.3

f i(M i - x )2

32471.7132479.596745.97110.11

1857.555267.866337.13

23280.6618543.5381140.18

208234.29

(M i - x )2

4058.961910.56562.1613.76

265.361316.963168.565820.169271.76

13523.36

70

29.2082342 70

29.208234

69

29.2082342 s69

29.208234s

Mi is calls midpoint our Xi

Page 12: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

Variance for Grouped Data

sf M x

ni i2

2

1

( )

22

f M

Ni i( )

For sample data

For population data

Sample variance s2 is commonly referred to by σ2n-1

Sample Standard Deviation s is commonly referred to by σn-1

So why is the sample measure divided by (n-1) ? – will deal with this soon!

Page 13: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

MH-Variance -Kuwait

GROUPED DATASample Variance

GROUPED DATAPopulation Variance

N

)(

2i2

x

RAW DATASample Variance

RAW DATAPopulation Variance

1

22

-n

)x(x s i

1

222

-n

)xn(xs i

N

)n(

22i2

x

1

22

-n

)xn(.fx s ii2

N

)n(.fx

2ii2

2

Formulae

N

fx ii2 .)( 2

1

.2

-n

f)x(x s ii2

Page 14: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

2- I would like you think of calculating variance as or

Where Sxx can be calculated in different ways

and can be divided appropriately dependent on whether we have a sample or population

Things will now do

MH-Variance -Kuwait

1- Understand why the following two formulas are the same and appreciate that the second form is much quicker to calculate than the first form

1

22

-n

)x(x s i

1

222

-n

)xn(xs i

12

-n

S s xx

n

S xx2

222 xnxxx

3- We should investigate why we average , S2 , by (n-1) when we are dealing with a sample

We will deal with this third and unusual point next!!

Page 15: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Why we divide by (n-1)

MH-Variance -Kuwait

2

Population

v

vWe take a random sample from the population and use it to estimate σ2

Samplev

v

Page 16: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

We are trying to estimate the true population mean σ2

In the real world we take a sample and use it

Population

2Sample

2s2

sI am going to show you that S2 will be the better estimator of the true population variance, σ2

MH-Variance -Kuwait

Page 17: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

21s 21

Taking Lots of Samples of fixed size n & Build distributions of S2 and σ2

22s

22

23s

23

24s

24

25s

25

2ns

2n

21

2

2

n

ss

n

ii

21

2

2

n

n

ii

sMH-Variance -Kuwait

Page 18: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Calculating s2 and σ2 of many samples , grouping and counting we can build distributions for s2 and σ2

σ2

S2 dist’nσs2 dist’n

<σ2

RED distribution is centered around the real population varianceMH-Variance -Kuwait

Page 19: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Showing = σ2

I will generate a Populationof numbersAnd calculate the Pop Var (σ2)

Row 1 Sample 1

Row 2- Sample 2

Row 3 Sample 3

Row 4 Sample 4

Row 100 Sample 100

S2 σs2

AVG(S2) AVG(σs2)

S2 σs2

S2 σs2

S2 σS2

S2 σs2

Then show that AVG(S2) = σ2

AVG(σs2) < σ2

Therefore E(S2)= σ2

2s

MH-Variance -Kuwait

Page 20: Recap All about measures of location measures of centre Mean Median Mode measures of Any Position Percentiles You should be able to calculate these from.

Summary

MH-Variance -Kuwait

We have looked at the formula for calculating Variance and Its square root Std- Deviation

We have noted that we average by n or n-1 depending on whether or not we are working with a sample or population

We have noted that that we can write Sxx = in different ways that are faster to calculate. We should work these different ways through shortly

2xx

But first

Some questions