Recap All about measures of location measures of centre Mean Median Mode measures of Any Position...
-
Upload
robert-ramsey -
Category
Documents
-
view
216 -
download
0
Transcript of Recap All about measures of location measures of centre Mean Median Mode measures of Any Position...
MH-Variance -Kuwait
Recap
All about measures of location
measures of centre Mean MedianMode
measures of Any Position Percentiles
You should be able to calculate these from grouped and raw data
You should also be able to draw a box and whisker plot
MH-Variance -Kuwait
This week Measures of Spread
Sample of Heights of peoples in Coventry and Norwich
We need more then the mean to compare data setsWe need a numerical measure representing how the data varies
MH-Variance -Kuwait
Measures of Spread
Range
Inter Quartile Range
Variance
Standard Deviation
This hour lesson we concentrate on how to calculate the following two measures
Range
MH-Variance -Kuwait
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Range = largest value - smallest valueRange = 615 - 425 = 190
Interquartile Range
The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
375375
400400
425425
450450
475475
500500
525525
550550
575575
600600
625625
Interquartile Range
MH-Variance -Kuwait
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
3rd Quartile (Q3) = 5251st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
L25= (n+1)*25/100 71/4 = 17.75 18th value
L75= (n+1)*75/100 71*3/4 = 53.25 53th value
MH-Variance -Kuwait
Basic NotationAs we will be working with formulas we need to make sure about some notation
10, 30, 301 , 46, 18, 21, 19, 83, 4, .............., 88
Data set “X”
x1 x2 x3 x4 x5 x6 x5 x6 x7 xn
We often refer to a data set with an upper case letter like X,
In which case the numbers in the data set are called elements (x1, x2, ..., xn)
“n” or “N” is the number of elements or observations
n321 x.......................xxx n
1ix X
MH-Variance -Kuwait
Net deviations from the meanwill always sum to zero
0)(1
n
ii xx
x1x 2x 3x 4x
So “total distance” from the mean is zero Because +ve and –ve contributions
cancel
MH-Variance -Kuwait
Measures of data Spread• But we want a measure that will represent these net
deviations somehow.
• One way to ensure a non-zero result is to square each deviation before adding it.
• We can then average these deviations by dividing by their
number “n” and use this compare data sets
• OR, we can average and take the square root of the above
• This latter approach will have the same units as the underlying data.
VarianceUnits squared
Standard deviation Units of Units
MH-Variance -Kuwait
Calculate the Variance for the following data set
10
3.5
27
12
2
Mean is 10.9 n=5
-0.9
-7.4
16.1
1.1
-8.9
xx i 0.81
54.76
259.21
1.21
79.21
395.2
2i )x(x
This data relates to Measures of distance travelled to work in units of (miles)
Units in miles
04795
23952
2 ..
N
)x(x σ i
This is the population variance (miles2)
8985
23952
..
N
)x(x σ i
This is the population standard deviation (miles)
ix
MH-Variance -Kuwait
Rent (€) f i
420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6
Total 70
M i
429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5
Population Variance for Grouped Data
M i - x
-63.7-43.7-23.7-3.716.336.356.376.396.3116.3
f i(M i - x )2
32471.7132479.596745.97110.11
1857.555267.866337.13
23280.6618543.5381140.18
208234.29
(M i - x )2
4058.961910.56562.1613.76
265.361316.963168.565820.169271.76
13523.36
70
29.2082342 70
29.208234
69
29.2082342 s69
29.208234s
Mi is calls midpoint our Xi
MH-Variance -Kuwait
Variance for Grouped Data
sf M x
ni i2
2
1
( )
22
f M
Ni i( )
For sample data
For population data
Sample variance s2 is commonly referred to by σ2n-1
Sample Standard Deviation s is commonly referred to by σn-1
So why is the sample measure divided by (n-1) ? – will deal with this soon!
MH-Variance -Kuwait
GROUPED DATASample Variance
GROUPED DATAPopulation Variance
N
)(
2i2
x
RAW DATASample Variance
RAW DATAPopulation Variance
1
22
-n
)x(x s i
1
222
-n
)xn(xs i
N
)n(
22i2
x
1
22
-n
)xn(.fx s ii2
N
)n(.fx
2ii2
2
Formulae
N
fx ii2 .)( 2
1
.2
-n
f)x(x s ii2
2- I would like you think of calculating variance as or
Where Sxx can be calculated in different ways
and can be divided appropriately dependent on whether we have a sample or population
Things will now do
MH-Variance -Kuwait
1- Understand why the following two formulas are the same and appreciate that the second form is much quicker to calculate than the first form
1
22
-n
)x(x s i
1
222
-n
)xn(xs i
12
-n
S s xx
n
S xx2
222 xnxxx
3- We should investigate why we average , S2 , by (n-1) when we are dealing with a sample
We will deal with this third and unusual point next!!
Why we divide by (n-1)
MH-Variance -Kuwait
2
Population
v
vWe take a random sample from the population and use it to estimate σ2
Samplev
v
We are trying to estimate the true population mean σ2
In the real world we take a sample and use it
Population
2Sample
2s2
sI am going to show you that S2 will be the better estimator of the true population variance, σ2
MH-Variance -Kuwait
21s 21
Taking Lots of Samples of fixed size n & Build distributions of S2 and σ2
22s
22
23s
23
24s
24
25s
25
2ns
2n
21
2
2
n
ss
n
ii
21
2
2
n
n
ii
sMH-Variance -Kuwait
Calculating s2 and σ2 of many samples , grouping and counting we can build distributions for s2 and σ2
σ2
S2 dist’nσs2 dist’n
<σ2
RED distribution is centered around the real population varianceMH-Variance -Kuwait
Showing = σ2
I will generate a Populationof numbersAnd calculate the Pop Var (σ2)
Row 1 Sample 1
Row 2- Sample 2
Row 3 Sample 3
Row 4 Sample 4
Row 100 Sample 100
S2 σs2
AVG(S2) AVG(σs2)
S2 σs2
S2 σs2
S2 σS2
S2 σs2
Then show that AVG(S2) = σ2
AVG(σs2) < σ2
Therefore E(S2)= σ2
2s
MH-Variance -Kuwait
Summary
MH-Variance -Kuwait
We have looked at the formula for calculating Variance and Its square root Std- Deviation
We have noted that we average by n or n-1 depending on whether or not we are working with a sample or population
We have noted that that we can write Sxx = in different ways that are faster to calculate. We should work these different ways through shortly
2xx
But first
Some questions