Why statisticians were created Measure of dispersion FETP India.

Post on 12-Jan-2016

228 views 0 download

Tags:

Transcript of Why statisticians were created Measure of dispersion FETP India.

Why statisticians were created

Measure of dispersion

FETP India

Competency to be gained from this lecture

Calculate a measure of variation that is adapted to the sample studied

Key issues

• Range• Inter-quartile variation• Standard deviation

Measures of spread, dispersion or variability

• The measure of central tendency provides important information about the distribution

• However, it does not provide information concerning the relative position of other data points in the sample

• Measure of spread, dispersion or variability address are needed

Range

Why one needs to measure variability

Students

Marks obtained

Biology Physics Chemistry

1 200 199 100

2 200 200 200

3 200 201 300

Mean 200 200 200

Variation Nil Slight Substantial

Range 0 2 200

Range

Every concept comes from a failure of the previous concept

• Mean is distorted by outliers• Median takes care of the outliers

Range

The range: A simple measure of dispersion

• Take the difference between the lowest value and the highest value

• Limitation: The range says nothing about the values

between extreme values The range is not stable: As the sample size

increases, the range can change dramatically

Statistics cannot be used to look at the range

Range

Example of a range

• Take a sample of 10 heights: 70, 95, 100, 103, 105, 107, 110, 112, 115

and 140 cms

• Lowest (Minimum) value 70cm

• Highest (Maximum) value 140cm

• Range 140 – 70 = 70cm

Range

Three different distributions with the same range (35 Kgs)

30 40 50 60 70

30 40 50 60 70

30 40 50 60 70

X X X X X X X X X

X X XX X XX X

X X

X

Even

Uneven

Clumped XXXXXXX

Range

The range increases with the sample size

Values Range

Initial set(5 values)

30 40 53 58 65 - - - 30 65 35

New set(3 more values)

30 40 53 58 65 48 51 64 30 65 35

New set(3 more values)

30 40 53 58 65 48 51 70 30 70 40

New set(3 more values)

30 40 53 58 65 28 51 70 28 70 42

Two ranges based on different sample sizes are not comparableRang

e

Percentiles and quartiles

• Percentiles Those values in a series of observations,

arranged in ascending order of magnitude, which divide the distribution into two equal parts

The median is the 50th percentile

• Quartiles The values which divide a series of observations,

arranged in ascending order, into 4 equal parts The median is the 2nd quartile

Inter-quartile range

First 25% 2nd 25% 3rd 25% 4th 25%

Q1Q2

(Median) Q3

Sorting the data in increasing order

• Median Middle value (if n is odd) Average of the two middle values (if n is

even) A measure of the “centre” of the data

• Quartiles divide the set of ordered values into 4 equal parts

The inter-quartile range

• The central portion of the distribution • Calculated as the difference between the

third quartile and the first quartile• Includes about one-half of the

observations• Leaves out one quarter of the observations • Limitations:

Only takes into account two values Not a mathematical concept upon which

theories can be developed

Inter-quartile range

The inter-quartile range: Example

• Values 29 , 31 , 24 , 29 , 30 , 25

• Arrange 24 , 25 , 29 , 29, 30 , 31

• Q1 Value of (n+1)/4=1.75 24+0.75 = 24.75

• Q3 Value of (n+1)*3/4=5.2 Q3 = 30+0.2 = 30.2

• Inter-quartile range = Q3 – Q1 = 30.2 – 24.75Inter-quartile range

Graphic representation of theinter-quartile range

Inter-quartile range

The mean deviation from the mean

• Calculate the mean of all values• Calculate the difference between each

value and the mean• Calculate the average difference

between each value and the mean• Limitations:

The average between negative and positive deviations may generate a value of 0 while there is substantial variation

Standard deviation

The mean deviation from the mean:Example

Data 10 20 30 40 50 60 70Mean = 280/7 = 40Mean deviation from mean10-40 20-40 ………-30 -20 -10 0 10 20 30 Sum = 0

Standard deviation

Absolute mean deviation from the mean

• Calculate the mean of all values• Calculate the difference between each

value and the mean and take the absolute value

• Calculate the average difference between each value and the mean

• Limitations: Absolute value is not good from a

mathematical point of viewStandard deviation

Absolute mean deviation from the mean: Example

Standard deviation

Data 10 20 30 40 50 60 70Mean = 280/7 = 40Mean deviation from mean10-40 20-40 ………-30 -20 -10 0 10 20 30Absolute values30 20 10 0 10 20 30 Mean deviation from mean = 120/7 = 17.1

Calculating the variance (1/2)

1. Calculate the mean as a measure of central location (MEAN)

2. Calculate the difference between each observation and the mean (DEVIATION)

3. Square the differences (SQUARED DEVIATION)• Negative and positive deviations will not

cancel each other out• Values further from the mean have a bigger

impactStandard deviation

Calculating the variance (2/2)

4. Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS)

5. Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE

• Why divide by n - 1 ? Adjustment for the fact that the mean is just

an estimate of the true population mean Tends to make the variance larger

Standard deviation

The standard deviation

• Take the square root of the variance• Limitations:

Sensitive to outliers

)( 1

22

nn

xxnSD

ii

Standard deviation

Example

Patient No of X rays

Deviation from mean

Absolute deviation

Square deviation

Square of observation

s

A 10 10-9= 1 1 12 = 1 102 = 100

B 8 8-9= -1 1 -12 = 1 82 = 64

C 6 6-9= -3 3 -32 = 9 62 = 36

D 12 12-9 = 3 3 32 = 9 122 = 144

E 9 9-9 = 0 0 02 = 0 92 = 81

Total 45 0 8 20 425

Mean = 45/9 = 9 x-rays Mean deviation = 8/5 = 1.6 x-rays

Variance = (20/(5-1)) = 20/4 = 5 x-rays Standard deviation = 5 = 2.2

Properties of the standard deviation

• Unaffected if same constant is added to (or subtracted from) every observation

• If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant

Standard deviation

Need of a measure of variation that is independent from the

measurement unit• The standard deviation is expressed in

the same unit as the mean: e.g., 3 cm for height, 1.4 kg for weight

• Sometimes, it is useful to express variability as a percentage of the mean e.g., in the case of laboratory tests, the

experimental variation is ± 5% of the mean

Standard deviation

The coefficient of variation

• Calculate the standard deviation• Divide by the mean

The standard deviation becomes “unit free”

• Coefficient of variation (%) = [S.D / Mean] x 100 (Pure number)

Standard deviation

Uses of the coefficient of variation

• Compare the variability in two variables studied which are measured in different units Height (cm) and weight (kg)

• Compare the variability in two groups with widely different mean values Incomes of persons in different socio-

economic groups

Standard deviation

A summary of measures of dispersion

Measure Advantages Disadvantages

Range •Obvious•Easy to calculate

•Uses only 2 observations•Increases with the sample size•Can be distorted by outliers

Inter-quartile range

•Not affected by extreme values

•Uses only 2 observations•Not amenable for further statistical treatment

Standard deviation

•Uses every value•Suitable for further analysis

•Highly influenced by extreme values

Choosing a measure of central tendency and a measure of

dispersion

Type of distribution

Measure of central tendency

Measure of dispersion

Normal •Mean •Standard deviation

Skewed •Median •Inter-quartile range

Exponential or logarithmic

•Geometric mean •Consult with the statistician

Key messages

• Report the range but be aware of its limitations

• Report the inter-quartile deviation when you use the median

• Report the standard deviation when you use a mean