08 measures of dispersion

17
14-04-2012 1 Research Methodology Dr. Nimit Chowdhary, Professor Saturday, April 14, 2012 1 © Dr. Nimit Chowdhary To be able to compute four common measures of variability Range Inter-quartile range Standard deviation Variance Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2

description

 

Transcript of 08 measures of dispersion

Page 1: 08 measures of dispersion

14-04-2012

1

Research Methodology Dr. Nimit Chowdhary, Professor

Saturday, April 14, 2012 1© Dr. Nimit Chowdhary

To be able to compute four common measures of variability Range Inter-quartile range Standard deviation Variance

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2

Page 2: 08 measures of dispersion

14-04-2012

2

The range is the difference between the largest and the smallest values in a set of values.

Example2 4 9 5 7 3

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 3

smallest

largest

Range = Largest – Smallest = 9 – 2 = 7

(+)Easy to calculate (-) Relies only on two

values. (-) Ignores variability of

all middle values

Data set A:

1 2 3 4 5 6 7 8 9

Range= 9 – 1 = 8

Data set B:

1 1 1 1 1 1 1 1 9

Range= 9 – 1 = 8

Page 3: 08 measures of dispersion

14-04-2012

3

The interquartile range is a measure of variability, based on dividing the dataset into quartiles.

Quartiles divide an ordered data set into four equal parts.

The values that divide each part are called the first, second and third quartiles.

First, second and third quartiles are denoted by Q1, Q2 and Q3 respectively.

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5

Arrange data set in numerical order Define the quartiles- the second quartile Q2 is

the median of the entire data set Q1 is the median of the data below Q2 Q3 is the median of the data above Q2 The interquartile range is

IQR = Q3 –Q1

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 6

Page 4: 08 measures of dispersion

14-04-2012

4

Ordered data set0 1 2 3 4 5 6 7 8 9

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8

Median is Q2Q2 = (4 + 5)/2

Q2 = 4.5Q1 = 2 Q3 = 2

Interquartile range = Q3 – Q1Interquartile range = 7 – 2 = 5

Page 5: 08 measures of dispersion

14-04-2012

5

IQR ignores outliers!0 1 2 3 4 5 6 7 8 999

Median is Q2Q2 = (4 + 5)/2

Q2 = 4.5Q1 = 2 Q3 = 2

Interquartile range = Q3 – Q1Interquartile range = 7 – 2 = 5

While range is strongly influenced by outliers, IQR is not

Variance is the average squared deviation from the mean

2 = (Xi- )2 / N 2 = variance = summation symbol Xi= element i from the data set =mean of the data set N = number of elements in the data set

Page 6: 08 measures of dispersion

14-04-2012

6

Find the variance of the following0, 1, 5, 6

Number of entries = N= 4

Mean == X/ N Deviation sum of

squares= SS = (x- )2

NX

Variance

22 )(

Page 7: 08 measures of dispersion

14-04-2012

7

Find the variance of the following0, 1, 5, 6

Mean: = X/ N= (0+1+5+6)/4= 12/4= 3

Dev sum of squares= SS= (x- )2

= (0-3)2 + (1-3)2 + (5-3)2 + (6-3)2

= 9+4+4+9 = 26 Variance= (Xi- )2 / N

= 26/4 = 6.5

NX

Variance

22 )(

The standard deviation is the square root of the variance

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14

NXdeviationdardS i /)( tan 2

Page 8: 08 measures of dispersion

14-04-2012

8

What happens to variability when you add a constant to each value in the data set?

All measures of variability- range, interquartile range, variance, and standard deviation- stay the same

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15

The variance and standard deviation are the most common and useful measures of variability.

These two measures provide information about how the data vary about the mean.

Page 9: 08 measures of dispersion

14-04-2012

9

When the data are clustered about the mean, the variance and standard deviation will be somewhat small.

When the data are widely scattered about the mean, the variance and standard deviation will be somewhat large.

Page 10: 08 measures of dispersion

14-04-2012

10

The sample variance is an approximate average of the squared deviations of the data values from the sample mean.

The sample variance is computed from the following formula and is denoted by s2:

3-20

What is the variance for the following sample values?

3 8 6 14 0 11

NOTE: Do not let the formula intimidate you. We will build a table to help with the computations.

Page 11: 08 measures of dispersion

14-04-2012

11

We will build a table to help in the computations. NOTE: The mean = 7.

S2 = 132/(6 – 1)= 132/5= 26.4

In the previous example, observe that the variance is large relative to the size of the data values.

This can be observed from the plot which shows that the data values are very much spread out about the mean value of 7.

Page 12: 08 measures of dispersion

14-04-2012

12

The sample standard deviation is the positive square root of the variance.

NOTE: the standard deviation has the same unit as the variable.

Example: The sample standard deviation for the previous example is

If all of the observations have the same value, the sample variance (standard deviation) will be zero. That is, there is no variability in the data set.

The variance (standard deviation) is influenced by outliers in the data set.

The unit for the standard deviation is the same as that for the raw data.

Thus it is preferred to use the standard deviation rather than the variance as the measure of variability.

Page 13: 08 measures of dispersion

14-04-2012

13

The population variance is the average of the squared deviations of the data values from the population mean.

The population variance is computed from the following formula and is denoted by ss2 2 :

The population standard deviation is the positive square root of the population variance.

The population standard deviation is computed from the following formula and is denoted by s s :

Page 14: 08 measures of dispersion

14-04-2012

14

The coefficient of variation (CV) allows us to compare the variation of two (or more) different variables.

Explanation of the term – sample coefficient of variation: the sample coefficient of variation is defined as the sample standard deviation divided by the sample mean of the data set.

Usually, the result is expressed as a percentage.

NOTE: The sample coefficient of variation standardizes the variation by dividing it by the sample mean.

Page 15: 08 measures of dispersion

14-04-2012

15

The coefficient of variation has no units since the standard deviation and the mean have the same units, and thus cancel out each other.

Because of this property, we can use this measure to compare the variations for different variables with different units.

3-30

The mean number of tourists arriving at a monument over a four-month period was 90, and the standard deviation was 5. The average expenditure made at the site was Rs.5,400, and the standard deviation was Rs. 775. Compare the variations of the two variables.

Page 16: 08 measures of dispersion

14-04-2012

16

Since the CV is larger for the revenues, there is more variability in the recorded revenues than in the number of tickets issued.

Explanation of the term – population coefficient of variation: the population coefficient of variation is defined as the population standard deviation divided by the population mean of the data set.

NOTE: The population CV has the same properties as the sample CV.

Page 17: 08 measures of dispersion

14-04-2012

17

Different measures of dispersion Range Interquartile range Variance Standard deviation

Concept of Coefficient of Variance

Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 33