Chapter 04 - Measures of Dispersion and Skewness.pdf

33
Measures of Dispersion, Skewness, and Kurtosis

Transcript of Chapter 04 - Measures of Dispersion and Skewness.pdf

Measures of Dispersion, Skewness,

and Kurtosis

• Descriptive summary measure

• Helps characterize data

• Variation of observations

• Determine degree of dispersion of observations about the

center of the distribution

• Absolute dispersion

• Same unit as the observations

• Relative dispersion

• No unit

• Measures of dispersion cannot be negative

• Smallest possible value is zero

Absolute Dispersion

• Simplest and easiest to use

• Difference between the highest and the lowest observation

𝑹𝒂𝒏𝒈𝒆 = 𝒎𝒂𝒙𝒊𝒎𝒖𝒎 − 𝒎𝒊𝒏𝒊𝒎𝒖𝒎

𝑹𝒂𝒏𝒈𝒆 = 𝑼𝑪𝑳𝑯𝑪𝑰 − 𝑳𝑪𝑳𝑳𝑪𝑰

Disadvantages

• Description of data is not

comprehensive

• Affected by outliers

• Smaller for small samples;

larger for large samples

• Cannot be computed when

there is an open-ended class

interval

Advantages

• Simple

• Easy to compute

• Easy to understand

• Describe variation of the measurements

• Average squared difference of each observation from the

mean

• May also be used as a measure of how good the mean is as a

measure of central tendency

• Unit of the variance is the squared unit of the observations

• People tend to use standard deviation for easier interpretation

Population Variance

• Denoted by σ2

• N elements

• Parameter

• Cannot be computed using

sample data

Sample Variance

• Denoted by s2

• n elements

• Statistic

• Estimate value of the

population variance

• Utilizes every observation

• Affected by outliers; extreme values make the standard

deviation bloated

• Cannot be computed when there are open-ended intervals

• Addition or subtraction of a constant c to each observation

would yield the same standard deviation as the original data

set

• Multiplication or division of each observation by a constant

would result in a standard deviation multiplied by or divided by

the constant

Relative Dispersion

• Compare variability of two or more data sets even if they have

different means or different units of measurement

• Ratio of the standard deviation to the mean, expressed as a

percentage (denoted by CV)

• Small CV means less variability; large CV means greater

variability

• Not to be used when mean is 0 or negative

• A sample survey in a certain province showed the number of

underweight children under five years of age in each

barangay: 3 5 6 4 7 8 6 9 10 4 6 7 5 8 9 8 3 4 5 5

• Given the frequency distribution table of scores

The number of incorrect answers on

a true-false exam for a random

sample of 20 students was

recorded as follows: 2, 1, 3, 2,

3, 2, 1, 3, 0, 1, 3, 6, 0, 3, 3,

5, 2, 1, 4, and 2.

Given the frequency distribution of

scores of 200 students in an

entrance exam in college.

Scores Freq. <CFD LCB UCB

59 – 62 2 2 58.5 62.5

63 – 66 12 14 62.5 66.5

67 – 70 24 38 66.5 70.5

71 – 74 46 84 70.5 74.5

75 – 78 62 146 74.5 78.5

79 – 82 36 182 78.5 82.5

83 – 86 16 198 82.5 86.5

87 – 90 2 200 86.5 90.5

• Relying solely on the mean and standard deviation may be

misleading

• Possible for two data sets to have same mean and standard

deviations, yet different shapes

• If it is possible to divide the histogram at the center into two

identical halves where each half is a mirror image of the other,

then the distribution is symmetric. Otherwise, it is skewed.

Positively Skewed

• Skewed to the right

• Values concentrated at the

left

• Upper tail stretches out more

than the lower tail

Negatively Skewed

• Skewed to the left

• Values concentrated to the

right

• Lower tail stretches out more

than the upper tail

• Single value that indicates the degree and direction of

asymmetry

• Denoted by Sk

Sk = 0 Symmetric

Sk > 0 Positively skewed

Sk < 0 Negatively skewed

• To determine degree of skewness, use |Sk| (magnitude of Sk)

• If |Sk| is far from 0, then it is an indication that the distribution

is seriously skewed

• Most commonly used measures

• Pearson’s first and second coefficients of skewness

• Coefficient of skewness based on third moment

• Coefficient of skewness based on the quartiles

• Relationships among the mean, median, and mode as basis

• Signs of the measures depend only on the sign of the numerator

because S is not negative

• Problems with Pearson’s first coefficient of skewness

associated with problems of using the mode

• Based on the definition of quartiles

• Around 25 percent fall between Q1 and the median

• Around 25 percent fall between the median and Q3

• Symmetric distribution distance between Q1 and Md =

distance between Md and Q3

• Skewed distribution

• Positively skewed Md is closer to Q1

• Negatively skewed Md closer to Q3

• Term coined by Karl Pearson

• Greek word kurtos which means convex

• Shape of a hump of a relative frequency distribution compared

to the normal distribution

• Three classifications

• Mesokurtic

• Leptokurtic

• Platykurtic

• Graph

• Displays the following

• Location

• Spread

• Symmetry

• Extremes

• Outliers

1. Construct a rectangle with one end at the first quartile and the

other end at the third quartile.

2. Put a vertical line at the median, across the interior of the

rectangle.

3. Compute for the inter-quartile range, lower fence, and upper

fence.

4. Locate smallest and largest values within the intervals [FL , Q1]

and [Q3, FU], respectively. Draw a line from these values to

the quartiles.

5. Values falling outside the fences are considered outliers,

denoted by “x”.

• Construct the boxplot for the following data set:

1 15 21 22 24

10 18 22 23 25

14 20 22 24 28

Definition

• Population Variance

𝜎2 = (𝑋𝑖 − 𝜇)2𝑁

𝑖=1

𝑁

• Sample Variance

𝑆2 = (𝑋𝑖 − 𝑋 )2𝑛

𝑖=1

𝑛 − 1

Computational Formula

• Population Variance

𝜎2 =𝑁 𝑋𝑖

2𝑁𝑖=1 − ( 𝑋𝑖

𝑁𝑖=1 )2

𝑁2

• Sample Variance

𝑆2 =𝑛 𝑋𝑖

2𝑛𝑖=1 − ( 𝑋𝑖

𝑛𝑖=1 )2

𝑛(𝑛 − 1)

Definition

• Population Variance

𝜎2 = 𝑓𝑖(𝑋𝑖 − 𝜇)2𝑘

𝑖=1

𝑁

• Sample Variance

𝑆2 = 𝑓𝑖(𝑋𝑖 − 𝑋 )2𝑘

𝑖=1

𝑛 − 1

Computational Formula

• Population Variance

𝜎2 =𝑁 𝑓𝑖𝑋𝑖

2𝑘𝑖=1 − ( 𝑓𝑖𝑋𝑖

𝑘𝑖=1 )2

𝑁2

• Sample Variance

𝑆2 =𝑛 𝑓𝑖𝑋𝑖

2𝑘𝑖=1 − ( 𝑓𝑖𝑋𝑖

𝑘𝑖=1 )2

𝑛(𝑛 − 1)

Population CV

𝐶𝑉 = 𝜎

𝜇× 100%

• Where

• σis the population standard deviation

• μis the population mean

Sample CV

𝐶𝑉 = 𝑠

𝑋 × 100%

• Where

• s is the sample standard deviation

• 𝑋 is the sample mean

First Coefficient of Skewness

𝑆𝑘1 =𝑋 − 𝑀𝑜

𝑆

Second Coefficient of Skewness

𝑆𝑘2 =3(𝑋 − 𝑀𝑑)

𝑆

Where 𝑋 = sample mean; Md= sample median; Mo = sample mode;

S = sample standard deviation

𝑆𝑘4 =𝑄3 − 𝑀𝑑 − (𝑀𝑑 − 𝑄1)

𝑄3 − 𝑄1=

𝑄1 + 𝑄3 − 2𝑀𝑑

𝑄3 − 𝑄1