BS-2
-
Upload
imtiazbulbul -
Category
Documents
-
view
221 -
download
2
description
Transcript of BS-2
1/27/2015
1
Lecture 2: Methods for Describing data through
Numerical Measures
North South University School of BusinessSlide 1 of 76
Outline• Measures of central tendency and dispersion
• Characteristics, uses, advantages, and disadvantages ofeach measure of location and dispersion
• Chebyshev’s theorem and the Empirical Rule as theyrelate to a set of observations
North South University School of BusinessSlide 2 of 76
• Quartiles, deciles, and percentiles
• Box plots
• Coefficient of skewness and coefficient of variation
• Scatter diagram
• Contingency table
Numerical Ways of Describing Data
• Measures of location
North South University School of BusinessSlide 3 of 76
Measures of location
• Measures of dispersion
Parameter and Statistic
A ParameterParameter is a measurable characteristic of a population
A statisticstatistic is a measurable characteristic of a
North South University School of BusinessSlide 4 of 76
A statisticstatistic is a measurable characteristic of a sample
1/27/2015
2
Measures of Location and Dispersion
• Measures for Population Data
• Measures for Sample Data
North South University School of BusinessSlide 5 of 76
• Measures for Ungrouped data
• Measures for grouped data
Measures of Location
Mean (Arithmetic, Weighted, Geometric) Median Mode
North South University School of BusinessSlide 6 of 76
Arithmetic Mean
The The Arithmetic MeanArithmetic Meanis the most widely used is the most widely used measure of location and measure of location and
shows the central value of shows the central value of the datathe data
Average Joe
North South University School of BusinessSlide 7 of 76
the datathe data
It is calculated by summing the values and
dividing by the number of values
Population Mean
N
X
For ungrouped data, the For ungrouped data, the
Population MeanPopulation Mean is is the sum of all the the sum of all the population values population values
divided by the total divided by the total number of populationnumber of population
North South University School of BusinessSlide 8 of 76
where µ is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding.
number of population number of population values:values:
1/27/2015
3
Example 1
The Kiers family owns four cars. The following is
the current mileage on
each of the four
56,000
42,000
North South University School of BusinessSlide 9 of 76
500,484
000,73...000,56
N
X
Find the mean mileage for the cars.
cars. 23,000
73,000
Sample Mean
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample
values:
North South University School of BusinessSlide 10 of 76
n
XX
where n is the total number of values in the sample.
Example 2
A sample of five
executives received the
14.0, 15.0, 17.0, 16 0
North South University School of BusinessSlide 11 of 76
4.155
77
5
0.15...0.14
n
XX
following bonus last
year ($000):
16.0, 15.0
Properties of the Arithmetic Mean
Every set of interval-level and ratio-level data has amean.
All the values are included in computing the mean.
A set of data has a unique mean.
North South University School of BusinessSlide 12 of 76
The mean is affected by unusually large or small datavalues.
The arithmetic mean is the only measure of locationwhere the sum of the deviations of each value from themean is zero.
1/27/2015
4
Example 3
Consider the set of values: 3, 8, and 4. The meanmean is 5. Illustrating the fifth
property
North South University School of BusinessSlide 13 of 76
0)54()58()53()( XX
property
Weighted Mean
The Weighted MeanWeighted Mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2,
...,wn, is computed from the
North South University School of BusinessSlide 14 of 76
)21
)2211
...(
...(
n
nnw
www
XwXwXwX
n pfollowing formula:
Example 4
During a one hour period on a hot Saturday afternoon cabana boy
Chris served fifty drinks. He sold five drinks for $0.50, fifteen for
$0.75, fifteen for $0.90, and fifteen for $1 10 Compute the weighted
North South University School of BusinessSlide 15 of 76
89.0$50
50.44$1515155
)15.1($15)90.0($15)75.0($15)50.0($5
wX
for $1.10. Compute the weighted mean of the price of the drinks.
The Median
There are as many values above the
median as below it in
The MedianMedian is the midpoint of the values after they have been
ordered from the smallest
North South University School of BusinessSlide 16 of 76
the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.
to the largest.
1/27/2015
5
The ages for a sample of five college students are:
21, 25, 19, 20, 22.
Arranging the data in ascending order
The median (cont’d)
North South University School of BusinessSlide 17 of 76
ggives:
19, 20, 21, 22, 25.
Thus the median is 21.
Example 5
Arranging the data in ascending order
gives:
The heights of four basketball players, in inches, are: 76, 73, 80, 75.
North South University School of BusinessSlide 18 of 76
gives:
73, 75, 76, 80
Thus the median is 75.5.
The median is found at the
(n+1)/2 = (4+1)/2 =2.5th data point.
Properties of the Median
There is a unique median for each data set.
It is not affected by extremely large or smallvalues and is therefore a valuable measure oflocation when such values occur
North South University School of BusinessSlide 19 of 76
location when such values occur.
It can be computed for ratio-level, interval-level, and ordinal-level data.
The Mode
The ModeMode is another measure of location and represents the value of the observation that
appears most frequently.
North South University School of BusinessSlide 20 of 76
1/27/2015
6
Example 6
The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of
81 occurs the most often, it is the mode.
North South University School of BusinessSlide 21 of 76
Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes,
trimodal, and so on.
Symmetric distribution: A distribution having the same shape on either side of the center
The Relative Positions of the Mean, Median, and Mode
North South University School of BusinessSlide 22 of 76
Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.
Can be positively or negatively skewed, or bimodal
The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Zero skewness Mean
= Median
= Mode
North South University School of BusinessSlide 23 of 76
Mode
Median
Mean
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
• Positively skewed:Mean and median are to the right of the mode.
North South University School of BusinessSlide 24 of 76
Mean > Median > Mode
Mode
Median
Mean
1/27/2015
7
Negatively Skewed: Mean and Median are to the left of the Mode
The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution
North South University School of BusinessSlide 25 of 76
Mean < Median < Mode
ModeMean
Median
Geometric Mean
The Geometric Mean(GM) of a set of n positive
numbers is defined as the nthroot of the product of the nnumbers. The formula is:
North South University School of BusinessSlide 26 of 76
GM X X X X nn ( )( )( )... ( )1 2 3
The geometric mean is used to average percents,
indexes, and relatives.
Example 7
The interest rate on three bonds were 5, 21, and 4 percent.
The arithmetic mean is (5+21+4)/3 =10.0.
The geometric mean is
North South University School of BusinessSlide 27 of 76
49.7)4)(21)(5(3 GM
The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.
Example 8
• The return on investment earned by Atkins ConstructionCompany for four successive years was: 30%, 20%, -40%, and 200%. What is the geometric mean rate ofreturn on investment?
North South University School of BusinessSlide 28 of 76
294.10.36.02.13.1... 421 n
nXXXGM
The average rate of return is 29.4%
1/27/2015
8
Geometric Mean (cont’d)
Another use of the geometric mean is to determine the
percent increase in sales, production or other business or
Grow th in Sales 1999-2004
10
20
30
40
50
ales
in M
illion
s($)
North South University School of BusinessSlide 29 of 76
1period) of beginningat (Value
period) of endat Value( nGM
other business or economic series
from one time period to another.
0
10
1999 2000 2001 2002 2003 2004
Year
Sa
Example 9
The total number of females enrolled in American colleges increased from 755,000 in
1992 to 835,000 in 2000.
North South University School of BusinessSlide 30 of 76
0127.1000,755
000,8358 GM
The value 0.0127 indicates that the average annual growth over the last 8-year period was 1.27%.
Dispersionrefers to the
spread or variability in
the data.
Measures of Dispersion
0
5
10
15
20
25
30
0 2 4 6 8 10 12
North South University School of BusinessSlide 31 of 76
the data.
Measures of dispersion include the following: Measures of dispersion include the following:
range, mean deviation, variance, and range, mean deviation, variance, and standard deviationstandard deviation..
Range = Largest value – Smallest value
0 2 4 6 8 10 12
The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.
-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.01 4 4 8 7 9 9 7 15 0
Example 10
North South University School of BusinessSlide 32 of 76
-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1
Highest value: 22.1 Lowest value: -8.1
Range = Highest value – lowest value= 22.1-(-8.1)
= 30.2
1/27/2015
9
MAD:The arithmetic
mean of the absolute l f th
The main features :
All values are used in the calculation.
It is not unduly influenced by large or small values
Mean Absolute Deviation (MAD)
North South University School of BusinessSlide 33 of 76
values of the deviations from the
arithmetic mean.
by large or small values.
The absolute values are difficult to manipulate.
n
XXMAD
The weights of a sample of crates containing books for the bookstore (in pounds ) are:
103, 97, 101, 106, 103Find the mean deviation.
X = 102
Example 11
North South University School of BusinessSlide 34 of 76
The mean deviation is:
4.25
541515
102103...102103
n
XXMD
Variance:the arithmetic mean of the
squared
Variance and standard Deviation
North South University School of BusinessSlide 35 of 76
squared deviations from
the mean.
Standard deviation: The square root of the variance.
Not influenced by extreme values.
The units are awkward, the square of the
The major characteristics:The major characteristics:
Population Variance
North South University School of BusinessSlide 36 of 76
, qoriginal units.
All values are used in the calculation.
1/27/2015
10
Population VariancePopulation Variance formula:
X is the value of an observation in the population
Variance and standard deviation
N
X
2
2
North South University School of BusinessSlide 37 of 76
X is the value of an observation in the population
µ is the arithmetic mean of the population
N is the number of observations in the population
Population Standard Deviation formula:
2
In Example 10, the variance and standard deviation are:
(X - )2
N =
Example 10 (revisited)
62.6
North South University School of BusinessSlide 38 of 76
(-8.1-6.62)2 + (-5.1-6.62)2 + ... + (22.1-6.62)2
25
= 42.227
= 6.498
Sample variance (s2):
s2 =(X - X)2
1
Sample variance and standard deviation
North South University School of BusinessSlide 39 of 76
s2 = n-1
Sample standard deviation (s):
2ss
37X
Example 12
The hourly wages earned by a sample of five students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
North South University School of BusinessSlide 40 of 76
40.75
37
n
XX
30.5
15
2.21
15
4.76...4.77
1
2222
n
XXs
30.230.52 ss
1/27/2015
11
Chebyshev’s theorem: For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of
the mean is at least:
1
Chebyshev’s theorem
North South University School of BusinessSlide 41 of 76
where k is any constant greater than 1.
2
11
k
Chebyshev’s theorem (cont’d)
The arithmetic mean biweekly amount by theDupree Paint employees to the company’s profit-sharing plan was $51.54, and the standarddeviation is $7.51. At least what percent of thecontributions lie within plus 3.5 standard deviationsand minus 3 5 standard deviations of the mean?
North South University School of BusinessSlide 42 of 76
and minus 3.5 standard deviations of the mean?
92.0
25.12
11
5.3
11
11 22
k
About 92%.
Empirical RuleEmpirical Rule: For any symmetrical, bell-shaped distribution:
About 68% of the observations will lie within ±1s f th
Interpretation and Uses of theStandard Deviation
North South University School of BusinessSlide 43 of 76
of the mean
About 95% of the observations will lie within ± 2s of the mean
Virtually all (99.7%) the observations will be within ± 3s of the mean
68%
Interpretation and Uses of the Standard Deviation
Bell-shaped Curve showing the relationship between µand σ
North South University School of BusinessSlide 44 of 76
68%
95%99.7%
1/27/2015
12
The Mean of Grouped Data
The Mean of a sample of data organized in a frequency
distribution is computed by the following formula:
North South University School of BusinessSlide 45 of 76
n
MfX
Example 13A sample of ten movie theaters
in a large metropolitan
area tallied the total number of movies showing
Movies showing
frequency f
class midpoint M
(f)(M)
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
North South University School of BusinessSlide 46 of 76
movies showing last week.
Compute the mean number of
movies showing.
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to 11 3 10 30
Total 10 66
6.610
66
n
MfX
The Median of Grouped Data
2CF
n
The Median of a sample of data organized in a frequency distribution is computed by:
North South University School of BusinessSlide 47 of 76
)(2 if
LMedian
where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class
interval.
Finding the Median Class
• Construct a cumulative frequency distribution.
• Decide the class that contains the median. MedianClass is the first class with the value of cumulativefrequency at least n/2
North South University School of BusinessSlide 48 of 76
frequency at least n/2.
1/27/2015
13
Example 13 (revisited)
Movies showing
Frequency Cumulative Frequency
1 up to 3 1 1
3 up to 5 2 3
North South University School of BusinessSlide 49 of 76
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10
Example 13 (cont’d)
From the table, L= 5, n =10, f = 3, i = 2, CF = 3
North South University School of BusinessSlide 50 of 76
33.6)2(3
32
10
5)(2
if
CFn
LMedian
The Mode of Grouped Data
The Mode for grouped data is approximated by the midpoint of the class
with the largest class frequency.
Movies h i
frequency f
class id i t
North South University School of BusinessSlide 51 of 76
The modes in example 13 are 6 and 10 and so is
bimodal.
showing f midpoint M
1 up to 3 1 2
3 up to 5 2 4
5 up to 7 3 6
7 up to 9 1 8
9 up to 11 3 10
The Standard Deviation of Grouped Data
The Standard Deviation of a sample of data organized in
a frequency distribution is computed by the following
North South University School of BusinessSlide 52 of 76
1
2
n
XMfs
formula:
1/27/2015
14
Example 13 (revisited)
A sample of ten movie theaters in a large metropolitan
area tallied the total number of movies showing last week.
Compute the standard deviation of
Movies showing
frequency f class midpoint M
(M-X) f*(M-X)2
1 up to 3 1 2 -4.6 21.16
3 up to 5 2 4 -2.6 13.52
5 up to 7 3 6 -0.6 1.08
7 up to 9 1 8 1.4 1.96
North South University School of BusinessSlide 53 of 76
standard deviation of movies showing.
p
9 up to 11 3 10 3.4 34.68
Total 10 72.40
8363.2
110
40.72
1
2
n
XMfs
Other Measures of Dispersion
• Quartiles divide a set of observations into four equalparts
• Deciles divide a set of observations into 10 equalparts
North South University School of BusinessSlide 54 of 76
• Percentiles divide a set of observations into 100equal parts
Quartiles
Locate the median,
(50th percentile)
first quartile (25th percentile)
and the 3rd quartile
North South University School of BusinessSlide 55 of 76
and the 3rd quartile
(75th percentile)
Location of a Percentile
P
100
where
Lp = (n+1)
North South University School of BusinessSlide 56 of 76
P is the desired percentile
1/27/2015
15
80
90
100
Stock prices on twelveconsecutive days for a
majorpublicly traded company
Example 14
North South University School of BusinessSlide 57 of 76
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12
86, 79, 92, 84, 69, 88, 91
83, 96, 78, 82, 85.
Using the twelve stock prices, we can find the median, 25th, and 75th percentiles as follows:
L75 = (12 + 1) 75100
= 9.75th observationQuartile 3
Example 14 (cont’d)
North South University School of BusinessSlide 58 of 76
L50 = (12 + 1) 50100 = 6.50th observation
L25 = (12+1) 25100
= 3.25th observationQuartile 1
Median
9692918886
12111098 50th percentile: Median
75th percentilePrice at 9.75 observation = 88 + .75(91-88)
= 90.25
Q3
Q4
Example 14 (cont’d)To locate the values, the first step is to organize the data in increasing order
North South University School of BusinessSlide 59 of 76
8685848382797869
87654321
25th percentilePrice at 3.25 observation = 79 + .25(82-79)
= 79.75
50 percentile: MedianPrice at 6.50 observation = 84 + .5(85-84)
= 84.50
Q1
Q2
Q3
Interquartile Range
The Interquartilerange is the distance
between the third quartile Q3 and the
This distance will include the middle 50 percent of the
North South University School of BusinessSlide 60 of 76
3
first quartile Q1.p
observations.
Interquartile range = Q3 - Q1
1/27/2015
16
Example 15For a set of
observations the third quartile is 24 and the
first quartile is 10. What is the quartile
deviation?
North South University School of BusinessSlide 61 of 76
deviation?
The interquartile range is 24 - 10 = 14. Fifty
percent of the observations will occur
between 10 and 24.
Box Plots
Five pieces of data are needed
A box plot is a graphical display, based on quartiles, that helps to picture a set of
data.
North South University School of BusinessSlide 62 of 76
to construct a box plot: the Minimum Value, the First Quartile, the Median, the Third Quartile, and the Maximum Value.
Example 16
Based on a sample of 20 deliveries, Buddy’s Pizza determined the following
information. The minimum delivery time was 13 minutes and the maximum 30 minutes. The first quartile was 15
minutes the median 18 minutes and the
North South University School of BusinessSlide 63 of 76
minutes, the median 18 minutes, and the third quartile 22 minutes. Develop a box
plot for the delivery times.
Example 16 (cont’d)
North South University School of BusinessSlide 64 of 76
1/27/2015
17
Example 16 (cont’d)
Q1 Q3MaxMin Median
North South University School of BusinessSlide 65 of 76
12 14 16 18 20 22 24 26 28 30 32
Coefficient of Variation
The coefficient of variation is the ratio of the standard deviation to the arithmetic
mean expressed as a
Relative dispersion
North South University School of BusinessSlide 66 of 76
%)100(X
sCV
mean, expressed as a percentage:
Mean
Skewness is the measurement of the lack of symmetry of the distribution.
The coefficient of skewness can range from -3 00 up to 3 00
Skewness
North South University School of BusinessSlide 67 of 76
from 3.00 up to 3.00 when using the
following formula:A value of 0 indicates a symmetric distribution.
Some software packages use a different formula which results in a
wider range for the coefficient.
s
MedianXsk
3
Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, 84.5.
Coefficient of variation:
Example 14 revisited
86 79 92 84 69 88 91 83 96 78 82 85
North South University School of BusinessSlide 68 of 76
= 8.5%%)100(X
sCV
Coefficient of skewness:
= -.035
s
MedianXsk
3
1/27/2015
18
Relationship Between Two Variables
• Univariate Data (Single Variable)
• Bivariate Data (Two Variables)– Scatter diagram
North South University School of BusinessSlide 69 of 76
– Contingency table
Scatter diagram :
A technique used to
show the
Variables must be at least interval scaled
Scatter diagram
North South University School of BusinessSlide 70 of 76
show the relationship
between variables.
Relationship can be positive (direct) or negative (inverse)
96929188
PriceIndex(000s)
8.07.57.57.3
Relationship between Market Index and Stock Price
100
Example 14 revisitedThe twelve days of stock prices and the overall market index on each day
are given as follows:
North South University School of BusinessSlide 71 of 76
8685848382797869
7.27.27.17.17.06.26.25.1
50
60
70
80
90
5 6 7 8 9 10
Index
Pri
ce
A contingency table is used to classify observations
according to two identifiable characteristics.
Contingency tables are used
Contingency table
North South University School of BusinessSlide 72 of 76
A contingency table is a cross tabulation that
simultaneously summarizes two variables of interest.
g ywhen one or both variables are
nominally scaled.
1/27/2015
19
Weight Loss45 adults, all 60 pounds
overweight, are randomly assigned to three weight loss programs. Twenty weeks into
the program a researcher
Example 17
North South University School of BusinessSlide 73 of 76
the program, a researcher gathers data on weight loss
and divides the loss into three categories: less than 20
pounds, 20 up to 40 pounds, 40 or more pounds. Here are
the results.
Weight
Loss
Plan
Less than 20 pounds
20 up to 40
pounds
40 pounds or more
Plan 1 4 8 3
Example 17 (cont’d)
North South University School of BusinessSlide 74 of 76
4 8 3
Plan 2 2 12 1
Plan 3 12 2 1
Compare the weight loss under the three plans.
Practice Problems• Problem 11 (Page 62)
(Problem 13)
• Problem 21 (Page 68)
(Problem 25 (Page 69))
• Problem 27 (Page 70)
(Problem 31 (Page 71))
North South University School of BusinessSlide 75 of 76
(Problem 31 (Page 71))
• Problem 42 (Page 76)
(Problem 46 (Page 79))
• Problem 47 (Page 79)
(Problem 51 (Page 82))
• Problem 49 (Page 81)
(Problem 53 (Page 84))
Assignment-2
• Problem 55 (Page 84)
(Problem 59 (Page 88))
• Problems 11, 13 (Page 108)
(Problems 11, 13 (Page 110))
• Problem 15 (Page 111)
North South University School of BusinessSlide 76 of 76
• Problem 15 (Page 111)
(Problem 15 (Page 113))
• Problem 20 (Page 113)
• Problem 25 (Page 117)
(Problem 21 (Page 118))