Post on 21-Jul-2016
description
Chapter 2 Methods for Describing
Sets of Data
Business Statistics
Business Statistics
Our market share far exceeds all competitors!
30%30%
32%32%
34%34%
36%36%
UsYYXX
Business Statistics
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBar
GraphPie
ChartPareto
Diagram
Presenting Qualitative Data
Business Statistics
PieChart
ParetoDiagram
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
Business Statistics
Summary Table1. Lists categories & number of elements in category2. Obtained by tallying responses in category3. May show frequencies (counts), % or both
Row Is Category
Tally:|||| |||||||| ||||
Major CountAccounting 130Economics 20Management 50Total 200
Business Statistics
PieChart
SummaryTable
Data Presentation
QualitativeData
QuantitativeData
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
ParetoDiagram
0
50
100
150
Acct. Econ. Mgmt.
Major
Business Statistics
Vertical Bars for Qualitative Variables
Bar Height Shows Frequency or %
Zero Point
Percent Used Also
Equal Bar Widths
Freq
uenc
y•Bar Graph
Business Statistics
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Econ.10%
Mgmt.25%
Acct.65%
Business Statistics
Pie Chart1. Shows breakdown of
total quantity into categories
2. Useful for showing relative differences
3. Angle size• (360°)(percent)
Majors
(360°) (10%) = 36°
36°
Business Statistics
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Business StatisticsPareto DiagramLike a bar graph, but with the categories arranged by height in descending order from left to right.
0
50
100
150
Acct. Mgmt. Econ.
Major Vertical Bars for Qualitative Variables
Bar Height Shows Frequency or %
Zero Point
Percent Used Also
Equal Bar Widths
Freq
uenc
y
Business StatisticsThinking ChallengeYou’re an analyst for IRI. You want to show the market shares held by Web browsers in 2006. Construct a bar graph, pie chart, & Pareto diagram to describe the data.
Browser Mkt. Share (%)Firefox 14Internet Explorer 81Safari 4Others 1
0%
20%
40%
60%
80%
100%
Firefox InternetExplorer
Safari Others
Business Statistics
Mar
ket S
hare
(%)
Browser
•Bar Graph Solution
Business Statistics
Market Share
Safari, 4%
Firefox, 14%
Internet Explorer,
81%
Others, 1%
•Pie Chart Solution
Business Statistics
0%
20%
40%
60%
80%
100%
InternetExplorer
Firefox Safari Others
Mar
ket S
hare
(%)
Browser
•Pareto Diagram Solution
Presenting Quantitative Data
Business StatisticsData
Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Business Statistics
Stem-and-Leaf Display
1. Divide each observation into stem value and leaf value
• Stem value defines class
• Leaf value defines frequency (count)
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
262 144677
3 028
4 1
Business StatisticsData
Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Business Statistics
Frequency Distribution Table Steps1. Determine range
2. Select number of classes Usually between 5 & 15 inclusive
3. Compute class intervals (width)
4. Determine class boundaries (limits)
5. Compute class midpoints
6. Count observations & assign to classes
Business Statistics Determine the range Range (R) = highest value – lowest value Number of classes C=1 + 10/3 x log N ( N = number of
observation) Class Interval CI = R/C (rounded) Class Limits/Boundaries Lowest Limits value <= lowest value Highest Limits value >= Highest Value Class Mid Point CM = (Lower + Upper Limits) / 2
Business StatisticsData
Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
012345
Business Statistics
Frequency
Relative Frequency
Percent
0 15.5 25.5 35.5 45.5 55.5
Lower Boundary
Bars Touch
Class Freq.15.5 – 25.5 325.5 – 35.5 535.5 – 45.5 2
Count
•Histogram
Business Statistics
Raw Data:
24, 26, 24, 21, 27 27 30, 41, 32, 38
20 18 42 25 57 26 35 29 34 40
33 21 56 45 51 23 36 54 20 19
Make Distribution Frequency Table !
Business Statistics
Relative Frequency Distribution
Class
18 – 23
2
24 – 29
1 42 – 47
3
Frequency %
30 – 35 36 – 41
54 – 59 48 – 53
4
587
10 3 713172723
Numerical Data Properties
Business StatisticsStandar Notation
Measure Sample Population
Mean X
StandardDeviation S
Variance S 2 2
Size n N
Business Statistics
Central Tendency (Location)
Variation (Dispersion)
Shape
Numerical Data Properties
Business StatisticsNumerical Data
Properties
Mean
Median
Mode
CentralTendency
Range
Variance
Standard Deviation
Variation
Percentiles
RelativeStanding
Interquartile Range Z–scores
Central Tendency
Business Statistics
MeanMeanMedian
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Business StatisticsMean1. Measure of central tendency2. Most common measure3. Acts as ‘balance point’4. Affected by extreme values (‘outliers’)5. Formula (sample mean)
X
X
n
X X X
n
ii
n
n
1 1 2 …
Business StatisticsMean ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
XX
nX X X X X Xi
i
n
1 1 2 3 4 5 6
6
10 3 4 9 8 9 117 6 3 7 76
8 30
. . . . . .
.
Business Statistics
Mean
MedianMedianMode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Business StatisticsMedian1. Measure of central tendency2. Middle value in ordered sequence
If n is odd, middle value of sequence If n is even, average of 2 middle values
3. Position of median in sequence
4. Not affected by extreme values
Positioning Point n 1
2
Business StatisticsMedian Example (Odd-sized sample)Raw Data: 24.1 22.6 21.5 23.7 22.6Ordered: 21.5 22.6 22.6 23.7 24.1Position: 1 2 3 4 5
Positioning Point
Median
n 12
5 12
3 0
22 6
.
.
Business StatisticsMedian Example (Even-sized Sample)Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3 11.7Position: 1 2 3 4 5 6
Positioning Point
Median
n 12
6 12
3 5
7 7 8 92
8 30
.
. . .
Business Statistics
Mean
Median
ModeMode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Business Statistics
Mode
1. Measure of central tendency2. Value that occurs most often3. Not affected by extreme values4. May be no mode or several modes5. May be used for quantitative or qualitative
data
Business Statistics
Mode Example
No ModeRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9
More Than 1 ModeRaw Data: 21 28 28 41 43 43
Business StatisticsThinking Challenge
You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.Describe the stock pricesin terms of central tendency.
Business StatisticsMean
XX
nX X Xi
i
n
1 1 2 8
8
17 16 21 18 13 16 12 118
15 5
…
.
Business Statistics
MedianRaw Data: 17 16 21 18 13 16 12 11Ordered: 11 12 13 16 16 17 18 21Position: 1 2 3 4 5 6 7 8
Positioning Point
Median
n 12
8 12
4 5
16 1622
16
.
Business Statistics
Mode
Raw Data: 17 16 21 18 13 16 1211
Mode = 16
Business Statistics
Summary of Central Tendency Measures Measure Formula DescriptionMean X i / n Balance Point
Median(n +1)
Position 2 Middle Value When Ordered
Mode none Most Frequent
Variation
Business Statistics
Mean
Median
Mode
RangeRange
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Business Statistics
Range1. Measure of dispersion2. Difference between largest & smallest observations
Range = Xlargest – Xsmallest
3. Ignores how data are distributed
77 88 99 1010 77 88 99 1010Range = 10 – 7 = 3 Range = 10 – 7 = 3
Business Statistics
Mean
Median
Mode
Range
Interquartile Range
VarianceVarianceStandard DeviationStandard Deviation
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Business Statistics
Variance & Standard Deviation1. Measures of dispersion2. Most common measures3. Consider how data are distributed
4 6 10 12
X = 8.3
4. Show variation about mean (X or μ)
8
Business Statistics
n - 1 in denominator! (Use N if Population Variance)
Sampel Variance Formula
X X X X X Xn
n1
2
2
2 2
1
( ) ( ) ( )…=
SX X
n
ii
n
2
2
1
1
( )
Business StatisticsStandar Deviation Formula
S S
X X
n
X X X X X Xn
ii
n
n
2
2
1
12
22 2
1
1
( )
( ) ( ) ( )…
Business Statistics
Variance ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
SX X
nX
X
n
S
ii
n
ii
n
2
2
1 1
2
2 2 2
18 3
10 3 8 3 4 9 8 3 7 7 8 36 1
6 368
( )
( ) ( ) ( )where .
. . . . . .
.
…
Business Statistics
Thinking ChallengeYou’re a financial analyst
for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.
What are the variance and standard deviation of the stock prices?
Business Statistics
Variation SolutionRaw Data: 17 16 21 18 13 16 12
11
SX X
nX
X
n
S
ii
n
ii
n
2
2
1 1
2
2 2 21
15 5
17 15 5 16 15 5 11 15 58 1
1114
( )
( ) ( ) ( )where .
. . .
.
…
Business Statistics
Sample Standard Deviation
S SX X
n
ii
n
2
2
1
11114 3 34
( ). .
Business Statistics Summary of Variation Measures
Measure Formula DescriptionRange X largest – X smallest Total SpreadStandard Deviation(Sample)
X Xn
i
2
1
Dispersion aboutSample Mean
Standard Deviation(Population)
X
Ni X
2 Dispersion aboutPopulation Mean
Variance(Sample)
(X i X )2
n – 1Squared Dispersionabout Sample Mean
Interpreting Standard Deviation
Business StatisticsIntrepreting Standard Deviation : Chebyshev’s Theorem (Applies to any shape data set)
• No useful information about the fraction of data in the interval x – s to x + s
• At least 3/4 of the data lies in the interval x 2s to x + 2s
• At least 8/9 of the data lies in the interval x – 3s to x + 3s
• In general, for k > 1, at least 1 – 1/k2 of the data lies in the interval x – ks to x + ks
Business StatisticsInterpreting Standard Deviation: Chebyshev’s Theorem
sx 3 sx 3sx 2 sx 2sx xsx
No useful information
At least 3/4 of the data
At least 8/9 of the data
Business StatisticsChebyshev’s Theorem ExamplePreviously we found the mean
closing stock price of new stock issues is 15.5 and the standard deviation is 3.34.
Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.
Business Statistics
At least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean.
x = 15.5 s = 3.34
(x – 2s, x + 2s) = (15.5 – 2∙3.34, 15.5 + 2∙3.34)
= (8.82, 22.18)
Business StatisticsInterpreting Standard Deviation : Empirical Rule Applies to data sets that are mound shaped and
symmetric Approximately 68% of the measurements lie in the
interval μ – σ to μ + σ Approximately 95% of the measurements lie in the
interval μ – 2σ to μ + 2σ Approximately 99.7% of the measurements lie in the
interval μ – 3σ to μ + 3σ
Interpreting Standard Deviation: Empirical Rule
μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ
Approximately 68% of the measurements
Approximately 95% of the measurements
Approximately 99.7% of the measurements
Empirical Rule ExamplePreviously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.
Empirical Rule Example
• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)
• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)
• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),
(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)
Numerical Measures of Relative Standing
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
PercentilesPercentiles
RelativeStanding
Z–scores
Numerical Measures of Relative Standing: Percentiles
Describes the relative location of a measurement compared to the rest of the data
The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it
Median = 50th percentile
Percentile ExampleYou scored 560 on the GMAT exam. This score puts
you in the 58th percentile. What percentage of test takers scored lower than you
did?What percentage of test takers scored higher than you
did?
Percentile ExampleWhat percentage of test takers scored lower than you
did?58% of test takers scored lower than 560.
What percentage of test takers scored higher than you did?
(100 – 58)% = 42% of test takers scored higher than 560.
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scoresZ–scores
Numerical Measures of Relative Standing: Z–Scores
Describes the relative location of a measurement compared to the rest of the data
• Sample z–scorex – x
sz =
Population z–scorex – μσz =
• Measures the number of standard deviations away from the mean a data value is located
Z–Score ExampleThe mean time to assemble a
product is 22.5 minutes with a standard deviation of 2.5 minutes.
Find the z–score for an item that took 20 minutes to assemble.
Find the z–score for an item that took 27.5 minutes to assemble.
Z–Score Examplex = 20, μ = 22.5 σ = 2.5
x – μ 20 – 22.5σz = = 2.5 = –1.0
x = 27.5, μ = 22.5 σ = 2.5x – μ 27.5 – 22.5
σz = = 2.5 = 2.0
Quartiles & Box Plots
Quartiles1. Measure of noncentral tendency
25%25% 25%25% 25%25% 25%25%
QQ11 QQ22 QQ33
2. Split ordered data into 4 quarters
Positioning Point of Q i ni
14
( )3. Position of i-th quartile
Quartile (Q1) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3
11.7Position: 1 2 3 4 5 6
Q Position
Q
1
1 14
1 6 14
175 2
6 31
n( ) ( ) .
.
Quartile (Q2) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3
11.7Position: 1 2 3 4 5 6
Q Position
Q
2
2 14
2 6 14
3 5
7 7 8 92
8 32
n( ) ( ) .
. . .
Quartile (Q3) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7Ordered: 4.9 6.3 7.7 8.9 10.3
11.7Position: 1 2 3 4 5 6
Q Position
Q
3
3 14
3 6 14
5 25 5
10 33
n( ) ( ) .
.
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Interquartile RangeInterquartile RangeVariance
Standard Deviation
Skew
Numerical DataProperties
CentralTendency Variation Shape
Interquartile Range1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles Interquartile Range = Q3 – Q1
4. Spread in middle 50%
5. Not affected by extreme values
Thinking ChallengeYou’re a financial analyst for
Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.
What are the quartiles, Q1 and Q3, and the interquartile
range?
Q1
Raw Data: 17 16 21 18 13 16 1211
Ordered: 11 12 13 16 16 17 1821
Position: 1 2 3 4 5 6 7 8
Quartile Solution*
Q Position
Q
1
1 14
1 8 14
3
131
n( ) ( )
Quartile Solution*Q3
Raw Data: 17 16 21 18 13 16 1211
Ordered: 11 12 13 16 16 17 1821
Position: 1 2 3 4 5 6 7 8Q Position
Q
3
3 14
3 8 14
6 75 7
183
n( ) ( ).
Interquartile Range Solution*
Interquartile RangeRaw Data: 17 16 21 18 13 16 12
11Ordered: 11 12 13 16 16 17 18
21Position: 1 2 3 4 5 6 7 8Interquartile Range Q Q3 1 18 0 13.0 5.
Box Plot1. Graphical display of data using 5-number summary
Median
44 66 88 1010 1212
Q3Q1 XlargestXsmallest
Shape & Box Plot
Right-SkewedLeft-Skewed Symmetric
QQ11 MedianMedian QQ33QQ11 MedianMedian QQ33 QQ11 MedianMedian QQ33
Graphing Bivariate Relationships
Graphing Bivariate Relationships
Describes a relationship between two quantitative variables
Plot the data in a Scattergram
Positive relationship
Negative relationship
No relationship
x xx
yy y
Scattergram ExampleYou’re a marketing analyst for Hasbro Toys.
You gather the following data:Ad $ (x) Sales (Units) (y)
1 12 13 24 25 4
Draw a scattergram of the data
Scattergram Example
01234
0 1 2 3 4 5
Sales
Advertising
Time Series Plot
Time Series PlotUsed to graphically display data produced over timeShows trends and changes in the data over timeTime recorded on the horizontal axisMeasurements recorded on the vertical axisPoints connected by straight lines
Time Series Plot ExampleThe following data shows
the average retail price of regular gasoline in New York City for 8 weeks in 2006.
Draw a time series plot for this data.
DateAverage
PriceOct 16, 2006 $2.219Oct 23, 2006 $2.173Oct 30, 2006 $2.177Nov 6, 2006 $2.158Nov 13, 2006 $2.185Nov 20, 2006 $2.208Nov 27, 2006 $2.236Dec 4, 2006 $2.298
Time Series Plot Example
2.05
2.1
2.15
2.2
2.25
2.3
2.35
10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4
Date
Price
Distorting the Truth with Descriptive Techniques
Errors in Presenting Data1. Using ‘chart junk’
2. No relative basis in comparing data batches
3. Compressing the vertical axis
4. No zero point on the vertical axis
‘Chart Junk’
Bad PresentationBad Presentation Good PresentationGood Presentation
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage Minimum Wage
0
2
4
1960 1970 1980 1990
$
No Relative Basis
Good PresentationGood Presentation
A’s by Class A’s by Class
Bad PresentationBad Presentation
0
100
200
300
FR SO JR SR
Freq.
0%
10%
20%
30%
FR SO JR SR
%
Compressing Vertical Axis
Good PresentationGood Presentation
Quarterly Sales Quarterly Sales
Bad PresentationBad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
No Zero Point on Vertical Axis
Good PresentationGood Presentation
Monthly Sales Monthly Sales
Bad PresentationBad Presentation
0204060
J M M J S N
$
36394245
J M M J S N
$