BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

29
BCOR 1020 Business Statistics Lecture 5 – January 31, 2008
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    4

Transcript of BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Page 1: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

BCOR 1020Business Statistics

Lecture 5 – January 31, 2008

Page 2: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Overview

• Chapter 4 – Descriptive Statistics…– Standardized Data– Percentiles and Quartiles– Boxplots

Page 3: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

Chebyshev’s Theorem – Developed by mathematicians Jules Bienaymé (1796-1878) and Pafnuty Chebyshev (1821-1894).• For any population with mean and standard deviation ,

the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 – 1/k2].– For k = 2 standard deviations, 100[1 – 1/22] = 75%

(So, at least 75.0% will lie within + 2– For k = 3 standard deviations, 100[1 – 1/32] = 88.9%

(So, at least 88.9% will lie within + 3

• Although applicable to any data set, these limits tend to be too wide to be useful.

Page 4: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Clickers

Using Chebyshev’s Theorem, determine the minimum percentage of observations that lie within 4 standard deviations of the mean.

100[1 – 1/k2]

A = 75.0%

B = 88.9%

C = 93.8%

D = 96.0%

Page 5: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

The Empirical Rule:• The normal or Gaussian distribution was named

for Karl Gauss (1771-1855).• The normal distribution is symmetric and is also

known as the bell-shaped curve.• The Empirical Rule states that given data from a

normal distribution, we expect that for…k = 1: About 68.26% will lie within + 1k = 2: About 95.44% will lie within + 2k = 3: About 99.73% will lie within + 3

Page 6: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

The Empirical Rule:• Distance from the mean is measured in terms of

the number of standard deviations.• Unusual Observations:

Unusual observations are those that lie beyond + 2.

Outliers are observations that lie beyond + 3.

Note: no upper bound is given. Data values outside + 3 are rare.

Page 7: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Clickers

Suppose 80 students take an exam. Assuming exam scores follow a normal distribution,approximately how many students would youexpect to have scores within 2 standard deviationsof the mean?

A = 55

B = 76

C = 79

D = 80

Page 8: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

Defining a Standardized Variable:• A standardized variable (Z) redefines each observation

in terms the number of standard deviations from the mean.

iix

z

Standardization formula for a population:

Standardization formula for a sample:

iix x

zs

• zi tells how far away the observation is from the mean (in terms of .

• A negative z value means the observation is below the mean.

• Positive z means the observation is above the mean.

Page 9: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

Defining a Standardized Variable:• MegaStat calculates standardized values as well

as checks for outliers.• In Excel, use =STANDARDIZE(Array, Mean,

STDev) to calculate a standardized z value.

Page 10: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized DataExample: Unusual Observations in the P/E Data• The P/E ratio data contains several large data values.

Are they unusual or outliers?

Raw Data:

Standardized

Data:

7 8 8 10 10 10 10 12 13 13 13 13

13 13 13 14 14 14 15 15 15 15 15 16

16 16 17 18 18 18 18 19 19 19 19 19

20 20 20 21 21 21 22 22 23 23 23 24

25 26 26 26 26 27 29 29 30 31 34 36

37 40 41 45 48 55 68 91

Page 11: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

Outliers: What do we do with outliers in a data set?• If due to erroneous data, then discard.• An outrageous observation (one completely

outside of an expected range) is certainly invalid.

• Recognize unusual data points and outliers and their potential impact on your study.

• Research books and articles on how to handle outliers.

Page 12: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized Data

Estimating Sigma:• It is common to use the sample standard

deviation (S) as an estimate of • We can also use the empirical rule to define a

simple (quick-and-dirty) estimate:– For a normal distribution, the range of 99.73% of the

values is 6 (from – 3 to + 3).– If you know the range R (high – low), you can estimate

the standard deviation as = R/6.– Useful for approximating the standard deviation when

only R is known.– This estimate depends on the assumption of normality.

Page 13: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Percentiles:• Percentiles are data that have been divided into

100 groups.– For example, you score in the 83rd percentile on a standardized

test. That means that 83% of the test-takers scored below you.

• Deciles are data that have been divided into 10 groups (i.e. 10th, 20th, 30th, etc. percentiles).

• Quintiles are data that have been divided into 5 groups (i.e. 20th, 40th, 60th, 80th, 100th percentiles).

• Quartiles are data that have been divided into 4 groups (i.e. 25th, 50th, 75th, 100th percentiles).

Page 14: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Percentiles:• Percentiles are used to establish benchmarks

for comparison purposes… • (e.g., health care, manufacturing and banking

industries use 5, 25, 50, 75 and 90 percentiles).

– Percentiles are used in employee merit evaluation and salary benchmarking.

Page 15: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Quartiles:• Quartiles are scale points that divide the sorted data into

four groups of approximately equal size.

• The three values that separate the four groups are called Q1, Q2, and Q3, respectively.– Quartiles (25, 50, and 75 percent) are commonly used to assess

financial performance and stock portfolios.

Q1 Q2 Q3

Lower 25%

| Second 25%

| Third 25%

| Upper 25%

Page 16: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Q2

Lower 50% | Upper 50%

• The second quartile Q2 is the median, an important indicator of central tendency.

• Q1 and Q3 measure dispersion since the interquartile range Q3 – Q1 measures the degree of spread in the middle 50 percent of data values.

Quartiles:

Q1 Q3

Lower 25% | Middle 50% | Upper 25%

Page 17: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Method of Medians:• For small data sets, find quartiles using method

of medians:

Step 1. Sort the observations.

Step 2. Find the median Q2.

Step 3. Find the median of the data values that lie below Q2. This is Q1.

Step 4. Find the median of the data values that lie above Q2. This is Q3.

Page 18: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

ClickersRecall the following P/E ratios for 68 stocks in a portfolio. First Find Q1, Q2 and Q3.

We can use quartiles to define benchmarks for stocks that are low-priced (bottom Quartile or Q1) or high-priced (top quartile or Q3). What is the P/E ratio benchmark for high-priced stocks in this portfolio?

A = 14 B = 19

C = 26 D = 36

7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 14

14 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19

19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26

26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91

Page 19: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Example: P/E Ratios and Quartiles:• recall from the previous question:

• These quartiles express central tendency (M = Q2) and dispersion (the interquartile range IQR).

• Because of clustering of identical data values, these quartiles do not provide clean cut points between groups of observations.

QQ11 QQ22 QQ33

Lower Lower 25%25%

of of P/E P/E RatiosRatios

1414 Second Second 25%25%

of of P/EP/E Ratios Ratios

1919 Third Third 25%25%

of of P/EP/E Ratios Ratios

2626 Upper 25%Upper 25% of of P/EP/E Ratios Ratios

Page 20: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Excel Quartiles:• Use Excel function =QUARTILE(Array, k) to

return the kth quartile.• Excel treats quartiles as a special case of

percentiles. For example, to calculate Q3…– We can use either =QUARTILE(Array, 3) or

=PERCENTILE(Array, 75)

• Excel calculates the quartile positions as:Position of Q1 0.25n + 0.75

Position of Q2 0.50n + 0.50

Position of Q3 0.75n + 0.25

Page 21: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & QuartilesCaution:• Quartiles generally resist outliers.

• However, quartiles do not provide clean cut points in the sorted data, especially in small samples with repeating data values.

Data set A: 1, 2, 4, 4, 8, 8, 8, 8 Q1 = 3, Q2 = 6, Q3 = 8

Data set B: 0, 3, 3, 6, 6, 6, 10, 15 Q1 = 3, Q2 = 6, Q3 = 8

• Although they have identical quartiles, these two data sets are not similar. The quartiles do not represent either data set well.

Page 22: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Central Tendency & Dispersion Using Quartiles:Some robust measures of central tendency using quartiles are:• Median (M = Q2) – we’ve already discussed.

• Midhinge – The mean of the 1st and 3rd quartiles:

Both are robust measures of central tendency since they ignore extreme values (outliers).

Midhinge = 1 3

2

Q Q

Page 23: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Percentiles & Quartiles

Central Tendency & Dispersion Using Quartiles:Some robust measures of dispersion using quartiles are:• Midspread (Innerquartile Range,IQR) – A robust

measure of dispersion:

• Coefficient of Quartile Variation (CQV) – Measures relative dispersion, expresses the midspread as a percent of the midhinge:

– Similar to the CV, CQV can be used to compare data sets measured in different units or with different means.

Midspread = Q3 – Q1

3 1

3 1

100Q Q

CQVQ Q

Page 24: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Clickers

Recall from the data set of 68 P/E ratios:Min = 7, Q1 = 14, Q2 = 19, Q3 = 26, Max = 91

What is the Midspread (Innerquartile Range)?

A) 12

B) 19

C) 77

D) 84

Page 25: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Boxplots

Boxplots – A useful tool of exploratory data analysis (EDA).• Also called a box-and-whisker plot.• Based on a five-number summary:

Xmin, Q1, Q2, Q3, Xmax

Example: Consider the five-number summary for the 68 P/E ratios…

Xmin = 7, Q1 = 14, Q2 = 19, Q3 = 26, Xmax = 91

Page 26: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Boxplots

• The Boxplot for the P/E ratio data is …

MinimumMinimum

Median (Median (QQ22))

MaximumMaximum

QQ11 QQ33

BoxBox

WhiskersWhiskers

Right-skewedRight-skewed

Page 27: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Boxplots

Fences and Unusual Data Values – Use quartiles to detect unusual data points.• These points are called fences and can be found

using the following formulas:

• Values outside the inner fences are unusual while those outside the outer fences are outliers.

Inner fences Outer fences:

Lower fence Q1 – 1.5 (Q3–Q1) Q1 – 3.0 (Q3–Q1)

Upper fence Q3 + 1.5 (Q3–Q1) Q3 + 3.0 (Q3–Q1)

Page 28: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Boxplots

Fences and Unusual Data Values:• Truncate the whisker at the fences and display

unusual values and outliers as dots.

Example: Boxplot of P/E ratios with fences…

Based on these fences, there are three unusual P/E values and two outliers.

Inner Fence

OuterFence

Unusual Outliers

Page 29: BCOR 1020 Business Statistics Lecture 5 – January 31, 2008.

Chapter 4 – Standardized DataExample: Unusual Observations in the P/E Data• The P/E ratio data contains several large data values. Are

they unusual or outliers? Compare the boxplot to standardized data analysis…

Standardized

Data: