BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

15
BUS304 – Data Charaterization 1 Chapter 3 Data Characterization

Transcript of BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

Page 1: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 1

Chapter 3

Data Characterization

Page 2: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 2

Today: Mean and Variance Mean:

also called “average” Formula:

Characterize the center of the data distribution

The most commonly used measure

Sample mean x The average derived from sample

Population mean The average derived from the

population

Exercise:

compute the mean weight for the Chargers’ offense players and defense players.

Which mean should be higher?

Why?

Are they population mean or sample mean?

Sum of datamean=

Number of data

Ways to compute the mean:

1. Use calculator.

2. Use Excel. (function: average)

Ways to compute the mean:

1. Use calculator.

2. Use Excel. (function: average)

Page 3: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 3

Sensitivity to outliers

Household income in community a: (Unit =10000$)

Household income in community b: (Unit =10000$)

Compute the mean for the following 2 groups of data

#1 #2 #3 #4 #5 #6 #7 #8

5 4 3 4 3 5 4 5

#1 #2 #3 #4 #5 #6 #7 #8

5 4 3 4 3 5 4 100

If the mayor decide to provide more public facilities to poor communities, and the decision is made based on whether the mean income in the community is below $50,000 per year.Does such a decision make sense?

Page 4: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 4

Compute the mean from frequency table

Below is a frequency table showing

the number of days the teams finish

their projects

How many days on average does a team

finish one project?

Create a histogram using the data on

the left, locate the mean on the

graph.

How to describe the shape of the

histogram?

What is the relationship between

the mean and peak?

Use relative frequency to find out the

mean.

Days to Complete Frequency

Relative

Frequency

5 4 ?

6 12 ?

7 8 ?

8 6 ?

9 4 ?

10 2 ?

total daysmean=

total teams

Page 5: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 5

Compute the mean from Histogram

Histogram

0

3

6

5

4

2

00

1

2

3

4

5

6

7

5 15 25 36 45 55 More

Fre

qu

ency

5 15 25 35 45 55

Histogram conveys the same information as the frequency table

total data value 15 3 25 6 35 5 45 4 55 2mean= 33

data size 3 6 5 4 2

x=33Mathematical Expression: if sample, if population=33

Page 6: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 6

Weighted Mean The mean assumes that each piece of information

equally. E.g. average score of the students.

Sometimes, different data should be put in different weight. One may be more important than the other.

• E.g. some instructor assign 60% on the homework score, and 40% on the final exam. If a student’s homework score is 84, and got 70 in the exam, compute the student’s final score. (weighted mean of homework score and exam score)

-- this teacher thinks homework reveals more comprehensive information about a student’s knowledge, and hence put more weight.

Page 7: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 7

When to use weighted mean?

Some other examples of weighted mean:

A student’s GPA. A course with more credit takes more weight.

An economic growth indicator. (some industries affects the economy more than

others)

Crush time leader: a player who perform the best in the last few minutes of the

game. – can reveal the person’s performance under pressure.

Expectation – you will see in chapter 4

• E.g. in a gambling game, if with 60% chance you lose one dollar, and with 40%

chance you gain one dollar, the expectation is

60%x$(-1)+40%x$1=-$0.2

Other examples? (average Cal State Tuition)

Always think whether you should use weighted mean or simple mean.Always think whether you should use weighted mean or simple mean.

Page 8: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 8

Break

Page 9: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 9

Variance

A measure of data spread.

Also called “the average of squared deviations from the mean”

The larger the variance, the fat the histogram

-- sample variance -- population variance

N2

i2 i 1

(x μ)σ

N

n2

i2 i 1

(x x)

n -1s

Note the difference!

Page 10: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 10

Steps to compute the variance1. Identify whether the data are of a population or sample

(the formulae are different.)

2. Use the following table to compute the deviation:

a) Find out the mean:

b) Find out the distance

(fill out the 2nd column)

c) Find out the squared distance

(the 3rd column)

d) Add up the 3rd column

e) divided by

i. population size; or

ii. sample size -1

Data list

Distance from

the mean

Square

the distance

5

4

4

5

3

2

5 4 4 5 3 2mean= 3.833

6

=5-mean=1.167

=(1.67)2 =1.36

Page 11: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 11

Comparing variance vs. histogramFind the variance for the following groups of sample data:

Compare the mean and variance.

Create the histogram to compare the distribution.

11

12

13

16

16

17

18

21

14

15

15

15

16

16

16

17

11

11

11

12

19

20

20

20

Page 12: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 12

What does variance mean?Variance indicate variation:

The larger the variance, the more spread out the data.

Indicates unpredictability. E.g.

• Weather data: weather changes dramatically, hard to predict tomorrow’s temperature

(If look at temperature data: which has larger variance, Chicago or San Diego?)

• Stock: more risk on returns. • A person’s performance: consistency. emotional…• Other examples?

Page 13: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 13

Use frequency table to compute the population variance:

14

15

15

15

16

16

16

17

Data value FrequencyRelative

Frequency

14 1 0.125

15 3 0.375

16 3 0.375

17 1 0.125

Data distance square

14

15

15

15

16

16

16

17

Data distance square

14

15

16

17

Compute the weighted average

Page 14: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 14

Standard Deviation

Square root of variance.

An indicator of data deviation, can be directly

compared to the mean.

2s= s 2= OR

Sample variance

Sample standard deviation

Population variance

Population standard deviation

Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.

Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.

Page 15: BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 15

Empirical Rule

If the data is bell shaped

(most of the time), then

68% of all data will fall in

the range of

95% of all data will fall in

the range of

99.7% of all data will fall in

the range of

If the data is bell shaped

(most of the time), then

68% of all data will fall in

the range of

95% of all data will fall in

the range of

99.7% of all data will fall in

the range of

68%

μ σμ 2σ

95%

μ 3σ

99.7%

μ

2

3