BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

BUS304 – Data Charaterization 1

Chapter 3

Data Characterization


Today: Mean and Variance Mean:

also called “average” Formula:

Characterize the center of the data distribution

The most commonly used measure

Sample mean x The average derived from sample

Population mean The average derived from the

population

Exercise:

compute the mean weight for the Chargers’ offense players and defense players.

Which mean should be higher?

Why?

Are they population mean or sample mean?

Sum of datamean=

Number of data

Ways to compute the mean:

1. Use calculator.

2. Use Excel. (function: average)

Ways to compute the mean:

1. Use calculator.

2. Use Excel. (function: average)


Sensitivity to outliers

Household income in community a: (Unit =10000$)

Household income in community b: (Unit =10000$)

Compute the mean for the following 2 groups of data

#1 #2 #3 #4 #5 #6 #7 #8

5 4 3 4 3 5 4 5

#1 #2 #3 #4 #5 #6 #7 #8

5 4 3 4 3 5 4 100

If the mayor decide to provide more public facilities to poor communities, and the decision is made based on whether the mean income in the community is below $50,000 per year.Does such a decision make sense?


Compute the mean from frequency table

Below is a frequency table showing

the number of days the teams finish

their projects

How many days on average does a team

finish one project?

Create a histogram using the data on

the left, locate the mean on the

graph.

How to describe the shape of the

histogram?

What is the relationship between

the mean and peak?

Use relative frequency to find out the

mean.

Days to Complete Frequency

Relative

Frequency

5 4 ?

6 12 ?

7 8 ?

8 6 ?

9 4 ?

10 2 ?

total daysmean=

total teams


Compute the mean from Histogram

Histogram

0

3

6

5

4

2

00

1

2

3

4

5

6

7

5 15 25 36 45 55 More

Fre

qu

ency

5 15 25 35 45 55

Histogram conveys the same information as the frequency table

total data value 15 3 25 6 35 5 45 4 55 2mean= 33

data size 3 6 5 4 2

x=33Mathematical Expression: if sample, if population=33


Weighted Mean The mean assumes that each piece of information

equally. E.g. average score of the students.

Sometimes, different data should be put in different weight. One may be more important than the other.

• E.g. some instructor assign 60% on the homework score, and 40% on the final exam. If a student’s homework score is 84, and got 70 in the exam, compute the student’s final score. (weighted mean of homework score and exam score)

-- this teacher thinks homework reveals more comprehensive information about a student’s knowledge, and hence put more weight.


When to use weighted mean?

Some other examples of weighted mean:

A student’s GPA. A course with more credit takes more weight.

An economic growth indicator. (some industries affects the economy more than

others)

Crush time leader: a player who perform the best in the last few minutes of the

game. – can reveal the person’s performance under pressure.

Expectation – you will see in chapter 4

• E.g. in a gambling game, if with 60% chance you lose one dollar, and with 40%

chance you gain one dollar, the expectation is

60%x$(-1)+40%x$1=-$0.2

Other examples? (average Cal State Tuition)

Always think whether you should use weighted mean or simple mean.Always think whether you should use weighted mean or simple mean.


Break


Variance

A measure of data spread.

Also called “the average of squared deviations from the mean”

The larger the variance, the fat the histogram

-- sample variance -- population variance

N2

i2 i 1

(x μ)σ

N

n2

i2 i 1

(x x)

n -1s

Note the difference!


Steps to compute the variance1. Identify whether the data are of a population or sample

(the formulae are different.)

2. Use the following table to compute the deviation:

a) Find out the mean:

b) Find out the distance

(fill out the 2nd column)

c) Find out the squared distance

(the 3rd column)

d) Add up the 3rd column

e) divided by

i. population size; or

ii. sample size -1

Data list

Distance from

the mean

Square

the distance

5

4

4

5

3

2

5 4 4 5 3 2mean= 3.833

6

=5-mean=1.167

=(1.67)2 =1.36


Comparing variance vs. histogramFind the variance for the following groups of sample data:

Compare the mean and variance.

Create the histogram to compare the distribution.

11

12

13

16

16

17

18

21

14

15

15

15

16

16

16

17

11

11

11

12

19

20

20

20


What does variance mean?Variance indicate variation:

The larger the variance, the more spread out the data.

Indicates unpredictability. E.g.

• Weather data: weather changes dramatically, hard to predict tomorrow’s temperature

(If look at temperature data: which has larger variance, Chicago or San Diego?)

• Stock: more risk on returns. • A person’s performance: consistency. emotional…• Other examples?


Use frequency table to compute the population variance:

14

15

15

15

16

16

16

17

Data value FrequencyRelative

Frequency

14 1 0.125

15 3 0.375

16 3 0.375

17 1 0.125

Data distance square

14

15

15

15

16

16

16

17

Data distance square

14

15

16

17

Compute the weighted average


Standard Deviation

Square root of variance.

An indicator of data deviation, can be directly

compared to the mean.

2s= s 2= OR

Sample variance

Sample standard deviation

Population variance

Population standard deviation

Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.

Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.


Empirical Rule

If the data is bell shaped

(most of the time), then

68% of all data will fall in

the range of


the range of

99.7% of all data will fall in

the range of

If the data is bell shaped

(most of the time), then


the range of


the range of

99.7% of all data will fall in

the range of

68%

μ σμ 2σ

95%

μ 3σ

99.7%

μ

2

3

BUS304 – Data Charaterization1 Chapter 3 Data Characterization.

Documents

Transcript of BUS304 – Data Charaterization1 Chapter 3 Data Characterization.