Nanofiltration Membrane Characterization using Mass Transfer Data ...
BUS304 – Data Charaterization1 Chapter 3 Data Characterization.
-
Upload
lorena-higgins -
Category
Documents
-
view
217 -
download
0
Transcript of BUS304 – Data Charaterization1 Chapter 3 Data Characterization.
BUS304 – Data Charaterization 1
Chapter 3
Data Characterization
BUS304 – Data Charaterization 2
Today: Mean and Variance Mean:
also called “average” Formula:
Characterize the center of the data distribution
The most commonly used measure
Sample mean x The average derived from sample
Population mean The average derived from the
population
Exercise:
compute the mean weight for the Chargers’ offense players and defense players.
Which mean should be higher?
Why?
Are they population mean or sample mean?
Sum of datamean=
Number of data
Ways to compute the mean:
1. Use calculator.
2. Use Excel. (function: average)
Ways to compute the mean:
1. Use calculator.
2. Use Excel. (function: average)
BUS304 – Data Charaterization 3
Sensitivity to outliers
Household income in community a: (Unit =10000$)
Household income in community b: (Unit =10000$)
Compute the mean for the following 2 groups of data
#1 #2 #3 #4 #5 #6 #7 #8
5 4 3 4 3 5 4 5
#1 #2 #3 #4 #5 #6 #7 #8
5 4 3 4 3 5 4 100
If the mayor decide to provide more public facilities to poor communities, and the decision is made based on whether the mean income in the community is below $50,000 per year.Does such a decision make sense?
BUS304 – Data Charaterization 4
Compute the mean from frequency table
Below is a frequency table showing
the number of days the teams finish
their projects
How many days on average does a team
finish one project?
Create a histogram using the data on
the left, locate the mean on the
graph.
How to describe the shape of the
histogram?
What is the relationship between
the mean and peak?
Use relative frequency to find out the
mean.
Days to Complete Frequency
Relative
Frequency
5 4 ?
6 12 ?
7 8 ?
8 6 ?
9 4 ?
10 2 ?
total daysmean=
total teams
BUS304 – Data Charaterization 5
Compute the mean from Histogram
Histogram
0
3
6
5
4
2
00
1
2
3
4
5
6
7
5 15 25 36 45 55 More
Fre
qu
ency
5 15 25 35 45 55
Histogram conveys the same information as the frequency table
total data value 15 3 25 6 35 5 45 4 55 2mean= 33
data size 3 6 5 4 2
x=33Mathematical Expression: if sample, if population=33
BUS304 – Data Charaterization 6
Weighted Mean The mean assumes that each piece of information
equally. E.g. average score of the students.
Sometimes, different data should be put in different weight. One may be more important than the other.
• E.g. some instructor assign 60% on the homework score, and 40% on the final exam. If a student’s homework score is 84, and got 70 in the exam, compute the student’s final score. (weighted mean of homework score and exam score)
-- this teacher thinks homework reveals more comprehensive information about a student’s knowledge, and hence put more weight.
BUS304 – Data Charaterization 7
When to use weighted mean?
Some other examples of weighted mean:
A student’s GPA. A course with more credit takes more weight.
An economic growth indicator. (some industries affects the economy more than
others)
Crush time leader: a player who perform the best in the last few minutes of the
game. – can reveal the person’s performance under pressure.
Expectation – you will see in chapter 4
• E.g. in a gambling game, if with 60% chance you lose one dollar, and with 40%
chance you gain one dollar, the expectation is
60%x$(-1)+40%x$1=-$0.2
Other examples? (average Cal State Tuition)
Always think whether you should use weighted mean or simple mean.Always think whether you should use weighted mean or simple mean.
BUS304 – Data Charaterization 8
Break
BUS304 – Data Charaterization 9
Variance
A measure of data spread.
Also called “the average of squared deviations from the mean”
The larger the variance, the fat the histogram
-- sample variance -- population variance
N2
i2 i 1
(x μ)σ
N
n2
i2 i 1
(x x)
n -1s
Note the difference!
BUS304 – Data Charaterization 10
Steps to compute the variance1. Identify whether the data are of a population or sample
(the formulae are different.)
2. Use the following table to compute the deviation:
a) Find out the mean:
b) Find out the distance
(fill out the 2nd column)
c) Find out the squared distance
(the 3rd column)
d) Add up the 3rd column
e) divided by
i. population size; or
ii. sample size -1
Data list
Distance from
the mean
Square
the distance
5
4
4
5
3
2
5 4 4 5 3 2mean= 3.833
6
=5-mean=1.167
=(1.67)2 =1.36
BUS304 – Data Charaterization 11
Comparing variance vs. histogramFind the variance for the following groups of sample data:
Compare the mean and variance.
Create the histogram to compare the distribution.
11
12
13
16
16
17
18
21
14
15
15
15
16
16
16
17
11
11
11
12
19
20
20
20
BUS304 – Data Charaterization 12
What does variance mean?Variance indicate variation:
The larger the variance, the more spread out the data.
Indicates unpredictability. E.g.
• Weather data: weather changes dramatically, hard to predict tomorrow’s temperature
(If look at temperature data: which has larger variance, Chicago or San Diego?)
• Stock: more risk on returns. • A person’s performance: consistency. emotional…• Other examples?
BUS304 – Data Charaterization 13
Use frequency table to compute the population variance:
14
15
15
15
16
16
16
17
Data value FrequencyRelative
Frequency
14 1 0.125
15 3 0.375
16 3 0.375
17 1 0.125
Data distance square
14
15
15
15
16
16
16
17
Data distance square
14
15
16
17
Compute the weighted average
BUS304 – Data Charaterization 14
Standard Deviation
Square root of variance.
An indicator of data deviation, can be directly
compared to the mean.
2s= s 2= OR
Sample variance
Sample standard deviation
Population variance
Population standard deviation
Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.
Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram.
BUS304 – Data Charaterization 15
Empirical Rule
If the data is bell shaped
(most of the time), then
68% of all data will fall in
the range of
95% of all data will fall in
the range of
99.7% of all data will fall in
the range of
If the data is bell shaped
(most of the time), then
68% of all data will fall in
the range of
95% of all data will fall in
the range of
99.7% of all data will fall in
the range of
68%
μ σμ 2σ
95%
μ 3σ
99.7%
μ
2
3