Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques.
-
Upload
dayna-maxwell -
Category
Documents
-
view
233 -
download
0
Transcript of Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques.
Statistics Lecture Notes
Dr. Halil İbrahim CEBECİ
Chapter 03Numerical
Descriptive Techniques
Measures of Central Location Mean, Median, Mode
Measures of Variability Range, Standard Deviation, Variance, Coefficient of
Variation
Measures of Relative Standing Percentiles, Quartiles
Measures of Linear Relationship Covariance, Correlation, Least Squares Line
Numerical Descriptive Techniques
Statistics Lecture Notes – Chapter 03
Arithmetic Mean
It is computed by simply adding up all the observations and dividing by the total number of observations
Measures of Central Location
Statistics Lecture Notes – Chapter 03
N
xN
ii
1𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑀𝑒𝑎𝑛 :
𝑆𝑎𝑚𝑝𝑙𝑒𝑀𝑒𝑎𝑛 :n
xx
n
ii
1
is seriously affected by extreme values called “outliers”.
E.g. as soon as a billionaire moves into a neighborhood, the average household income increases beyond what it was previously
Ex3.1 – Weights of the students of a classroom in elementary school are given below. Calculate the arithmetic mean. If a new student with weight of 140lbs came in, what the new mean of the classroom would be. Discuss the validity of results.
Arithmetic Mean
Statistics Lecture Notes – Chapter 03
89 77 90 101 66 66 76 59 64 75
88 65 75 72 70 64 66 68 82 80
N
xN
ii
1 ¿89+77+90+…+80
20=74,65
¿89+77+90+…+80+140
21=77 ,77
Median
The value that falls in the middle of the pre-ordered (ascending or descending) observation list
Where there is an even number of observations, the median is determined by averaging the two observation in the middle.
Measures of Central Location
Statistics Lecture Notes – Chapter 03
Ex3.2 - Find the median of the value of Ex3.1 for two different situation.
Measures of Central Location
Statistics Lecture Notes – Chapter 03
59 64 64 65 66 66 66 68 70 72
75 75 76 77 80 82 88 89 90 101 140
𝑀𝑒𝑑𝑖𝑎𝑛 21=75
Mode
Observation that occurs with the greatest frequency.
There are several problems with using the mode as a measure of central location
In a small sample it may not be a very good measure It may not unique
Measures of Central Location
Statistics Lecture Notes – Chapter 03
Ex3.3 - Find the mode of the value of Ex3.1
Mode
Statistics Lecture Notes – Chapter 03
89 77 90 101 66 66 76 59 64 75
88 65 75 72 70 64 66 68 82 80
To eliminate the effect of the outliers in the sample with contains similar value, we can use geometric mean
Ex3.4 – The final exam scores of a master class are given below. Calculate the arithmetic and geometric mean
Geometric Mean
Statistics Lecture Notes – Chapter 03
45 37 40 30 35 45 50 95
Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value?
Measures of Variability
Statistics Lecture Notes – Chapter 03
For example, two sets of class grades are shown. The mean (=50) is the same in each case…
But, the red class has greater variability than the blue class.
Range:
The range is the simplest measure of variability, calculated as:
Advantage : SimplicityDisadvantage : Simplicity
Set 1 : 4, 4, 4, 4, 4, 50 Set 2 : 4, 8, 15, 24, 39, 50
Measures of Variability
Statistics Lecture Notes – Chapter 03
Variance:
A measure of the average distance between each of a set of data points and their mean value.
Variance is the difference between what is expected (Budget) and the actuals (Expenditure). it is the difference between "should take" and "did take".
Measures of Variability
Statistics Lecture Notes – Chapter 03
N
xN
ii
1
2
2
)(
N
xN
ii
1
2
2
)(
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 :
𝑆𝑎𝑚𝑝𝑙𝑒𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 :
Ex3.6 – The following are the number of summer jobs a sample of six student applied for. Find the mean and variance of these data.
Variance
Statistics Lecture Notes – Chapter 03
17 15 23 7 9 13
n
xx
n
ii
1 ¿17+15+23+7+9+13
6=14
1
)(1
2
2
n
xxs
n
ii
¿(17−14)2+(15−14)2+…+(13−14)2
6−1
¿1665
=33.2
Standart Deviation:
shows how much variation or "dispersion" exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values
Measures of Variability
Statistics Lecture Notes – Chapter 03
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝜎=√𝜎2
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠=√𝑠2
Ex3.7 – Find the standard deviation of the values of Ex3.6
Standard Deviation
Statistics Lecture Notes – Chapter 03
17 15 23 7 9 13
1
)(1
2
2
n
xxs
n
ii
¿33.2 𝑠=√𝑠2=√33.6=5.8
Interpreting the Standard Deviation
Statistics Lecture Notes – Chapter 03
Approximately 68% of all observations fallwithin one standard deviation of the mean.
Approximately 95% of all observations fall
within two standard deviations of the mean.
Approximately 99.7% of all observations fallwithin three standard deviations of the mean
If the histogram is bell-shaped
A more general interpretation of the standard deviation is derived from Chebysheff’s Theorem, which applies to all shapes of histograms (not just bell shaped).
The proportion of observations in any sample that liewithin standard deviations of the mean is at least:
Chebysheff’s Theorem
Statistics Lecture Notes – Chapter 03
1−1
𝑘2𝑓𝑜𝑟 𝑘>1
Ex3.8 - A professor has announced that the grades on a statistics exam have a mean value of 72 and astandard deviation of 6. Not knowing anything about the shape of the distribution of grades, what canwe say about the proportion of grades that are between:
a. 66 and 78?b. 60 and 84?c. 54 and 90?
A3.8a(66, 78) = (72 - 6, 72 + 6) we observe that k = 1. With k = 1, Chebyshev’s theorem states that at least
of the grades fall within the interval (66, 78).
Chebysheff’s Theorem
Statistics Lecture Notes – Chapter 03
A3.8b(60, 84) = (72 - 12, 72 + 12) we observe that k = 2. With k = 2, Chebyshev’s theorem states that at least
of the grades fall within the interval (60, 84).
A3.8c(54, 90) = (72 - 18, 72 + 18) we observe that k = 3. With k = 3, Chebyshev’s theorem states that at least
of the grades fall within the interval (54, 90).
Chebysheff’s Theorem
Statistics Lecture Notes – Chapter 03
Ex3.9 – if we know that the distribution of the values in Ex3.8 is bell-shaped, new answers are
a. 66 and 78? Approximately 68% of the grades fall between this
interval (1 standard deviation) b. 60 and 84?
Approximately 95% of the grades fall between this interval (2 standard deviation)
c. 54 and 90? Approximately 99.7% of the grades fall between this
interval (3 standard deviation)
Chebysheff’s Theorem
Statistics Lecture Notes – Chapter 03
Percentile:
The th percentile is the value for which percent are less than that value and ()% are greater than that value.
Measures of Relative Standing
Statistics Lecture Notes – Chapter 03
Ex3.10 – The weights in pounds of a group of workers are as follows:
Measures of Relative Standing
Statistics Lecture Notes – Chapter 03
173 165 171 175 188183 177 160 151 169162 179 145 171 175168 158 186 182 162154 180 164 166 157
a. Find the 25th percentile of the weightsb. Find the 50th percentile of the weights.c. Find the 75th percentile of the weights.
A3.10a
Measures of Relative Standing
Statistics Lecture Notes – Chapter 03
𝑄1=160+162
2=161
A3.10b
𝑄2=𝑀𝑒𝑑𝑖𝑎𝑛=169
A3.10c
𝑄3=177+179
2=178
Range Weights Range Weights
1 145 14 171
2 151 15 171
3 154 16 173
4 157 17 175
5 158 18 175
6 160 19 177
7 162 20 179
8 162 21 180
9 164 22 182
10 165 23 183
11 166 24 186
12 168 25 188
13 169
A box plot is a graphical summary of quantitative data. A box plot indicates what the two extreme values of a data set are, where the data are centered, and how spread out the data are.
It does this by plotting the values of five descriptive statistics of the data: the smallest value () lower quartile () median() upper quartile() largest value ()
Box Plots
Statistics Lecture Notes – Chapter 03
Interquartile Range:
The interquartile range measures the spread of the middle 50% of the observations.
Whiskers:
The line extending to the left and right.The Whiskers extend outward to the smaller of 1.5 times the IQR or to the most extreme point that is not an outlier.
Box Plots
Statistics Lecture Notes – Chapter 03
Ex3.11 – Use the weights in pounds of a group of workers in Ex3.10 and,Construct a box plot for these weights. Compute the interquartile range and identify any outliers.
A3.11 – The first step in constructing a box plot is to rank the data, in order to determine the numericalvalues of the five descriptive statistics to be plotted.
Box Plots
Statistics Lecture Notes – Chapter 03
Descriptive Statistics Numerical Value
145
161
169
178
188
17
135.5 and 203,5
* There is no outlier
Wendy’s service time is shortest and least variable
Hardee’s has the greatest variability,
Jack-in-the-Box has the longest service times.
Box Plots
Statistics Lecture Notes – Chapter 03
Covariance:
a measure of how much two variables change together
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
N
yxN
iyixi
xy
1
))((
1
))((1
n
yyxxs
n
iii
xy
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒=¿
𝑆𝑎𝑚𝑝𝑙𝑒𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒=¿
Coefficient of Correlation:
Correlation is a scaled version of covariance that takes on values in [−1,1] with a correlation of ±1 indicating perfect linear association and 0 indicating no linear relationship.
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛=¿
𝑆𝑎𝑚𝑝𝑙𝑒𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛=¿
𝜌=𝜎𝑥𝑦
𝜎 𝑥𝜎 𝑦
𝑟=𝑆𝑥𝑦
𝑆𝑥𝑆 𝑦
Ex3.12 - Based on the sample of data shown below. Measure how these two variables are related by computing their covariance and coefficient of correlation.
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
Years of education () Income ()
11 25
12 33
11 22
15 41
8 18
10 28
11 32
12 24
17 53
11 26
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
() ()
11 25 - 0.8 0.64 -5,2 27.04 4.16
12 33 0.2 0.04 2,8 7.84 0.56
11 22 -0.8 0.64 -8,2 67.24 6.56
15 41 3.2 10.24 10,8 116.64 34.56
8 18 -3.8 14.44 -12,2 148.84 46.36
10 28 -1.8 3.24 -2,2 4.84 3.96
11 32 -0.8 0.64 1,8 3.24 -1.44
12 24 0.2 0.04 -6,2 38.44 -1.24
17 53 5.2 27.04 22,8 519.84 118.56
11 26 -0.8 0.64 -4,2 17.64 3.36
=57.6 =951.8 =215.4
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
1
)(1
2
n
xxs
n
ii
x ¿√ 57.69 =2.53
𝑟=𝑠𝑥𝑦𝑠𝑥 𝑠 𝑦
= 23.932.53∗10.28
=0.92
1
)(1
2
n
yys
n
ii
y ¿√ 951.89 =10.28
1
))((1
n
yyxxs
n
iii
xy ¿215.49
=23.93
Least Square Method:
Produces a straight line drawn through the points so that the sum of squared deviations between the points and the line is minimized.
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
�̂�=𝑏0+𝑏1𝑥
21x
xy
s
sb
𝑏0= �̂�−𝑏1𝑥
Ex3.13 - Find the least squares (regression) line for the data in Ex3.12 using the shortcut formulas for the coefficients.
Measures of Linear Relationship
Statistics Lecture Notes – Chapter 03
�̂�=−13.93+3.74 𝑥
21x
xy
s
sb
𝑏0=30.2−3.74∗11.8=−13.93
¿23.93
2.532=3.74
Q3.1 - Consider the following sample of measurements
Compute each of the following:
a. the meanb. the medianc. the mode
Exercises
Statistics Lecture Notes – Chapter 03
37 32 30 28 30 32 35 28 32 29
Q3.2 - Consider the following sample of data
Compute each of the following for this sample:
a. the meanb. the rangec. the varianced. the standard deviation
Exercises
Statistics Lecture Notes – Chapter 03
17 25 18 14 28 21
Q3.3 - Consider the following sample of data,
a. Construct a box plot for these datab. Compute the interquartile range and identify any
outliers
Exercises
Statistics Lecture Notes – Chapter 03
208 160 175 334 228 211 179 354
265 215 191 239 298 226 220 260
173 163 226 165 252 422 284 232
225 348 290 180 300 200 245 204
256 281 230 275 158 224 315 217
Q3.4 – based on the sample of data shown below,
a. Calculate Covarianceb. Calculate the coefficient of Correlationc. Find the equation of least square line
Exercises
Statistics Lecture Notes – Chapter 03
Number of Ads ()
Number of Customers ()
5 528
12 876
8 653
6 571
4 556
15 1058
10 963
7 719