Lectures 1-13 +Outlines-6381-2016-Presentation
-
Upload
anonymous-y4ajgz -
Category
Documents
-
view
218 -
download
0
Transcript of Lectures 1-13 +Outlines-6381-2016-Presentation
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
1/160
Management 6381: Managerial Statistics
Lectures and Outlines
© 2016 Bruce Cooil
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
2/160
2
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
TABLE OF CONTENTS
Lecture 1
Descriptive Statistics
(Including Stem &
Leaf Plots, Box Plots,
Regression Example) 1Stem & Leaf Di splay 1
Descri ptive Statistics: Means,
Median, Std.Dev., IQR 2
Box Plot 3
Regression 10
Lecture 2
Central Limit
Theorem & CIs 12 Statement of Theorem 12
Simulations 13
Practical I ssues & Examples 15
Tail Probabil iti es & Z-values 16
Z-Value Notation 17
Picture of CLT 19
Everythi ng There I s to Know 20
Summary & 3 Types of CI s 21
Glossary 22
Lecture 3CIs & Introduction to
Hypothesis Testing 23Examples of Two Main Types
of CI s 23
Hypothesis Testing 25
Type I & Type I I Er ror 27
Pictures of the Right and Left
Tai l P-Values 29
Big Pictur e Recap 30
Glossary 31
Lecture 4
One- & Two-Tailed
Tests, Tests on
Proportion, & Two
Sample Test 32 When to Use t-Values (Case 2) 34
Test on Sample Proporti on
(Case 3) 34
Means from Two Samples
(Case 4) 35
About t-Distri bution 38
Lecture 5
More Tests on Means
from Two Samples 39 Tests on Two Proportions 39
Odds, Odds Ratio, & Relati ve 44
Tests on Pair ed Samples 45
F inding the Right Case 47
Lecture 6
Simple Linear
Regression 48Purposes 48
The Thr ee Assumptions,
Terminology, & Notation 49
Modeli ng Cost in Terms of
Units 50
Estimation & I nterpretation of
Coefficients 51
Decompositi on of SS(Total ) 52
Main Regression Output 53
Measures of F it 54Correlation 56
Di scussion Questions 57 I nterpretation of Plots
59How to Do Regression in
Minitab 61
How to Do Regression i n Excel 62
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
3/160
3
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
TABLE OF CONTENTS
Lecture 6 Addendum
Terminology, Examples
&Notation 63
Synonym Groups 63
Main I deas 63Examples of Corr elati on 64
Notation for Types of Var iation
and R 2 66
Lecture 7
Inferences About
Regression Coefficients
& Confidence/Prediction
Intervals for μY /Y 67
Modeli ng Home Pri ces Using 68
Regression Output 72
Two Basic Tests 73
Test for Lack- of-F it 74
Test on Coeff icients 75
Prediction I ntervals & Confi dence
I ntervals 76
How to Generate These Intervals
in M ini tab 17 77
Lecture 8
Introduction to Multiple
Regression 80
Application to Predicting Product
Share (Super Bowl Broadcast) 81
3-D Scatterplot 82
Regression Output 84
Sequenti al Sums of Squares 85
Squared Coeff icient t-Ratio
Measures Marginal Value 86
Di scussion Questions on
I nterpreting Output 88
Lecture 9
More Multiple
Regression Examples 90 Modeli ng Salaries (NF L
Coaches —
2015) 90Modeli ng Home Pri ces 93
Regression Di alog Boxes 99
Lecture 10
Strategies for Finding the
Best Model 102Stepwise Approach 102
Best Subsets Appr oach 103
Procedure for F inding Best Model 104
Studying Successfu l Products (TV
Shows) 105
Best Subsets Output 108
Stepwise Options 109
Stepwise Output 110
Best Predictive Model 111
Regression on All Candidate
Predictors to Find Redundant
Predictors 113
Other Cr iteri a for Selecting Models 114
Discoverers 115
Lecture 11
1-Way Analysis of
Variance (ANOVA) as a
Multiple Regression 116 Comparing Dif ferent Types of
Mutual Funds 116
Meaning of the Coeff icients 118
Purpose of Overall F -test and
Coeff icient t-Tests 120
Comparing Network Share by
Location of Super Bowl 122
Standard Formulation of 1-WayANOVA 125
Analysis of Covariance 126
Looking Up F Critical Values 128
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
4/160
4
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
TABLE OF CONTENTS
Lecture 12
Chi-Square Tests for
Goodness-of-Fit &
Independence 129 Goodness-of -F it Test 129
Test for I ndependence 130
Using M in itab 132
Lecture 13
Executive Summary &
Notes for Final Exam,
Outline of the Course &
Review Questions 133Executive Summary & Notes for
Final 133
Outli ne of Course 135
Review Questions with Answers 140
Appendix for Review Questions 145
The Outlines
Tests Concerning Means
and Proportions &
Outline of Methods for
Regression 149 Tests Concerni ng M eans and
Proportions151
Conf idence I ntervals for the Seven
Cases 153
Outl ine of M ethods for Regression 154
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
5/160
Lecture 1: Descriptive StatisticsManagerial Statistics
Reference: Ch. 2: 2.4, 2.6 – pp. 56-59, 67-68; Ch. 3: 3.1-3.4 --pp. 98-105, 108-116, 118-129Outline:!Stem and Leaf displays
!Descriptive StatisticsMeasures of the Center: mean, quartiles, trimmed mean, median
Measures of Dispersion: standard deviation, interquartile range!Box plots & Regression as Descriptive Tools
Stem and Leaf Displays The rules:1) List extremes separately;
2) Divide the remaining observations into from 5 to 15 intervals;
3) The “stem” represents the first part of each observation and is used to label the interval, while the
leaf represents the next digit of each observation;4) Don’t hesitate to “bend” or “break” these rules.
Famous Ancient Example (modified slightly): Salaries of 10 college graduates in thousands (1950’s):
2.1, 2.9, 3.2, 3.3, 3.5, 4.6, 4.8 , 5.5, 7.9, 50.
Stem and Leaf
(With trimming)
Units: 0.10 Thousand $
2|19
3|235
4|68
5|5
6|
7|9
High: 500 MINITAB’s Version: This is an option in the Graph Menu, Or you can give the commands shown.
Stem-and-Leaf Displays
Stem and Leaf Display When Numbers Above are
in Units of $100,000 (i.e., same data X 100)
UNITS: 0.1 *100 = 10 Thousand $
Same Display
High: 500
No Trimming! (Here the extreme observations are
included in the main part of the display.)
MTB> Stem c1
Leaf Unit = 1.0
(9) 0 223334457
1 1
1 2
1 3
1 4
1 5 0
With Trimming
MTB > Stem c1;
SUBC> trim.
Leaf Unit = 0.10
2 2 19
5 3 235
5 4 68
3 5 5
2 6
2 7 9
HI 500
Page 1 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
6/160
Another Example: Make S&L of 11 customer expenditures at an electronics store(dollars): 235, 403, 539, 705, 248, 350, 909, 506, 911, 418, 283.
Units: $10
3 2|348
4 3|5(2) 4|01
5 5|30
3 6|
3 7|0
2 8|
2 9|01
Now reconsider the first example with the 10 salaries!I put those 10 observations into the first column of a Minitab spreadsheet (orworksheet) and then asked for descriptive statistics.
MTB > desc c1
Descriptive Statistics
Variable N Mean Median TrMean StDev SE Mean
Salaries 10 8.78 4.05 4.46 14.58 4.61
Variable Minimum Maximum Q1 Q3
Salaries 2.10 50.00 3.12 6.10
What do the “Mean,” “TrMmean,” “Median,” “Q1" and “Q3" represent?
Mean: Average of Sample
TrMean (5% Trimmed Mean): Average of middle 90% of sample
Median: Middle Observation (when n is even: average of middle two obs.)
OR 50th
Percentile
Q1 & Q3 (1st and 3
rd quartiles):
25th and 75th Percentiles
Page 2 of 156
2
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
7/160
Note how the median is much better measure of a typical central value in this case.
Recall how standard deviation is calculated.
First the sample variance is calculated:
S 2 = estimate of average squared distance from the mean
= {sum of squared differences (Obs-Mean)2 }/(n-1)
={2.1 -8.78)2 +(2.9 -8.78)
2 +...+ (50 -8.78)
2 }/9 = 212.6 .
Then the sample standard deviation is calculated as the square root of the
sample variance:
s = (212.6)½ = 14.58 .
As a descriptive statistic, “s” is usually interpreted as the “typical distance of anobservation from the mean.” But what does “s” actually measure?
Square root of average squared distance from mean
What’s the disadvantage of S as a measure of dispersion (or spread)?
Sensitive to extreme observations (large and small)
What’s an alternative measure of dispersion that is insensitive to extremes?
0.75 * (Q3 - Q1)
[Q3-Q1] is referred to as the interquartile range (IQR). If the distribution is
approximately normal, then
(0.75)(Q3 - Q1) ≡ (0.75) IQR
provides an estimate of the population standard deviation (σ).
For sample of 10 salaries: (0.75) IQR = 0.75(6.10 - 3.12) =2.2.
(Compare with s= 14.58.) The Boxplot
Elements: Q1, median, and Q3 are represented as a box, and 2 sets of fences
(inner and outer) are graphed at intervals of 1.5 IQR below Q1 and above Q3.
The figures on pages 122-125 (in our text by Bowerman et al.) provide goodillustrations.
Page 3 of 156
3
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
8/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
9/160
MINITAB Boxplot of the 10 Salaries
Result of Menu Commands: Graph Boxplot
50
40
30
20
10
0
S a l a r i e s
Boxplot of Salaries
Page 5 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
10/160
More Examples with Another Data Set Where We Compare Distributions
These data are from http://www.google.com/finance and consist of daily closing prices and
returns (in %) for Google stock and the S&P500 index (see the variables Google_Return and
S&P500_Return below), and a column of standard normal observations.
. . .
Page 6 of 156
http://www.google.com/financehttp://www.google.com/financehttp://www.google.com/financehttp://www.google.com/finance
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
11/160
Page 7 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
12/160
(Recall what the Standard Normal distribution looks like, e.g. http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg .)
MTB > describe c3 c5 c6
(Or to do same analysis from menus: start from “Stat” menu, got to “Basic Statistics” & then to “Display
Descriptive Statistics,” then in the dialog box select c3, c5,and c6 as the “variables.”)
Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal
Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
Google_Return 29 1 -0.207 0.260 1.401 -5.414 -0.772 -0.086 0.771 1.674
S&P_Return 29 1 0.026 0.116 0.624 -1.198 -0.414 0.017 0.534 1.051
Standard_Normal 30 0 -0.134 0.170 0.931 -1.778 -0.813 -0.184 0.598 1.871
Standard_NormalS&P_ReturnGoogle_Return
2
1
0
-1
-2
-3
-4
-5
-6
D a t a
22-Apr-16
Boxplot of Google_Return, S&P_Return, Standard_Normal
quarterly results
Apparently due to announcement of disappointing
Page 8 of 156
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
13/160
In contrast to the boxplots on the previous page, many business distributions are
positively skewed. For example, here is a comparison of the revenue distribution
for the largest firms in three health care industries.
Pharmacy & Other ServicesMedical FacilitiesInsurance & Managed Care
1 20
1 00
80
60
40
20
0
R
e v e n u e ( B i l l i o n s )
Express_Scripts_Holdin
HCA_Holdings
United_Health_ Group
Boxplot of 2014 Revenues in Three Health Care Industries
(12 Firms) (13 Firms) (13 Firms)
(for Firms That Are Among the Largest 1000 in U.S.)
Page 9 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
14/160
Page 10 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
15/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
16/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
17/160
2
Here is a picture of the parent distribution.
10
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Value of O bservation (1 : Complaint; 0 : No Complaint)
F r e q u e n c y
Parent Distribution: Binomial (n=1, p=0.1)
In a simulation, I repeatedly took a random sample of 100observations from the parent distribution above, and calculatedthe mean of each sample of 100 observations. I did this a 1000times. Here is a histogram of those 1000 means.
0.210.180.150.120.090.060.03
160
140
120
100
80
60
40
20
0
Value of the Mean
F r e q u e n c y
Mean 0.09962
StDev 0.02918N 1000
Histogram of 1000 Means (Each is the Average of 100 Observations)(and Comparison with Normal Distribution)
As predicted by the Central Limit Theorem: this distribution isapproximately normal and the sample mean (the mean of themeans) is approximately 0.1 (same as the parent), and thesample standard deviation (of the means) is approximately 0.03 (= [parent distribution’s std.dev.]/ √n = 0.3/√100).
Page 13 of 156
Note: this is
approximately 0.03
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
18/160
3
Another simulation: suppose we toss one fair die. Here is the probability distribution of the outcome.
654321
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Outcome of Tossed Die
F r e q u e n c y
Parent Distribution: Integers 1-6 Are Equally Probable
I repeatedly take a random sample of 2 observations from the parent distribution above, and calculate the mean of each sampleof 2 observations. I do this a 1000 times. Here is a histogram ofthose 1000 means (each mean is of only 2 observations).
6.45.64.84.03.22.41.60.8
180
160
140
120
100
80
60
40
20
0
Value of the mean
F r e q u e n c y
M e an 3. 53 9
S tD ev 1 .18 4
N 1000
Histogram of 1000 Means (Each is the Average of 2 Observations)
As predicted by the Central Limit Theorem: this distribution isapproximately normal and the sample mean (the mean of themeans) is approximately 3.5 (same as the parent), and thesample standard deviation (of the means) is approximately 1.2
(1.2 = [parent distribution’s std. dev.]/√n = 1.7/√2). Page 14 of 156
Note: This is
approximately 1.2
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
19/160
4
Practical Issues & Two Examples
How large should n be? Here are two guides:
1)for a typical sample mean, : n > 30 (this is a conservative rule);2)for a sample proportion: n large enough so that ≥ & ( − ) ≥ .
Example 1
Here are descriptive statistics for 40 annual “returns” on the S&P500(these “returns” are simple annual percent gain or loss on index, withoutcompounding or inclusion of dividends), 1975-2014.
MTB > desc 'S&P_Return' StDev/√N = [16.57/√40
Descriptive Statistics: S&P_Return
Variable N N* Mean SE Mean StDev Minimum
S&P_Return 40 0 13.41 2.62 16.57 -36.55
Variable Q1 Median Q3 MaximumS&P_Return 4.99 15.75 27.74 37.20
This summary shows that:
n = 40, ̅ = 13.41 (this is an estimate of μ), s = 16.57 (an estimate of σ);and s/√n = 16.57/(40)½ = 2.62
Describe the distr ibution of (the sample mean), assuming the actualdistr ibution of S&P_Retur n remains unchanged dur ing 1975-2014:The distribution is approximately Normal with mean of approximately
13.41 and std. dev. of approx. 2.62 .
Example 2
Suppose I interview 100 people and 20 prefer a new product (to
competing brands). I want to estimate: p ≡ proportion of population that prefer the new brand. (Each customer preference is a Bernoulli
observation, with an approx. mean of 0.20 and approx. variance of[0.20 ⋅ 0.8]=0.16.)
I n summary , the sample proportion,p̂ , is: 20/100 = 0.2 .
p̂ behaves as though it has a normal distribution,
with a mean of approximately 0.2 (this is our estimate) and a standard
deviation of approximately[0.2*0.8/100]1/2 = 0.04 .
Recall that forBernoulli Distributµ = p, σ2=p(1-p).Consequently:
σ/√n = [p(1-p)/n]1/2
Page 15 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
20/160
5
Tail-Probabilities & the Corresponding Normal Values (Z-values)
0.4
0.3
0.2
0.1
0.0
F r e q u e n c y
General Normal Distribution
Tail Probabilities
0.10 >
0.05 >
0.025 >
Value of Normal Random Variable
Page 16 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
21/160
6
Z-Value Notation
“ zα” is used to represent the standard normal value above which there is a tail
probability of α.Tail probability is α
zα
Verify that z0.10 = 1.28, z0.05=1.645, and that z0.025 = 1.96. (Use normal table, e.g.,http://www2.owen.vanderbilt.edu/bruce.cooil/cumulative_standard_normal.pdf .)
Page 17 of 156
To verify that Z = 1.28:
0.10
0.10
Z
0.90
Tail probability is 0.10,
So find Z-value that
corresponds
to cumulative prob. of 0.9 .
=> It's 1.28
To verify that Z = 1.645:0.05
Z0.05
Tail probability is 0.05,
So find Z-value that
corresponds
to cumulative prob. of 0.95.
=> It's 1.645
Verify that Z = 1.96 !0.025
Z
0.025
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
22/160
0z
Cumulative probabilities for POSITIVE z-values are in the following table:
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
Page 18 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
23/160
7
Picture of the Central Limit Theorem
Acknowledgment: This picture of the Central Limit Theorem is based on a much prettier graph made for this course by Tim Keiningham,Global Chief Strategy Officer and Executive Vice President, Ipsos Loyalty (also a student in an earlier version of this course).
Page 19 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
24/160
8
Everything There is to Know About the Normal Distribution,
The Central Limit Theorem, and Confidence Intervals
The Central Limit Theorem states that the distribution of ̅ (the
distribution of sample means) is approximately normal with mean μ and
variance σ2/n, abbreviated:
“̅ is approximately N(μ,σ2/n),” where:
μ is the mean of the distribution of ̅ (μ is also the mean of the
population from which the observations were sampled).
̅ is the sample mean. (The sample is taken from a population with
mean μ and variance σ2. Think of ̅ as an "estimate" of μ.)
σ2/n is the variance of the distribution of ̅ . Also referred to as the
variance of ̅ .
σ/√n is the standard error of ̅ (the sample mean). It is also sometimes
called the “SE mean” or standard deviation of ̅ .
The figure on the top of the previous page indicates:
̅ is within 1.28 standard errors* of μ with probability 80% .
̅ is within 1.645 standard errors* of μ with probability 90% .
̅ is within 1.96 standard errors* of μ with probability 95% .
* Remember that the standard error of ̅ is σ/√n.
Another Way of Saying the Same Thing
± (1.28) (σ/√n) is an __ 80 __% confidence interval for μ.
± (1.645)(σ/√n) is a __ 90 __% confidence interval for μ.
± (1.96) (σ/√n) is a __ 95 __% confidence interval for μ.
Page 20 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
25/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
26/160
10
Glossary
Reference: Chapter 5 (pp. 188, 190) versus Chapter 3 (pp. 100, 110)
The Mean of a Distribution: ≡ () ≡ ∑ (). (1)
The mean of a distribution (or a random variable X) is simply the weighted average of its
realizable outcomes, where each realizable value is weighted by its probability, P(x).
Contrast this definition with the definition of a sample mean:
x x n ( / )1 (= ). (2)The only difference is that (1/n), the frequency with which each observation occurs in the
sample, replaces P(x) in equation (1).
The Variance of a Distribution: ≡ ( ) ≡ ∑ ( )(). (3)
The variance of a distribution (of a random variable X) may also be calculated as = ∑ 2() 2. Note the first term in this last expression is just the expectation or averagevalue of X2.
Standard Deviation of a Distribution:
= ∑( )()/ = / . (4)Compare this with the definition of the sample standard deviation:
= ∑ ( ) (−)/
= ∑ ( )/ ( )/.
( The sample variance is: = [∑ ( )/ ( )] .)
_________________________________________________________________
ANSWERS to Examples (on Bottom of Previous Page)
Example 1
1) 90% CI: ̅ ± Z0.05 (s/√n) = 13.41 ± 1.645 (2.62) = 13.41 ± 4.31OR: (9.1, 17.7)
2) 95% CI: ̅ ± Z0.025 (s/√n) = 13.41 ± 1.96 (2.62) = 13.41 ± 5.13OR: (8.3, 18.5)
Example 2
1) 90% CI: ̂± 0.05 ̂ (1 ̂ )/ = 0.2 ±(1.645)[.2(.8)/100]½ = 0.2 ± 0.066
OR: (13%, 27%)
2) 80% CI: ̂± 0.0 ̂ (1 ̂ )/ = 0.2 ± 1.28(0.04) = 0.2 ± 0.051OR: (15%, 25%)
Example 399% CI:
± /(−)
(/ √ ) = 6.93 ± 2.947 (4.69) = 6.93 ± 13.82 OR: (-6.9, 20.8)Page 22 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
27/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
28/160
2
Find a 95% confidence interval for the real mean mpg (μ) andinterpret it.
C.I.: . / √ = 81.37 ± 1.96 (1.24)
= 81.37 ± 2.43 or (78.9, 83.8)
Interpretation: This covers the real mean (μ) with 95%
probability
Would an 80% confidence interval be longer or shorter?
Shorter !
(Use Z0.10 = 1.28, and interval becomes (79.8, 83.0).)
(The convention is : Use t-values when n
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
29/160
3
Hypothesis Testing
Reconsider the new hybrid car example (example 1). Suppose that I want
to show that my new car has an average mpg (μ) that is better than that of
the best performing competitor, for which the average mpg is 78. Formally, I want
to "disprove" a null hypothesis
H0: μ =78 (or sometimes written as μ ≤ 78)
in favor of the alternative hypothesis:
H1: μ > 78.
Note that: n=30 ,̅
=81.37, s= 6.8, s/√n = 1.24. (For n < 30, the procedure isidentical except when we find the critical value. That case will also be discussed.)
To build a case for H1, I follow 3 logical steps (typical of all hypothesis testing).
1) Assume H0 is true.
2) Construct a test statistic with a known distribution (using H0).
In this case I use the test statisti c, z ≡ [̅ - 78]/(s/√n)
which should have approximately a standard normal
distribution if H0 is true. (WHY? CLT, since n is large)
3) Reject H0 in favor of H1 if the value of z supports H1.
("Large" values of z support H1 in this case.)
Regarding step 3, if H0 is true, I would see values of z greater than z0.05 = 1.645
only 5% of the time. This seems improbable and it supports H1 and so a reasonable
decision rule is to: reject H0 in favor of H1 if z is greater than 1.645. This assumes
that I am wil l ing to make a mistake 5% of the time.
Page 25 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
30/160
4
In this sample,
z = [̅ - 78]/(s/√n) = [81.37-78]/1.24 = 2.72 > 1.645.
Therefore, I reject H0 in favor of H1.
SUMMARY: to test H0: μ = 78 versus H1: μ > 78
we use the decision rule: reject H0 if
z = [̅ - 78]/(s/√n) > zα
or equivalently if: ̅ > 78 + zα(s/√n).
Otherwise we accept H0.
In this case z= 2.72, so I reject the null hypothesis H0 at the 0.05 level, and
conclude in favor of the alternative hypothesis H1 . That is, I conclude that
the average mpg of the new hybrid automobile is significantly greater than
78, but using this decision rule (i.e., rejecting H0 whenever z>z0.05) there is
a 5% chance that I have erroneously rejected H0 and that the real average
mpg (μ) really is only 78 (or less).
Above we chose α = 0.05, so that z0.05 = 1.645. This probability α is referred
to as the significance level, and it is the maximum probability of making a
type I error: type I error refers to the error we make if we reject H0 when H0
is in fact true. Typically we useα = 0.001, 0.010, 0.025, 0.05, 0.1, or 0.2
so that ↓ ↓ ↓ ↓ ↓ ↓ zα = 3.09, 2.33, 1.96, 1.645, 1.28, or 0.84, respectively
(the corresponding t-values are very similar for moderate values of n:
for n=20: t
19
= 3.6, 2.5, 2.1, 1.7, 1.3, or 0.86;
for n=30: t ( )29
= 3.4, 2.5, 2.0, 1.7, 1.3, or 0.85 ).
Suppose that I had chosen α = 0.001, then since z0.001 = 3.09, and z = 2.72,
I would accept H0 because z =2.72 >/ z0.001=3.09. In this case, I would be
concerned that I made a type II error. Type II error refers to the case where
the null hypothesis H0 is really false but I fail to reject it! The following
figure summarizes the situation with type I and II errors.
Page 26 of 156
Z0.05
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
31/160
5
DECISION WHAT IS REALLY TRUE
H0 IS TRUE H1 IS TRUE
REJECT H0 Type I Error Correct
Decision
ACCEPT H0 Correct
Decision
Type II Error
Good Lingo: “Cannot Reject H0” can be used for “Accept H0.”
Bad Lingo: “Accept H1” should not be used for “Reject H0.”
How do we protect against:
Type 1 Error? Small α Type II Error? Large n
Note that to make a decision on whether to reject or accept H0: μ =78, we simply
need to compare the test statistic z = [̅ - 78]/(s/√n) with an appropriate normalvalue, zα, that corresponds to the significance level α that is chosen beforehand. Ifz > zα , we reject H0 (otherwise accept H0).
Distribution of Test Statistic (Z) When H0 Is True
z0.05 z z0.001
1.645 2.72 3.09
Alternatively, we could simply look up the tail probability that corresponds to
the test statistic z (this is called the p-value) and compare it to the
significance level α. If the p-value is less than α (p-value < α), we reject H0
(otherwise accept H0).
In this case z = 2.72, and the p-value for H0: μ = 78 versus H1: μ > 78, is theright tail-probability (because this is a one-tailed test where the alternative
hypothesis goes to the right-side). What is the p-value in this case?
P-value (probability to the right of 2.72) = 1 - [Cumulative probability at 2.72]
= 1 - 0.9967 ≈ 0.0033
Can we reject H0 at the 0.05 level? YES At the 0.001 level? NO! Page 27 of 156
P-Value: the
probability to right
of z
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
32/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
33/160
7
Alternatively, we can find the p-value that corresponds to the test
statistic, z, for this hypothesis test and compare it with α, and (as always)
we only reject H0 if the p-value is less than α. Remember that when
the alternative hypothesis goes to the left side, the p-value refers to the
tail probability to the left of the test statistic z. Given the way the p-value
is calculated, we always reject H0 when p-value < α, and accept H0
otherwise.
Given the test statistic z =1.10 for H0: μ = 80 versus H1: μ 78, and the test statistic was z= 2.72?
This was calculated on page 5 as 0.0033.
43210-1-2-3-4
2.5
2.0
1.5
1.0
0.5
0.0
43210-1-2-3-4
2.5
2.0
1.5
1.0
0.5
0.0
Page 29 of 156
1.10
2.72
Here P-value is a left tail
probability because H1 goe
to the left !
Here P-value is a rig
tail probability beca
H1 goes to the right
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
34/160
8
Big Picture Recap Let μ0 represent the constant benchmark to which we wish to compare μ, &
consider three scenarios.
1)H0: μ = μ0 2)H0: μ = μ0 3)H0: μ = μ0
H0
also written as: μ ≤ μ0 μ ≥ μ0 No Other WayAlternative
Hypothesis H1 H1: μ > μ0 H1: μ < μ0 H1: μ ≠ μ0
Critical Value zα -zα zα/2
Decision Rule
Reject H0 if: z > zα z < − zα |z| > zα/2 (Note that “z” is the test statistic )
Definition of p-value Tail prob. > z Tail prob. < z Tail prob. > |z|
Example Example 1
(see bottom p.6)
Example 2
(see bottom p.6)
Example 3
(new example)
Picture of
test statistic and
p-value (shaded area)relative to standard
normal distribution
Null Hypothesis H0 : μ = 78 H0 : μ = 80 H0 : μ = 80Alternative H1: μ > 78 H1: μ < 80 H1: μ ≠ 80 Significance Level α= 0.05 α = 0.10 α = 0.10
Test Statisticz = ̅ − √ ⁄ = . ̅ − √ ⁄ = . |̅ − √ ⁄ | = .
Critical Value Z0.05= 1.645 -Z0.10 = -1.28 Z0.10/2=Z0.05=1.6
Decision Reject H0 Accept H0 Accept H0 Becau
|1.10| 1.645
Page 30 of 156
P-Value=0.0033
P-value=0.86
P-Value=0.27
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
35/160
9
Glossary
α = significance level = maximum probability of making a type I error.
p-value = tail-probability that corresponds to test statistic, that is calculated for
specific alternative hypothesis H1.
β = probability of making a type II error (not rejecting H0 when H1 is true).
Power = 1-β = probability of making correct decision when H1 is true.
How does power change with sample size?
Power increases as sample size increases (ceteris paribus).
Because as n increases, the test statistic becomes larger in absolute
value, and is more likely to exceed the critical value in the appropriate
direction. See the 3rd-to-last row of the table on the last page (i.e., the
test statistic formulas). Another way to think about it: as the test
statistic becomes larger in absolute value in the direction supporting H1,the p-value decreases.
How does power change with α?
Power increases as α increases (ceteris paribus).
Because as α increases, the critical value decreases in absolute value,
and is more likely to be exceeded by the test statistic, see the
penultimate row of last page (i.e., the critical values and how theychange with α).
© Bruce Cooil, 2016
Page 31 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
36/160
Lecture 4
One and Two-Tailed Tests, Tests on a Sample
Proportion, & Introduction to Tests on Two Samples
Main References
(1) Ch.9: 9.3-9.4, Summary, Glossary, App. 9.3;Ch.10: 10.1
(2) The Outline "Tests on Means and
Proportions" (referred to as "The Outline")
Topics I. Tests on Means and Propor tions from One Sample (Reference:
9.3-9.4)
● Example of a two-tailed test (Case 1)● When to use t-values (Case 2)● Tests on a sample proportion (Case 3)
II. Tests on M eans from Two Samples (Ref: 10.1)
● Tests on means from two large samples (Case 4)● Tests on means when it is appropriate to assume variances are
equal(Case 5)
I. Tests on Means & Proportions from One Sample
Summary of Last Time (1-Tailed Versions of Case 1)
Last time we first considered the one-tailed hypothesis test:
H0: μ = 78 versus H1: μ > 78.
(OR H0: μ ≤ 78 )In this case we use the decision rule: reject H0 if:
z = [ ̅ - 78 ] /(s/√n) > zα ,
or equivalently if ̅ > 78 + zα(s/√n). Otherwise we
accept H0.
Page 32 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
37/160
2
Then we considered the one-tailed test going the other way (μ
still represents the mean mpg of my new hybrid). I make the
claim that the average mpg is 80, and so my competitor wants to
test:
H0: μ = 80 versus H1: μ < 80 .
The decision rule will be to reject H0 in favor of H1 if:
=̅
√ ⁄
( : =.
.= . )
supports H1. If ̅ is calculated using observations from adistribution where μ < 80 (as my competitor believes is the
case), then we will tend to get small values of z. So the decision
rule would be, reject H0 in favor of H1 if
= ̅ − √ ⁄
<
(or equivalently if: ̅ < (/√ ) .
[Note that this is just Case 1 in the outline: μ0 refers to the constant used
in the null hypothesis, which is "80" in this last case.]
Example of a 2-tai led Test A two-tailed test would be:
H0: μ = 80 versus H1: μ ≠ 80. So, for example if α=0.05, we would reject H0 in favor of H1 if
|z| > z0.025 (because α/2 = 0.025).
What do we conclude if we do this 2-tailed test?
(Recall that: n= , ̅ = . , /√ =1.24 .)Test Stat:Z =(81.37 - 80)/1.24 = 1.10 (SAME as above )
Critical Value:Z0.025 = 1.96
Conclusion: Accept H0 .
(μ is not significantly different from 80.) Page 33 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
38/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
39/160
4
II. Means from Two Samp les
Case 4: What To Do When Both Samples Are Large
Example:
The owner of two fast-food restaurants wants to compare the
average drive-thru “experience times” for lunch customers at eachrestaurant (“experience time” is the time from when vehicle
entered the line to when the entire order was received). There is
reason to believe that Restaurant 1 has lower average experience
times than Restaurant 2 because its staff has more training.
Suppose n1 experience times during lunch are randomly selected
for Restaurant 1, n2 from Restaurant 2 with following results(units: minutes): n
1
= 100 ̅ = s1 = 0.7n
2
= 50 ̅ = . s2 = 0.5 .Why do we use Case 4 on page 1 of the outline?
Both Samples ≥ 30 (& Independent).
If we want to show Restaurant 1 has a lower average experience
time, what are the appropriate hypotheses and what can we
conclude (at the 0.1 level)?H0: μ1 - μ2 = 0 (OR: ≥ 0 ) In Outline: D0 = 0.
H1: μ1 < μ2 OR μ1 - μ2 < 0
Test Statistic:
= −
+
= −.√ (.) +
(.)
= −.√ .
= −.
Critical Value: -Z0.10 = -1.28 Conclusion: Reject H0.
(YES!)
What would happen if we test at the 0.01 level?
New Critical Value: -Z0.01 = -2.33 (Still Reject H0)
Is there any reason to pick α in advance?
Yes, it’s more objective!
Page 35 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
40/160
5
Would Welch’s t-test (p. 376) make a difference? In this case we use the same test statistic but compare it with a
critical value from the t-distribution with degrees of freedom,
So for the α=0.1 and α=0.01, the critical values are:
−0.()
= −1.29, −0.0()
= −2.35, respectively, and
the conclusions are the same in each case!
Case 5: What If We Are Willing To Assume Equal
Variances?
Example : I'm comparing weekly returns on the same stock
over two different periods. The average sample return is larger
during period 2. Can one show that the return during period 2
is significantly higher than during period 1 at the 0.01 level?The data are: n1 = 21, ̅ = . %,
= .
n2 = 11, ̅ = . %, = . .
What are the appropriate hypotheses?
H0: μ1 - μ2 = 0
H1: μ1 < μ2 μ1 - μ2 < 0.
It may be risky to rely only on the CLT. (Why?)
Technically I make 3 additional assumptions if I use Case 5:
(1) observations are approximately normal,
(2) the two populations have equal variances and
(3) samples are independent.
.133
150
)50/5.0(
1100
)100/7.0(
)100/1(
1
)/(
1
)/(
)//(2222
2
2
2
2
2
2
1
2
1
2
1
2
2
2
21
2
1
n
n s
n
n s
n s n s df
Page 36 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
41/160
6
The test statistic in Case 5 allows us to use a pooled estimate of
the variance:
. .
.
√ . . The test statistic is:
t
. . .
.. Suppose I do this test at the 0.01 significance level. What would
be the critical value for the test statistic "t" and what would be
the conclusion?
Critical Value: .? . . -2.457Conclusion: Reject H0 . (YES!)
What would be the two-tailed test in this case? (Specify H0 &
H1.) Also give the critical value and conclusion if testing at the
0.01 level?
H0: μ1 - μ2 = 0 versus H1: μ1 - μ2 ≠ 0
Test Statistic: t = -2.6 (Same as for one-tailed test)
Critical Value: ./ . = 2.75Conclusion: Accept H0 . (No!)
(Because |t|=2.6 < 2.75 .)
Just like Case 4 with “sp” used
in place of “s1" and “s2 ”
Page 37 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
42/160
7
About the t-Distribution (Reference: Bowerman, et al., pp. 344-346)
According to the Central Limit Theorem, (the sample mean of n observations),
has approximately a normal distribution with mean μ, and standard deviation σ/√n .
Also, this approximation improves as the sample size, n, increases. Consequently, by the Central Limit Theorem, the standardized mean,
=̅ − √ ⁄
,
has approximately a standard normal distribution. We have been using this single
resul t to justi fy the construction of confidence intervals and hypothesis tests.
When using this result, we have generally been approximating σ by substituting
the sample standard deviation, “s,” for it. If the sample is large enough, thisdoesn’t impose much additional error. But when samples are smaller (e.g., n < 30),
the convention is to accommodate the additional error (caused when using s for σ)
by using the fact that i f the original distribution was normal , then the t-statistic,
=̅ − √ ⁄
,
really has what is referred to as a t-distr ibution wi th n-1 degrees of freedom . The
degrees of freedom number, n-1, refers to the amount of information that thesample standard deviation, s, contains about the true standard deviation σ. If we
have only 1 observation, we have no information about σ (n-1= 1-1 = 0), if we
have 2 observations we have essentially 1 piece of information about σ, and so on.
This is the reason we divide by the degrees of freedom n-1, when calculating s,
= [∑ ( − ̅ )/ ( − )] .
The real question becomes: why should we use the t-distribution when i t rel ies
on the strong assumption that the ori ginal distribution is normal, which is
exactly the type of assumption we were trying to avoid by using the Central Limit
Theorem?! The answer is essentially this: by using t-values in place of z-values
we are doing something that accommodates the additional inaccuracy we generate
by using s to estimate σ, and in practice it works quite well even when the parent
distri bution is not normal ! Of course, t-values converge to z-values as the sample
size increases: see the t-table.
Page 38 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
43/160
Lecture 5: More Tests on Means from Two Samples
Outline: (Reference: Bowerman et al., 10.2-10.3, Appendix 10.3; the Outline
“Tests Concerning Mean and Proportions”)
Tests on Two Proportions (Case 6, Ch. 10.3) Everything to Know About Odds, Odds Ratios and Relative Risk
Tests on Paired Samples (Case 7, Ch. 10.2)
Tests on Two Proportions(Case 6: Large Samples)
This example comes from an article, “10 Most Popular
Franchises” published in the “Small Business” section of
CNN.com (April, 2010):http://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.html .
(More recent data through early 2016 consist primarily of a
smaller sample of settled loans from the same period:
http://fitsmallbusiness.com/best-franchises-sba-default-rates/# . )
It provides franchise failure rates based on loan data from theSmall Business Administration (October, 2000 through
September, 2009) and it illustrates all of the issues one will
typically face when comparing rates (expressed as proportions).
The 10 most popular franchises are: 1)Subway, 2)Quiznos,
3)The UPS Store, 4)Cold Stone Creamery, 5)Dairy Queen,
6)Dunkin Donuts, 7)Super 8 Motel, 8)Days Inn, 9)Curves for
Women, and 10)Matco Tools. Super 8 Motel and Days Inn have
the highest start-up costs (average SBA loan sizes are 0.91 and
1.02 million dollars, respectively), and nominally Super 8Motels seem to have a lower failure rate. Here are the data.
SBA Loans Failures*
Super 8 Motel 456 18
Days Inn 390 23
*Failures are loans in liquidation or charged off .
Page 39 of 156
http://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.html
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
44/160
2
Is there a higher failure rate for SBA loans to Days Inn than for
Super 8 Motel at the 0.05 level?
H0:
p1 - p2 = 0 (Or ≤ 0)
H1: p1 - p2 > 0
(Where p1 = proportion of Days Inn failures;
p2 = proportion of Super 8 Motel failures.)
Are the sample sizes sufficiently large to use the normal
approximation in CASE 6?(In Case 6, the relevant sample sizes are the number of successes and failures ineach sample; each must be at least 5, i.e., ) p - (1 n ,p n ),p - (1 n ,p n 2 2 2 2 1 1 1 1 .)
YES, all 4 groups ≥ 5.
The sample estimates of p1 and p2 are:
̂ = 2 = 0.0590 ; ̂2 = 846 = 0.0395 .Consequently, the test statistic is:
= −−√
(
) +
(
)
= . − . −√ .(.) + .(.)
= 0.01950.0151 = 1.30.
Page 40 of 156
D0 = 0
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
45/160
3
OR, following the text’s approach (which is appropriate only
when the null hypothesis states that the proportions are equal),
we could also use the overall rate of failure to calculate the
standard error of the test statistic. Since,
̅ =
=23 +1
39 + = 0.0485 (see data on p.1),the test statistic becomes:
= .9 − .39 −√ .(.) + .(.)
=.19.1 = 1.32 .
With either test statistic we get essentially the same result:
Critical Value: Conclusion:
Z0.05 = 1.645 Accept H0 (No, the rate at Days Inn is not significantly higher.)
Which approach does MINITAB take?
Page 41 of 156
Case 1
Case 2
Cases 4 & 5
Case 7
Case 3
Case 6
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
46/160
4
If We Do NOT Pool (which is the default unless we click on the “usedpooled estimate...” option ):
Test and CI for Two Proportions
Sample X N Sample p
1 23 390 0.058974
2 18 456 0.039474
Difference = p (1) - p (2)
Estimate for difference: 0.0195007
95% CI for difference: (-0.00992794, 0.0489293)
Test for difference = 0 (vs not = 0): Z = 1.30
P-Value = 0.194
Wait!! This p-value is for a two-sided test!
We need the p-value for H1: p1>p2 , which is: 0.194/2 =0.097
=> Accept H0.
Three options are provided here
1)Both samples in one column
2)Each sample in its own colum
3)Summarized data.
I could have selected the
appropriate one-sided alternati
here but instead used the defaul
option (the two-sided test).
3210-1-2-3
0.4
0.3
0.2
0.1
0.0
Page 42 of 156
D0 = 0
The default setting is to not pool !
Sum of two tail probabilities
tail prob. is
0.194/2 =0.097
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
47/160
5
If We Pool:
Test and CI for Two Proportions
Sample X N Sample p
1 23 390 0.058974
2 18 456 0.039474
Difference = p (1) - p (2)
Estimate for difference: 0.0195007
95% CI for difference: (-0.00992794, 0.0489293)
Test for difference = 0 (vs not = 0):
Z = 1.32 P-Value = 0.188 1-sided p-value=0.094 .
Other Caveats and Notes
1) ̂1& ̂2 may seriously underestimate actual rates offailure, since the study includes recent loans to franchises
that probably will fail within 5 years (but had not yetfailed during the study period). To get better estimates,
each loan should be observed over a period of equal
duration. For example, we might observe each over a 5
year period (from the time of the loan is granted), and
̂1& ̂2 would then be legitimate estimates of the failurerate of SBA loans to each franchise.
2) Sometimes data of this type are summarized in terms of
odds and odds ratios, especially in health/medical care
applications.
Page 43 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
48/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
49/160
7
The Relat ive Size of These Num bers
Note that odds, odds ratios and relative risk ratios can each beanywhere between 0 and infinity. Also note the difference between the
probability and odds scales.
Probability Scale(p) : 0___1/4___1/2___3/4___1
Odds Scale (p/[1-p]): 0___1/3____1_____3____ ∞
Finally, whenever ̂ > ̂, the odds ratio will be greater than therelative risk:
/(−) /(−) =
(−) (−) >
.
Tests on Paired Samples(Case 7: Large or Small Samples)
In late December of 2009, Forbes did a study of the best and worstmutual funds of the prior decade. I thought it would be interesting to
compare the best and worst (among funds that still exist from that
period) in terms of annual returns during the six subsequent years
(2010-2015).
Fund Annualized Return
1999-2009
Best: CGM Focus Fund (CGMFX)
(Current Morningstar Rating: )
18.8%
Worst: Fidelity Growth Strategies (FDEGX)(Current Morningstar Rating: )
-9.5%
S&P 500 -2.6%
Page 45 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
50/160
8
If I expect the CGM Focus Fund to outperform Fidelity Growth
Strategies Fund during 2010-2015, I might ask the following research
question: Does the CGM Focus have an average return that is
significantly more than 0.5% higher than the average annual
return of the Fidelity Growth Strategies during 2010-2014 (α=0.1)?
Then: H0:µCGM - µFidelity =0.5(OR < 0.5) H1:µCGM - µFidelity > 0.5 .
The actual data are below.
Year CGM Focus
Fund
F idel ity Growth
Strategies Fund
Differences:
=CGM – Fidelity 2010 16.94 25.63 - 8.692011 - 26.29 - 8.95 - 17.34
2012 14.23 11.78 2.452013 37.61 37.87 - 0.262014 1.39 13.69 - 12.302015 - 4.11 3.17 - 7.28 Mean 6.63 13.87 - 7.24
We can’t apply cases 4 or 5 to this problem because the annual returnsare from the same years and are affected by the same market forces.
Consequently,
the two samples are not independent!But we can take differences (CGM minus Fidelity — see the last columnin the table above) and apply Case 2 to the single sample of differences.
The following hypotheses are equivalent to the ones above but arewritten in terms of the differences:
H0: µDifferences =0.5 (OR < 0.5) H1: µDifferences > 0.5.The mean and standard deviation of the five differences are:
̅ =7.24 ; = ∑(− )− = 7.38. Thus, the standard error of themean is:
√ =
.√ 6 =3.01. Here are the details of the case 2 test.
Test statistic: = ̅−µ √ ⁄ =7.240.5
3.01 = 2.57Critical Value: (−) = .() =. Conclusion: Accept H0 (No!)
The averagedifference makeclear we cannotreject H0 (Fidelitoutperforms CGBut we formallyapply the testanyway (as anillustration).
Because t is not greater than 1.48Page 46 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
51/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
52/160
Lectu re 6: Simp le Lin ear Regression
Outline: Main reference is Ch. 13, especially 13.1-13.2, 13.5, 13.8
The Why, When ,What and How of Regression
● Purposes of Regression● Three Basic Assumptions:
Linearity, Homoscedasticity, Random Error● Estimation and Interpretation of the Coefficients
● Decomposition of SS(Total) = ∑ ( − )=1(See third equation on page 492:
“ SS(Total)” is referred to there as “Total Variation.”) ● Measures of fit: MS(Error) (the variance of error), R 2 (Adjusted )
Purposes
1. To predict values of one variable (Y) given the values of
another (X). This is important because the value of X may
be easier to obtain, or may be known earlier.
2. To study the strength or nature of the relationship between
two variables.
3. To study the variable Y by controlling for the effects (or removing the effects) of another variable X.
4. To provide a descriptive summary of the relationship
between X and Y.
Assumpt ions
The basic model is of the form:
(1) 0 + 1 + ,where β0, and β1 are called coefficients, and represent unknownconstants (that will be estimated in the regression analysis), and
“ε” is used to represent random error. The error, ε, is assumed
Page 48 of 156
What
How
Why
When
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
53/160
2
to come from a distribution with mean 0 and constant variance
σε2 . The main result of the regression analysis is to provide
estimates of the coefficients so that we can use the estimatedregression equation,
(2) ̂ 0 + 1to predict Y.
Notes on Terminology and Notation
ŷ is the predicted value and is referred to as the "fit" or the
"fitted value."
The residuals, ei, (the observed errors) are defined as thedifference between the actual and the predicted value of Y,
i.e.,
ei = [residual for observation i]= − ̂. Note that the theoretical error term, εi, from equation (1), is
slightly different from the residuals:
εi ≡ y i ─ ( β 0 + β 1 x i ) versus ei ≡ y i ─ (b0 + b1 x i ).
Formal ly the model makes the assumption that the errors (the
εi) are a random sample from a distr ibution with mean 0 and
variance σ ε 2 . This one assumption is sometimes referred to in 3
parts.
1. Linearity: there is a basic linear relationship between y andx as shown in (1), which is equivalent to saying that the
real mean of the errors (the εi) is 0.
2. Homoscedasticity: the variance of the errors εi is constantfor all yi.
3. Random Error : the errors εi are independent from one
another.
Page 49 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
54/160
3
Two plots provide a way of checking these assumptions:
● To check linearity: the plot of y versus x;
● To check linearity, homoscedasticity and randomness: the
plot of the residuals, ( − ̂), versus the fit values, ̂ .Plots of standardized residuals versus fit are especiallyuseful.
Imagine I have developed a special new product and that Idevelop a model to estimate the cost of producing it using data
from the first 5 orders.
Order Numberof
Units(x)
Cost(y)($1000)
Predicted Cost(or fit)
Residual
( − )1 1 6 5 12 3 14 11 3
3 4 10 14 -4
4 5 14 17 -3
5 7 26 23 3
Page 50 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
55/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
56/160
5
Decompo sit ion o f SS(Total)
Without this regression model, we might be forced to use the
average, , to predict future values of y. To get an indication ofhow we1l would do as a prediction, we can find the sum ofsquared differences between each yi & :
( − )
=1=(6-14)2 +(14-14)2 +(10-14)2 +(14-14)2 +(26-14)2 =224
(see the 3rd column of the table on the next page). This sum ofsquares is referred to as SS(Total) ,i.e.,
SS(Total) = ∑ ( − )= = 224 .
The regression model succeeds in reducing the uncertainty about
y if SS(Error) is significantly less than SS(Total) . Also,
regression models actually allow us to decompose SS(Total) into two parts, SS(Error ) and SS(Regression) :
SS(Total) = SS(Regression) + SS(Error) ;
where: SS(Regression) =∑ ( − ̅) =1= the sum of squares of the fitted values around
their mean (the mean of the values is ).=(5-14)2 +(11-14)2 +(14-14)2 +(17-14)2 +(23-14)2
= 180
(see the 4th column of the table on the next page).
Page 52 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
57/160
6
So in this case, the decomposition of SS(Total) works out as
follows: SS(Total) = SS(Regression) + SS(Error)
224 = 180 + 44.
Summary of the Decomposition of SS(Total)
Units(x) Cost(y) ( − ) ( − ) ( − )1 6 (6-14)2 (5-14)2 12
3 14 (14-14)2 (11-14)2 32
4 10 (10-14)2 (14-14)2 (-4)2
5 14 (14-14)2 (17-14)2 (-3)2
7 26 (26-14)2 (23-14)2 32
TOTALS: 224 = 180 + 44 Name of SS: SS(Total)= SS(Regress.)+ SS(Error)
Minitab Summary: Main Regression Output of Version 17
(See Page 11 for a Compariso n w ith Excel)
Regression Analysis: Cost(y) versus Units(x)
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-ValueRegression 1 180.00 180.00 12.27 0.039
Units(x) 1 180.00 180.00 12.27 0.039
Error 3 44.00 14.67
Total 4 224.00
Model SummaryS R-sq R-sq(adj) R-sq(pred)
3.82971 80.36% 73.81% 38.11%
CoefficientsText Notat ion:
s
⁄Term Coef SE Coef T-Value P-Value VIFConstant 2.00 3.83 0.52 0.638Units(x) 3.000 0.856 3.50 0.039 1.00
Regression EquationCost(y) = 2.00 + 3.000 Units(x)
224/4
Page 53 of 156
"MS" refers to "Mean Square" which is always t
corresponding SS (Sum of Squares) divided byDF (degrees of freedom): MS=SS/DF.
Variance of Error
Variance of Y
Measures of Fit
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
58/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
59/160
8
In this example:
()() . . %
(OR: − ()() − ) where “180," “44,” and “224" are all shown in the “Analysis of
Variance” table.
A better measure of fit is found by adjusting R 2 so that it
estimates the proportion of the variance of y that is “explained”
by the fitted values from the model. This proportion is referredto as "R 2(Adjusted),"
( ) − ()() .In this case:
( ) −
⌈ ( − )⁄ ⌉
[ ( − )⁄ ] −.
. . %
Note that “14.67" is shown in the “Analysis of Variance”
table.
R 2(Adj) represents the proportion of the variance of Y that is"explained" (or generated) by the regression equation, while sε
represents the estimated standard deviation of the residuals. I n
this example, 73.8% of the variance in cost (Y) is " explained"
by the model that uses units (X) as a predictor and the
standard deviation of the errors made by this model is 3.8
thousand doll ars.
Page 55 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
60/160
9
In simple linear regression, R 2 (unadjusted) is generally written
as "r 2" and it represents the squared correlation coefficient (alsosee page 494 of the text). The estimated correlation between
cost (Y) and units (X) is 0.896. See the correlation matrix below.
MTB > corr c1 c2 c3Correlations (Pearson)
Units(x) Cost(y)Cost(y) 0.896
0.039
FITS1 1.000 0.896* 0.039
Formulas for Correlation (Pearson Correlation) :
.
)1(
1*
)1(
1
)1/(
2/1
n
1i
2n
1i
2
n
1i
y y n
x x n
n y y x x
i i
i i
(See pp. 125-127, 492-495 of the text for more examples and discussion.)
Alternatively:r = (sign of b 1 ) * [square root of R 2 (f rom simple regression)] .
I n the example: r = + 896.0804.0 .
Note that in general, r is between -1 and +1.
Cell Contents: Correlation
P-value
In spreadsheet:c1: Units(x); c2: Cost(y); and c3: Fits1
r
Page 56 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
61/160
10
Discussion Questions
(The Regression Output Is Redisplayed on the Next Page.)
1. Use the regression equation to predict the cost(Y) when number of units(X) is 4.
ŷ = b0 + b1 X = 2 + 3(4) = 14 thousand dollars
2. What was the actual cost for an order when units =4? (What is theresidual or error at that point?)
From page 3: When Units (X) is 4, Y is 10,
thus: residual = y - ŷ = 10 - 14 = -4 .
3. What is the sample variance of cost (Y)? (See next page.)
S Y2 = SS(Total)/4 = 224/4 = 56( S Y = √56 = 7.5 thousand )
4. What is the estimated variance of the residuals (or errors) of theregression?
Sε2 = MS(ERROR) = 14.67 (Find this in Analysis of Variance Table!)
5. How good is the fit? There are two ways of measuring fit (see the “Model Summary”):
Sε = 3.83 thousand dollars
R2(Adjusted) = 73.8%
(74% of the Variance in cost(Y) is “explained” by the model.)
6. Show how R2(Adjusted) is related to the variance of cost and the varianceof the residuals?
= 1 − (
)
= 1 − 14.6756 7. Show how R2(unadjusted) is related to the correlation between cost (Y) and units (X).
R2 = r2 (“r” represents the sample correlation; this only works insimple linear regression!!)
(On next page: R2 = 0.804; on page 9: r = 0.896.)
Page 57 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
62/160
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
63/160
12
Interpreting the Plot of Residuals versus Fit
One way to check on the three assumptions (linearity,
homoscedasticity and random error) is to plot the residuals
(errors) against the predicted (or fitted) values ̂.
There are hardly enough observations here to be very confident
in the assumptions. But in general we look for symmetry around
the horizontal line through zero as an indication that the
assumptions of linearity and randomness are met. To confirmhomoscedasticity, we look for roughly constant vertical
dispersion around the horizontal line through zero.
The ideal situation generally looks something like the following plot.
-10 0 10 20
-3
-2
-1
0
1
2
Fitted Value
R e s i d u a l
Residuals Versus the Fitted Values
(response is C3)
Page 59 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
64/160
13
Here is a situation where the linearity assumption is violated.
Here is a common situation (below) where homoscedasticity is
violated: notice how the residuals show increasing vertical
dispersion around the horizontal line through zero as the fitted
values increase.
5 10 15
-10
0
10
20
30
Fitted Value
R e s i d u a l
Residuals Versus the Fitted Values
(response is C3)
-10 0 10 20 30
-10
0
10
Fitted Value
R e s i d u a l
Residuals Versus the Fitted Values(response is C3)
Page 60 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
65/160
14
How to Do This Regression Analysis in MINITAB Minitab 17
Page 61 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
66/160
15
How to do a Regression Analysis in Excel
Click into the Data menu and check for the “Data Analysis”
option (far right).
If the “ Data Analysis” option is not there :
Start from File menuClick on ”Options” Click on“Add-Ins”Select “Analysis ToolPak” & hit “Go” near
the bottom of the dialog box.
Otherwise start from the Data menu:Click on “Data Analysis” (far right), Select “Regression” and
then specify the Y- and X-range in the dialog box.
(You can simply click into each range box and then move themouse directly into the spreadsheet to select the numerical
data cells from the appropriate column(s) of the spreadsheet.
The appropriate range of cells should then appear in the range
box. The “Input X Range” may consist of several columns,
each column for a different predictor.)
Other Good References
See page 519 of the text for an example with great screen pictures. Another good reference is:
www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel.
Page 62 of 156
http://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excelhttp://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excelhttp://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
67/160
Lecture 6 Addendum: Terminology, Examples, and Notation
Regression Terminology
Synonym Groups
1) Y, Dependent Variable, Response Variable
2) X, Predictor Variable, “Independent” Variable
3) , Prediction, Predicted Value, Fit, Fitted Value4) Variance of Y, MS(Total), Adj MS(Total)
5) Variance of Error, MS(Error), Adj MS(Error)
6) − , Error, Residual7) “Coefficients” are sometimes referred to using the more general term “parameters.”
Coefficients are the parameters that are used in linear models.
Main Ideas
Simple linear regression refers to a regression model with only one predictor. The underlying
theoretical model is:
= 0 + 1 + ,where y represents a value of the dependent variable, x is a value of the predictor, representsrandom error and and 1 represent unknown constants.The corresponding estimate regression equation is:
̂= 0 + 1.The regression coefficient b0 and b1 refer to sample estimates of the true coefficients β0 and β1,
respectively.
The sample correlation coefficient, r , estimates the true (or population) value of the correlation,
, which is a measure of the degree to which two variables (Y and X) are linearly related.
Of course, the sample correlation (r) and the slope coefficient (b1) are closely related:
1 = = (,) () , (*) where and are the sample standard deviations of Y and X respectively.The corresponding relationship between the “true” values, β1 and , is:
1 = = (,)
=
(,)2
.
Page 63 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
68/160
Examples of Correlation
Correlation: r= 0.725 (R 2 (unadjusted) = r 2 x100% = 52.6%); y = -756.6 + 12.25 x.
Change in GDP (Y): change in Annual U.S. GDP in billions of dollars
Consumer Sentiment (X): Index of financial well-being and the degree to which consumer
expectations are positive (based on five questions on a survey conducted by the University of
Michigan. (https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770)
Correlation: r= 0.725 (R 2 (unadjusted) = r 2 x100% = 51.3%); y = 44.45 +0.3045 x.
Data are from the World Health Organization (Life Expectancy (Y) as of 2015, Literacy Rate (X)
for 2007-2012).
11010090807060
500
250
0
-250
-500
Consumer Sentiment
C h a n g e i n G D P
Change in GDP vs Consumer Sentiment (1995-2015)
2009
1999
2015
1009080706050403020
85
80
75
70
65
60
55
50
Literacy Rate (for People >=15 years-old)
L i f e E x p e
c t a n c y ( B o t h S e x e s i n y e a r s )
Life Expectancy(Both_Sexes) vs Literacy Rate (>=15 years) for 112 Nations
Page 64 of 156
https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
69/160
Correlation: r = -0.891 (R 2 (unadjusted) = r 2 x100% = 79.4%);
MPG = 41.71 – 0.006263 Weight.
Data are for 14 automobiles (2005) from www.chegg.com.
Correlation: r= 0.018 (R 2 (unadjusted) = r 2 x100% = 0.1%);
Random Y = 0.06153 + 0.01688 Random X.
Y and X are two sets of 1000 standard normal random numbers.
4000375035003250300027502500
28
26
24
22
20
18
16
Weight
M P G
MPG (City) vs Weight (Lbs)
43210-1-2-3-4
4
3
2
1
0
-1
-2
-3
-4
Random X
R a n d o m Y
Random Y vs Random X
Page 65 of 156
http://www.chegg.com/
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
70/160
Notation for Types of Variation and R2
For linear regression models (with one or more predictor variables), the basic types of sums of
squares represent three types of variation.
1. Total Variation = ∑( − ) ≡ () The sum of squares of the observations of Y around their mean.
2. Explained Variation = ∑( − ) ≡ () The sum of squares of the predicted values of Y ( ) around their mean (which is also ̅ ).
3. Unexplained Variation = ∑( − ) ≡ () The sum of squares of the differences between each observation and the corresponding
predicted value.
Note:
Total Variation = Explained Variation + Unexplained Variation
Or: SS(Total) = SS(Regression) + SS(Error)
The R 2 (Unadjusted) is sometimes called the simple coefficient of determination and it is
the square of the correlation:
() = = =()
()= − ()()
R 2 (Adjusted) is a more accurate assessment of the strength of the relationship between Y
and X. In general:
R 2 (Adjusted) = − ()()
= −
() ( − [# ])⁄
() (− )⁄
For simple linear regression, which includes a constant and a slope coefficient: [# of
parameters in model] = 2.
[Reference: pp. 493-495, Essentials of Business Statistics (2015), 5th Edition, Bowerman
et al.]
Page 66 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
71/160
Lecture 7Inferences About Regression Coefficients &
Confidence/Prediction Intervals for μ Y /Y
Outline:(Ref: Ch. 13: 13.3-13.4, 13.6-13.7) Recap of Main Ideas from Lecture 6
Testing Lack-of-Fit
Inferences Based on Regression Coefficients (Ch. 13.3)
Prediction Intervals versus Confidence Intervals for Y (Ch. 13.4)
(Please read pp. 486-489, not for details on how PIs and
CIs are calculated but for the main idea of what they tell
us about Y!)
Summary of Ideas from Lecture 6
3 Assumptions:
The basic relationship between Y and X is linear up to a
random error term that has mean 0 (linearity) and constant
variance (homoscedasticity). Errors are random in the sense
that they are independent of each other and do not depend on
the value of Y.
One way to check these assumptions is to plot residuals versus
fitted values.
The coefficients estimates b0, and b1, are chosen to minimize the
sum of squared errors (or residuals).
b1 represents the average change in Y that is associated with aone unit change in X.
Regression is useful because it allows us to reduce the
uncertainty regarding Y. We can think about this is in terms of
the decomposition of SS(Total) (NOTE: “SS” is used in
regression to refer to “Sums of Squares”):
Page 67 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
72/160
2
SS(Total) = SS(Regression) + SS(Error)
∑ ( )2= = ∑ (̂ ̅) 2= + ∑ ( ̂) 2=
= +
This decomposition of total uncertainty (SS(Total)) suggests two
useful summaries of how well the model fits:
1) 2( ) 1 ()/(−[# ])()/(−) 1 ()() . (Recall that:
2() 1 ())() . )2) √ ()
≡ √ {()/( # )} .
Appl icat ion
Data are available from Blackboard (with this lecture note).These data are from 43 urban communities (2012)
Citation: “Cost of Living Index,” Council for Community and
Economic Research, January, 2013.
MTB > info c1-c3
Information on the WorksheetColumn Count Name
T C1 43 URBAN AREA AND STATE
C2 43 HOME PRICE Avg for 2400 sq. ft. new home, 4 bed, 2 bath on 8000 sq.ft. lot
C3 43 Apt Rent Avg for 950 sq. ft. unfurnished apt., 2 bed, 1.5-2bathexcluding all utilities except water.
Other interesting data sets on home prices and rental rates by city:
https://smartasset.com/mortgage/price-to-rent-ratio-in-us-cities
https://smartasset.com/mortgage/rent-vs-buy#map .
SS of observed
y-valuesaround mean
SS of fitted
values aroundthe mean
SS of errors (errors
are the actual y minusfitted or predicted y)
Page 68 of 156
Variance of Error
Variance of Y
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
73/160
3
Regression for All 43 Cities
Regression Using 38 Cities Where Rent < $1500
4000350030002500200015001000
1400000
1200000
1000000
800000
600000
400000
200000
S 70561.6
R-Sq 88.2%
R-Sq(adj) 87.9%
Apt Rent
H O M E P R I C E
Fitted Line PlotHOME PRICE = - 61366 + 339.3 Apt Rent
New York (Manhattan) NY
New York (Brooklyn) NY
San Francisco CA
Honolulu HI
New York (Queens) NY
150014001300120011001000
500000
450000
400000
350000
300000
250000
200000
S 59728.1
R-Sq 35.9%
R-Sq(adj) 34.2%
Apt Rent
H
O M E P R I C E
HOME PRICE = 1894 + 286.5 Apt Rent
Page 69 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
74/160
440000420000400000380000360000340000320000300000
100000
50000
0
-50000
-100000
-150000
Fitted Value
R e s i d u a l
Residuals Versus Fits(response is HOME PRICE)
Page 70 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
75/160
5
Do the assumptions of regression hold (approximately)?
Linearity (and randomness):
Residuals fall symmetrically around horizontal line
where Residual =0. This indicates that the linearity
and randomness assumptions hold approximately).
Homoscedasticity:
There is approximately constant vertical dispersion
of residuals which supports homoscedasticity
.
How would one identify possible investment opportunities?
Negative Residuals
( Y Y >ˆ ).
Interpret the estimated slope coefficient, b1.
On average, home price (Y) increases $286.5
per $1 increase in Rent (X) .
Interpret the constant coefficient, b0.
Hypothetical home price when apartment rent is 0.(It’s an extrapolation,since no rents are close to zero!)
Page 71 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
76/160
6
Regression Analysis: HOME PRICE versus Apt Rent
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 72076453090 72076453090 20.20 0.000
Apt Rent 1 72076453090 72076453090 20.20 0.000
Error 36 1.28428E+11 3567451386
Lack-of-Fit 34 1.24610E+11 3664999695 1.92 0.401Pure Error 2 3818260289 1909130145
Total 37 2.00505E+11
Model Summary
S R-sq R-sq(adj) R-sq(pred)
59728.1 35.95% 34.17% 28.48%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 1894 77528 0.02 0.981
Apt Rent 286.5 63.7 4.49 0.000 1.00
Regression Equation
HOME PRICE = 1894 + 286.5 Apt Rent
Recall how R 2 , R 2 (adj), and s ε summarize information provided
in the Analysis of Variance table.
=
()
() =
.1 10
.01 10 OR
.1
01 =36%
sε = () = √ 3.57 =59.7 thousand $
( ) = ()()
= . .01 10/
= 34.2%
Interpret R 2(adjusted):
34 % of the variance in home price is “explained” bymodel. ”R -sq(pred)” (28.48%) refers to the predicted R 2 and represents the estimated proportion of variance ofhome price that the model would explain in a differentsample from the same population.
Interpret sε:
Estimated standard deviation of theoretical error is 60thousand dollars.
Page 72 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
77/160
7
The analysis on the last page also includes two types of statistical tests.
1) Test for “Lack -of- Fit”
H 0 : The Model I s Appropriate versus H 1 : Model I s Not Appropr iate
Here we hope that we do not reject H0. That is, we hope to see a
large p-value (e.g., p-value > 0.2). If the p-value is small andforces us to reject H0, then we should try to find another model.
If there is substantial information that the model is inappropriate,
the “Lack -of-Fit” variance will be significantly larger than the
“Pure-Error” variance and the ratio of these two variances should
be significantly greater than 1. Here the p-value (0.4) indicates this
ratio, called the F-value (1.92), is not significantly greater than 1.(Please find these numbers in the Analysis of Variance Table.)
F-value = 1.92 =.6 6 1.1
= (−−)( )
According to the p-value (0.4), there is a 40% probability that the
F-value would be 1.92 or larger, even when the real population
variances are equal. (The estimate of the “pure-error” variance is
not very accurate —it’s based on 2 degrees of freedom.) Thus, there
is no significant indication that the model is inappropriate.
2) Test of Whether Each C oefficient (β 0 and β 1 ) I s Signif icantly
Nonzero H 0 : β =0 versus H 1 : β ≠ 0.
Here we hope to at least be able to reject H0 when testing β1 (the
coefficient of the predictor), i.e., we hope to reject H0: β1 =0 infavor of H1: β1 ≠ 0. Consequently we would prefer to see a small
p-value in this case (e.g., p-value < 0.05).
If we test at 0.05 level, we accept H0 for β0 (p-value =0.981 >0.05)
and reject H0 for β1 (p-value =0.000
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
78/160
8
The Test for Lack-of-Fit
It is only possible to do this test when there are two or more
observations that have the same predictor values (x-values). In these
cases, SS(Error) can be further decomposed into two components:
SS(Error) = ∑ ( − ̂) 2=1 = SS(Lack-of-Fit) + SS(Pure Error).
SS(Pure Error) is calculated as follows: for each group of observations
that have the same predictor values (i.e., same apartment rent value)
the sum of squares of the y-values around the mean is calculated, and
these sums are then added up across all groups. In this case there are
two groups of cities at the same rent level.
Urban Area Home Price (Y) Apt Rent (X)
Jacksonville FL 218265 1019
Bryan-College Station TX 246858 1019 Mean: 232562
Rochester MN 250723 1122
Minot ND 333300 1122
Mean: 292012
Consequently:
SS(Pure Error) = (218265 232562) 2 + (246858 232562)2
+ (250723 292012)2 + (333300 292012)2 = 3.818 billionSS(Lack-of-Fit ) = SS(Error) SS(Pure Error) =128.4 billion – 3.8 billion
= 124.6 billion Analysis of Variance Table (again)
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 72076453090 72076453090 20.20 0.000
Apt Rent 1 72076453090 72076453090 20.20 0.000
Error 36 1.28428E+11 3567451386
Lack-of-Fit 34 1.24610E+11 3664999695 1.92 0.401
Pure Error 2 3818260289 1909130145
Total 37 2.00505E+11
The degrees of freedom of SS(Pure Error) is number of observations with
common predictor values (x-values) minus the number of group means
(4-2= 2). The degrees of freedom for SS(Lack-of-Fit) is the number of
distinct predictor values minus the number of parameters in the model
(36-2=34).
Page 74 of 156
-
8/15/2019 Lectures 1-13 +Outlines-6381-2016-Presentation
79/160
9
Tests & Other Inferences Based on Regression Coefficients
Note that we can make inferences (i.e., do hypothesis tests and form
confidence intervals) about the coefficients from a regression
analysis by treating the coefficients as though they were CASE 2sample means and using t-values based on the degrees of
freedom of SS(Error). This is also true in multiple regression.
We have already considered the basic significanc