700579308
-
Upload
maleeha-ahmad -
Category
Documents
-
view
219 -
download
0
Transcript of 700579308
-
8/3/2019 700579308
1/22
Lecture11
SimpleRegression
Statisticsfor
Manag
ement
Decisions
#
Regres
sion
Regressionanalysisenables
ustoestimate
thestrengthanddirectionof
relations
betweenvariables
Spec
ificallybetweendependent(Y)and
independentvariables(x1,x2,
etc.)
Forexample:
Theeffectofyearsofeducatio
nonincome
Theeffectofenginesizeonegasmileage
Theeffectofhousesizeonprice
First
Covarian
ceandCorrelation
(toseew
hetherarelationship
EXISTS)
Statisticsfor
Management
Decisions
#
Example
Considerthefollo
wingexamplecompar
ing
thereturnsofConsolidatedMoosePasture
stock(CMP)and
theTSX300Index
Thenextslideshows25monthlyreturns
-
8/3/2019 700579308
2/22
Statisticsfor
Manag
ement
Decisions
#
Examp
leData
TSX
CMP
TSX
CMP
TS
X
CMP
x
y
x
y
x
y
3
4
-4
-3
2
4
-1
-2
-1
0
-1
1
2
-2
0
-2
4
3
4
2
1
0
-2
-1
5
3
0
0
1
2
-3
-5
-3
1
-3
-4
-5
-2
-3
-2
2
1
1
2
1
3
-2
-2
2
-1
Statisticsfor
Manag
ement
Decisions
#
Examp
le
Fromt
hedata,
itappearstha
tapositive
relationshipmayexist
Most
ofthetimewhentheTSX
isup,
CMPis
up
Likew
ise,whentheTSXisdow
n,
CMPisdown
mostofthetime
Sometimes,
theymoveinoppositedirections
Letsg
raphthisdata
Statisticsfor
Management
Decisions
#
GraphOfData
-6-4-20246
-6
-4
-2
0
2
4
6
TSX
C
MP
Statisticsfor
Management
Decisions
#
GraphOfData
-6-4-20246
-6
-4
-2
0
2
4
6
TSE
C
MP
-
8/3/2019 700579308
3/22
Statisticsfor
Manag
ement
Decisions
#
ExampleSummaryStatis
tics
Thedata
doappeartobepositive
lyrelated
Letsderivesomesummarystatisticsaboutthese
data:
Mean
s2
s
TSX
0.0
0
7.2
5
2.6
9
CMP
0.0
0
6.2
5
2.5
0
Statisticsfor
Manag
ement
Decisions
#
Observ
ations
Bothhavemeansofzeroand
standard
deviationsjustunder3
Howev
er,eachdatapointdoesnothave
simply
onedeviationfromthe
mean,
it
deviatesfromb
othmeans
Consid
erPointsA,
B,
CandDonthenext
graph
Statisticsfor
Management
Decisions
#
GraphofData
-6-4-20246
-6
-4
-2
0
2
4
6
TSX
CMP
Statisticsfor
Management
Decisions
#
Implications
Whenpointsinth
eupperrightandlowe
r
leftquadrantsdominate,
thenthesums
of
theproductsofth
edeviationswillb
e
positive
Whenpointsinth
elowerrightanduppe
r
leftquadrantsdominate,
thenthesums
of
theproductsofth
edeviationswillb
e
negative
-
8/3/2019 700579308
4/22
Statisticsfor
Manag
ement
Decisions
#
AnImp
ortantObservation
Thesu
msoftheproductsofthedeviations
willgiv
eustheappropriatesignofthe
slopeofourrelationship
Statisticsfor
Manag
ement
Decisions
#
Covariance
population
sample
Statisticsfor
Management
Decisions
#
Covariance
Inthesameun
its
as
Variance
(ifbo
th
varia
blesare
inthesameun
it),i.e.
units
square
d
Very
importan
telemen
to
fmeasuring
port
folioris
kinfin
ance
Statisticsfor
Management
Decisions
#
UsingCovarian
ce
VeryusefulinFin
anceformeasuring
portfoliorisk
Unfortunately,
itishardtointerpretfortwo
reasons:
Whatdoesthem
agnitude/sizeimply?
Theunitsarecon
fusing
-
8/3/2019 700579308
5/22
Statisticsfor
Manag
ement
Decisions
#
AMore
UsefulStatistic
Wecansimultaneouslyadjus
tforbothof
theseshortcomingsbydividingthe
covariancebythetworelevantstandard
deviations
Thisop
eration
Removestheimpactofsize&
scale
Eliminatestheunits
Statisticsfor
Manag
ement
Decisions
#
Correla
tion
Correlationindicatesapositiv
e/negative
relationbetweentwovariable
s
Both
variablesmovetogether,
eitherinthe
samedirectionorinopposited
irections
E.g.whenonegoesupsodoe
stheother
Statisticsfor
Management
Decisions
#
TheCorrelation
Coefficient
Thecorrelationcoe
fficientmeasuresthestrength
ofthelinearrelation
shipbetweentwovariables.
Coefficient=-1
pe
rfectnegative
Coefficient=0
no
relation
Coefficient=1
per
fectpositive
Statisticsfor
Management
Decisions
#
CalculatingCor
relation
population
sample
-
8/3/2019 700579308
6/22
Statisticsfor
Manag
ement
Decisions
#
Examp
le
X
Y
20
16
18
12
24
18
20
17
22
21
14
10
18
10
Create
ascatterplot,whattypeo
frelationship
exists?
Computethecorrelationcoefficie
nt
Testthesignificanceofthecorre
lationcoefficient
atthe0
.05level
Statisticsfor
Manag
ement
Decisions
#
Scatterplot
0510
15
20
25
0
5
10
15
20
25
30
X
Statisticsfor
Management
Decisions
#
Correlationcoefficient
X
Y
X2
Y2
XY
20
16
400
256
320
18
12
324
144
216
24
18
576
324
432
20
17
400
289
340
22
21
484
441
462
14
10
196
100
140
18
10
324
100
180
136
104
2704
1654
2090
Statisticsfor
Management
Decisions
#
Correlationcoefficient
InExce
l:use
the
CORRELfunc
tion
,=
CORREL(A2:A
8,B
2:B
8)
-
8/3/2019 700579308
7/22
Statisticsfor
Manag
ement
Decisions
#
Signific
ance
Hypo
thes
istes
ton
the
truepopu
lationparame
ter
(rho,
r)
H0:r=0
HA:r
0
Tes
tsta
tis
tic
(n-2
degreeso
ffree
dom
):
estima
tedfromt
he
sample
Statisticsfor
Manag
ement
Decisions
#
Signific
ancetest
3.5
63>
2.5
71(tcriticalvalue,
5d
f)
Rejectthenullh
ypothesisandco
ncludethatthe
correlationcoefficientissignificant(significantly
differentthan0)
Statisticsfor
Management
Decisions
#
Correlationvs.Regression
Correlationindicate
sarelationbetweentwo
variables
Regressionindicate
scausality
betweenan
independentanda
dependentvariable.
Changesintheindependentvariablesarethose
causingthechange
inthedependentvariable
Wellstartwithsimp
leregressiononeindependent
variableandthenlookatmultiplevariables.
Simple
regression
-
8/3/2019 700579308
8/22
Statisticsfor
Manag
ement
Decisions
#
Ascatterplot
0
500
1000
1500
2000
2500
3000
3500
4000
0
2000
4000
6000
8000
10000
12000
Income
Theinco
me/consumptionexample
Depen
dentvariable(y)consumption
Indepe
ndentvariable(x)income
Statisticsfor
Manag
ement
Decisions
#
Thesto
ry
Wethin
kthatincomeaffectsconsumption
Them
oreyoumakethemoreyoubuy
Weare
lookingtostudythisrelat
ionshipinmore
depth
Isthereindeedasignificanteffect?
What
isthemagnitudeofthiseffect?
(Welimitourdiscussiontolineareffects)
Wellcreateandtestaregressio
nmodelofthe
relation
shipbetweenconsumptio
nandincome
Statisticsfor
Management
Decisions
#
Modelingthelin
earrelationships
Premise:thereisatruer
elationshipbetweenincome
and
consumption.
Thisrelationshipcanbe
describedinalinearform:
Ormoregenerally:
Statisticsfor
Management
Decisions
#
SimpleLinearR
egressionModel
Notethatboth
and
arepopulation
parameterswhich
areusuallyunknowna
nd
henceestimatedfrom
thedata.
y
x
run
rise
=slope
(=rise/run
)
=y-intercept
-
8/3/2019 700579308
9/22
Statisticsfor
Manag
ement
Decisions
#
Modelingthelinearrelationships
Withsim
plelinear
regressionwetryto
capturethetrue
relationshipbetweenthe
twovaria
blewithasingle
line.
Theestim
atedregression
modelis:
0
500
1000
1500
2000
2500
3000
3500
4000
0
2000
4000
6000
8000
10000
12000
Income
Statisticsfor
Manag
ement
Decisions
#
Unders
tandingtheerrort
erm
Noline
canhitallthepointsinthescatter
plot,orevenmostofthepoin
ts
Theam
ountwemissbyisca
llederroror
residu
al.
Itisthedifferencebetweenthe
predictedvalue
(from
theregressionline)and
thetruevalue
Statisticsfor
Management
Decisions
#
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0
5000
10000
15000
20000
25000
30000
35000
Population
(thousands)
error(residual,deviation)
Agoodregressionline
willbetheonethatminimizesthe
totalofthesquarede
rrors(SSE).
Morefo
rmally
-
8/3/2019 700579308
10/22
Statisticsfor
Manag
ement
Decisions
#
Regres
sionAnalysis
Astatisticaltechniquefordeterminingthe
bestfitlinethroughaseriesofdata
There
gressionlineistheuni
quelinethat
minimizesthetotalofthesqu
ared
deviations(orerrors).
Thestatisticaltermi
sSumo
f
SquaredErrors
orSS
E
This
lineiscalledtheleastsquares
line
Statisticsfor
Manag
ement
Decisions
#
RequiredConditions-e
Thepr
obabilitydistributionof
ei
snormal
E(e)=
0
sei
sconstantandindepende
ntofx,
the
indepe
ndentvariable
Theva
lueofea
ssociatedwithany
particu
larvalueofyisindepe
ndentofthe
valueofea
ssociatedwithan
yothervalue
ofy
Statisticsfor
Management
Decisions
#
Findingtheline
equation
Theequation:
Where:
estimated
b1=
XY-
X
Y
n
Statisticsfor
Management
Decisions
#
Example
Obs.
#o
fs
ites
(X)
Capac
ity
(Y)
XY
X2
1
13
81.8
2
1063.6
6
169
2
10
81.8
2
818.2
100
3
13
58.1
8
756.3
4
169
4
8
43.6
4
349.1
2
64
5
5
40
200
25
6
7
36.3
6
254.5
2
49
7
4
34.5
5
138.2
16
8
8
32.7
3
261.8
4
64
9
7
29.0
9
203.6
3
49
10
3
25.4
5
76.3
5
9
Total
SX=
78
SY=
463.6
4
SXY=
4121.8
6
SX2
=714
calculated
Dataon
refining
capacityof
anoil
companys
sites
-
8/3/2019 700579308
11/22
Statisticsfor
Manag
ement
Decisions
#
The
lea
stsquares
line
Meaning:capacitywill
increaseby
4.7
87unitsforevery
siteadded
Understandingand
assessingtheregression
model
Statisticsfor
Management
Decisions
#
Example
TheHarrisCorporation
hasrecentlydoneastu
dy
ofhomesthathavesol
din
theDetroitareawithinthe
past18months.Data
wererecordedforthe
askingprice(x)andthe
numberofweeks(y)each
homewasonthemark
et
beforeitsold.
Weekson
theMarket
Asking
Price
23
$76,5
00
48
$102,0
00
9
$53,0
00
26
$84,2
00
20
$73,0
00
40
$125,0
00
51
$109,0
00
18
$60,0
00
25
$87,0
00
62
$94,0
00
33
$76,0
00
11
$90,0
00
15
$61,0
00
26
$86,0
00
27
$70,0
00
56
$133,0
00
12
$93,0
00
Statisticsfor
Management
Decisions
#
Themodeltobeestimated
-
8/3/2019 700579308
12/22
Statisticsfor
Manag
ement
Decisions
#
Excel
TheRegressionTool
Tools
Data
Analysis
ChooseRegressionfromthedialogueboxmenu.
Statisticsfor
Manag
ement
Decisions
#
The
regression
output(Excel)
SUMMARYOUTPUT
RegressionStatistics
MultipleR
0.705948422
R
Square
0.498363174
AdjustedRS
quare
0.464920719
StandardError
11.96417889
Observations
17
ANOVA
df
SS
MS
F
SignificanceF
Regression
1
2133.111647
2133.1116471
4.90211089
0.001541086
Residual
15
2147.123648
143.1415765
Total
16
4280.235294
Coefficients
StandardError
tStat
P-value
Intercept
-16.22506178
12.20252667
-1.3296477220
.203501866
AskingPrice
0.000528163
0.000136818
3.8603252320
.001541086
Statisticsfor
Management
Decisions
#
Theestimatedmodel
=
-16.2
251+0.00053x
Intercept(b0):-16.2
251
Slope(b1):0.0
00
53
Whatisthemean
ingof-16.2
251?
Istheeffectofas
kingpriceonnumbero
f
weekssignificant
?
lil
.
.
.
.
i
i
ii
i
.
.
.
.
i
l
.
.
l
.
Coefficients
Standard
Error
tStat
P-value
Intercept
-16.22506178
12.202
52667
-1.329647722
0.203501866
AskingPrice
0.000528163
0.0001
36818
3.860325232
0.001541086
Statisticsfor
Management
Decisions
#
Test1:TestingtheSlope
Thehypotheses:
HO:b1=0
HA:b10
Wefollowat-test:
Thestandard
error
oftheestimat
e
e
stimated
true
OR
-
8/3/2019 700579308
13/22
Statisticsfor
Manag
ement
Decisions
#
Thesta
ndarderrorofthe
estimate
Thesta
ndarderroroftheestimat
e(Se
orSEE)
measureshowthedatavariesar
oundthe
regress
ionline
Simila
rtotheconceptofstandarddeviation
WewouldlikeSet
obesmall
thesmalleritisthe
larger
thet-statisticisandthemore
likelyweareto
reject
thenullhypothesisthattheslo
peiszero
k=1fo
rsimpleregression
Statisticsfor
Manag
ement
Decisions
#
Whatd
owehaveinthetables?
SUMMARYOUTPUT
RegressionStatistics
MultipleR
0.705948422
R
Square
0.498363174
AdjustedRS
quare
0.464920719
StandardError
11.96417889
Observations
17
ANOVA
df
SS
MS
F
SignificanceF
Regression
1
2133.111647
2133.1116471
4.90211089
0.001541086
Residual
15
2147.123648
143.1415765
Total
16
4280.235294
Coefficients
StandardError
tStat
P-value
Intercept
-16.22506178
12.20252667
-1.3296477220
.203501866
AskingPrice
0.000528163
0.000136818
3.8603252320
.001541086
S=SEE
SSE
Sb1
b1
Statisticsfor
Management
Decisions
#
Test1:Testing
the
Slope
lil
.
.
.
.
i
i
ii
i
.
.
.
.
i
l
.
.
l
.
Coefficients
Standard
Error
tStat
P-value
Intercept
-16.22506178
12.202
52667
-1.329647722
0.203501866
AskingPrice
0.000528163
0.0001
36818
3.860325232
0.001541086
t=
0.0
00528163-
0
0.0
00136818
=>
Canconc
lude
tha
tthes
lope
isdifferen
t
from
zero
[
/2
,(n
-2)df]
Statisticsfor
Management
Decisions
#
TestingtheSlope
Ifwewishtotestforpositive
ornegative
linea
r
relationshipswecond
uctone-tailtests,
i.e.our
researchhypothesisb
ecome:
H1:10
(testing
forapositiveslope)
Ofcourse,
thenullhy
pothesisremains:H0:1
=0.
-
8/3/2019 700579308
14/22
Statisticsfor
Manag
ement
Decisions
#
Isthet
estdifferentfrom1
?
Thehypotheses:
HO:b1=1
HA:b11
t=
0.0
00528163-1
0.0
00136818
Statisticsfor
Manag
ement
Decisions
#
Test2:
modelfit
Testing
theoverallsignificanceo
fthemodel
H0
:b1=b2=b3==
0
H1
:atleastonebisdifferentthanzero
Weneedtoseethatatleastoneofourindependent
variab
leshasasignificantaffect
Note:
weonlyhaveb1sothistests
houldgiveusthe
same
resultsastheprevioust-test(
andwellseethatit
does)
Thetes
tstatisticisanF-ratio
WellhaveanANOVAtable(fromExcel)
Statisticsfor
Management
Decisions
#
The
regression
output(Excel)
SUMMARYOUTPUT
RegressionStatistics
MultipleR
0.705948422
RSquare
0.498363174
AdjustedR
Square
0.464920719
StandardError
11.96417889
Observations
17
ANOVA
df
SS
MS
F
SignificanceF
Regression
1
2133.111647
2133.111647
14.90211089
0.001541086
Residual
15
2147.123648
143.1415765
Total
16
4280.235294
Coefficients
Sta
ndardError
tStat
P-value
Intercept
-16.22506178
12.20252667
-1.329647722
0.203501866
AskingPrice
0.000528163
0.000136818
3.860325232
0.001541086
Statisticsfor
Management
Decisions
#
FRatio
MeanSq
uares=SS/df
Significantmo
del
ANOVA
df
SS
MS
F
Significa
nceF
Regression
1
21
33.111647
2133.111647
14.90211089
0.001541086
Residual
15
21
47.123648
143.1415765
Total
16
42
80.235294
Degreesof
Freedom
Sumof
Squares
Mean
Square
F-Statistic
Regression
1
SSR
MSR
=SSR/1
F=MSR/MSE
Residual
n-2
SSE
MSE=SSE/(n-2)
Total
n-1
SST
Thegeneralformf
orasimple
regression:
-
8/3/2019 700579308
15/22
Statisticsfor
Manag
ement
Decisions
#
SymmetryinTesting
SUMMARYOUTPUT
Regr
essionStatistics
MultipleR
0.705948422
R
Square
0.498363174
AdjustedRS
quare
0.464920719
StandardError
11.96417889
Observations
17
ANOVA
df
SS
MS
F
SignificanceF
Regression
1
2133.111647
2133.111647
14.90211089
0.001541086
Residual
15
2147.123648
143.1415765
Total
16
4280.235294
Coefficients
StandardError
tStat
P-value
Intercept
-16.22506178
12.20252667
-1.329647722
0.203501866
AskingPrice
0.000528163
0.000136818
3.860325232
0.001541086
Statisticsfor
Manag
ement
Decisions
#
Test3:
R2-CoefficientofD
etermination
TheR2t
ellsoftheproportionofthevariabilityinthe
depende
ntvariableisexplainedbythe
independent
variable
Wewo
uldliketoseehighvalues(1isthehighest)
Note:forsimpleregression,
R-squared
isthesquareofthe
correlationcoefficient(r):R2=(r)2.
Statisticsfor
Management
Decisions
#
CoefficientofD
etermination
Aswedidwithanalysisofvariance,wecanpartitionthe
variationinyintotwoparts:
SST=Variationiny=SSE+SSR
SSESum
ofSquaresE
rrormeasurestheamountof
variationinythatremains
unexplained(i.e.
duetoerror
)
SSRSum
ofSquaresR
egressionmeasurestheamount
ofvariationinyexplained
byvariationintheindepende
nt
variablex.
Statisticsfor
Management
Decisions
#
In
the
Exceloutput
SUMMARYOUTPUT
RegressionStatistics
MultipleR
0.7
05948422
R
Square
0.4
98363174
AdjustedR
Square
0.4
64920719
StandardError
11.9
6417889
Observations
17
ANOVA
df
SS
MS
F
Significan
ceF
Regression
1
21
33.1
11647
2133.1
11647
14.9
0211089
0.0
01541086
Residual
15
21
47.1
23648
143.1
415765
Total
16
42
80.2
35294
Coefficients
StandardError
tStat
P-value
Intercept
-16.2
2506178
12
.20252667
-1.3
29647722
0.2
03501866
AskingPrice
0.0
00528163
0.000136818
3.8
60325232
0.0
01541086
2133.1
11647
4280.2
35294
-
8/3/2019 700579308
16/22
Statisticsfor
Manag
ement
Decisions
#
Coe
ffic
ien
to
fDe
term
inat
ion
R2hasava
lueo
f.4
984
.Thismeans
49.8
4%
ofthevaria
tion
intheweek
sonmarke
t(y)isexp
laine
dby
thevaria
tion
inthe
as
kingprice
(x).Therema
ining
50
.16%
is
unexplained
,i.e.
due
toerror.
Un
like
theva
lueo
fa
tes
ts
tatis
tic,
thecoe
fficientof
determination
doesnothaveacriticalvalue
tha
tena
bles
us
todraw
conc
lus
ions.
Ingenera
lthe
higher
theva
lueo
fR2
,the
better
themo
de
l
fitsthe
data.
R2=
1:
Perfec
tma
tchbe
tween
the
linean
dthe
da
tapo
ints
.
R2=
0:
Thereareno
linearre
lations
hipbe
tweenxan
dy.
Statisticsfor
Manag
ement
Decisions
#
Summa
ryofsimpleregress
ionoutput
SUMMARYOUTPU
T
Regression
Statistics
MultipleR
0.7
05948422Correlationcoefficinetbetweenxandy
R
Square
0.4
98363174Coefficientofdetermination
AdjustedRSquare
0.4
64920719
StandardError
11.9
6417889S
Observations
17N
ANOVA
df
SS
MS
F
SignificanceF
Regression
1
2133.1
11647
2133.1
11647
14.9
02
11089
0.0
01541086
Residual
15
2147.1
23648
143.1
415765
Total
16
4280.2
35294
Coefficients
StandardError
tStat
P-value
Intercept
-16.2
2506178
12.2
0252667
-1.3
29647722
0.2
035
01866
AskingPrice
0.0
00528163
0.0
00136818
3.8
60325232
0.0
015
41086
b0a
ndb1
Sb0
andSb1
Learnthe
relationships
betweenthethree
tablescomponents
Confidenceandprediction
intervals
Statisticsfor
Management
Decisions
#
Prediction
Supposeyouwantedto
knowhowmanyweeksitwou
ld
taketosellahousepricedat$100,0
00
Theregressionequationwas:=
-16.2
251+0.0
0053
x
Substitutex=100,0
00
y=-16.2
251+0.00053*(100,0
00)=36.7
749
Importantsidenote:payattentiontotheunitsofmeasurementin
thedata
y=36.7
749isapointe
stimateofthenumberofweek
s
Pointestimatesaresubjecttoerrors
whatisthetrueprice?
-
8/3/2019 700579308
17/22
Statisticsfor
Manag
ement
Decisions
#
Scatterplot
010
20
30
40
50
60
70
$50,
000
$60,
000
$70,
000
$80,0
00
$90,0
00
$100,0
00
$110,
000
$120,0
00$
130,0
00
$140,0
00
Needto
constructapredictioni
ntervalarou
ndthisestimate
Statisticsfor
Manag
ement
Decisions
#
Prediction
interval
Xp=100000,
y=36.5
9126539
-40
-200
20
40
60
80
100 5
0000
62500
75000
87500
100000
112500
125000
Price
f
Predictioninterval
Statisticsfor
Management
Decisions
#
Prediction
Interval
y
ta/2,n-2
se
1+
1 n
OR
y
ta/2,n-2se
1+
1 n
(textboo
k)
derived
Statisticsfor
Management
Decisions
#
Inourexample
-
8/3/2019 700579308
18/22
Statisticsfor
Manag
ement
Decisions
#
Adiffer
entquestion
Suppos
eIownseveralpropertiesinDetroitand
priceth
emallat$100,0
00.
What
istheexpected
numbe
rofweeksforsellingthesehomes?
Instead
ofpredictinganindividua
lvalue,
Iam
askingforanexpectedvalue(i.e
.themean
numberofweek)
Wecanuseaconfidenceintervalfortheestimationof
them
ean.
Thed
istinctionbetweenconfidenceintervaland
predic
tionintervalissimilartothedifferencebetween
theCIofthemeanvs.
theCIofanindividualvalue
Statisticsfor
Manag
ement
Decisions
#
ConfidenceInterval
Narrowerthantheprediction
interval
-40
-200
20
40
60
80
100
50000
62500
75000
87500
100000
112500
125000
Statisticsfor
Management
Decisions
#
ConfidenceInte
rval
y
ta/2,n-2
se
1 nOR
y
ta/2,n-2se
1 n
(textboo
k)
derived
Statisticsfor
Management
Decisions
#
Inourexample
Note:Point,PredictionandCo
nfidenceintervalsinExcelareobtainedby
Add-
Ins>DataAnalysisPlus>PredictionInterval
-
8/3/2019 700579308
19/22
Statisticsfor
Manag
ement
Decisions
#
Thecu
rve
Bothin
tervalsarecurved,becoming
narrow
eraroundtheaverage
valueofx(x-
bar).
ThecloserXgistoX-barthe
betterour
estima
teandthusthenarrow
erthe
interva
l.
Statisticsfor
Manag
ement
Decisions
#
Examp
le
Thefollowingsummarystatistics
wereobtained
fromaregressionanalysis:
Provide
a90%C
Ifortheaveragey,givenxg=80
Statisticsfor
Management
Decisions
#
Solution
y
ta/2,n-2
se
1 n
Needto
compute
using
SSE
80
67.2
0
9,7
84345.5
0*80=-17,8
56
a=0.1
n-
2=18
Statisticsfor
Management
Decisions
#
Solution
Computingthestandarderrorofthe
estimate
Fromt
hettable
:t0.0
5,1
8=1.7
34
-
8/3/2019 700579308
20/22
Statisticsfor
Manag
ement
Decisions
#
Solutio
n
y
ta
/2,n-
2se
1 n
Statisticsfor
Manag
ement
Decisions
#
Regres
sionDiagnostics
Thereare
threeconditionsthatarerequiredinorderto
perform
aregressionanalysis.Theseare:
Theerrorvariablemustbenormally
distributed,
Theerrorvariablemusthaveaconstantvariance,&
Theerrorsmustbeindependentofeachother.
Howcanw
ediagnoseviolationsoftheseconditions?
ResidualAnalysis,thatis,examine
thedifferences
between
theactualdatapointsandthosepredictedby
thelinea
requation
Statisticsfor
Management
Decisions
#
Res
idua
lAnalys
is
Reca
llthe
dev
iations
be
tween
theac
tua
lda
tapo
intsan
dthe
regress
ion
linewerecalle
dresiduals
.Exce
lca
lcu
lates
res
idua
lsasparto
fitsregress
ionana
lys
is:
Wecanuse
theseres
iduals
tode
term
inew
he
ther
theerror
varia
bleisnonnorma
l,w
hether
theerrorvariance
iscons
tant,
an
dw
he
ther
theerrorsarein
depen
den
t
X
Y
Fitted
Res
idua
l
St.res
id
76500
23
24
.17942851
-
1.1
79428507
-
0.1
02346097
102000
48
37
.64759194
10
.35240806
0.9
06924006
53000
9
11
.76759162
-
2.7
67591621
-
0.2
59720474
84200
26
28
.2462857
-
2.2
46285699
-
0.1
93608631
73000
20
22
.33085706
-
2.3
30857056
-
0.2
03458396
125000
40
49
.79534718
-
9.7
95347185
-
0.9
46239962
109000
51
41
.34473484
9.6
55265163
0.8
62374431
60000
18
15
.46473452
2.5
35265477
0.2
30053966
87000
25
29
.72514286
-4.7
2514286
-
0.4
07099584
94000
62
33
.42228576
28
.57771424
2.4
71464614
76000
33
23
.91534687
9.0
84653129
0.7
88907244
90000
11
31
.30963267
-
20
.30963267
-
1.7
51163484
-
-
Statisticsfor
Management
Decisions
#
Nonnorma
lity
Wecan
take
theres
idua
lsan
dpu
tthem
intoa
histog
ram
tov
isua
llyc
hec
kforn
orma
lity
we
re
loo
king
fora
bells
hape
dhistogram
()w
iththe
mean
close
tozero
().
-
8/3/2019 700579308
21/22
Statisticsfor
Manag
ement
Decisions
#
Heteroscedasticity
Whenther
equirementofaconstantvaria
nceisviolated,we
haveaconditionofheteroscedasticity.
Wecandia
gnoseheteroscedasticitybyplottingtheresidual
againstthe
predictedy.
Statisticsfor
Manag
ement
Decisions
#
He
terosce
das
tic
ity
Ifthevarian
ceo
ftheerrorvaria
ble(
)is
no
tcons
tan
t,then
we
have
heteroscedasticity
.Here
s
thep
loto
fthe
res
idual
aga
ins
tthepre
dictedva
lueofy:
ther
e
doesntappearto
be
a
chan
ge
in
the
spread
ofthe
plotted
points,
therefore
no
heteroscedasticity
Statisticsfor
Management
Decisions
#
Nonindependence
oftheErrorVariable(f
or
timeseriesdatanotinthiscourse)
Ifweweretoobservethenu
mberofweekshousesstayonth
e
marketformanyweeksfor,say,ayear,thatwouldconstitu
tea
timeseries.
Whenthedataaretimeseries,theerrorsoftenarecorrelated.
Errortermsthatarecorrelatedovertimearesaidtobe
autocorrelatedorseriallyc
orrelated.
Wecanoftendetectautocorrelationbygraphingthe
residualsagainstthetime
periods.Ifapatternemerges,itis
likelythattheindependence
requirementisviolated.
Statisticsfor
Management
Decisions
#
Nonindependen
ceoftheError
Variable
Patternsintheappearanceoftheresidualsovertime
indicatesthatautocorrelationexists:
Note
the
runs
ofpositive
residu
als,
replaced
by
runs
ofnegative
residuals
Note
the
oscillating
behavior
ofthe
residuals
around
zero.
-
8/3/2019 700579308
22/22
Statisticsfor
Manag
ement
Decisions
#
Outliers
Anoutlie
risanobservationthat
isunusually
smallorunusuallylarge.
E.g.
inou
rhousesexamplethep
ricesrange
from$53,0
00to$133,0
00.
Sup
posewehave
avalue
of$1,0
00,0
00t
hispointisan
outlier
.
Statisticsfor
Manag
ement
Decisions
#
Outliers
Possiblere
asonsfortheexistenceofoutliersinclude:
Therewasanerrorinrecordingthevalue
Thepoin
tshouldnothavebeeninclude
dinthesample
Perhaps
theobservationisindeedvalid
.
Outlierscanbeeasilyidentifiedfrom
ascatterplot.
Iftheabsolutevalueofthestandardresid
ualis>2,we
suspectthepointmaybeanoutlierand
investigatefurther.
Theyneed
tobedealtwithsincetheycaneasily
influenc
etheleastsquaresline
Statisticsfor
Management
Decisions
#
Outliersoure
xample
OlgaKaminer,2009
Statisticsfor
Management
Decisions
#
rocedureforReg
ressionDiagnostics
Developamodelthathasatheoreticalbasis.
Gatherdataforthetwovariablesinthemodel.
Drawthescatterdia
gram
todeterminewhetheralinear
modelappearstobeappropriate.
Identifypossible
outliers.
Determinetheregre
ssionequation.
Calculatetheresidu
alsandchecktherequired
conditions(normality,
homoscedasticity,
independen
ce)
Assessthemodels
fit(t-testfortheslope,
theove
rall
F-ratio,
R2)
Ifthemodelfitsth
edata,usetheregression
equationtopredicta
particularvalue
(confidence/prediction
intervals)ofthedependent
variable.