700579308

download 700579308

of 22

Transcript of 700579308

  • 8/3/2019 700579308

    1/22

    Lecture11

    SimpleRegression

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Regres

    sion

    Regressionanalysisenables

    ustoestimate

    thestrengthanddirectionof

    relations

    betweenvariables

    Spec

    ificallybetweendependent(Y)and

    independentvariables(x1,x2,

    etc.)

    Forexample:

    Theeffectofyearsofeducatio

    nonincome

    Theeffectofenginesizeonegasmileage

    Theeffectofhousesizeonprice

    First

    Covarian

    ceandCorrelation

    (toseew

    hetherarelationship

    EXISTS)

    Statisticsfor

    Management

    Decisions

    #

    Example

    Considerthefollo

    wingexamplecompar

    ing

    thereturnsofConsolidatedMoosePasture

    stock(CMP)and

    theTSX300Index

    Thenextslideshows25monthlyreturns

  • 8/3/2019 700579308

    2/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Examp

    leData

    TSX

    CMP

    TSX

    CMP

    TS

    X

    CMP

    x

    y

    x

    y

    x

    y

    3

    4

    -4

    -3

    2

    4

    -1

    -2

    -1

    0

    -1

    1

    2

    -2

    0

    -2

    4

    3

    4

    2

    1

    0

    -2

    -1

    5

    3

    0

    0

    1

    2

    -3

    -5

    -3

    1

    -3

    -4

    -5

    -2

    -3

    -2

    2

    1

    1

    2

    1

    3

    -2

    -2

    2

    -1

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Examp

    le

    Fromt

    hedata,

    itappearstha

    tapositive

    relationshipmayexist

    Most

    ofthetimewhentheTSX

    isup,

    CMPis

    up

    Likew

    ise,whentheTSXisdow

    n,

    CMPisdown

    mostofthetime

    Sometimes,

    theymoveinoppositedirections

    Letsg

    raphthisdata

    Statisticsfor

    Management

    Decisions

    #

    GraphOfData

    -6-4-20246

    -6

    -4

    -2

    0

    2

    4

    6

    TSX

    C

    MP

    Statisticsfor

    Management

    Decisions

    #

    GraphOfData

    -6-4-20246

    -6

    -4

    -2

    0

    2

    4

    6

    TSE

    C

    MP

  • 8/3/2019 700579308

    3/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    ExampleSummaryStatis

    tics

    Thedata

    doappeartobepositive

    lyrelated

    Letsderivesomesummarystatisticsaboutthese

    data:

    Mean

    s2

    s

    TSX

    0.0

    0

    7.2

    5

    2.6

    9

    CMP

    0.0

    0

    6.2

    5

    2.5

    0

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Observ

    ations

    Bothhavemeansofzeroand

    standard

    deviationsjustunder3

    Howev

    er,eachdatapointdoesnothave

    simply

    onedeviationfromthe

    mean,

    it

    deviatesfromb

    othmeans

    Consid

    erPointsA,

    B,

    CandDonthenext

    graph

    Statisticsfor

    Management

    Decisions

    #

    GraphofData

    -6-4-20246

    -6

    -4

    -2

    0

    2

    4

    6

    TSX

    CMP

    Statisticsfor

    Management

    Decisions

    #

    Implications

    Whenpointsinth

    eupperrightandlowe

    r

    leftquadrantsdominate,

    thenthesums

    of

    theproductsofth

    edeviationswillb

    e

    positive

    Whenpointsinth

    elowerrightanduppe

    r

    leftquadrantsdominate,

    thenthesums

    of

    theproductsofth

    edeviationswillb

    e

    negative

  • 8/3/2019 700579308

    4/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    AnImp

    ortantObservation

    Thesu

    msoftheproductsofthedeviations

    willgiv

    eustheappropriatesignofthe

    slopeofourrelationship

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Covariance

    population

    sample

    Statisticsfor

    Management

    Decisions

    #

    Covariance

    Inthesameun

    its

    as

    Variance

    (ifbo

    th

    varia

    blesare

    inthesameun

    it),i.e.

    units

    square

    d

    Very

    importan

    telemen

    to

    fmeasuring

    port

    folioris

    kinfin

    ance

    Statisticsfor

    Management

    Decisions

    #

    UsingCovarian

    ce

    VeryusefulinFin

    anceformeasuring

    portfoliorisk

    Unfortunately,

    itishardtointerpretfortwo

    reasons:

    Whatdoesthem

    agnitude/sizeimply?

    Theunitsarecon

    fusing

  • 8/3/2019 700579308

    5/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    AMore

    UsefulStatistic

    Wecansimultaneouslyadjus

    tforbothof

    theseshortcomingsbydividingthe

    covariancebythetworelevantstandard

    deviations

    Thisop

    eration

    Removestheimpactofsize&

    scale

    Eliminatestheunits

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Correla

    tion

    Correlationindicatesapositiv

    e/negative

    relationbetweentwovariable

    s

    Both

    variablesmovetogether,

    eitherinthe

    samedirectionorinopposited

    irections

    E.g.whenonegoesupsodoe

    stheother

    Statisticsfor

    Management

    Decisions

    #

    TheCorrelation

    Coefficient

    Thecorrelationcoe

    fficientmeasuresthestrength

    ofthelinearrelation

    shipbetweentwovariables.

    Coefficient=-1

    pe

    rfectnegative

    Coefficient=0

    no

    relation

    Coefficient=1

    per

    fectpositive

    Statisticsfor

    Management

    Decisions

    #

    CalculatingCor

    relation

    population

    sample

  • 8/3/2019 700579308

    6/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Examp

    le

    X

    Y

    20

    16

    18

    12

    24

    18

    20

    17

    22

    21

    14

    10

    18

    10

    Create

    ascatterplot,whattypeo

    frelationship

    exists?

    Computethecorrelationcoefficie

    nt

    Testthesignificanceofthecorre

    lationcoefficient

    atthe0

    .05level

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Scatterplot

    0510

    15

    20

    25

    0

    5

    10

    15

    20

    25

    30

    X

    Statisticsfor

    Management

    Decisions

    #

    Correlationcoefficient

    X

    Y

    X2

    Y2

    XY

    20

    16

    400

    256

    320

    18

    12

    324

    144

    216

    24

    18

    576

    324

    432

    20

    17

    400

    289

    340

    22

    21

    484

    441

    462

    14

    10

    196

    100

    140

    18

    10

    324

    100

    180

    136

    104

    2704

    1654

    2090

    Statisticsfor

    Management

    Decisions

    #

    Correlationcoefficient

    InExce

    l:use

    the

    CORRELfunc

    tion

    ,=

    CORREL(A2:A

    8,B

    2:B

    8)

  • 8/3/2019 700579308

    7/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Signific

    ance

    Hypo

    thes

    istes

    ton

    the

    truepopu

    lationparame

    ter

    (rho,

    r)

    H0:r=0

    HA:r

    0

    Tes

    tsta

    tis

    tic

    (n-2

    degreeso

    ffree

    dom

    ):

    estima

    tedfromt

    he

    sample

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Signific

    ancetest

    3.5

    63>

    2.5

    71(tcriticalvalue,

    5d

    f)

    Rejectthenullh

    ypothesisandco

    ncludethatthe

    correlationcoefficientissignificant(significantly

    differentthan0)

    Statisticsfor

    Management

    Decisions

    #

    Correlationvs.Regression

    Correlationindicate

    sarelationbetweentwo

    variables

    Regressionindicate

    scausality

    betweenan

    independentanda

    dependentvariable.

    Changesintheindependentvariablesarethose

    causingthechange

    inthedependentvariable

    Wellstartwithsimp

    leregressiononeindependent

    variableandthenlookatmultiplevariables.

    Simple

    regression

  • 8/3/2019 700579308

    8/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Ascatterplot

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0

    2000

    4000

    6000

    8000

    10000

    12000

    Income

    Theinco

    me/consumptionexample

    Depen

    dentvariable(y)consumption

    Indepe

    ndentvariable(x)income

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Thesto

    ry

    Wethin

    kthatincomeaffectsconsumption

    Them

    oreyoumakethemoreyoubuy

    Weare

    lookingtostudythisrelat

    ionshipinmore

    depth

    Isthereindeedasignificanteffect?

    What

    isthemagnitudeofthiseffect?

    (Welimitourdiscussiontolineareffects)

    Wellcreateandtestaregressio

    nmodelofthe

    relation

    shipbetweenconsumptio

    nandincome

    Statisticsfor

    Management

    Decisions

    #

    Modelingthelin

    earrelationships

    Premise:thereisatruer

    elationshipbetweenincome

    and

    consumption.

    Thisrelationshipcanbe

    describedinalinearform:

    Ormoregenerally:

    Statisticsfor

    Management

    Decisions

    #

    SimpleLinearR

    egressionModel

    Notethatboth

    and

    arepopulation

    parameterswhich

    areusuallyunknowna

    nd

    henceestimatedfrom

    thedata.

    y

    x

    run

    rise

    =slope

    (=rise/run

    )

    =y-intercept

  • 8/3/2019 700579308

    9/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Modelingthelinearrelationships

    Withsim

    plelinear

    regressionwetryto

    capturethetrue

    relationshipbetweenthe

    twovaria

    blewithasingle

    line.

    Theestim

    atedregression

    modelis:

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0

    2000

    4000

    6000

    8000

    10000

    12000

    Income

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Unders

    tandingtheerrort

    erm

    Noline

    canhitallthepointsinthescatter

    plot,orevenmostofthepoin

    ts

    Theam

    ountwemissbyisca

    llederroror

    residu

    al.

    Itisthedifferencebetweenthe

    predictedvalue

    (from

    theregressionline)and

    thetruevalue

    Statisticsfor

    Management

    Decisions

    #

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    0

    5000

    10000

    15000

    20000

    25000

    30000

    35000

    Population

    (thousands)

    error(residual,deviation)

    Agoodregressionline

    willbetheonethatminimizesthe

    totalofthesquarede

    rrors(SSE).

    Morefo

    rmally

  • 8/3/2019 700579308

    10/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Regres

    sionAnalysis

    Astatisticaltechniquefordeterminingthe

    bestfitlinethroughaseriesofdata

    There

    gressionlineistheuni

    quelinethat

    minimizesthetotalofthesqu

    ared

    deviations(orerrors).

    Thestatisticaltermi

    sSumo

    f

    SquaredErrors

    orSS

    E

    This

    lineiscalledtheleastsquares

    line

    Statisticsfor

    Manag

    ement

    Decisions

    #

    RequiredConditions-e

    Thepr

    obabilitydistributionof

    ei

    snormal

    E(e)=

    0

    sei

    sconstantandindepende

    ntofx,

    the

    indepe

    ndentvariable

    Theva

    lueofea

    ssociatedwithany

    particu

    larvalueofyisindepe

    ndentofthe

    valueofea

    ssociatedwithan

    yothervalue

    ofy

    Statisticsfor

    Management

    Decisions

    #

    Findingtheline

    equation

    Theequation:

    Where:

    estimated

    b1=

    XY-

    X

    Y

    n

    Statisticsfor

    Management

    Decisions

    #

    Example

    Obs.

    #o

    fs

    ites

    (X)

    Capac

    ity

    (Y)

    XY

    X2

    1

    13

    81.8

    2

    1063.6

    6

    169

    2

    10

    81.8

    2

    818.2

    100

    3

    13

    58.1

    8

    756.3

    4

    169

    4

    8

    43.6

    4

    349.1

    2

    64

    5

    5

    40

    200

    25

    6

    7

    36.3

    6

    254.5

    2

    49

    7

    4

    34.5

    5

    138.2

    16

    8

    8

    32.7

    3

    261.8

    4

    64

    9

    7

    29.0

    9

    203.6

    3

    49

    10

    3

    25.4

    5

    76.3

    5

    9

    Total

    SX=

    78

    SY=

    463.6

    4

    SXY=

    4121.8

    6

    SX2

    =714

    calculated

    Dataon

    refining

    capacityof

    anoil

    companys

    sites

  • 8/3/2019 700579308

    11/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    The

    lea

    stsquares

    line

    Meaning:capacitywill

    increaseby

    4.7

    87unitsforevery

    siteadded

    Understandingand

    assessingtheregression

    model

    Statisticsfor

    Management

    Decisions

    #

    Example

    TheHarrisCorporation

    hasrecentlydoneastu

    dy

    ofhomesthathavesol

    din

    theDetroitareawithinthe

    past18months.Data

    wererecordedforthe

    askingprice(x)andthe

    numberofweeks(y)each

    homewasonthemark

    et

    beforeitsold.

    Weekson

    theMarket

    Asking

    Price

    23

    $76,5

    00

    48

    $102,0

    00

    9

    $53,0

    00

    26

    $84,2

    00

    20

    $73,0

    00

    40

    $125,0

    00

    51

    $109,0

    00

    18

    $60,0

    00

    25

    $87,0

    00

    62

    $94,0

    00

    33

    $76,0

    00

    11

    $90,0

    00

    15

    $61,0

    00

    26

    $86,0

    00

    27

    $70,0

    00

    56

    $133,0

    00

    12

    $93,0

    00

    Statisticsfor

    Management

    Decisions

    #

    Themodeltobeestimated

  • 8/3/2019 700579308

    12/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Excel

    TheRegressionTool

    Tools

    Data

    Analysis

    ChooseRegressionfromthedialogueboxmenu.

    Statisticsfor

    Manag

    ement

    Decisions

    #

    The

    regression

    output(Excel)

    SUMMARYOUTPUT

    RegressionStatistics

    MultipleR

    0.705948422

    R

    Square

    0.498363174

    AdjustedRS

    quare

    0.464920719

    StandardError

    11.96417889

    Observations

    17

    ANOVA

    df

    SS

    MS

    F

    SignificanceF

    Regression

    1

    2133.111647

    2133.1116471

    4.90211089

    0.001541086

    Residual

    15

    2147.123648

    143.1415765

    Total

    16

    4280.235294

    Coefficients

    StandardError

    tStat

    P-value

    Intercept

    -16.22506178

    12.20252667

    -1.3296477220

    .203501866

    AskingPrice

    0.000528163

    0.000136818

    3.8603252320

    .001541086

    Statisticsfor

    Management

    Decisions

    #

    Theestimatedmodel

    =

    -16.2

    251+0.00053x

    Intercept(b0):-16.2

    251

    Slope(b1):0.0

    00

    53

    Whatisthemean

    ingof-16.2

    251?

    Istheeffectofas

    kingpriceonnumbero

    f

    weekssignificant

    ?

    lil

    .

    .

    .

    .

    i

    i

    ii

    i

    .

    .

    .

    .

    i

    l

    .

    .

    l

    .

    Coefficients

    Standard

    Error

    tStat

    P-value

    Intercept

    -16.22506178

    12.202

    52667

    -1.329647722

    0.203501866

    AskingPrice

    0.000528163

    0.0001

    36818

    3.860325232

    0.001541086

    Statisticsfor

    Management

    Decisions

    #

    Test1:TestingtheSlope

    Thehypotheses:

    HO:b1=0

    HA:b10

    Wefollowat-test:

    Thestandard

    error

    oftheestimat

    e

    e

    stimated

    true

    OR

  • 8/3/2019 700579308

    13/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Thesta

    ndarderrorofthe

    estimate

    Thesta

    ndarderroroftheestimat

    e(Se

    orSEE)

    measureshowthedatavariesar

    oundthe

    regress

    ionline

    Simila

    rtotheconceptofstandarddeviation

    WewouldlikeSet

    obesmall

    thesmalleritisthe

    larger

    thet-statisticisandthemore

    likelyweareto

    reject

    thenullhypothesisthattheslo

    peiszero

    k=1fo

    rsimpleregression

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Whatd

    owehaveinthetables?

    SUMMARYOUTPUT

    RegressionStatistics

    MultipleR

    0.705948422

    R

    Square

    0.498363174

    AdjustedRS

    quare

    0.464920719

    StandardError

    11.96417889

    Observations

    17

    ANOVA

    df

    SS

    MS

    F

    SignificanceF

    Regression

    1

    2133.111647

    2133.1116471

    4.90211089

    0.001541086

    Residual

    15

    2147.123648

    143.1415765

    Total

    16

    4280.235294

    Coefficients

    StandardError

    tStat

    P-value

    Intercept

    -16.22506178

    12.20252667

    -1.3296477220

    .203501866

    AskingPrice

    0.000528163

    0.000136818

    3.8603252320

    .001541086

    S=SEE

    SSE

    Sb1

    b1

    Statisticsfor

    Management

    Decisions

    #

    Test1:Testing

    the

    Slope

    lil

    .

    .

    .

    .

    i

    i

    ii

    i

    .

    .

    .

    .

    i

    l

    .

    .

    l

    .

    Coefficients

    Standard

    Error

    tStat

    P-value

    Intercept

    -16.22506178

    12.202

    52667

    -1.329647722

    0.203501866

    AskingPrice

    0.000528163

    0.0001

    36818

    3.860325232

    0.001541086

    t=

    0.0

    00528163-

    0

    0.0

    00136818

    =>

    Canconc

    lude

    tha

    tthes

    lope

    isdifferen

    t

    from

    zero

    [

    /2

    ,(n

    -2)df]

    Statisticsfor

    Management

    Decisions

    #

    TestingtheSlope

    Ifwewishtotestforpositive

    ornegative

    linea

    r

    relationshipswecond

    uctone-tailtests,

    i.e.our

    researchhypothesisb

    ecome:

    H1:10

    (testing

    forapositiveslope)

    Ofcourse,

    thenullhy

    pothesisremains:H0:1

    =0.

  • 8/3/2019 700579308

    14/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Isthet

    estdifferentfrom1

    ?

    Thehypotheses:

    HO:b1=1

    HA:b11

    t=

    0.0

    00528163-1

    0.0

    00136818

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Test2:

    modelfit

    Testing

    theoverallsignificanceo

    fthemodel

    H0

    :b1=b2=b3==

    0

    H1

    :atleastonebisdifferentthanzero

    Weneedtoseethatatleastoneofourindependent

    variab

    leshasasignificantaffect

    Note:

    weonlyhaveb1sothistests

    houldgiveusthe

    same

    resultsastheprevioust-test(

    andwellseethatit

    does)

    Thetes

    tstatisticisanF-ratio

    WellhaveanANOVAtable(fromExcel)

    Statisticsfor

    Management

    Decisions

    #

    The

    regression

    output(Excel)

    SUMMARYOUTPUT

    RegressionStatistics

    MultipleR

    0.705948422

    RSquare

    0.498363174

    AdjustedR

    Square

    0.464920719

    StandardError

    11.96417889

    Observations

    17

    ANOVA

    df

    SS

    MS

    F

    SignificanceF

    Regression

    1

    2133.111647

    2133.111647

    14.90211089

    0.001541086

    Residual

    15

    2147.123648

    143.1415765

    Total

    16

    4280.235294

    Coefficients

    Sta

    ndardError

    tStat

    P-value

    Intercept

    -16.22506178

    12.20252667

    -1.329647722

    0.203501866

    AskingPrice

    0.000528163

    0.000136818

    3.860325232

    0.001541086

    Statisticsfor

    Management

    Decisions

    #

    FRatio

    MeanSq

    uares=SS/df

    Significantmo

    del

    ANOVA

    df

    SS

    MS

    F

    Significa

    nceF

    Regression

    1

    21

    33.111647

    2133.111647

    14.90211089

    0.001541086

    Residual

    15

    21

    47.123648

    143.1415765

    Total

    16

    42

    80.235294

    Degreesof

    Freedom

    Sumof

    Squares

    Mean

    Square

    F-Statistic

    Regression

    1

    SSR

    MSR

    =SSR/1

    F=MSR/MSE

    Residual

    n-2

    SSE

    MSE=SSE/(n-2)

    Total

    n-1

    SST

    Thegeneralformf

    orasimple

    regression:

  • 8/3/2019 700579308

    15/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    SymmetryinTesting

    SUMMARYOUTPUT

    Regr

    essionStatistics

    MultipleR

    0.705948422

    R

    Square

    0.498363174

    AdjustedRS

    quare

    0.464920719

    StandardError

    11.96417889

    Observations

    17

    ANOVA

    df

    SS

    MS

    F

    SignificanceF

    Regression

    1

    2133.111647

    2133.111647

    14.90211089

    0.001541086

    Residual

    15

    2147.123648

    143.1415765

    Total

    16

    4280.235294

    Coefficients

    StandardError

    tStat

    P-value

    Intercept

    -16.22506178

    12.20252667

    -1.329647722

    0.203501866

    AskingPrice

    0.000528163

    0.000136818

    3.860325232

    0.001541086

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Test3:

    R2-CoefficientofD

    etermination

    TheR2t

    ellsoftheproportionofthevariabilityinthe

    depende

    ntvariableisexplainedbythe

    independent

    variable

    Wewo

    uldliketoseehighvalues(1isthehighest)

    Note:forsimpleregression,

    R-squared

    isthesquareofthe

    correlationcoefficient(r):R2=(r)2.

    Statisticsfor

    Management

    Decisions

    #

    CoefficientofD

    etermination

    Aswedidwithanalysisofvariance,wecanpartitionthe

    variationinyintotwoparts:

    SST=Variationiny=SSE+SSR

    SSESum

    ofSquaresE

    rrormeasurestheamountof

    variationinythatremains

    unexplained(i.e.

    duetoerror

    )

    SSRSum

    ofSquaresR

    egressionmeasurestheamount

    ofvariationinyexplained

    byvariationintheindepende

    nt

    variablex.

    Statisticsfor

    Management

    Decisions

    #

    In

    the

    Exceloutput

    SUMMARYOUTPUT

    RegressionStatistics

    MultipleR

    0.7

    05948422

    R

    Square

    0.4

    98363174

    AdjustedR

    Square

    0.4

    64920719

    StandardError

    11.9

    6417889

    Observations

    17

    ANOVA

    df

    SS

    MS

    F

    Significan

    ceF

    Regression

    1

    21

    33.1

    11647

    2133.1

    11647

    14.9

    0211089

    0.0

    01541086

    Residual

    15

    21

    47.1

    23648

    143.1

    415765

    Total

    16

    42

    80.2

    35294

    Coefficients

    StandardError

    tStat

    P-value

    Intercept

    -16.2

    2506178

    12

    .20252667

    -1.3

    29647722

    0.2

    03501866

    AskingPrice

    0.0

    00528163

    0.000136818

    3.8

    60325232

    0.0

    01541086

    2133.1

    11647

    4280.2

    35294

  • 8/3/2019 700579308

    16/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Coe

    ffic

    ien

    to

    fDe

    term

    inat

    ion

    R2hasava

    lueo

    f.4

    984

    .Thismeans

    49.8

    4%

    ofthevaria

    tion

    intheweek

    sonmarke

    t(y)isexp

    laine

    dby

    thevaria

    tion

    inthe

    as

    kingprice

    (x).Therema

    ining

    50

    .16%

    is

    unexplained

    ,i.e.

    due

    toerror.

    Un

    like

    theva

    lueo

    fa

    tes

    ts

    tatis

    tic,

    thecoe

    fficientof

    determination

    doesnothaveacriticalvalue

    tha

    tena

    bles

    us

    todraw

    conc

    lus

    ions.

    Ingenera

    lthe

    higher

    theva

    lueo

    fR2

    ,the

    better

    themo

    de

    l

    fitsthe

    data.

    R2=

    1:

    Perfec

    tma

    tchbe

    tween

    the

    linean

    dthe

    da

    tapo

    ints

    .

    R2=

    0:

    Thereareno

    linearre

    lations

    hipbe

    tweenxan

    dy.

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Summa

    ryofsimpleregress

    ionoutput

    SUMMARYOUTPU

    T

    Regression

    Statistics

    MultipleR

    0.7

    05948422Correlationcoefficinetbetweenxandy

    R

    Square

    0.4

    98363174Coefficientofdetermination

    AdjustedRSquare

    0.4

    64920719

    StandardError

    11.9

    6417889S

    Observations

    17N

    ANOVA

    df

    SS

    MS

    F

    SignificanceF

    Regression

    1

    2133.1

    11647

    2133.1

    11647

    14.9

    02

    11089

    0.0

    01541086

    Residual

    15

    2147.1

    23648

    143.1

    415765

    Total

    16

    4280.2

    35294

    Coefficients

    StandardError

    tStat

    P-value

    Intercept

    -16.2

    2506178

    12.2

    0252667

    -1.3

    29647722

    0.2

    035

    01866

    AskingPrice

    0.0

    00528163

    0.0

    00136818

    3.8

    60325232

    0.0

    015

    41086

    b0a

    ndb1

    Sb0

    andSb1

    Learnthe

    relationships

    betweenthethree

    tablescomponents

    Confidenceandprediction

    intervals

    Statisticsfor

    Management

    Decisions

    #

    Prediction

    Supposeyouwantedto

    knowhowmanyweeksitwou

    ld

    taketosellahousepricedat$100,0

    00

    Theregressionequationwas:=

    -16.2

    251+0.0

    0053

    x

    Substitutex=100,0

    00

    y=-16.2

    251+0.00053*(100,0

    00)=36.7

    749

    Importantsidenote:payattentiontotheunitsofmeasurementin

    thedata

    y=36.7

    749isapointe

    stimateofthenumberofweek

    s

    Pointestimatesaresubjecttoerrors

    whatisthetrueprice?

  • 8/3/2019 700579308

    17/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Scatterplot

    010

    20

    30

    40

    50

    60

    70

    $50,

    000

    $60,

    000

    $70,

    000

    $80,0

    00

    $90,0

    00

    $100,0

    00

    $110,

    000

    $120,0

    00$

    130,0

    00

    $140,0

    00

    Needto

    constructapredictioni

    ntervalarou

    ndthisestimate

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Prediction

    interval

    Xp=100000,

    y=36.5

    9126539

    -40

    -200

    20

    40

    60

    80

    100 5

    0000

    62500

    75000

    87500

    100000

    112500

    125000

    Price

    f

    Predictioninterval

    Statisticsfor

    Management

    Decisions

    #

    Prediction

    Interval

    y

    ta/2,n-2

    se

    1+

    1 n

    OR

    y

    ta/2,n-2se

    1+

    1 n

    (textboo

    k)

    derived

    Statisticsfor

    Management

    Decisions

    #

    Inourexample

  • 8/3/2019 700579308

    18/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Adiffer

    entquestion

    Suppos

    eIownseveralpropertiesinDetroitand

    priceth

    emallat$100,0

    00.

    What

    istheexpected

    numbe

    rofweeksforsellingthesehomes?

    Instead

    ofpredictinganindividua

    lvalue,

    Iam

    askingforanexpectedvalue(i.e

    .themean

    numberofweek)

    Wecanuseaconfidenceintervalfortheestimationof

    them

    ean.

    Thed

    istinctionbetweenconfidenceintervaland

    predic

    tionintervalissimilartothedifferencebetween

    theCIofthemeanvs.

    theCIofanindividualvalue

    Statisticsfor

    Manag

    ement

    Decisions

    #

    ConfidenceInterval

    Narrowerthantheprediction

    interval

    -40

    -200

    20

    40

    60

    80

    100

    50000

    62500

    75000

    87500

    100000

    112500

    125000

    Statisticsfor

    Management

    Decisions

    #

    ConfidenceInte

    rval

    y

    ta/2,n-2

    se

    1 nOR

    y

    ta/2,n-2se

    1 n

    (textboo

    k)

    derived

    Statisticsfor

    Management

    Decisions

    #

    Inourexample

    Note:Point,PredictionandCo

    nfidenceintervalsinExcelareobtainedby

    Add-

    Ins>DataAnalysisPlus>PredictionInterval

  • 8/3/2019 700579308

    19/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Thecu

    rve

    Bothin

    tervalsarecurved,becoming

    narrow

    eraroundtheaverage

    valueofx(x-

    bar).

    ThecloserXgistoX-barthe

    betterour

    estima

    teandthusthenarrow

    erthe

    interva

    l.

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Examp

    le

    Thefollowingsummarystatistics

    wereobtained

    fromaregressionanalysis:

    Provide

    a90%C

    Ifortheaveragey,givenxg=80

    Statisticsfor

    Management

    Decisions

    #

    Solution

    y

    ta/2,n-2

    se

    1 n

    Needto

    compute

    using

    SSE

    80

    67.2

    0

    9,7

    84345.5

    0*80=-17,8

    56

    a=0.1

    n-

    2=18

    Statisticsfor

    Management

    Decisions

    #

    Solution

    Computingthestandarderrorofthe

    estimate

    Fromt

    hettable

    :t0.0

    5,1

    8=1.7

    34

  • 8/3/2019 700579308

    20/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Solutio

    n

    y

    ta

    /2,n-

    2se

    1 n

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Regres

    sionDiagnostics

    Thereare

    threeconditionsthatarerequiredinorderto

    perform

    aregressionanalysis.Theseare:

    Theerrorvariablemustbenormally

    distributed,

    Theerrorvariablemusthaveaconstantvariance,&

    Theerrorsmustbeindependentofeachother.

    Howcanw

    ediagnoseviolationsoftheseconditions?

    ResidualAnalysis,thatis,examine

    thedifferences

    between

    theactualdatapointsandthosepredictedby

    thelinea

    requation

    Statisticsfor

    Management

    Decisions

    #

    Res

    idua

    lAnalys

    is

    Reca

    llthe

    dev

    iations

    be

    tween

    theac

    tua

    lda

    tapo

    intsan

    dthe

    regress

    ion

    linewerecalle

    dresiduals

    .Exce

    lca

    lcu

    lates

    res

    idua

    lsasparto

    fitsregress

    ionana

    lys

    is:

    Wecanuse

    theseres

    iduals

    tode

    term

    inew

    he

    ther

    theerror

    varia

    bleisnonnorma

    l,w

    hether

    theerrorvariance

    iscons

    tant,

    an

    dw

    he

    ther

    theerrorsarein

    depen

    den

    t

    X

    Y

    Fitted

    Res

    idua

    l

    St.res

    id

    76500

    23

    24

    .17942851

    -

    1.1

    79428507

    -

    0.1

    02346097

    102000

    48

    37

    .64759194

    10

    .35240806

    0.9

    06924006

    53000

    9

    11

    .76759162

    -

    2.7

    67591621

    -

    0.2

    59720474

    84200

    26

    28

    .2462857

    -

    2.2

    46285699

    -

    0.1

    93608631

    73000

    20

    22

    .33085706

    -

    2.3

    30857056

    -

    0.2

    03458396

    125000

    40

    49

    .79534718

    -

    9.7

    95347185

    -

    0.9

    46239962

    109000

    51

    41

    .34473484

    9.6

    55265163

    0.8

    62374431

    60000

    18

    15

    .46473452

    2.5

    35265477

    0.2

    30053966

    87000

    25

    29

    .72514286

    -4.7

    2514286

    -

    0.4

    07099584

    94000

    62

    33

    .42228576

    28

    .57771424

    2.4

    71464614

    76000

    33

    23

    .91534687

    9.0

    84653129

    0.7

    88907244

    90000

    11

    31

    .30963267

    -

    20

    .30963267

    -

    1.7

    51163484

    -

    -

    Statisticsfor

    Management

    Decisions

    #

    Nonnorma

    lity

    Wecan

    take

    theres

    idua

    lsan

    dpu

    tthem

    intoa

    histog

    ram

    tov

    isua

    llyc

    hec

    kforn

    orma

    lity

    we

    re

    loo

    king

    fora

    bells

    hape

    dhistogram

    ()w

    iththe

    mean

    close

    tozero

    ().

  • 8/3/2019 700579308

    21/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Heteroscedasticity

    Whenther

    equirementofaconstantvaria

    nceisviolated,we

    haveaconditionofheteroscedasticity.

    Wecandia

    gnoseheteroscedasticitybyplottingtheresidual

    againstthe

    predictedy.

    Statisticsfor

    Manag

    ement

    Decisions

    #

    He

    terosce

    das

    tic

    ity

    Ifthevarian

    ceo

    ftheerrorvaria

    ble(

    )is

    no

    tcons

    tan

    t,then

    we

    have

    heteroscedasticity

    .Here

    s

    thep

    loto

    fthe

    res

    idual

    aga

    ins

    tthepre

    dictedva

    lueofy:

    ther

    e

    doesntappearto

    be

    a

    chan

    ge

    in

    the

    spread

    ofthe

    plotted

    points,

    therefore

    no

    heteroscedasticity

    Statisticsfor

    Management

    Decisions

    #

    Nonindependence

    oftheErrorVariable(f

    or

    timeseriesdatanotinthiscourse)

    Ifweweretoobservethenu

    mberofweekshousesstayonth

    e

    marketformanyweeksfor,say,ayear,thatwouldconstitu

    tea

    timeseries.

    Whenthedataaretimeseries,theerrorsoftenarecorrelated.

    Errortermsthatarecorrelatedovertimearesaidtobe

    autocorrelatedorseriallyc

    orrelated.

    Wecanoftendetectautocorrelationbygraphingthe

    residualsagainstthetime

    periods.Ifapatternemerges,itis

    likelythattheindependence

    requirementisviolated.

    Statisticsfor

    Management

    Decisions

    #

    Nonindependen

    ceoftheError

    Variable

    Patternsintheappearanceoftheresidualsovertime

    indicatesthatautocorrelationexists:

    Note

    the

    runs

    ofpositive

    residu

    als,

    replaced

    by

    runs

    ofnegative

    residuals

    Note

    the

    oscillating

    behavior

    ofthe

    residuals

    around

    zero.

  • 8/3/2019 700579308

    22/22

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Outliers

    Anoutlie

    risanobservationthat

    isunusually

    smallorunusuallylarge.

    E.g.

    inou

    rhousesexamplethep

    ricesrange

    from$53,0

    00to$133,0

    00.

    Sup

    posewehave

    avalue

    of$1,0

    00,0

    00t

    hispointisan

    outlier

    .

    Statisticsfor

    Manag

    ement

    Decisions

    #

    Outliers

    Possiblere

    asonsfortheexistenceofoutliersinclude:

    Therewasanerrorinrecordingthevalue

    Thepoin

    tshouldnothavebeeninclude

    dinthesample

    Perhaps

    theobservationisindeedvalid

    .

    Outlierscanbeeasilyidentifiedfrom

    ascatterplot.

    Iftheabsolutevalueofthestandardresid

    ualis>2,we

    suspectthepointmaybeanoutlierand

    investigatefurther.

    Theyneed

    tobedealtwithsincetheycaneasily

    influenc

    etheleastsquaresline

    Statisticsfor

    Management

    Decisions

    #

    Outliersoure

    xample

    OlgaKaminer,2009

    Statisticsfor

    Management

    Decisions

    #

    rocedureforReg

    ressionDiagnostics

    Developamodelthathasatheoreticalbasis.

    Gatherdataforthetwovariablesinthemodel.

    Drawthescatterdia

    gram

    todeterminewhetheralinear

    modelappearstobeappropriate.

    Identifypossible

    outliers.

    Determinetheregre

    ssionequation.

    Calculatetheresidu

    alsandchecktherequired

    conditions(normality,

    homoscedasticity,

    independen

    ce)

    Assessthemodels

    fit(t-testfortheslope,

    theove

    rall

    F-ratio,

    R2)

    Ifthemodelfitsth

    edata,usetheregression

    equationtopredicta

    particularvalue

    (confidence/prediction

    intervals)ofthedependent

    variable.