Multi Linear Regression Handout 2x1

download Multi Linear Regression Handout 2x1

of 67

Transcript of Multi Linear Regression Handout 2x1

  • 8/18/2019 Multi Linear Regression Handout 2x1

    1/67

    31-03-2016

    1

     Automation LabIIT Bombay

    CL 202: Introduction to Data AnalysisLinear and Nonlinear Regression

    Sachin C. Patawardhan and Mani Bhushan

    Department of Chemical Engineering

    I.I.T. Bombay

    31-Mar-16 Regression 1

     Automation LabIIT Bombay

    Outline

    Mathematical Models in Chemical Engineering

    Linear Regression Problem

    Ordinary and Weighted Least Squares formulations throughalgebraic viewpoint and geometric interpretations

    Ordinary and Weighted Least Squares formulations through

    probabilistic viewpoint Ordinary Least Squares and Minimum Variance Estimation

    Ordinary Least Squares and Maximum Likelihood Estimation

    Confidences intervals for parameter estimates andhypothesis testing

    Nonlinear regression problem: Nonlinear in parametermodels and maximum likelihood parameter estimation

    Examples of linear and nonlinear regression

    Appendix: Ordinary Least Squares and Cramer-Rao Bound

    31-Mar-16 Regression 2

  • 8/18/2019 Multi Linear Regression Handout 2x1

    2/67

    31-03-2016

    2

     Automation LabIIT Bombay

    Mathematical Models

    Mathematical Model: mathematical description of a real

    physical process

    Used in all fields: biology, physiology, engineering, chemistry,

    biochemistry, physics, and economics

    Deterministic models: each variable and parameter can be

    assigned a definite fixed number or a series of fixed

    numbers, for any given set of conditions.

    Stochastic models: variables or parameters used to

    describe the input-output relationships and the structure of

    the elements (and the constraints) are not precisely known

    31-Mar-16 Regression 3

     Automation LabIIT Bombay

    Elements of a Model

    31-Mar-16 Regression 4

    Independent inputs (x)

    Output (y) (dependent variable)

    Parameters (θ)

    Transformation operator (T) Algebraic

    Differential

    ),..,,,..( 11   mn x xT      1 x

    n x y

    Mathematical Model

  • 8/18/2019 Multi Linear Regression Handout 2x1

    3/67

    31-03-2016

    3

     Automation LabIIT Bombay

    Mathematical Models

    Models are used for

    Behavior Prediction/Analysis: Understand the influence of the

    independent inputs to a system on the observed system output

    System/process/material design

    Catalyst design, membrane design

    Equipment Design: sizing of processing equipment

    Flow-sheeting: deciding flow of material and energy in a

    chemical plant

    System / process operation: monitoring and control, safety and

    hazard analysis, abnormal behavior diagnosis

    31-Mar-16 Regression 5

     Automation LabIIT Bombay

    Models in Chemical Engineering

    Models popularly used in chemical engineering

    Transport phenomena based models: continuum equations

    describing the conservation of mass, momentum, and energy

    Population balance models: Residence time distributions

    (RTD) and other age distributions

    Empirical models based on data fitting: Typical example-

    polynomials used to fit empirical data, thermodynamic

    correlations, correlations based on dimensionless groups

    used in heat, mass and momentum transfer, transfer

    function models used in process control

    31-Mar-16 Regression 6

  • 8/18/2019 Multi Linear Regression Handout 2x1

    4/67

    31-03-2016

    4

     Automation LabIIT Bombay

    Empirical Modeling

    Exact expression relating the dependent and the

    independent variable may not be known

    Weierstrass theorem states that any continuous function

    can be approximated by a polynomial function with arbitrary

    degree of accuracy.

    Invoking Weierstrass theorem, relationship between the

    dependent and independent variables is approximated as a

    polynomial

    The order of polynomial used typically depends on range of

    values over which approximation has been constructed.

    31-Mar-16 Regression 7

     Automation LabIIT Bombay

    Empirical Modeling Examples

    31-Mar-16 Regression 8

    ][

    ][

    43

    2

    2

    T T T  for T T  R

    T T T  for bT a R

        

    1

    resistanceofdependenceeTemperatur

    ][

    ][

    43

    2

    2

    T T T  for T T C 

    T T T  for bT aC 

        p

    1p

    pofdependenceeTemperatur

    32

    2

    nnnT 

    cnbnaT 

           

     atomscarbonofno.offunctionas

    serieshomologousainnshydrocarboofpointBoiling

  • 8/18/2019 Multi Linear Regression Handout 2x1

    5/67

    31-03-2016

    5

     Automation LabIIT Bombay

    Empirical Modeling Examples

    31-Mar-16 Regression 9

    ][][

    ][][

    434

    22

    212

     P  P  P  for T T T  for 

     fPT eP dT cP bT aY 

     P  P  P  for T T T  for 

     P T Y 

    3

    1

      yieldreactionofdependencepressureandeTemperatur

        

               pa Nu   /RePr transferheatinmodels

     basedgroupessDimensionl

     A

     RT  E  C ek    /0Ar-

    EquationsRateReaction

    .

    :

    :

    )1(1

     volatilityrelative:

    fractionmolevapor

     fractionmoleliquid

     :ModelVLESimplified

     

     

     

     y

     x

     x

     x

     y

     Automation LabIIT Bombay

    Linear in Parameter Models

    31-Mar-16 Regression 10

     

    Defining

    formabstractfollowingtheindrepresente

     becanthatmodelsconsiderwewithbeginTo

    )(...)()(

    ....

    2211

    21

    xxx

    x

     p p

    m

     f  f  f  y

     x x x

         

    v z  z  z  y

     y

    v

     p p           ...

    ,

    2211

     thenoftmeasurementheinerrorsandmodelingin

    errorsfromarisingerrorcombinedadenoteLet

     

    writecanwevariables,newDefining

     p p

    ii

     z  z  z  y

     f  z 

         

    ...

    ),(

    2211

    x

    Sources of error• Measurement Errors in dependent variable (y)• Modeling or Approximation Errors

  • 8/18/2019 Multi Linear Regression Handout 2x1

    6/67

    31-03-2016

    6

     Automation LabIIT Bombay

    Linear Regression Problem

    31-Mar-16 Regression 11

    For the class of models considered till now, the dependentvariable is a linear function of model parameter

    )()()( 2121   xxxx   g  g  g            :DefinitionFunctionLinear

    vθ v z  z  z  y

    θ 

     z  z  z 

     p p

     p

     p

    z

    z

     

    Defining

       

       

    ...

    ....

    ....

    2211

    21

    21

     

    minimized.isobjectivescalarsome

     thatsuchestimateequations

     modelandexperimenttindependenfromgenerated

     and 

    setsdataGiven

    n

    i

    T i

    i

    n Z n y

    vvv

    θ nivθ  y

    n

     y y y

    ,....,,

    ,...,,:

    ,....,,,........,,

    ) ( 

    ) ( ) ( ) ( 

    21

    2121

    21

     

    z

    zzzSS

     Automation LabIIT Bombay

    Linear Regression Problem

    31-Mar-16 Regression 12

      iwvwvv

    or vvv

    i

    n

    i

    iin

    n

    i

    in

     allforwhere 

    :norm-2

     FunctionObjectiveofChoice

    0,...,

    ,...,

    1

    2

    1

    1

    2

    1

     

     

    In practice, the 2-norm

    based formulation is

    preferred over the other

    two choices because of

    (a) Amenability to theanalytical treatment

    (b) Ease of geometric

    interpretations

    (c) Ease of interpretation

    from viewpoint of

    probability and statistics

     

    n

    i

    iin

    n

    i

    in   vwvvvvv1

    1

    1

    1   ,...,,...,  or 

    Norm-1

      

     

    Norm-

     

      

     

    in   vi

     Maxvv  ,...,1 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    7/67

    31-03-2016

    7

     Automation LabIIT Bombay

    Model Parameter Estimation

    31-Mar-16 Regression 13

      )()2()1(2121

    21

    ,....,,,........,,

    1)()(

    n

     Z n y

    T T 

     y y y

     x x f  x f 

    vbz az vbxa y

    a,b

    zzzSS

    z

    θ

     and from

     formtheofmodellinearsimpleaof

    )(parametersofestimationConsider

    )...(

    ......

    )...(

    .....

    )2...(

    )1...(

    222

    111

    nvbxa y

    ivbxa y

    vbxa y

    vbxa y

    nnn

    iii

     

    ...

    ...

    ......

    ......

    ...

    ...

    n

    i

    n

    i

    n

    i

    v

    v

    v

    v

    θ 

    b

    a

     x

     x

     x

     x

     y

     y

     y

     y

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    A

     Automation LabIIT Bombay

    Model Parameter Estimation

    Number of unknown variables =

    2 (parameters) + n (errors)

    Number of equations = n

    Number of equations < number of unknowns

    The system of linear equation has infinite number of

    solution.

    To estimate model parameters, we resort to optimization

    The necessary conditions for optimality provide 2 additional

    constraints so that the combined system of equations has a

    unique solution.

    31-Mar-16 Regression 14

  • 8/18/2019 Multi Linear Regression Handout 2x1

    8/67

    31-03-2016

    8

     Automation LabIIT Bombay

    Model Parameter Estimation

    31-Mar-16 Regression 15

    00

    2121

    21

    b

     J 

    a

     J 

    nibz az  yv

    vvv J 

    ii

    ii

    n

     and 

    areoptimalityforconditionsnecessarythe

     for

     where),functionscalaraDefining

    ,....,

    ,....,( 

    n

    i

    iii

    n

    i

    i

    iwvw J 

    v J 

    1

    2

    1

    2

    0)(

    )(

    )allfor( 

    squareleastWeighted

     

    squareleastOrdinary

     measurescalarusedcommonlyMost

    Quadratic objective function(a) Leads to analytical solution(b) Has nice geometric

    interpretation(c) Facilitates interpretation

    and analysis throughstatistics

     Automation LabIIT Bombay

    Ordinary Least Squares

    31-Mar-16 Regression 16

     

      Y ˆ

    Y V V 

    V V Y V 

    T T 

    OLS 

    T T 

    n

    i

    i

    θ 

    θ θ θ 

     J 

    v J θ 

    AAA

    AA

    A

    1

    1

    2

    02

     

     becomesoptimalityforconditionNecessary

     notationmatrix-vectorUsing

    symmetriciswhen

    vectoraw.r.t.functionscalaraofationdifferentiofRules

    BBxx

    Bxx

    xBy

    ByxBy

    x

    Byx

    2)(

    )()(

    T T T 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    9/67

    31-03-2016

    9

     Automation LabIIT Bombay

    Geometric Interpretations

    31-Mar-16 Regression 17

      V ˆˆY V ˆ   *    θ θ θ    AA areresidualsmodelEstimated

    V ˆY ˆY 

    ˆY ˆ

    :

    V Y * 

     havewe

     Defining

     ParametersTrue

     behaviorTrue:Assumption

     θ 

    θ 

    θ 

    A

    A

    b

     x

     x

     x

    aθ 

    n

    ˆ...

    ˆ...

    ˆY ˆ

      2

    1

    1

    1

    1

    A   Amatrixofspacecolumn

     theinliesVector   Y ˆ

     

    θ 

    b

    a

     x

     x

     x

     x

     y

     y

     y

     y

    v

    v

    v

    v

    n

    i

    n

    i

    n

    i

    ˆ

    ˆ

    ˆ

    ......

    ......

    ...

    ...

    V ˆ

    ˆ...

    ˆ

    ...

    ˆ

    ˆ

    A

    1

    1

    1

    1

    2

    1

    2

    1

    2

    1

     Automation LabIIT Bombay

    Geometric Interpretations

    31-Mar-16 Regression 18

    .ofspacecolumntheonvectorofprojectiona:   AYŶ

        .V ˆ

    V ˆV ˆˆ

    A

    AAAYA

     ofspacecolumnthetolarperpendicuisvectori.e.

     impliesoptimalityforconditionNecessary

    0  T T T T  θ 

    HHH

    AAAAH

    AAAAI

    2

    1

    1

     i.e.matrixidempotentis:Note

     matrix.)projection(orHatasknownisT T 

    T T  Y Y ˆY V ˆ

      AHI

    H

     ofspacecolumnthetoorthogonal

    Aofspacecolumntheinlying :

     componentsorthogonaltwointosplitisVector

    :Y V ˆ

    Y Y ˆ

  • 8/18/2019 Multi Linear Regression Handout 2x1

    10/67

    31-03-2016

    10

     Automation LabIIT Bombay

    Geometric Interpretations

    31-Mar-16 Regression 19

     Automation LabIIT Bombay

    Ethanol-Water Example

    31-Mar-16 Regression 20

    Experimental DataDensity and weight percent of ethanol in ethanol-water mixture

    Ref.: Ogunnaike, B. A., Random Phenomenon, CRC Press, London, 2010

  • 8/18/2019 Multi Linear Regression Handout 2x1

    11/67

    31-03-2016

    11

     Automation LabIIT Bombay

    Ethanol-Water Example

    31-Mar-16 Regression 21

    Ref.: Ogunnaike, B. A., Random Phenomenon, CRC Press, London, 2010

     Automation LabIIT Bombay

    Quadratic Polynomial Model

    31-Mar-16 Regression 22

    v fPT eP dT cP bT aY      22  yieldreactionofdependencepressure

     andetemperaturformodelConsider

      nivθ Y 

     f ed cbaθ 

    T  P  P T  P T 

    i

    T i

    i

    iiiiii

    i

    ,.....,2,1

    1

    )(

    22)(

     for 

    Defining

    z

    z

    nivT  fP eP dT cP bT aY 

     P T Y  P T Y  P T Y 

    iiiiiiii

    nnn

    ,...,2,1

    ),,(),......,,,(),,,(

    22

    222111

     for equationsmodelingCorrespond

     :availableData

     

    1166

    1

    1

    1

    1

    2

    1

    222

    111

    2

    1

    n

    v

    v

    v

    θ 

     f 

    b

    a

    n

    T  P T 

    T  P T 

    T  P T 

    n

    nnnnn

    .....

    ....

    ........

    ....................

    ........

    ........

    ....

           

    A

  • 8/18/2019 Multi Linear Regression Handout 2x1

    12/67

    31-03-2016

    12

     Automation LabIIT Bombay

    Generalization of OLS

    31-Mar-16 Regression 23

      )()2()1(212211

    ,....,,,........,,

    ...

    n

     Z n y

     p p

     y y y

    v z  z  z  y

    zzzSS

    θ

     and from

     formtheofmodellinear-multigeneralaof

     vectorparameterofestimationconsiderThus,

       

     

    111

    2

    1

    2

    1

    21

    22

    2

    2

    1

    11

    2

    1

    1

    2

    1

    n

    v

    v

    v

     pθ 

    θ 

    θ 

     pn

     z  z  z 

     z  z  z 

     z  z  z 

    n

     y

     y

     y

    n pn

     p

    nn

     p

     p

    n

    .....

    ............

    ....................

    ........

    ........

    .... 

           

    A

     Automation LabIIT Bombay

    Weighted Least Square

    31-Mar-16 Regression 24

    V Y 

    V V 

    ....

    θ 

    vw J θ 

     Min

    iw

    wwwdiag 

    n

    i

    ii

    i

    n

    A

    W

    W

     toSubject

     asformulatedbecanproblemregressionrmultilineaThe

     allfor

     Let

     matrixweightingDefining

    1

    2

    21

    0

     

     

    optimalityforconditionnecessarytheUsing

    Y ˆ

    Y V V 

    WAWAA

    AWAW

    TT   1

    02

    θ 

    θ θ θ 

     J    T T 

    OLStoWLSreducesSelecting   nn IW

  • 8/18/2019 Multi Linear Regression Handout 2x1

    13/67

    31-03-2016

    13

     Automation LabIIT Bombay

    Example: Multi-linear Regression

    31-Mar-16 Regression 25

    Laboratory experimentaldata on Yield obtained from acatalytic process at varioustemperatures and pressures

     P .T..Y    21307570975   ˆ

    modellinear-multiFitted

    Ref.: Ogunnaike, B. A., RandomPhenomenon, CRC Press, London,2010

     Automation LabIIT Bombay

    Reactor Yield Data

    31-Mar-16 Regression 26

  • 8/18/2019 Multi Linear Regression Handout 2x1

    14/67

    31-03-2016

    14

     Automation LabIIT Bombay

    Estimated Model

    31-Mar-16 Regression 27

     P .T..Y    21307570975ˆ

    modellinear-multiFitted

     Automation LabIIT Bombay

    Example: Multi-linear Regression

    31-Mar-16 Regression 28

    Boiling points of a series of hydrocarbons

    Ref.: Ogunnaike, B. A., Random Phenomenon, CRC Press, London, 2010

  • 8/18/2019 Multi Linear Regression Handout 2x1

    15/67

    31-03-2016

    15

     Automation LabIIT Bombay

    Candidate Models

    31-Mar-16 Regression 29

    0 1 2 3 4 5 6 7 8 9 10-250

    -200

    -150

    -100

    -50

    0

    50

    100

    150

    200

    250

       B  o   i   l   i  n  g   P  o   i  n   t   (   0   C   )

    n, No. of Carbon Atoms

     

    Linear ModelT = 39*n - 170Quadratic Model

    T = - 3*n2 + 67*n - 220

    Data 1

    Linear Model

      Quadratic Model

         

     

    2nnT 

    bnaT 

     :ModelQuadratic

     :ModelLinear

     Automation LabIIT Bombay

    Unaddressed Issues

    Model parameter estimates change if

    the data set size, n, matrix A and vector Y change

    Matrix A is same but only Y changes (due to

    measurement errors)

    n is same but a different set of input conditions i.e.

    different A matrix is chosen

    How do we compare estimates generated through two

    independent sets of experiments?

    Can we come up with confidence intervals for ‘true’

    parameters?

    31-Mar-16 Regression 30

  • 8/18/2019 Multi Linear Regression Handout 2x1

    16/67

    31-03-2016

    16

     Automation LabIIT Bombay

    Need for Statistical Approach

    If we have multiple candidate models, how does one

    systematically select a most suitable model?

    If identified model is used for prediction, how to quantify

    uncertainties in the model predictions?

    Linear algebra/optimization based treatment of model

    parameter estimation problem does not help in answering

    these questions systematically.

    Remedy: Formulate and solve the parameter estimation

    using framework of probability and statistics

    31-Mar-16 Regression 31

     Automation LabIIT Bombay

    Notations

    31-Mar-16 Regression 32

    n y

    ni

    n

     y y y

    Y Y Y Y 

    Y Y Y 

    n

    ,........,,

    ,........,,

    ,........,,

    21

    21

    21

    S

     ofnsrealizatioofsetai.e.,eachforone

    s,experimenttindependennfromcollectedissetData

     variablesrandomntindepedndeConsider

    vectorparameterTrue

     andRVsofnsrealizatio

     relatingModel

     

    ErrorRandomrelating

     RVforModel

    :*

    *)(

    *)(

    θ 

    vθ  y

    V Y 

    V θ Y 

    i

    T i

    i

    ii

    i

    T i

    i

    i

    i

    z

    z

      i

    T i

    i

    ii

    i

    T i

    i

    i

    i

    v y

    V Y 

    V Y 

    ˆˆ

    ˆˆ

    ˆˆ

    ˆ

    ˆ

    ) ( 

    ) ( 

     z

    θ

    θz

    θ

     and,RVsofnsrealizatio

     relatingModel

     

    RV)(anestimatesparameterand

     ResidualsModel

     relatingRVforModel

  • 8/18/2019 Multi Linear Regression Handout 2x1

    17/67

    31-03-2016

    17

     Automation LabIIT Bombay

    Context Sensitive Notations

    31-Mar-16 Regression 33

     

    111

    2

    1

    2

    1

    21

    22

    2

    2

    1

    11

    2

    1

    1

    2

    1

    n

    v

    v

    v

     pθ 

    θ 

    θ 

     pn

     z  z  z 

     z  z  z 

     z  z  z 

    n

     y

     y

     y

    n pn

     p

    nn

     p

     p

    n

    ........

    ........

    .................... ........

    ........

    ....

            

    A

       

    )1(

    ....

    )1()(

    ........

    ....................

    ........

    ........

    )1(

    ....

    2

    1

    2

    1

    )()(

    2

    )(

    1

    )2()2(

    2

    )2(

    1

    )1()1(

    2

    )1(

    1

    2

    1

    n

     pθ 

    θ 

    ....

    θ 

    θ 

     pn

     z  z  z 

     z  z  z 

     z  z  z 

    n

    n pn

     p

    nn

     p

     p

    n

    VAY         VariablesRandomof

     Vectorsrepresent

     boldandBold

    :Note

    VY

     VariablesRandomof

     ns"realizatio"of

     vectorsrepresent and

    :Note

    V Y 

     Automation LabIIT Bombay

    Notations

    31-Mar-16 Regression 34

     

    )ofnrealizatio(aEstimatesParameter:(ordinary)

    vector)variable(randomEstimatesParameter:(bold)

    RV)aNOT (fixed,vectorparameterTrue

    Note

    θ

    θ

    ˆˆ

    ˆ

    :* 

     θ 

    θ 

    nii

    n

     Z 

    ,...,,:

    ,....,,

    )(

    )()()(

    21

    21

    z

    zzzS

     ofknowledgeor

     tsmeasuremeninerrorsnoaretherei.e.

     vectorsknownprefectlyofconsists

     Set

     :assumptiongsimplifyinmajorA

  • 8/18/2019 Multi Linear Regression Handout 2x1

    18/67

    31-03-2016

    18

     Automation LabIIT Bombay

    Regression Problem Formulation

    31-Mar-16 Regression 35

        2

    11

    0    

     

      

    V Var V  E 

     z  z Y V   pn

     and

     i.e.,variancewithRVmeanzeroais

     errormodelingthethatassumeusLet

    2

    * *  ...

    )(v F V offormtheaboutmadebeen

     hasassumptionNOstagethisAt:Note

    .parametersmodeltruetheredpresentwhere

     Thus,exactly.knownandvector ticdeterminisaisthatassumedisIt

    * * 

    * * 

    ,...,

    ...

     p

     p p z  z Y  E 

      

      

    1

    11  

    z

     Automation LabIIT Bombay

    Regression Problem Formulation

    31-Mar-16 Regression 36

    d.distribute yidenticallandtindependenarefor

     eachthatassumedfurtherisIt

    ni

     z  z Y V 

    i

     p p

    i

    ii

    i

    ,....,

    ...   * * 

    21

    11

        

    d.distribute yidenticallNOT arebuttindependenare

     for RVs:Note   niV  z  z Y i

    i

     p p

    i

    i  ,....,...   * *  21

    11      

      ii p pii

    i

     p

    i

    i

    V  z  z Y 

    n

     ,...,n ,:i ,...,z  z  y

    * *  ...

    ,

      11

    1  21

     equationsmodelingcorrespondand

    sexperimenttindependenfromgenerated

     

    datasetaconsiderNow

    S

  • 8/18/2019 Multi Linear Regression Handout 2x1

    19/67

    31-03-2016

    19

     Automation LabIIT Bombay

    Regression Problem Formulation

    31-Mar-16 Regression 37

     

    VAY

    VAY

    .......

    ...

    ....

    .........

    ....

    ....

    ....

    θ 

    n

     pθ  pn

     z  z 

     z  z 

     z  z 

    n

    nn

     p

     p

     p

     p

    n

     or,

     haveweequations,modelallcollecting

     notation,vectortheUsing

    11

    1

    2

    1

    1

    1

    1

    22

    1

    11

    1

    2

    1

     

     

       

     Automation LabIIT Bombay

    Regression Problem Formulation

    31-Mar-16 Regression 38

       and

     thatimpliesit

     Since

    **

    ,

    θ θ  E  E 

     E 

    V  E 

    n

    i

    AVAY

    0V

    1

    0

    nn

    i

    i

     E 

    niV 

     E Cov

    IVVR 

    VVVR 

    2][

    var(

    ,...,2,1:

    ][)(

     

     

     

    thatfollowsit,)and

     IID,betoassumedareSince

     Let

    2

    VAY     * 

    * " 

    θ 

    θ 

     tsmeasuremenfrom

     constantunknown"Estimate:Problem

  • 8/18/2019 Multi Linear Regression Handout 2x1

    20/67

    31-03-2016

    20

     Automation LabIIT Bombay

    Ordinary Least Squares

    3/31/2016 State Estimation 39

    θ 

    θ θ 

    θ 

    T T 

     torespectwith

     

    functionobjectiveminimizingbyobtainedis

     ofestimate(OLS)squareleastOrdinary

    AA     Y Y V V 

    errorsmodelingtheofvariancesampletheminimizes

    thatestimatoranasviewedbecanOLSThus,

    variance)sample(i.e. :Note  

    nnS vn

    i

    i

    T    2

    1

    2V V 

    ?ofestimateunbiasedanIs   * ˆ

    Y ˆ

    θ θ 

    θ 

    OLS 

    T T 

    OLS   AAA

      1

     Automation LabIIT Bombay

    Ordinary Least Squares

    31-Mar-16 Regression 40

       

    ˆ

    * * 

    θ 

    θ θ 

     E  E 

    OLS 

    T T 

    T T 

    OLS 

     ofestimateunbiasedanisThus

     sidesthebothonnsexpectatioTaking

    θ

    AAAA

    YAAAθ

    1

    1

    VAAAAYAAAθ

    θ

    Y

        * ˆ

    ,ˆˆ

    θ 

    θ 

    T T T T 

    OLS 

    OLS OLS 

    11

    i.e.,RVofnrealizatioaasviewedbecan

     thatfollowsit

    RVofnrealizatioaisSince

  • 8/18/2019 Multi Linear Regression Handout 2x1

    21/67

    31-03-2016

    21

     Automation LabIIT Bombay

    Ordinary Least Squares

    31-Mar-16 Regression 41

        122

    1

    AALLLRL

    LVVLθθθ

    LVLYθ

    AAAL

    θ

    T T T 

    T T T 

    OLS OLS OLS 

    OLS 

    T T 

     E θ θ  E Cov

    θ 

      

    * * ˆ

    ˆˆˆ

    ˆ

     matrixDefining

     samplesfromEstimate:Remedy

     practiceinknownnotis:Difficulty2

    2

     

     

    OLS 

    T  θ  pn

    ˆY V ˆV ˆV ˆˆ   A

     where12 

      12     AAθ   T OLS 

    Cov    ̂) ˆ( ofEstimate

     Automation LabIIT Bombay

    Minimum Variance Estimator

    31-Mar-16 Regression 42

     possible.assmallasis

    thatsuch,unknown,ofestimate,

     unbiasedanfindtowantweSuppose

     p

    θ θ  E Cov

     Rθ 

    * * 

    ˆˆˆ

    θθθ

    θ

      R V0V

    V

    A

    VAY

    Y

      Cov E 

     R

     pn

    θ 

     R

    n

    n

    n

     and

    thatsuchvariablesrandomofvectoraisand

     matrixknownaiswhere

     modelaandtsmeasuremenGiven

    1

    Note: Here R is a symmetric and positive definite matrix

  • 8/18/2019 Multi Linear Regression Handout 2x1

    22/67

    31-03-2016

    22

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 43

    θ θ  E 

    θ θ  E Cov

    * * 

    * *   ˆˆˆ

    LYLY

    θθθ

     

    matrixaiswhere 

    formtheofestimatorlinearaproposeuslet

     solution,OLSthefromcluesTaking

    )(ˆ n p   LLYθ

     possible.assmallasisthatsuchMatrixFind:ProblemEstimationParameter

     VarianceMinimum

    θL  ˆCov

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 44

    ILA

    L

    0V

     thatsuchchooseweifonlyandifholdwill

     conditionssunbiasednetheSince   , E 

        0AVILA

    0VALLYθ

    * * * * ˆ

    θ  E 

    θ θ  E θ  E θ  E   

    thatimpliestrequiremenssunbiasedneThe

       

    ILA

    LYθLL

     tosubject

     ProblemEstimatonParametervarianceMinimum

    Cov J Cov J  J  Min ˆ.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    23/67

    31-03-2016

    23

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 45

     

    ILA

    LYLYLL

    θ

    L

     tosubject

     asformulatedbecanoffunctionscalara

     minimizesthatfindingofproblemtheThus,

    T θ θ  E tr  J 

     Min

    Cov

    * * 

    ) ˆ( 

    2

    1

         pT 

    Var Var Var θ θ  E tr 

    θ θ  E tr Cov J 

    θθθθθ

    θθθ

    ˆ....ˆˆˆˆ

    ˆˆˆ

    * * 

    * * 

    21

    2

    1

     functionscalaraConsiderfunction.objectivescalara

    constructtoneedweproblem,onoptimizatianformulateTo

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 46

     

      LVVALLYθ

    ILA

    θθθθ

    * * 

    * * * * 

    ˆ

    ,

    ˆˆˆˆ

    θ θ 

    θ θ  E θ θ  E   T T T 

     thatfollowsitSince

     

     smultiplierLagrangeofmatrixtherepresents wherew.r.t.minimizedis

     thatsuchmatrixfindingtoequivalentisThis

    ,

    ˆˆ   * * 

    L

    ILAθθ

    L

    tr θ θ  E tr  J   T 

    2

    1

  • 8/18/2019 Multi Linear Regression Handout 2x1

    24/67

    31-03-2016

    24

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 47

     

    L

    ILALRL

     torespectwith

     functionobjectiveminimizingasedreformulat

     becanproblemonoptimizatitheThus,

    tr tr  J    T 2

    1

       

      T T T T 

    T T T T 

    θ θ  E 

    θ θ 

     E θ θ  E 

     E 

    LRLθθ

    LRL

    LVVLθθ

    R VV

    * * 

    * * 

    * * 

    ˆˆ

    ˆˆ

    ,

     or

     

    thatfollowsitSince

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 48

       

    ][

    ][

    0LAI

    0ALR L

    BBAABAA

    CBBACA

     J 

     J 

    tr tr 

    T T 

    T T T T 

     areoptimalityforconditionsnecessarythe

     and

     results Using

    ][1

    1

    0AR AILAI

    R AL

    T T 

    T T 

     and

     haveweThus,

  • 8/18/2019 Multi Linear Regression Handout 2x1

    25/67

    31-03-2016

    25

     Automation LabIIT Bombay

    Minimum Variance Estimator

    3/31/2016 State Estimation 49

      1111

    11

    R AAR AR AL

    AR A

    T T T T 

    T T 

     thatfollowsitand

     impliesThis

    YR AAR ALYθ   111ˆ     T T  MV   isestimatorvarianceminimumtheThus

          11

    111

    AR Aθθθ

    VR AAR Aθ

    T T 

     MV  MV 

    T T 

     MV 

    θ θ  E Cov

    θ 

    * * min 

    ˆˆˆ

    ˆ

     implieswhich

     

    estimatorvarianceminimumtheFor

     Automation LabIIT Bombay

    Gauss Markov Theorem

    31-Mar-16 Regression 50

    estimatorMVthe yieldsselectingthatindicates

    solutionsquareleastweightedthewith

     

    estimatorvarianceminimumtheComparing

    1

    1

    111

    ˆ

    ˆ

    R W

    YWAWAAX

    YR AAR ALYθ

    T T 

    WLS 

    T T 

     MV 

    Gauss-Markov theoremThe minimum variance unbiased linear estimator isidentical to the weighted least square estimatorwhen the weighting matrix is selected as inverseof the “measurement error” covariance matrix.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    26/67

    31-03-2016

    26

     Automation LabIIT Bombay

    Regression: OLS as MV Estimator

    3/31/2016 State Estimation 51

        )ˆ()ˆ(

    ˆ)()(ˆ

    ,)(

    12ˆ

    112112

    2

    OLS 

     MV 

    OLS 

    T T T T 

     MV 

    CovCov

    Cov

    θAAθ

    θYAAAYIAAIAθ

    IVR 

    θ 

     

      

     

     

    i.e.estimator,varianceminimumtheis

     estimatorOLSthethatfollowsit

     

    problemregressionthetoReturning

       MV 

    Covtr Covtr 

    θ 

    θθ

    LLYLθ

    ˆ~

    ~,~~

    ,

     havewillwhere say

     ofestimatorunbiasedlinearotherAny

     Automation LabIIT Bombay

    Insights

    OLS is an unbiased parameter estimator. The variance

    errors in the parameter estimates can be reduced by

    increasing the sample size.

    OLS estimator can be viewed as an estimator

    that minimizes sample variance of model residuals

    that yields the parameter estimates with the minimum

    possible variance (the most efficient linear estimator)

    This is how far we can go without making any assumption

    about the distribution of the model residuals.

    31-Mar-16 Regression 52

  • 8/18/2019 Multi Linear Regression Handout 2x1

    27/67

    31-03-2016

    27

     Automation LabIIT Bombay

    Need to Choose Distribution For selecting a suitable ‘black-box’ model that

    explains the data best from candidate models, weneed to test hypothesis whether an estimatedmodel coefficient is ‘close to zero’ or ‘not close tozero’, i.e. whether the associated term in themodel can retained or neglected

    We need to generate confidence intervals for thetrue model parameters

    We need to use the estimated model for carryingout predictions

    Thus, we cannot proceed further unless we selecta suitable distribution for the model residuals

    31-Mar-16 Regression 53

     Automation LabIIT Bombay

    1850 1900 1950 2000-0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

       G   l  o   b  a   l   T  e  m  p  e  r  a   t  u  r  e   D  e  v   i  a   t   i  o  n

    Year 

     

    Data

    Linear Model

    Quadratic Model

    Example: Global Temperature Rise

    31-Mar-16 Regression 54

    .

    V t .t .Y 

    V t Y 

    -

    -

    25

    3

    103053123.04114

    10168.4187.8

    Model

    Developedusing OLS

  • 8/18/2019 Multi Linear Regression Handout 2x1

    28/67

    31-03-2016

    28

     Automation LabIIT Bombay

    Statistics

    31-Mar-16 Regression 55

     

    63

    31

    22

    3

    10194100748

    1007485615

    1082081

    1016841878

    ..

    ..

    ..ˆ

    -

    -

    θ 

    OLS 

    AA

     

     ModelLinear

    974

    521

    214

    1

    22

    5

    10118310202110161

    1020211063410464

    101611046410294

    105821

    10305312304114

    ...

    ...

    ...

    ...ˆ

    AAT 

    OLS θ 

     

     ModelQuadratic

     Automation LabIIT Bombay

    Example: Global Temperature Rise

    31-Mar-16 Regression 56

    Histogramof Linear model

    Residuals (normalized)

    Histogramof Quadratic model

    residuals (normalized)

    -3 -2 -1 0 1 2 30

    5

    10

    15

    Normalized Residual

       F  r  e  q  u  e  n  c  y-3 -2 -1 0 1 2 3

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    Normalized Residual

       F  r  e  q  u  e  n  c  y

    dev.std.sample

     :residualNormalized

    :ˆˆ

    ˆ~

     

     i

    i

    vv  

  • 8/18/2019 Multi Linear Regression Handout 2x1

    29/67

    31-03-2016

    29

     Automation LabIIT Bombay

    Choice of Distribution

    Least Squares (LS) estimation Penalizes square of the deviations from zero error (i.e.

    mean)

    Thus, it ‘favors’ errors close to zero

    Moreover, positive and negative errors of equalmagnitude are ’equally penalized’

    Consequence: Histograms of the model residualsare approximately bell shaped in most LSestimation

    Thus, it is reasonable to assume that the modelresiduals have Gaussian/normal distribution

    This choice also follows from a generalized versionof the Central Limit Theorem

    31-Mar-16 Regression 57

     Automation LabIIT Bombay

    Regression Problem Reformulation

    31-Mar-16 Regression 58

    i

    i

     p p

    i

    ii

    i

    ni

     z  z Y V 

     forspecifiedwasondistributiNo

    d.distribute yidenticalland

     tindependenarefor

     

    eachthatassumedisitnow,tillUp

    ,....,

    ...   ) ( ) ( 

    21

    11

        

    nn

    i

    i

     N 

    ni N V 

    I0V  2

    1

    2 210

     

     

    ,

    ,....,,

    ~

    wordsotherinor

     for ~ 

    i.e.Gaussian,iseachthatassumelyadditionalweNow

  • 8/18/2019 Multi Linear Regression Handout 2x1

    30/67

    31-03-2016

    30

     Automation LabIIT Bombay

    Gaussian Assumption: Visualization

    31-Mar-16 Regression 59

    TrueRegression

    Line

    Ref.: Ogunnaike, B. A., Random Phenomenon, CRC Press, London, 2010

    Modeling Error Densities

     Automation LabIIT Bombay

    Consequences of Gaussianity

    31-Mar-16 Regression 60

     

     

    i p pi

    ii

    iiV 

    nV V V 

    nV V V 

    i

     z  z  yv

    v|θ v N 

    |θ v N |θ v N |θ v N 

    |θ vvv f θ  L

    niV 

    i

    n

    n

      

       

     

      

     

    ...

    exp 

    ....

    ,....,,

    ,...,,:

    ,....,,

    11

    2

    2

    21

    21

    22

    1

    21

    21

    21

     where

     followsasparametersunknownfor

     functionlikelihoodtheconstructcanwei.i.d.,and

     normalarethatassumptiontheUnder

    θ

         

    n

    i

    ivnn

    θ  L1

    2

    2

    2

    2

    1

    22

    2         ln ln ln 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    31/67

    31-03-2016

    31

     Automation LabIIT Bombay

    Maximum Likelihood Estimation

    31-Mar-16 Regression 61

      0AA  

      Y 

    ln θ 

    θ 

    θ  L   T 

     notationmatrixvectorinoptimalityforconditionsNecessary

     

         

     

     

    θ θ 

    |θ  f  L

    nn

    nn

    AA

    IθV

    Y Y exp 

    V V exp V 

    22

    12

    2

    2

    1

    2

    1

    2

    1

    2

    1

       

       

    elyAlternativ

          θ θ nnθ  L   T  AA     Y Y ln ln ln   2

    2

    2

    1

    22

    2      

     Automation LabIIT Bombay

    OLS as ML Estimator

    31-Mar-16 Regression 62

      OLS  ML

      θ θ 

    θ 

    ˆY ˆ     T1T

    AAA

     isofestimatepointlikelihoodmaximumThus,

          12

    12

    AAθ

    θAAθθ

    θ

     ML

    OLS 

     ML ML

     ML ML

    θ  N 

    CovCovθ  E 

    θ 

     

     

    ˆˆˆ

    ˆˆ

    ~

     and thatfollowsitThen,

     .RVofnrealizatioarepresentLet

    Thus, if we assume that the modeling errors are i.i.d.samples from the Gaussian distribution, then

    the OLS estimator turns out to be identical tothe Maximum Likelihood (ML) estimator.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    32/67

    31-03-2016

    32

     Automation LabIIT Bombay

    Consequences of Gaussianity

    31-Mar-16 Regression 63

          12

    11

    AAθθ

    VAAAYAAAθθθθ

    T T T T 

     ML MV OLS 

    Covθ  E 

    θ 

     ˆˆ

    ˆˆˆˆ

     and 

    withvectorrandomGaussianais

    thatfollowsitRVsGaussianofpropertiestheFrom

    P

    AAP  T

     matrixofelementdiagnonal

     matrixdefineusLet

    thi pii   '

    1

     

    2 iiii

    i

     pθ  N    ,ˆ

    ˆ

    * ~

    normalunivariateisofpdfmarginalthethatfollowsit

    RV,GaussiantemultivariaofpropertiesFrom

    θ

    θ

     Automation LabIIT Bombay

    Confidence Internals on Parameters

    31-Mar-16 Regression 64

     unknownis:Difficulty

     asonintervalsconfidencethe

     constructcanweprinciple,in~Since

    2

    222

    2

    1

     

      

     

      / 

    ˆ

    ,,ˆ

    a

    ii

    iia

    i

    iiii

     z  p

    θ θ  z  P 

    θ 

     pθ  N θ

    CI.theconstructtothemuseand

     

    residualsmodelusingestimate:Remedy

    V ˆV ˆˆ   T  pn 

      12

    2

     

     

    ? ˆ

    ˆ   * 

    2 ii

    ii

     p

    θ θ    ofondistributitheiswhat:Question

  • 8/18/2019 Multi Linear Regression Handout 2x1

    33/67

    31-03-2016

    33

     Automation LabIIT Bombay

    Confidence Internals on Parameters

    31-Mar-16 Regression 65

     

        

            

        

    ˆV ˆˆV ˆ

    ˆˆˆˆV V 

    V V 

    * * 

    * * * * 

    AA

    AAAA

    T T T 

    n

    i

    niT 

    Y Y Y Y 

    ~v

    1

    2

    2

    2

    2

    1 Consider

      0     ˆV ˆ,V ˆ

    * A

    A

    T  thatfollowsit

     ofspacecolumnthetoorthogonalisSince

           ˆˆ

    V ˆ

    V ˆ

    V V   * * 

     

     

     

        AAT T 

    T T 

    222

    111

       

    2

    2

    2112

    2

    1

    1

     pn

     p

    T T 

    T T 

    ~

    ~

      

      

       

      

              

      

    V ˆV ˆ

    ˆˆˆˆ   * * * * 

     thatfollowsitRV,ofpropertiesthefromThus, 2

    AAAA

     Automation LabIIT Bombay

    Confidence Internals on Parameters

    31-Mar-16 Regression 66

     pn~

     pn~

    v

     pn

    v pn pn

     pn

     pnn

    i

    i

    n

    i

    i

    2

    2

    12

    2

    2

    2

    1

    22

    1

    11

        

      

      

     

     

    ˆ

    ˆˆ

    ˆY V ˆˆV ˆV ˆˆ

     thatfollowsitThus,

     where Now   θA

     pn

     pniiii

    ii

    ii T  pn

     Z  pθ 

     p

    θ 

    ~/

    )/ˆ(

    ˆ

    ˆ   2  

      

      

     

      and

     forintervalconfidence(Thus,

        

      

        

      1ˆˆˆˆ

    %100)1

    ,2/,2/   ii pniiii pni

    i

     pt θ  pt θ  P 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    34/67

    31-03-2016

    34

     Automation LabIIT Bombay

    Example: Global Warming

    31-Mar-16 Regression 67

     

    63

    31

    223

    10194100748

    1007485615

    10820811016841878

    ..

    ..

    .ˆ..ˆ

    -

    -

    θ 

    OLS 

    AAP

     ; 

    ModelLinear

     

    21424

    *

    22

    26

    *

    22

    22

    *

    22

    10763.2

    ˆ

    108208.11019.4

    ˆ

    ˆ

    ˆ

     p~ 

      

     

        θθθ

      95.0107144106223977.1

    95.010763.210168.410763.210168.4%95

    3*

    2

    3

    140,2/

    4

    140,2/

    3

    2

    4

    140,2/

    3

    2

    .. P 

    t t  P 

     

      

     

      

     forintervalconfidenceThus,

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 68

    0

    11

    * * *  ...' 

    i

     p p z  z Y  E θ thi

     

      

     hypothesistestcanwe

     toofcomponent

    ofoncontributitheofimportancethemeasureTo

    0

    0

    1

    0

    :

    :

    i

    i

     H 

     H 

     

     

     hypothesis Alternate

     hypothesis Null

    While developing a black box model from data, weare often not clear about the terms to beincluded in the model. For example, for the globaltemperature data, should be develop a linearmodel or a quadratic model?

  • 8/18/2019 Multi Linear Regression Handout 2x1

    35/67

    31-03-2016

    35

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 69

    k T  P value p

     p

    θ k 

    value p

     pn

    ii

    i

    2

    ˆ

    ˆ

    Then.statisticstesttheof

    valueobservedthebeLet 

     Otherwisereject)tofailwe(i.e.Accept

     if Reject

     toisoftest,cesignificanoflevelatand,

     then,trueisIf

    0

    20

    0

    0

    0

     H 

    t  p

    θ  H 

     H 

    T  p

     H 

     pn

    ii

    i

     pn

    ii

    i

    ,/ ˆ

    ˆ

    ˆ

      

     

     

    θ

     Automation LabIIT Bombay

    Example: Global Warming

    31-Mar-16 Regression 70

    974

    521

    214

    1

    22

    5

    10118310202110161

    1020211063410464

    101611046410294

    105821

    10305312304114

    ...

    ...

    ...

    ...ˆ

    AAP  T 

    T θ 

     

     ModelQuadratic

    0

    0

    31

    30

    :

    :

     

     

     H 

     H 

     hypothesis Alternate

     hypothesis Null

    We are interested in finding whether inclusion ofthe quadratic term is contributing to the mean of Y

  • 8/18/2019 Multi Linear Regression Handout 2x1

    36/67

    31-03-2016

    36

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 71

      0705.42,705.4ˆ

    ˆ

    139

    33

    3   T  P value p

     p

    θ k   Since

     

     rejectwe

     Since

     becesignificanofleveltheLet

     then,trueisIf

    0

    139025011

    5

    33

    3

    314211

    3

    33

    30

    9772170541011835821

    103053

    050

    1011835821

    0

     H 

    t  p

    θ 

    ~T  p

     H 

    ....

    ˆ

    .

    ..

    ˆ

    ˆ

    ˆ

    ,.  

     

     

     

    θθ

    Thus, there is strong evidence that the quadraticterm contributes to the correlation

     Automation LabIIT Bombay

    Mean Response

    31-Mar-16 Regression 72

        * θ Y  E 

    00

    00

    0zz

    zz

     

    ?RVofmeaniswhat,fixedforis,Question

      θz

    θ

    ˆˆ

    ˆ

    ,* 

    θ 

    0

    0

    0

     

      followsasusingdconstructebecanofestimatean

     knownotdoweSince

       

      j

     j

    mi

    i

    i

    T i

    i

    i

    vθ  y

     y y yV 

    V θ Y Y 

    ,

    ,

    ,,,

    ,....,,

    0

    0

    0

    02010

    0

    0

    z

    zz

    zz

     samplesgetwillweRV,anisSince

     ofsamplescollectandselectweSuppose

     modelConsider

  • 8/18/2019 Multi Linear Regression Handout 2x1

    37/67

    31-03-2016

    37

     Automation LabIIT Bombay

    Mean Response

    31-Mar-16 Regression 73

        00

    00

    0

    00

    T T 

    Y Y 

    θ  E  E      

      

     

      * ˆˆ

    ? ˆ

    ˆˆ

    zθz

    θ

     ofestimateunbiasedanIs

    variable.randomaisvector,randomaisSince

     

                            00201022

    0000

    0

    0

    00

    0

    00

    zPzzAAz

    zθzθzθz

    zθz

    T T T 

    Y Y 

    T T T T 

    Cov

    Covθ θ  E Cov

    θ  E 

        

     

      

    ˆ

    ˆˆˆˆ

    ˆˆˆ

    * * 

     or

     and 

    haveweThus

      00200

    zPzz

    θ

    T T 

    Y    θ  N         ,ˆ

    ˆ

    * ~

     thatfollowsitRV,GaussianaisSince

     Automation LabIIT Bombay

    Mean Response

    31-Mar-16 Regression 74

         pn

     pn

    Y Y 

    Y Y T 

     pn Z 

    ~/ 

    ) / ˆ( 

    / ˆ

    ˆ

    ˆ   20000

    0

    00  

      

       

     

         zPz 

           responsemeantruetheonintervalconfidence(Thus,

          

     

       

      1

    1001

    00

    2

    00

    2 000zPzzPz

      T 

     pnY Y 

     pnY   t t  P    ˆˆˆˆ

    % ) 

    ,/ ,/ 

     

    θA

    zPz

    ˆY V 

    ˆV ˆ

    V ˆ

    ˆ

    ˆˆ

    ˆ

     andwhere

     followsascomputedbecanofestimatean

     trueknowrarelywepractice,inSince, 2

     pn

    12

    0022

    2

    0

    0

     

      

     

     

  • 8/18/2019 Multi Linear Regression Handout 2x1

    38/67

    31-03-2016

    38

     Automation LabIIT Bombay

    Future Response

    31-Mar-16 Regression 75

     

      0

    0

    0

    0

    0

    0

    z

    zz

    zz

    V θ Y Y   T 

    i

     predictingininterestedwe,situationssomein

     fixedatmodeltheGiven

      θz

    θ

    ˆˆ

    ˆ

    ,* 

    T Y 

    θ 

    0

    0

    0

    followsasusingdconstructebecanofestimatean

     knownotdoweSince

    Apart from determining a single value to predict aresponse, we are interested in finding a prediction intervalthat with a given degree of confidence will contain theresponse.

     Automation LabIIT Bombay

    Future Response

    31-Mar-16 Regression 76

         and

     

    thatfollowsitSince

     Consider

    0020

    0

    2

    00

    2

    00

    0

    zPzz

    z

    T T 

    θ ~N Y 

    θ ~N Y 

    V~N 

    Y Y 

     

     

     

    ,

    ),,( 

    ˆ

       

        00200

    2

    2

    1

    1

    0

    10   zPz

    θzzz

    n

    n

    ~N Y Y 

    Y  ,....,Y Y 

       ,ˆ

    .ˆ,,,,

     

    thatfollowsitThus,

     obtaintoused

     datapasttheoftindependenisresponse)future(i.e.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    39/67

    31-03-2016

    39

     Automation LabIIT Bombay

    Future Response

    31-Mar-16 Regression 77

     

     

     pn pn

    ~T 

     pn

     Z 

    2

    00

    0

    0

    00

    0

    0

    1

    1

        

     

     

    / ˆ

    ˆ

    ˆ

    ˆ

    zPz

    θz

    zPz

    θz 

     

     

    haveweanyforThus,

      

     

       

      1

    1

    10

    200

    0

    02   pnT 

     pn  t 

    Y t  P  ,/ ,/ 

    ˆ

    ˆ

    zPz

    θz

     Automation LabIIT Bombay

    Prediction Interval

    31-Mar-16 Regression 78

        0020

    0

    0

    0

    1

    1100

    zPzθz

    z

    z

     pn

    T t 

    Y Y 

    %

      

     

     ˆˆ

    ,

    ,/ 

    :istheni.e.

     ,atresponsefuturethefor

     intervalpredictionA

      002

    0

    0

    0

    0

    1001

    zPz

    z

     pnY 

    Y  E 

      

     

     

     ˆˆ

    ,/   

     is)(i.e.,atresponsemeanthe

     onintervalconfidence:Recall

  • 8/18/2019 Multi Linear Regression Handout 2x1

    40/67

    31-03-2016

    40

     Automation LabIIT Bombay

    CI and PI

    Difference between confidence interval (CI) and

    prediction interval (PI):

    Confidence interval (CI) is on a fixed parameter

    of interest (like E[Y0] )

    Prediction interval (PI) is on a random variable

    (like Y0 )

    At any z0, the prediction interval on future

    response is wider than the confidence interval on

    the mean response.

    31-Mar-16 Regression 79

     Automation LabIIT Bombay

    Mileage Related to Engine Displacement

    31-Mar-16 Regression 80

    Consider the mileage (y, miles/gallon) and enginedisplacement (x, inch3) data for various cars. An expertcar engineer insists that the mileage is related todisplacement as: Y = mx + c

    (Montgomery and Runger, 2003)

     

    64

    412

    10347210696

    106962219039419

    04707333

    ..

    ..;ˆ

    ..ˆˆˆ

    -

    -.

    mcθ 

    V mxcY 

    T T 

    AAP 

     

    parametersmodelEstimated

    ModelProposed

  • 8/18/2019 Multi Linear Regression Handout 2x1

    41/67

    31-03-2016

    41

     Automation LabIIT Bombay

    Mileage Related to Engine Displacement

    31-Mar-16 Regression 81

    100 150 200 250 300 350 400 450 5000

    5

    10

    15

    20

    25

    30

    35

    40

    x (engine displacement)

      y   (  g  a  s  o   l   i  n  e  m   i   l  e  a  g  e   )

    Scatter, CI for Mean Response and Prediction Interval

     

    raw data

    regression model

    mean response: lower 

    mean response: upper 

    ind. pred.:lower 

    ind. pred.:upper 

     

    IntervalPredictionCIResponseMean

     x

     x x

     fromawaymove

     weasincreases

     andat narrowestisPI

    :Note

    0

     Automation LabIIT Bombay

    Assessing Quality of Fit

    31-Mar-16 Regression 82

    Y?variableresponsetheexplainadequatelytoableis

    modelfittedthewhetherassesswedoHow

    Variability Analysis

     

    RegressionResiduals

     setdataGiven

     R E 

    n

    i

    iiiiii

    n

    i

    iii

    n

    i

    iY 

    ii

    SS SS 

     y y y y y y y y

     y y y y y ySS 

    ni y

    1

    22

    1

    2

    1

    2

    ˆˆ2ˆˆ

    ˆˆ

    ,....,2,1:,z

    n

    i

    ii

    n

    i

    iii  y y y y y y11

    22   ˆV ˆˆ 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    42/67

    31-03-2016

    42

     Automation LabIIT Bombay

    Variability Analysis

    31-Mar-16 Regression 83

          01

    n

    i

    ii

    T T 

    T T T 

     yvY    ˆˆˆV ˆˆY 

    ˆY ˆY 

    θAθA

    0AθA0θAA

     thatimplies

    optimalityforconditionnecessarytheOLS,In

    ni z 

    V  z  z Y 

    i

    i

    i

     p p

    i

    i

    ,...,,

    ...

    2111

    221

     fori.e.

    formtheoftypicallyare

     regressionrmultilineainusedModels

     Note

       

     Automation LabIIT Bombay

    Variability Analysis

    31-Mar-16 Regression 84

     

      00

    0

    111

    11

    1

    1

    1

     n

    i

    i

    n

    i

    i

    n

    i

    iT 

    n

    T T 

    n

    v y yv

    v

    ˆˆ

    ˆV ˆ

    V ˆˆY 

    ...

    1

    0AθAA

    1

    A

     constrainttheincludes

      yoiptimalitforconditionnecesarytheThus,

     ismatrixofcolumnfirstthethatimpliesThis

    model}regressionbycapturedty{Variabili

     d}unexplaineleftty{Variabili yVariabilitTotal

      R E Y    SS SS SS 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    43/67

    31-03-2016

    43

     Automation LabIIT Bombay

    Variability Analysis

    31-Mar-16 Regression 85

    YY 

     E 

    YY 

     R

     E  s

    SS 

    SS  R

    SS SS 

      12

    Re

    isfittheofqualitytheofmeasuregoodA

     orasdenotedisandresidualstheof

      yvariabilitthealsoisdunexplaineleft yVariabilit

    R2 quantifies the proportion of the variability inthe response variable explained by the input variable.

    R2 is called coefficient of determination.(a direct measure of the quality of fit)

    A good fit should result in high R2

     Automation LabIIT Bombay

    Variability Analysis

    31-Mar-16 Regression 86

    10   2

    2

     R

    SS  R

    YY 

     R

     :Note

     invariationobservedTotal

    regressionbyexplainedinVariation

    The coefficient of determination close to 1 indicates

    that the model adequately captures the relevantinformation contained in the data.

    Conversely, the coefficient of determination close to 0indicates a model that is inadequate to captures therelevant information contained in the data.

    In general, it is possible to improve R2 by introducingadditional parameters in a model. However, note thatthe improved R2 can be, at times, misleading.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    44/67

    31-03-2016

    44

     Automation LabIIT Bombay

    Variability Analysis

    31-Mar-16 Regression 87

     parametersmoreofinclusionthrough

     improvethatmodelapenalizes

    modelinvariableofno.ofregardless

     constantremainstermThis

     squaremeanResidual

    fitmodelofmeasurealternateAn

    2 R R

    nSS 

     pnSS 

    nSS 

     pnSS  R

    adj

     E 

     E adj

    :

    :)1/(

    :)/(

    )1/(

    )/(1

    2

    2

     .parametersofnumberexcessiveusingwithoutadequately

    capturedbeenhasdatain yvariabilitthethatindicates

     andofvalauescomparableandhighRelatively   22adj

     R R

     Automation LabIIT Bombay

    Variability Analysis: Examples

    31-Mar-16 Regression 88

    77.0

    72.955,54.1237

    2   R

     SS SS   RY 

     :examplemileageGasoline

    66680,6715.0

    1989.2,6934.6

    616406192.0

    5491.2,6934.6

    22

    22

    . R R

     SS SS 

    .  R R

     SS SS 

    adj

     RY 

    adj

     RY 

     

    ModelQuadratic

     ,

    ModelLinear

     :exampleWarmingGlobal

    10   2  R :Note

  • 8/18/2019 Multi Linear Regression Handout 2x1

    45/67

    31-03-2016

    45

     Automation LabIIT Bombay

    Example: Multi-linear Regression

    31-Mar-16 Regression 89

    Boiling points of a series of hydrocarbons

    Ref.: Ogunnaike, B. A., Random Phenomenon, CRC Press, London, 2010

     Automation LabIIT Bombay

    Candidate Models

    31-Mar-16 Regression 90

    0 1 2 3 4 5 6 7 8 9 10-250

    -200

    -150

    -100

    -50

    0

    50

    100

    150

    200

    250

       B  o   i   l   i  n  g   P  o   i  n   t   (   0   C   )

    n, No. of Carbon Atoms

     

    Linear ModelT = 39*n - 170Quadratic Model

    T = - 3*n2 + 67*n - 220

    Data 1

    Linear Model

      Quadratic Model

         

     

    2nnT 

    bnaT 

     :ModelQuadratic

     :ModelLinear

  • 8/18/2019 Multi Linear Regression Handout 2x1

    46/67

    31-03-2016

    46

     Automation LabIIT Bombay

    Raw Model Residues

    31-Mar-16 Regression 91

    0 2 4 6 8-8

    -6

    -4

    -2

    0

    2

    4

    6

    8

    10

       M  o   d  e   l   R  e  s   i   d  u  e  v   (   k   )   (   0   C   )

    n, No. of Carbon Atoms

    vn.-n..-T      )(02383)(6667661429218   2 :ModelQuadratic

    33736ˆ   . 

     Automation LabIIT Bombay

    Confidence Interval

    31-Mar-16 Regression 92

     :3Parameter

     :2Parameter

     :1Parameter and

     forintervalconfidenceThus,

    7670128074

    2547807955

    415195871240

    57062050

    1001

    50250

    .-.-

    ..

    . , -.-

    .t 

    i

    ,

    ,

    .

    ,.

     

      

    0

    3

    *

    3  θ 

    θ rd 

     hypothesistestcanwe

    modelquadraticthetoofcomponent

    ofoncontributitheofimportancethemeasureTo

     

    006000536008930

    053605060091070

    089309107094641

    )(  1

    ..-.

    .-..-

    ..-.T AAP

  • 8/18/2019 Multi Linear Regression Handout 2x1

    47/67

    31-03-2016

    47

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 93

    0

    0

    31

    30

    :

    :

     

     

     H 

     H 

     hypothesis Alternate

     hypothesis Null

     OtherwiserejecttoFail

     if Reject

     toisoftest,cesignificanoflevelatand,

     then,trueisIf

    0

    50050

    3

    0

    0

    5

    33

    0

    0321448890

    010

    48890

     H 

    .t .

    θ  H 

     H 

    T . p

     H ii

    ,.

    ˆ

    .

    ˆ

    ˆ

     

     

    θθ

    rejectedishypothesisnulltheSince

     :StatisticsTest

    ,0321.4

    1845648890

    02383

    48890

    ˆ3

    ..

    .-

    .

    θ k 

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 94

    )01.0(

    00160184562

    18456

    5

    ce significanof level value p

    ..T  P value p

    .k

    value p

    :Note

    .statisticstesttheofvalueobservedtheis

    Thus, there is strong evidence that the quadraticterm contributes to the correlation between the

    boiling point and the carbon number

    9958.0,997.0

    96980974.0

    22

    22

    22

    adj

    adj

    adj

     R R

    . R R

     R R

     :ModelQuadratic 

    : ModelLinear 

    andinDeterminatoftCoefficien

  • 8/18/2019 Multi Linear Regression Handout 2x1

    48/67

    31-03-2016

    48

     Automation LabIIT Bombay

    Analysis of Residuals

    31-Mar-16 Regression 95

    Linear ModelNormalized

    Residuals showa pattern

    Quadratic ModelNormalized

    Residuals are

    Randomly spreadBetween +/- 2

    0 2 4 6 8-5

    -4

    -3

    -2

    -1

    0

    1

    2

    3

       N  o  r  m  a   l   i  z  e   d   M  o   d  e   l   R  e  s   i   d  u  a   l  v   (   k   )

    n, No. of Carbon Atoms

     

    Linear Model

    Quadratic Model

     Automation LabIIT Bombay

    Example: Multi-linear Regression

    31-Mar-16 Regression 96

    Laboratory experimentaldata on Yield obtained from acatalytic process at various

    temperatures and pressures(n = 32)

    21   21307570975ˆ   x. x.. y  

    modellinear-multiFitted

    Ref.: Ogunnaike, B. A., RandomPhenomenon, CRC Press, London,2010

  • 8/18/2019 Multi Linear Regression Handout 2x1

    49/67

    31-03-2016

    49

     Automation LabIIT Bombay

    Raw Model Residues

    31-Mar-16 Regression 97

    94150

    21203

    07570

    866075

    .

    .

    .

    .

     

     

     

    ˆ

    ˆ

    0 5 10 15 20 25 30 35-2.5

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    2.5

       M  o   d  e   l   R  e  s   i   d  u  e  v   (   k   )

    Sample No.

     Automation LabIIT Bombay

    Confidence Interval

    31-Mar-16 Regression 98

     

    400000065000

    000010009250

    650000925064379

    )(   1

    ...-

    ...-

    .-.-.T AAP

    4349941

    13700150

    8581896904522050

    95

    290250

    ..

    . ,.

    ...t 

    i

    ,

    ,.

    ,.

     :3Parameter

    :2Parameter

     :1Parameter and

     forintervalconfidenceThus,

     

     

    0

    2

    2 *  hypothesistestcanwe

    modelproposedthetoofcomponent

    ofoncontributitheofimportancethemeasureTo

    θnd 

  • 8/18/2019 Multi Linear Regression Handout 2x1

    50/67

    31-03-2016

    50

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 99

    0

    0

    21

    20

    :

    :

     

     

     H 

     H 

     hypothesis Alternate

     hypothesis Null

     OtherwiserejecttoFail

     if Reject

     toisoftest,cesignificanoflevelatand,

     then,trueisIf

    0

    290250

    2

    0

    0

    29

    2

    22

    2

    0

    0452202980

    050

    02980

     H 

    .t .

    θ  H 

     H 

    T . p

     H 

    ,.

    ˆ

    .

    ˆ

    ˆ

     

     

    θθ

    rejectedishypothesisnulltheSince

     :StatisticsTest

    ,04522

    5439202980

    07570

    02980

    ˆ2

    .k 

    ..

    .

    .

    θ k 

     Automation LabIIT Bombay

    Hypothesis Testing

    31-Mar-16 Regression 100

    )05.0(

    01660543922

    54392

    29

    ce significanof level value p

    ..T  P value p

    .k

    value p

    :Note

    .statisticstesttheofvalueobservedtheis

     OtherwiserejecttoFail

     if Reject

     toisoftest,cesignificanoflevelAt

    0

    290050

    2

    0

    0

    7564202980

    010

     H 

    .t .

    θ  H 

     H 

    ,.

    ˆ

    )01.0(

    ,75642

    ce significanof level value p

    .k 

    :Note

    hypothesisnullrejecttofailweSince

  • 8/18/2019 Multi Linear Regression Handout 2x1

    51/67

    31-03-2016

    51

     Automation LabIIT Bombay

    Nonlinear in Parameter Models

    31-Mar-16 Regression 101

     

       

       

       

        

    Re

    /RePr 

    ScSh

     Nu  pa

    transfermassandheatinmodelsbasedgroupessDimensionl

            n A RT  E 

     A   C ek -R  /

    0:EquationsRateReaction

      

     

     x

     xY 

    )1(1:ModelVLESimplified

    EquationAntoine 

    modelWaalDerVan 

    modelKwongRedlich

     nscorrelatiomicThermodyna

     

     

     

    C T 

     B A P 

    a

    bV 

     RT  P 

    bV T V a

    bV  RT  P 

    vln 

    2

     Automation LabIIT Bombay

    Nonlinear-in-Parameter Models

    31-Mar-16 Regression 102

     parametersTrue residual,Model

     Defining

     FormModelAbstract

    ::

    ),(

    ....

    ....

    *

    *

    21

    21

    θ

    θx

    θ

    x

     

     

       

     g Y 

     x x x

    m

    n

    )allfor( 

    squareleastWeighted

     

    squareleastOrdinary

     EstimationParameter

    iww Min

     Min

    i

    n

    i

    iiOLS 

    n

    i

    iOLS 

    0)(ˆ

    )(ˆ

    1

    2

    1

    2

     

      

     

     

      

     

     

     

    θθ

    θθ

    The parameter estimationproblem has to be solved

    using numericaloptimization tools.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    52/67

    31-03-2016

    52

     Automation LabIIT Bombay

    Regression Problem Formulation

    31-Mar-16 Regression 103

    d.distribute yidenticallandtindependenarefor

     errormodelingrandomeachthatassumedisIt

    ni

     g Y    iii

    i

    ,....,

    ,

    21

      θx 

     

     for 

    thatassumedfurtherisIt

    ni N i   ,....2,1),0(~  2   

      ni ,θ  g Y 

    n

    ni y

    i

    i

    i

    i

    i y

    ,...,,

    ,...,,:,

    21

    21

     for 

    equationsmodeland

     sexperimenttindependenfromgenerated

     

    datasetaconsiderNow

     x

    xS

     Automation LabIIT Bombay

    Consequences of Gaussianity

    31-Mar-16 Regression 104

     

     

    θx   ,

    exp 

    ....

    ,....,,

    ,...,,:

    ,....,,

    i

    ii

    ii

    n

    n

    i

     g  ye

    e|θ e N 

    |θ e N |θ e N |θ e N 

    |θ eee f θ  L

    θ 

    ni

    i

    n

    n

     

      

     

    2

    2

    21

    21

    22

    1

    21

    21

    21

       

     

     

       

       

     where

     followsasparametersunknownfor

     functionlikelihoodtheconstructcanwei.i.d.,and

     normalarethatassumptiontheUnder

    n

    i

    ienn

     L1

    2

    2

    2

    2

    1)log(

    2)2log(

    2)(log

       θ

  • 8/18/2019 Multi Linear Regression Handout 2x1

    53/67

    31-03-2016

    53

     Automation LabIIT Bombay

    Maximum Likelihood Estimation

    31-Mar-16 Regression 105

     

    n

    i

    i

    i ,θ  g  y

    θ 

     Minθ  L

    θ 

     Minθ 

    1

    2xln ˆ

     thatimpliesThis

         

    n

    i

    i

    i  θ  g  y

    nnθ  L

    1

    2

    2

    2

    2

    1

    22

    2,ln ln ln    x

       

      OLS n

    i

    i

    i ML  θ  ,θ  g  y

    θ  Minθ θ    ˆˆˆ  

    1

    2x

    θ  isofestimatepointlikelihoodmaximumtheThus,

    Thus, under the Gaussian assumption, the OLS estimator turns outto be identical to the Maximum Likelihood (ML) estimator.

     Automation LabIIT Bombay

    Gauss-Newton Method

    31-Mar-16 Regression 106

         

       

    ni

    V θ 

    θ θ θ 

    θ  g θ  g Y 

    θ θ θ 

    θ 

    θ θ 

    θ  g θ  g θ  g 

    θ 

    i

    k ik i

    i

    k k 

    k ik ii

    ,....,,

    ,,

    ,

    ,,,

    21

     for

     asedapproximatbecanequationsmodelthesmall,For

     solution,guessaofodneighborhothein

     ionapproximatbasedseriesTaylorConsider

    xx

    θ

    θ

    xxx

         

      niV θ Y θ θ θ 

    θ  g θ  g Y Y 

    i

    k T k ik 

    i

    k ik ik i

    i

    i

    ,...,,

    ,,

    ) ,( ) ( 

    ,) ( 

    21

     for 

    andDefining

    z

    xzx

  • 8/18/2019 Multi Linear Regression Handout 2x1

    54/67

    31-03-2016

    54

     Automation LabIIT Bombay

    Gauss-Newton Method

    31-Mar-16 Regression 107

    n

    T k n

    T k 

    n

    v

    v

    θ 

     y

     y

    .............

    .....,

    ,

      1

    1

    1

    A

    z

    z

     equationsmodelStacking

          k T k k T k k k 

    k k k k 

    θ 

    θ 

     N 

    θ 

    Y ˆ

    ,,~V 

    V Y 

    AAA

    A

    1

    20

     isofestimate

    likelihoodmaximumthe

     thatassumptiontheUnder

     

     

    n

    i

    k i

    ik 

    k k 

    k k k 

    θ  g  y

    θ θ θ 

    θ 

    1

    21

    1

    0

    ,

    ˆ

    x

    θ

       

       where )(tolerance

     satisfiediscriterionnterminatio

     folloingthetilliterationsthecontinueand

     asforguessnewa

     generatecanwe,guess,initialanfromstartingThus,

     Automation LabIIT Bombay

    Covariance of Parameter Estimate

    31-Mar-16 Regression 108

       

     

     N  N  N  N 

     N T  N 

     N T  N 

     N 

     N 

    θ 

     pn

    θ Cov

    θ 

     N 

     N 

     N 

    ˆY V ˆ

    V ˆV ˆˆ

    ˆ

    ˆ

    ) ( 

    A

    AA

     

    followsascontructedbevanestimateAn

     thatfollowsitOLS,ofpropertiestheFrom

     .terminatesmethodNewtonGaussthewhen

     obtainedsolutionoptimumtherepresentLet

    12

    2

    12

     

     

     

        1

     N T  N 

     N  N 

     N  N 

     N θ Cov

    Cov

    θ Covθ 

    θ θ θ 

    AA

    θ

     ˆ

    ),ˆ( 

    ˆˆ

    ˆˆ

     i.e.ofthattoidenticalis

     thatarguecanweRVofntranslatio

    onlyissolutionoptimumtheSince

  • 8/18/2019 Multi Linear Regression Handout 2x1

    55/67

    31-03-2016

    55

     Automation LabIIT Bombay

    Confidence Internals on Parameters

    31-Mar-16 Regression 109

     

      pn~

     pn

     pn

    V  pn

     N 

     N 

     N 

     N 

    22

    2

    2  

     

       

     

        ˆˆ   ˆ~

    ˆ thatfollowsit,Since

     

      pn

     pn

    V V 

     N 

    iiV ii

     N iiV 

    ii ~T  pn

     Z  p

     p   N  N 

     N 

     N 

      2  

      

      

     

     / 

    / ˆ

    / ˆ

    ˆ

    ˆ

    ˆˆ

    *  θθ and

     forintervalconfidenceThus,

        

      

       

      1

    1001

    22

     N 

    iiV  pnii

     N 

    iiV  pni

    i

     pt θ  pt θ  P    N  N    ˆ,/ * 

    ˆ,/ 

    ˆˆˆˆ

     

     N iiV ii N 

     N T  N  N 

     pθ ~N 

    θ  E 

     N 

    2

    1

     ,ˆ,V ˆ

    ,

    ˆ

    θ

    AAP

    θ

     thatfollowsitofondistributi

     GaussianofassumptointhefromDefining

     .andunbiasedisestimatorMLthethatassumeusLet

     Automation LabIIT Bombay

    Linearizing Transformations

    31-Mar-16 Regression 110

    In some special cases, a linear-in-parameter formcan be derived using variable transformations

        V  Nu  pa              /logRelogPr loglog

      V  Nu     RelogPr loglog     

      V C nT  R

     E k r   A A  

      

         )log(

    1)log(log 0

     

    Defining:ModelVLESimplified

    V  x y

     / 

     

      

     

     

      

     

    11

    11

    1

     

      

    OLS/WLS methods developed for linear-in-parametermodels can be used for estimating parameters of the

    transformed model.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    56/67

    31-03-2016

    56

     Automation LabIIT Bombay

    Nonlinear in Parameter Models

    31-Mar-16 Regression 111

    d.transformebecannot)(ResidualModelOriginalThe

    Difficulty

     

      

      

     

     

      

     

    n

    i

    i

    n

    i

    iT 

    θ 

     Minθ 

     θ 

    θ θ 

    V θ 

     Minθ 

    1

    2

    1

    2

     ˆ

    ˆ

    ˆˆ

    ˆ

     

    forsolvingbyestimatingtoequivalentNOT is

     dtransformethefromestimatesrecoveringand

     

    forsolvingMoreover,

    Parameters estimated using the transformed model serve asa good initial guess for solving the nonlinear optimization problem.

     Automation LabIIT Bombay

    A Fix using WLS

    31-Mar-16 Regression 112

       

      

     

    n

    i

    iiWLS 

    iii

    wθ 

     Minθ 

    V w

    1

    2

    22

     

     

    ˆ

     forsolveand

     eapproximattotryweapproach,thisBy

    21

    0

    00

    00

    0

    21

     

     

     

     

     

      

     

     

      

     

     

      

     

    i

    ii

    i

    i

    i

    i

    ii

    i

    i

    ii

    ii

    ii

     g w

     g V 

     g  g V 

    ni g V 

     

      

     

      

      

     

     

     

     Choose

     ofnbhdtheinexpansionseriesTaylorUsing

     for asitdenoteusLetoffinctioncomplexais:Note

    ,...,,.

  • 8/18/2019 Multi Linear Regression Handout 2x1

    57/67

    31-03-2016

    57

     Automation LabIIT Bombay

    WLS Example

    31-Mar-16 Regression 113

            V  f  f Y 

    Y Y 

     f  f  f Y 

     p p

     p p

    xx

    xxx

    1121

    121132

    ln ....ln ln ln 

    ˆ

    ...ˆ

       

     

           

    modeldtransformeand

     modelConsider

     

    2

    00

    21

    1

    2

    1

    2

    11ii

    iiiV i

    i

    iiiiii

     pT 

    n

    i

    iiT 

    n

    i

    iT 

     ywY Y 

    Y Y Y Y V 

    θ 

     y yθ 

     Minv

    θ 

     Minθ 

    ii

     

      

     

     

      

     

     

      

     

     Choose

     

    problemestimationparameterdTransforme

      

     

       

     

    ˆ

    ˆln ˆln ˆln ln 

    ....ln 

    ˆln ln ˆ

    ,

     Automation LabIIT Bombay

    WLS Example

    31-Mar-16 Regression 114

      T nn

    n

    i

    iiiT 

    n

    i

    iiT 

     y y y

     , y y ydiag 

     y y yθ 

     Minv y

    θ 

     Minθ 

    ln ....ln ln Y 

    ....

    ˆln ln ( ˆ

    21

    22

    2

    2

    1

    1

    22

    1

    22