07b_regression.pdf

download 07b_regression.pdf

of 24

Transcript of 07b_regression.pdf

  • 7/29/2019 07b_regression.pdf

    1/24

    Linear Re ression

    J eff Howbert Introduction to Machine Learning Winter 2012 1

  • 7/29/2019 07b_regression.pdf

    2/24

    J eff Howbert Introduction to Machine Learning Winter 2012 2

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 7/29/2019 07b_regression.pdf

    3/24

    J eff Howbert Introduction to Machine Learning Winter 2012 3

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 7/29/2019 07b_regression.pdf

    4/24

    J eff Howbert Introduction to Machine Learning Winter 2012 4

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 7/29/2019 07b_regression.pdf

    5/24

    J eff Howbert Introduction to Machine Learning Winter 2012 5

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 7/29/2019 07b_regression.pdf

    6/24

    J eff Howbert Introduction to Machine Learning Winter 2012 6

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 7/29/2019 07b_regression.pdf

    7/24

    Loss function

    z Suppose target labels come from setY

    Binar classification: Y = 0, 1

    Regression: Y = (real numbers)z A loss function maps decisions to costs:

    defines the penalty for predicting when the

    true value is .

    ),( yyL y

    yz an ar c o ce or c ass ca on:

    0/1 loss (same asmisclassification error)

    =

    =otherwise1

    if0),(1/0

    yyyyL

    z Standard choice for regression: 2

    J eff Howbert Introduction to Machine Learning Winter 2012 7

    squared loss,

  • 7/29/2019 07b_regression.pdf

    8/24

    Least squares linear fit to data

    z Most popular estimation method is least squares:

    Determine linear coefficients , that minimize sumof squared loss (SSL).

    Use standard (multivariate) differential calculus:

    eren a e w respec o , find zeros of each partial differential equation

    solve for , z One dimension:

    N

    jj Nxy samplesofnumber))((SSL2 =+=

    j

    x, yx,yxyx

    yxtrainingofmeans

    ]var[

    ],cov[1

    ===

    =

    J eff Howbert Introduction to Machine Learning Winter 2012 8

    ttt xxy samplefor test +=

  • 7/29/2019 07b_regression.pdf

    9/24

    Least squares linear fit to data

    z Multiple dimensions

    To sim lif notation and derivation, chan e to ,

    and add a new feature x0 = 1 to feature vector x:x =+=

    d

    ii xy 0 1

    Calculate SSL and determine :=i 1

    N d

    j

    j i

    iij

    y

    xy

    y

    XyXy

    responsestrainingallofvector

    )()()(SSL1

    T2

    0

    =

    === =

    j

    yXXX

    xX

    )(

    samplestrainingallofmatrix

    T1T=

    =

    J eff Howbert Introduction to Machine Learning Winter 2012 9

    ttty xx samplefor test =

  • 7/29/2019 07b_regression.pdf

    10/24

    Least squares linear fit to data

    J eff Howbert Introduction to Machine Learning Winter 2012 10

  • 7/29/2019 07b_regression.pdf

    11/24

    Least squares linear fit to data

    J eff Howbert Introduction to Machine Learning Winter 2012 11

  • 7/29/2019 07b_regression.pdf

    12/24

    Extending application of linear regression

    zThe inputs X for linear regression can be:

    Ori inal uantitative in uts

    Transformation of quantitative inputs, e.g. log, exp,square root, square, etc.

    Polynomial transformation

    example: y =0

    +1x +

    2x2 +

    3x3

    Dummy coding of categorical inputs

    example: x3 =x1 x2

    zThis allows use of linear regression techniques to fit much

    J eff Howbert Introduction to Machine Learning Winter 2012 12

    more complicated non-linear datasets.

  • 7/29/2019 07b_regression.pdf

    13/24

    Example of fitting polynomial curve with linear model

    J eff Howbert Introduction to Machine Learning Winter 2012 13

  • 7/29/2019 07b_regression.pdf

    14/24

    Prostate cancer dataset

    z 97 samples, partitioned into:

    67 trainin sam les

    30 test samplesz Eight predictors (features):

    6 continuous (4 log transforms)

    1 binary 1 ordinal

    z Continuous outcome variable:

    l psa: og pros a e spec c an gen eve

    J eff Howbert Introduction to Machine Learning Winter 2012 14

  • 7/29/2019 07b_regression.pdf

    15/24

    Correlations of predictors in prostate cancer dataset

    J eff Howbert Introduction to Machine Learning Winter 2012 15

  • 7/29/2019 07b_regression.pdf

    16/24

    Fit of linear model to prostate cancer dataset

    J eff Howbert Introduction to Machine Learning Winter 2012 16

  • 7/29/2019 07b_regression.pdf

    17/24

    Regularization

    z Complex models (lots of parameters) often prone to overfitting.

    z Overfitting can be reduced by imposing a constraint on the overallmagn u e o e parame ers.

    z Two common types of regularization in linear regression:

    L re ularization (a.k.a. rid e re ression). Find which minimizes:

    = ==+

    N d

    ii

    d

    iiij

    xy1 1

    22

    0 )(

    is the regularization parameter: bigger imposes more constraint

    L1 regularization (a.k.a. lasso). Find which minimizes:+

    N d

    i

    d

    ii xy2 ||)(

    J eff Howbert Introduction to Machine Learning Winter 2012 17

    = ==j ii1 10

  • 7/29/2019 07b_regression.pdf

    18/24

    Example of L2 regularization

    L2 regularization

    towards (but not to)zero, and towards

    each other.

    J eff Howbert Introduction to Machine Learning Winter 2012 18

  • 7/29/2019 07b_regression.pdf

    19/24

    Example of L1 regularization

    L1 regularization shrinkscoefficients to zero at

    different rates; differentvalues of give models

    w eren su se s ofeatures.

    J eff Howbert Introduction to Machine Learning Winter 2012 19

  • 7/29/2019 07b_regression.pdf

    20/24

    Example of subset selection

    J eff Howbert Introduction to Machine Learning Winter 2012 20

  • 7/29/2019 07b_regression.pdf

    21/24

    Comparison of various selection and shrinkage methods

    J eff Howbert Introduction to Machine Learning Winter 2012 21

  • 7/29/2019 07b_regression.pdf

    22/24

    L1 regularization gives sparse models, L2 does not

    J eff Howbert Introduction to Machine Learning Winter 2012 22

  • 7/29/2019 07b_regression.pdf

    23/24

    Other types of regression

    z In addition to linear regression, there are:

    -

    decision trees

    neural networks

    su ort vector machines

    locally linear regression

    etc.

    J eff Howbert Introduction to Machine Learning Winter 2012 23

  • 7/29/2019 07b_regression.pdf

    24/24

    MATLAB interlude

    _ _ .

    art

    J eff Howbert Introduction to Machine Learning Winter 2012 24