07b_regression.pdf

7/29/2019 07b_regression.pdf

1/24

Linear Re ression

J eff Howbert Introduction to Machine Learning Winter 2012 1


2/24


slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)


3/24




4/24




5/24




6/24




7/24

Loss function

z Suppose target labels come from setY

Binar classification: Y = 0, 1

Regression: Y = (real numbers)z A loss function maps decisions to costs:

defines the penalty for predicting when the

true value is .

),( yyL y

yz an ar c o ce or c ass ca on:

0/1 loss (same asmisclassification error)

=

=otherwise1

if0),(1/0

yyyyL

z Standard choice for regression: 2


squared loss,


8/24

Least squares linear fit to data

z Most popular estimation method is least squares:

Determine linear coefficients , that minimize sumof squared loss (SSL).

Use standard (multivariate) differential calculus:

eren a e w respec o , find zeros of each partial differential equation

solve for , z One dimension:

N

jj Nxy samplesofnumber))((SSL2 =+=

j

x, yx,yxyx

yxtrainingofmeans

]var[

],cov[1

===

=


ttt xxy samplefor test +=


9/24


z Multiple dimensions

To sim lif notation and derivation, chan e to ,

and add a new feature x0 = 1 to feature vector x:x =+=

d

ii xy 0 1

Calculate SSL and determine :=i 1

N d

j

j i

iij

y

xy

y

XyXy

responsestrainingallofvector

)()()(SSL1

T2

0

=

=== =

j

yXXX

xX

)(

samplestrainingallofmatrix

T1T=

=


ttty xx samplefor test =


10/24




11/24




12/24

Extending application of linear regression

zThe inputs X for linear regression can be:

Ori inal uantitative in uts

Transformation of quantitative inputs, e.g. log, exp,square root, square, etc.

Polynomial transformation

example: y =0

+1x +

2x2 +

3x3

Dummy coding of categorical inputs

example: x3 =x1 x2

zThis allows use of linear regression techniques to fit much


more complicated non-linear datasets.


13/24

Example of fitting polynomial curve with linear model



14/24

Prostate cancer dataset

z 97 samples, partitioned into:

67 trainin sam les

30 test samplesz Eight predictors (features):

6 continuous (4 log transforms)

1 binary 1 ordinal

z Continuous outcome variable:

l psa: og pros a e spec c an gen eve



15/24

Correlations of predictors in prostate cancer dataset



16/24

Fit of linear model to prostate cancer dataset



17/24

Regularization

z Complex models (lots of parameters) often prone to overfitting.

z Overfitting can be reduced by imposing a constraint on the overallmagn u e o e parame ers.

z Two common types of regularization in linear regression:

L re ularization (a.k.a. rid e re ression). Find which minimizes:

= ==+

N d

ii

d

iiij

xy1 1

22

0 )(

is the regularization parameter: bigger imposes more constraint

L1 regularization (a.k.a. lasso). Find which minimizes:+

N d

i

d

ii xy2 ||)(


= ==j ii1 10


18/24

Example of L2 regularization

L2 regularization

towards (but not to)zero, and towards

each other.



19/24

Example of L1 regularization

L1 regularization shrinkscoefficients to zero at

different rates; differentvalues of give models

w eren su se s ofeatures.



20/24

Example of subset selection



21/24

Comparison of various selection and shrinkage methods



22/24

L1 regularization gives sparse models, L2 does not



23/24

Other types of regression

z In addition to linear regression, there are:

-

decision trees

neural networks

su ort vector machines

locally linear regression

etc.



24/24

MATLAB interlude

_ _ .

art


07b_regression.pdf

Documents

Transcript of 07b_regression.pdf