Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual...

38
Regression “A new perspective on freedom”

Transcript of Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual...

Page 1: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Regression

“A new perspective on freedom”

Page 2: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Classification

Page 3: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

?Cat Dog

Page 4: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Cleanliness

Size

Page 5: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

?

$ $$ $$$ $$$$

Page 6: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Regression

Page 7: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

$

$$

$$$

$$$$

Price

Top speed

x

y

Page 8: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Regression

Data

Goal: given , predict

i.e. find a prediction function

(xi ;yi )i=1:::n

y(x)

x y

Page 9: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Nearest neighbor

-5 0 5 10 15 20 25-10

-5

0

5

10

15

Page 10: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Nearest neighbor

• To predict x– Find the data point xi closest to x

– Choose y = yi

+ No training

– Finding closest point can be expensive

– Overfitting

Page 11: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Kernel Regression

• To predict X– Give data point xi weight

– Normalize weights

– Let y=nX

i=1

m0iyi

k(x) = e.g. k(x) = e¡x 2

2¾2

m0i =

miP nj =1mj

mi = k(x ¡ xi )

Page 12: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Kernel Regression

-5 0 5 10 15 20 25-10

-5

0

5

10

15

[matlab demo]k

Page 13: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Kernel Regression

+ No training

+ Smooth prediction

– Slower than nearest neighbor

– Must choose width of

y(x) =P

i yik(xi ¡ x)P

i k(xi ¡ x)

k

Page 14: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Linear regression

Page 15: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Linear regression

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

[start Matlab demo lecture2.m]

Given examples

Predict given a new point

(xi ;yi )i=1:::n

yn+1 xn+1

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

xn+1

yn+1

Page 16: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

Linear regression

Predictionyi = w0 + w1xi

Predictionyi = w0 + w1xi;1 + w2xi;2

=³1 xi;1 xi;2

´0

B@w0w1w2

1

CA

= X >i w

xn+1

yn+1

Page 17: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Linear Regression

yy Error or “residual”

Prediction

Observation

x

X i =

0

B@

1xi;1xi;2

1

CA

Sum squared errorX

i(X >

i w ¡ yi)2

y = X >i w

Page 18: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Linear Regression

n

d Solve the system (it’s better not to invert the matrix)

E =X

i

(X >i w¡ yi )2 = kXw¡ yk22

= w>X >Xw¡ 2y>Xw+kyk22

A b>

X =

0

B@

¡ X >1 ¡

¡ X >2 ¡: : :

1

CA

@E@w

=2Aw¡ 2b

Aw= b

Page 19: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

LMS Algorithm(Least Mean Squares)

where

Online algorithm

E =X

i

(X >i w¡ yi )2 =

X

i

E i

@E@w

=X

i

@E i

@w

@E i

@w

@E@w

@E i

@w=

@@w

(X >i w¡ yi )2

= 2X i (X >i w¡ yi )

®@E@w

wX i

X >i w= yi

wt+1 =wt +®X i (yi ¡ X >i w

t)

Page 20: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Beyond lines and planes

everything is the same with

still linear in

0 10 200

20

40

yi =w0+w1xi +w2x2i

w

X i =

0

@1xix2i

1

A

Page 21: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Linear Regression [summary]

n

d

Let

For example

Let

Minimize by solvingkX w ¡ yk22³X >X

´w = X >y

y =

0

BB@

y1y2: : :

1

CCA

Given examples

X >i =

³1 xi;1 xi;2 x2i;1 x2i;2 xi;1xi;2

´X >i = (f 1(xi) f 2(xi) : : : f d(xi))

X =

0

BB@

¡ X >1 ¡

¡ X >2 ¡

: : :

1

CCA

Predict yn+1 = X >n+1w

(xi ;yi )i=1:::n

Page 22: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Probabilistic interpretation

Likelihood

X >i wyi

xi

yi jxi » N (X >i w;¾

2)

L =Y

iexp ¡

12¾2

(X >i w ¡ yi)

2 = exp ¡12¾2

X

i(X >

i w ¡ yi)2

= exp ¡12¾2

kX w ¡ yk2

Page 23: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Overfitting

0 2 4 6 8 10 12 14 16 18 20-15

-10

-5

0

5

10

15

20

25

30

[Matlab demo]

Degree 15 polynomial

Page 24: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Ridge Regression(Regularization)

0 2 4 6 8 10 12 14 16 18 20-10

-5

0

5

10

15Effect of regularization (degree 19)

with “small”²Minimize12kX w ¡ yk22+ ²kwk22

A = X >X

b= X >y

(A + ²I )w = bSolve

Let

Page 25: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Probabilistic interpretation

yi jxi » N (X >i w;¾

2)Likelihood

Prior

P (wjx1; : : :xn) =P (w;x1; : : :xn)P (x1; : : :xn)

/ P (w;x1; : : :xn)

Posterior

w » N

Ã

0;¾2

²

!

P (w;x1; : : :xn) = exp ¡½ ²2¾2

kwk22

¾Y

iexp ¡

12¾2

(X >i w ¡ yi)

2

= exp ¡12¾2

2

4²kwk22+X

i(X >

i w ¡ yi)2

3

5

Page 26: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Locally Linear Regression

Page 27: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Global temperature increase

Page 28: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Locally Linear Regression

• To predict X– Give data point xi weight

– Let

– Let

w=Argminw

nX

i=1

mi (X >i w¡ yi )2

mi = k(xn+1 ¡ xi )

k(x) = e.g. k(x) = e¡x 2

2¾2

yn+1 =X >n+1w

Page 29: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Locally Linear Regression

+ Good even at the boundary (more important in high dimension)

– Solve linear system for each new prediction

– Must choose width of k

To minimize

Solve³X >M X

´w = X >M y

Predict yn+1 = X >n+1w

nX

i=1

mi (X >i w¡ yi )2

where M =

0

@m1

m2

m3

1

A

Page 30: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionGaussian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

Page 31: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionLaplacian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

Page 32: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

L1 Regression

Page 33: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Sensitivity to outliers

yi

High weight given to outliers

010

2030

40

0

10

20

30

5

10

15

20

25

Temperature at noon

x>i w

yix>i w

E =X

i(x>i w ¡ yi)

2 =X

iE i E i

@E i@yi Influence

function

Page 34: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

s.t. x>i w ¡ yi · ci 8i

yi ¡ x>i w · ci 8i

L1 Regression

E 0 =X

ijx>i w ¡ yi j

=X

iE 0i yix>i w

Linear program

E iE 0i

yix>i w

@E 0i

@yiminw;c

X

ici

Influence function

Page 35: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Spline RegressionRegression on each interval

5200 5400 5600 5800

50

60

70

Page 36: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Spline RegressionWith equality constraints

5200 5400 5600 5800

50

60

70

Page 37: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

Spline RegressionWith L1 cost

5200 5400 5600 5800

50

60

70

Page 38: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.

To learn more

• The Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer