Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced...

30
Advanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya Institute of Technology Ichiro Takeuchi, Nagoya Institute of Technology 1/1

Transcript of Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced...

Page 1: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Advanced Lecture on

Neural Information Processing Systems

(Lecture 03)

Ichiro Takeuchi

Nagoya Institute of Technology

Ichiro Takeuchi, Nagoya Institute of Technology 1/1

Page 2: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Nonlinear modeling

Consider training a model for relationship between elapsedtime after collision (x) and passenger’s head acceleration (y)

Ichiro Takeuchi, Nagoya Institute of Technology 2/1

Page 3: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Nonlinear modeling

Ichiro Takeuchi, Nagoya Institute of Technology 3/1

Page 4: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Linear modeling is not helpful here

Ichiro Takeuchi, Nagoya Institute of Technology 4/1

Page 5: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

We want something like this

-150

-100

-50

0

50

100

0 10 20 30 40 50 60

Acc

eler

atio

n[G

]

Time[ms]

Ichiro Takeuchi, Nagoya Institute of Technology 5/1

Page 6: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Which nonlinear model should we use?

Consider a single input case x ∈ R, y ∈ R▶ y = w1 log x

▶ y = w1

√x+ w2 exp(−x2)

▶ y = w1 cos 2πx+ w2 sin 2πx2 + w3

1x

▶ y = log(w1 + w2x)

▶ y = w1+xexp(−w2x2)

▶ y = sin 2π(w1 + w2x) + cos 2π(w3 + w4x)

What’s the difference between the first and the latter 3models?

Ichiro Takeuchi, Nagoya Institute of Technology 6/1

Page 7: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Basis function approach

▶ For single input case, i.e., when x ∈ R, basis functionmodel is written as

y = f(x) = w0 + w1h1(x) + w2h2(x) + . . .+ wqhq(x),

where hk, k = 1, . . . , q is a basis function.

▶ How can we estimate the parameters w0, w1, . . . , wq byleast squares method?

minw0,w∈Rq

n∑i=1

(yi − (w0 +

d∑j=1

wjhj(x))

)2

Ichiro Takeuchi, Nagoya Institute of Technology 7/1

Page 8: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Basis function approach as linear models

▶ Original training set

Xn×1

=

x1

x1...xn

,y =

y1y2...yn

▶ Expanded training set

Xn×1

=

1 h1(x1) h2(x1) · · · hq(x1)1 h1(x2) h2(x2) · · · hq(x2)...

......

. . ....

1 h1(xn) h2(xn) · · · hq(xn)

,y =

y1y2...yn

Ichiro Takeuchi, Nagoya Institute of Technology 8/1

Page 9: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Basis function approach and linear model

▶ Basis function approach

y = f(x) = w0 · 1 + w1h1(x) + w2h2(x) + . . .+ wqhq(x)

▶ Linear regression with multiple inputs

y = f(x) = w0 · 1 + w1x1 + w2x2 + . . .+ wqxq

Ichiro Takeuchi, Nagoya Institute of Technology 9/1

Page 10: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Which basis functions should we use?

▶ Radial basis function

hk(x) = exp

(−(x− ck)

2

2σ2

)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Bas

is fu

nctio

n va

lues

hq(

x)

Input x

Ichiro Takeuchi, Nagoya Institute of Technology 10/1

Page 11: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

How to determine q, {ck}qk=1, σ2 in RBF

▶ Approach 1▶ q ← n▶ ck ← xi, k = 1, . . . , q▶ s ← cross validation (explained later)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Bas

is fu

nctio

n va

lues

hq(

x)

Input x

Ichiro Takeuchi, Nagoya Institute of Technology 11/1

Page 12: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

How to determine q, {ck}qk=1, σ2 in RBF

▶ Approach 2▶ q ← cross validation

▶ ck ←(kn

)thquantile of {xi}ni=1

▶ s ← cross validation

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Bas

is fu

nctio

n va

lues

hq(

x)

Input x

Ichiro Takeuchi, Nagoya Institute of Technology 12/1

Page 13: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

RBF Approach for Collision Data

▶ If we select good hyper-parameters (q, {ck}qk=1, s)

-150

-100

-50

0

50

100

0 10 20 30 40 50 60

Acc

eler

atio

n[G

]

Time[ms]

Ichiro Takeuchi, Nagoya Institute of Technology 13/1

Page 14: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Overfitting

▶ If we do not select good hyper-parameters (q, {ck}qk=1, s)

Ichiro Takeuchi, Nagoya Institute of Technology 14/1

Page 15: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Simulation Example for RBF

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

q = 1 q = 10

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

q = 20 q = 50

Ichiro Takeuchi, Nagoya Institute of Technology 15/1

Page 16: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Training Error and True Error

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 4 5 10 20 40 50

Err

or

# of basis "q"

Training ErrorTrue Error

Ichiro Takeuchi, Nagoya Institute of Technology 16/1

Page 17: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

High dimensional problem

E.g. Gene expression microarray

▶ xij: activity of jth gene for ith patient

▶ yi: Effectiveness of a medicine

yi = f(xi) = w0 + w1xi1 + . . .+ w10000xi,10000

Ichiro Takeuchi, Nagoya Institute of Technology 17/1

Page 18: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

How to avoid overfitting: Regularization

minw∈Rd

n∑i=1

(yi −w⊤xi

)subject to

d∑j=1

w2j ≤ s

Ichiro Takeuchi, Nagoya Institute of Technology 18/1

Page 19: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Ridge regression

w∗λ = arg min

w∈Rd

n∑i=1

(yi −w⊤xi)2 + λ

d∑j=1

w2j ,

where λ > 0 is the regularization parameter.

Ichiro Takeuchi, Nagoya Institute of Technology 19/1

Page 20: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Simulation Example for Ridge Regression

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

λ = 0 (q = 50) λ = 1.0 (q = 50)

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

TruthEstimated

λ = 10 (q = 50) λ = 100 (q = 50)

Ichiro Takeuchi, Nagoya Institute of Technology 20/1

Page 21: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Solving Ridge regression

▶ Training data

Xn×d

:=

x11 x12 · · · x1d

x21 x22 · · · x2d...

.... . .

...xn1 xn2 · · · xnd

=

x1

x2...xn

, yn×1

:=

y1y2...yn

▶ Solution

w∗λ = (X⊤X + λI)−1X⊤y

Ichiro Takeuchi, Nagoya Institute of Technology 21/1

Page 22: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Model selection

▶ Example: how to select the regularization parameter λ

▶ Training error cannot be used for model selection becauseit cannot detect over-training (as we will see).

Ichiro Takeuchi, Nagoya Institute of Technology 22/1

Page 23: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Training and validation data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training dataValidation data

•: Training data, •: Validation data

Ichiro Takeuchi, Nagoya Institute of Technology 23/1

Page 24: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Training and validation data

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 4 5 10 20 40 50

Err

or

# of basis "q"

Training ErrorTrue Error

Validation Error

▶ Training error monotonically decreases

▶ Validation error can be used as a proxy of the true error

Ichiro Takeuchi, Nagoya Institute of Technology 24/1

Page 25: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Cross-validation

Training data Validation data

R1

R2

R3

R4

R5

▶ The model hyper-parameters (q, λ etc.) are selectedbased on the average validation error.

Ichiro Takeuchi, Nagoya Institute of Technology 25/1

Page 26: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Cross-validation example

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training DataValidation Data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training DataValidation Data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training DataValidation Data

Round 1 Round 2 Round 3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training DataValidation Data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-1 -0.5 0 0.5 1

Out

put y

Input x

Training DataValidation Data

Round 4 Round 5

Ichiro Takeuchi, Nagoya Institute of Technology 26/1

Page 27: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Leave-one-out cross-validation (LOOCV)

R1

R2

R n

R n-1

Training data Validation data

Ichiro Takeuchi, Nagoya Institute of Technology 27/1

Page 28: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Final exercise IGiven the data {(xi, yi)}ni=1, consider a constant model thatdoes not use the input x (not useful in practice)

f(x) = w0,

The parameter w0 is estimated by solving the followingminimization problem:

arg minw0∈R

n∑i=1

(yi − f(xi))2 = arg min

w0∈R

n∑i=1

(yi − w0)2

▶ First, show that the solution of the optimal solution ofthe above problem is the sample mean, i.e.,

arg minw0∈R

n∑i=1

(yi − w0)2 =

1

n

n∑i=1

yi

Ichiro Takeuchi, Nagoya Institute of Technology 28/1

Page 29: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Final exercise II

▶ Next, confirm that the training error and the LOOCVerror of the constant model are respectively written as

TrainEr :=n∑

i=1

(yi − arg min

w0∈R

n∑j=1

(yj − w0)2

)2

=n∑

i=1

(yi − y)2,

LoocvEr :=n∑

i=1

(yi − arg min

w0∈R

∑j =i

(yj − w0)2

)2

=n∑

i=1

(yi −

1

n− 1

∑j =i

yj

)2

.

Ichiro Takeuchi, Nagoya Institute of Technology 29/1

Page 30: Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced Lecture on Neural Information Processing Systems (Lecture 03) Ichiro Takeuchi Nagoya

Final exercise III

▶ Finally, show that the relation of these two errors arewritten as

LoocvEr :=

(n

n− 1

)2

TrainEr.

Ichiro Takeuchi, Nagoya Institute of Technology 30/1