Machine Learning: Exercise Sheet 5 -...

Machine Learning: Exercise Sheet 5

Manuel BlumAG Maschinelles Lernen und Naturlichsprachliche Systeme

Albert-Ludwigs-Universitat Freiburg

[email protected]

Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 5 (1)

Exercise 1: k-Nearest Neighbor Classifier

A medical expert is going to build up a case-based reasoning system fordiagnosis tasks. Cases correspond to individual persons where the case problemparts are made up of a number of features describing possible symptoms andthe solution parts represent the diagnosis (classification of disease). The casebase contains the seven cases provided in the table below.

Training Fever Vomiting Diarrhea Shivering Classificationc1 no no no no healty (H)c2 average no no no influenza (I)c3 high no no yes influenza (I)c4 high yes yes no salmonella poisoning (S)c5 average no yes no salmonella poisoning (S)c6 no yes yes no bowel inflammation (B)c7 average yes yes no bowel inflammation (B)

Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 5 — Exercise 1: k-Nearest Neighbor Classifier (2)


Moreover, the expert has specified a similarity measure reflecting his expertise,using local similarity measures and feature weights as specified in the figurebelow.

(a) Calculate the similarity between all cases from the case base and the queryq = (high, no, no, no).




Case Representation

I Here, each case ci = (p, s) consists of a problem part p and solution part s.

I An attribute-value-based case representation is used withp = (pF , pV , pD , pSh) and aF ∈ {no, average, high} andaV , aD , aSh ∈ {yes, no}. For the solution part it holds s ∈ {H, I ,S ,B}(classification with 4 classes).

I Thus, q = (qF , qV , qD , qSh) = (high, no, no, no).




Similarity Assessment

I Similarity between q and all c ∈ CB = {ci |i = 1, . . . , 7} must bedetermined.

I It holds for all c ∈ CB

Sim(q, c) =

∑a∈{F ,V ,D,Sh} wa · sima(qa, c.pa)∑

a∈{F ,V ,D,Sh} wa




Similarity Assessment

I Note that here the weights have already been normalized (sum up to 1)which is why the expression for the global weighted similarity Sim(q, c)becomes

Sim(q, c) =∑

a∈{F ,V ,D,Sh}

wa · sima(qa, c.pa)




I for c1 = ((no, no, no, no),H):Sim(q, c1) = 0.3 · 0.0 + 0.2 · 1.0 + 0.2 · 1.0 + 0.3 · 1.0 = 0.70

I for c2 = ((average, no, no, no), I ):Sim(q, c2) = 0.3 · 0.3 + 0.2 · 1.0 + 0.2 · 1.0 + 0.3 · 1.0 = 0.79

I for c3 = ((high, no, no, yes), I ):Sim(q, c3) = 0.3 · 1.0 + 0.2 · 1.0 + 0.2 · 1.0 + 0.3 · 0.2 = 0.76

I for c4 = ((high, yes, yes, no),S):Sim(q, c4) = 0.3 · 1.0 + 0.2 · 0.2 + 0.2 · 0.2 + 0.3 · 1.0 = 0.68

I for c5 = ((average, no, yes, no),S):Sim(q, c5) = 0.3 · 0.3 + 0.2 · 1.0 + 0.2 · 0.2 + 0.3 · 1.0 = 0.63

I for c6 = ((no, yes, yes, no),B):Sim(q, c6) = 0.3 · 0.0 + 0.2 · 0.2 + 0.2 · 0.2 + 0.3 · 1.0 = 0.28

I for c7 = ((average, yes, yes, no),B):Sim(q, c7) = 0.3 · 0.3 + 0.2 · 0.2 + 0.2 · 0.2 + 0.3 · 1.0 = 0.47



(b) Calculate the similarity between all cases from the case base and the query(?, yes, no, yes) where the question mark indicates that the value of thesymptom fever has not been determined, yet.

I The calculation of Sim(q, c) now disregards the unknown attribute Fever ,giving rise to

Sim(q, c) =

∑a∈{V ,D,Sh} wa · sima(qa, c.pa)∑

a∈{V ,D,Sh} wa



We obtain for q = (?, yes, no, yes) and

I for c1 = ((no, no, no, no),H):Sim(q, c1) = 2

7· 0.0 + 2

7· 1.0 + 3

7· 0.0 = 2

7≈ 0.29

I for c2 = ((average, no, no, no), I ):Sim(q, c2) = 2

7· 0.0 + 2

7· 1.0 + 3

7· 0.0 = 2

7≈ 0.29

I for c3 = ((high, no, no, yes), I ):Sim(q, c3) = 2

7· 0.0 + 2

7· 1.0 + 3

7· 1.0 = 5

7≈ 0.71

I for c4 = ((high, yes, yes, no),S):Sim(q, c4) = 2

7· 1.0 + 2

7· 0.2 + 3

7· 0.0 ≈ 0.34

I for c5 = ((average, no, yes, no),S):Sim(q, c5) = 2

7· 0.0 + 2

7· 0.2 + 3

7· 0.0 ≈ 0.06

I for c6 = ((no, yes, yes, no),B):Sim(q, c6) = 2

7· 1.0 + 2

7· 0.2 + 3

7· 0.0 ≈ 0.34

I for c7 = ((average, yes, yes, no),B):Sim(q, c7) = 2

7· 1.0 + 2

7· 0.2 + 3

7· 0.0 ≈ 0.34



(c) Determine the nearest neighbors in (a) and (b) as well as the diagnosis thesystem outputs, when using the k-NN methods with k = 1.

I In task (a) the nearest neighbor is case c2 with Sim(q, c2) = 0.79.

I The corresponding diagnosis of the system is to state that the queryperson suffers from influenza.

I In task (b), where the value of the fever attribute was not known, case c3

achieves maximal similarity (Sim(q, c3) ≈ 0.71).

I Here, the system’s diagnosis is influenza as well.

I Of course, the absence of one (or more) attribute values reduces thereliability of the system’s predictions.


Exercise 2: Gradient Descent

(a) Execute four iterations of gradient descent with momentum to find the

minimum of the function f (u) = u3

3+ 50u2− 100u− 30. Start with u = 20,

use a learning rate that is set to ε = 0.01, and parameter µ set to 0.1.

I We know that ∂f (u)∂u

= u2 + 100u − 100

1: chose an initial point ~u2: set ~∆← ~0 (step width)3: while ||gradf (~u)|| not close to 0 do

4: ~∆← −ε · gradf (~u) + µ~∆

5: ~u ← ~u + ~∆6: end while7: return ~u



The function ...

-6e+06

-4e+06

-2e+06

0

2e+06

4e+06

6e+06

8e+06

1e+07

1.2e+07

1.4e+07

-300 -200 -100 0 100 200 300

(x**3)*(1.0/3.0)+50*(x**2)-100*x-30



The function ... and on [−20, 30]

-10000

0

10000

20000

30000

40000

50000

60000

-20 -10 0 10 20 30

(x**3)*(1.0/3.0)+50*(x**2)-100*x-30



I extrema at u1 = 0.99 (local minimum) and u2 = −100.99 (localmaximum)

-6e+06

-4e+06

-2e+06

0

2e+06

4e+06

6e+06

8e+06

1e+07

1.2e+07

1.4e+07

-300 -200 -100 0 100 200 300

(x**3)*(1.0/3.0)+50*(x**2)-100*x-30



I Trace

1. u = 20 uvan = 202. ~∆ = 03. ||gradf (~u)|| = ||202 + 100 · 20− 100|| = 2300

4. ~∆← −ε · gradf (~u) + µ~∆

= −0.01 · 2300 + 0.1 · 0 = −23 ~∆van = −235. ~u ← ~u + ~∆ = 20 + (−23) = −3 uvan = −3

3. ||gradf (~u)|| = ||(−3)2 + 100 · (−3)− 100|| = −391

4. ~∆← −ε · gradf (~u) + µ~∆

= −0.01 · (−391) + 0.1 · (−23) = 1.61 ~∆van = 3.91

5. ~u ← ~u + ~∆ = −3 + 1.61 = −1.39 uvan = −3 + 3.91 = 0.91



I Trace continued

3. ||gradf (~u)|| = ||(−1.39)2 + 100 · (−1.39)− 100|| = || − 237.1||||gradvanf (~u)|| = gradvanf ( ~0.91)|| = || − 8.0||

4. ~∆← −ε · gradf (~u) + µ~∆

= −0.01 · (−237.1) + 0.1 · 1.61 = 2.53 ~∆van = 0.08

5. ~u ← ~u + ~∆ = −1.39 + 2.53 = 1.14 uvan = 0.99

3. ||gradf (~u)|| = ||1.142 + 100 · 1.14− 100|| = ||15.3||||gradvanf (~u)|| = gradvanf ( ~0.99)|| = ||0|| → end of while loop

4. ~∆← −ε · gradf (~u) + µ~∆= −0.01 · 15.3 + 0.1 · 2.53 = 0.1

5. ~u ← ~u + ~∆ = 1.14 + 0.1 = 1.24


Exercise 3: Multi-Layer Perceptrons

Examine the multi-layer perceptron given in the figure below with the weightsin the accompanying table.

(a) Both neurons use the logistic activation function (u 7→ 11+e−u ). The

network has a single input variable x and one output variable y . Calculatethe output of both neurons and the error made by the MLP when applyinga pattern with x = 0 and target value 0.5.



I input: x = 0, target: d = 0.5

I net input of neuron 2: net2 = w2,1 · x + w2,0 = −1 · 0 + 1 = 1

I activation of neuron 2: a2 = fsig (net2) ≈ 0.73

I net input of neuron 3:net3 = w3,1 · x + w3,2 · a2 + w3,0 ≈ 2 · 0 + 1 · 0.73− 2 = −1.27

I output, i.e. activation of neuron 3: y = a3 = fsig (net3) ≈ 0.22



Examine the multi-layer perceptron given in the figure below with the weightsin the accompanying table.

(b) Calculate the partial derivatives of the error with respect to the weights forthe pattern used in task (a).



I ∂e∂y

= ∂e∂a3

=∂ 1

2(a3−d)2

∂y= a3 − d ≈ 0.22− 0.5 = −0.28

I ∂e∂net3

= ∂e∂a3· ∂a3∂net3

= ∂e∂a3·a3(1−a3) ≈ −0.28 · (−0.22)(1−0.22)) = −0.048

I ∂e∂a2

= ∂e∂net3

· ∂net3∂a2

= ∂e∂net3

· ∂(a2+2x−2)∂a2

≈ −0.048 · 1 = −0.048

I ∂e∂net2

= ∂e∂a2· ∂a2∂net2

= ∂e∂a2· a2(1− a2) ≈ 0.807 · 0.73(1− 0.73) = −0.0094



I ∂e∂w3,0

= ∂e∂net3

· ∂net3∂w3,0

= ∂e∂net3

· ∂(w3,0+w3,1·a2+w3,2·x)

∂w3,0= ∂e

∂net3· 1 ≈ −0.048 · 1 =

−0.048

I ∂e∂w3,1

= ∂e∂net3

· ∂net3∂w3,1

= ∂e∂net3

· ∂(w3,0+w3,1·a2+w3,2·x)

∂w3,1= ∂e

∂net3· a2 ≈

−0.048 · 0.73 = −0.0035

I ∂e∂w3,2

= ∂e∂net3· ∂net3∂w3,2

= ∂e∂net3· ∂(w3,0+w3,1·a2+w3,2·x)

∂w3,2= ∂e

∂net3·x ≈ −0.048 ·0 = 0

I ∂e∂w2,0

= ∂e∂net2· ∂net2∂w2,0

= ∂e∂net2· ∂(w2,0+w2,1·x)

∂w2,0= ∂e

∂net2·1 ≈ −0.0094·1 = −0.0094

I ∂e∂w2,1

= ∂e∂net2· ∂net2∂w2,1

= ∂e∂net2· ∂(w2,0+w2,1·x)

∂w2,1= ∂e

∂net2·x ≈ −0.0094·0 = −0.0094



Now, consider the network structure of a multi-layer perceptron with 10neurons given in the figure below. Each circle denotes a neuron, the arrowsdenote connections between neurons.

(c) Which of the neurons are input neurons, which ones are output neurons?

(d) How many layers does this MLP have? Which neurons belong to whichlayer?

A

B

C

D

E

F

G

H

I

J



Now, consider the network structure of a multi-layer perceptron with 10neurons given in the figure below. Each circle denotes a neuron, the arrowsdenote connections between neurons.

(c) Which of the neurons are input neurons, which ones are output neurons?

(d) How many layers does this MLP have? Which neurons belong to whichlayer?

A

B

C

D

E

F

G

H

I

J

input hidden hidden hidden output



(e) Assume we are applying a pattern to the MLP. Give an order in which theneuron activations ai can be calculated.

A

B

C

D

E

F

G

H

I

J

input hidden hidden hidden output

Possible Order: I,F,H,B,A,D,C,J,G,E

I determine input values (I, F)

I calculate activations in first hidden layer (H, B, A)

I calculate activations in second hidden layer (D, C)

I calculate activations in third hidden layer (J, G)

I calculate activations in output layer (E)



(f) Which of the functions given by the plots below can be implemented bymulti-layer perceptrons? The MLP should only contain neurons withlogistic activation functions. (Note: The weights of the networks must befinite numbers.)



I TL: yes (see next slide)

I TR: no (function must be differentiable)

I BL: no (since logistic function yields values between 0 and 1)

I BR: yes (a net with three hidden neurons and one output neuron can)



I TL: yes, the network below can

x y2

1

1

1

−1

5

2

2

0.5 −0.5



x y2

1

1

1

−1

5

2

2

0.5 −0.5

I neuron 1 (hidden layer)

0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10


0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10




0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10


0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10

Entire Net:

0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10


Machine Learning: Exercise Sheet 5 -...

Documents

Transcript of Machine Learning: Exercise Sheet 5 -...