INTRODUCTION TO Machine Learning · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT...
Transcript of INTRODUCTION TO Machine Learning · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT...
-
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN© The MIT Press, 2004
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml
Lecture Slides for
-
CHAPTER 12:
Local Models
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
3
Introduction
Divide the input space into local regions and learn simple (constant/linear) models in each patch
Unsupervised: Competitive, online clusteringSupervised: Radial-basis func, mixture of experts
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
4
Competitive Learning
{ }( )
( )ijtjtiij
t
ij
t
ti
t
tti
i
lt
li
tti
t i itt
ik
ii
mxbmE
m
k
b
bk
b
bE
−=∂∂
−=∆
=
⎩⎨⎧ −=−
=
−=
∑∑
∑ ∑=
ηη
:means- Batch
:means- Batch
otherwise0
minif 1
1
xm
mxmx
mxm X
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
5
Winner-take-allnetwork
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6
Adaptive Resonance Theory
Incremental; add a new cluster if not covered; defined by vigilance, ρ
(Carpenter and Grossberg, 1988)
( )⎩⎨⎧
−η=ρ>←
−−=−=
+
=
otherwise
if
min
1
1
it
i
it
k
lt
k
litt
i
b
b
mxm
xm
mxmx
∆
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
7
Self-Organizing Maps
Units have a neighborhood defined; mi is “between”mi-1 and mi+1, and are all updated together
One-dim map:
( )( )( ) ( ) ⎥
⎦
⎤⎢⎣
⎡
σ−
−πσ
=
−η=
2
2
2exp
2
1 ili,le
i,le lt
l mxm∆
(Kohonen, 1990)
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
8
Radial-Basis Functions
Locally-tuned units:
01
wpwyH
h
thh
t ∑=
+=
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡ −−=
2
2
2exp
h
ht
th s
pmx
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9
Local vs Distributed Representation
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10
Training RBF
Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights: Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken, 1989)
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
11
Regression
{ }( ) ( )
( )
( ) ( )
( )3
2
2
01
2
21
|
h
ht
th
t iih
ti
tih
h
hjtjt
ht i
ihti
tihj
t
th
ti
tiih
i
H
h
thih
ti
t i
ti
tih,iihhh
spwyrs
s
mxpwyrm
pyrw
wpwy
yrw,s,E
mx
m
−⎥⎦
⎤⎢⎣
⎡−η=
−⎥⎦
⎤⎢⎣
⎡−η=
−η=
+=
−=
∑ ∑
∑ ∑
∑
∑
∑∑
=
∆
∆
∆
X
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
12
Classification
{ }( )[ ][ ]∑ ∑∑
∑∑
+
+=
−=
k h kthkh
h ithiht
i
t i
ti
tih,iihhh
wpw
wpwy
yrw,s,E
0
0
exp
exp
log | Xm
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
13
01
vpwy tTH
h
thh
t ++= ∑=
xv
Rules and Exceptions
Default ruleExceptions
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14
Rule-Based Knowledge
Incorporation of prior knowledge (before training)Rule extraction (after training) (Tresp et al., 1997)Fuzzy membership functions and fuzzy rules
( ) ( )( ) ( )( ) ( )
( )10 with
2exp
10 with2
exp2
exp
10 THEN OR AND IF
223
23
2
122
22
21
21
1
321
.ws
cxp
.ws
bxs
axp
.ycxbxax
=⎥⎥⎦
⎤
⎢⎢⎣
⎡ −−=
=⎥⎦
⎤⎢⎣
⎡ −−⋅⎥
⎦
⎤⎢⎣
⎡ −−=
=≈≈≈
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
15
Normalized Basis Functions
( )
( )( ) ( )2
1
22
22
1
2exp
2exp
h
hjtj
t i
th
tiih
ti
tihj
th
t
ti
tiih
H
h
thih
ti
l llt
hht
H
l
tl
tht
h
s
mxgywyrm
gyrw
gwy
s/
s/
p
pg
−−−η=
−η=
=
⎥⎦⎤
⎢⎣⎡ −−
⎥⎦⎤
⎢⎣⎡ −−
=
=
∑∑
∑
∑
∑
∑
=
=
∆
∆
mx
mx
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16
Competitive Basis Functions
Mixture model: ( ) ( ) ( )∑=
=H
h
ttttt ,hphpp1
||| xrxxr
( ) ( ) ( )( ) ( )
∑
∑
⎥⎦⎤
⎢⎣⎡ −−
⎥⎦⎤
⎢⎣⎡ −−
=
=
l llt
l
hht
hth
l
t
tt
s/a
s/ag
lplphphp
hp
22
22
2exp
2exp
||
|
mx
mx
xx
x
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
17
Regression ( )( )
⎥⎦
⎤⎢⎣
⎡σ−
−πσ
=∏ 22exp21
|ti
ti
i
tt yrp xr
{ }( ) ( )
( ) ( )( )
( ) ( )[ ]( ) ( )[ ]
( ) ( ) ( )( ) ( )∑
∑ ∑∑
∑∑
∑ ∑ ∑
=
−−
−−=
−−=∆−=∆
=
⎥⎦
⎤⎢⎣
⎡−−=
l
l i
til
ti
tl
i
tih
ti
tht
h
t h
hjtjt
ht
hhjt
th
tih
tiih
ihtih
t h i
tih
ti
thh,iihhh
,lplp,hphp
,hp
yr/g
yr/gf
s
mxgfmfyrw
wy
yrgw,s,
xrxxrx
xr
m
||||
|
21exp
21exp
fit constant the is
21
explog|
2
2
2
2
ηη
XL
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
18
Classification
{ }( ) ( )
[ ][ ]∑ ∑∑
∑
∑ ∑ ∑
∑ ∑ ∏
=
=
⎥⎦
⎤⎢⎣
⎡=
=
l i
til
ti
tl
i
tih
ti
tht
h
k kh
ihtih
t h i
tih
ti
th
t h i
rtih
thh,iihhh
yrg
yrgf
ww
y
yrg
ygw,s,ti
log exp
log exp
exp exp
log exp log
log| XL m
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
19
EM for RBF (Supervised EM)
E-step:
M-step:
( )tth ,hpf xr |≡
( )( )
∑∑
∑∑∑∑
=
−−=
=
t
th
t
ti
th
ih
t
th
T
ht
t htt
hh
t
th
t
tth
h
f
rfw
f
fs
f
f
mxmx
xm
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
20
Learning Vector Quantization
H units per class prelabeled (Kohonen, 1990)
Given x, mi is the closest:
x
mi mj
( )( )⎩⎨
⎧
−η−=−η=
otherwise
label class same the have and if
it
i
it
it
i
mxm
mxmxm
∆∆
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
21
Mixture of Experts
In RBF, each local fit is a constant, wih, second layer weight
In MoE, each local fit is a linear function of x, a local expert:
(Jacobs et al., 1991)
ttih
tihw xv=
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
22
MoE as Models Combined
Radial gating:
Softmax gating:
∑ ⎥⎦⎤
⎢⎣⎡ −−
⎥⎦⎤
⎢⎣⎡ −−
=
l llt
hht
th
s/
s/g
22
22
2exp
2exp
mx
mx
[ ][ ]∑= l tTl
tTht
hg xmxm
expexp
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
23
Cooperative MoE
Regression
{ }( ) ( )( )( )( ) tjth
t
ti
tih
tih
tihj
t
tth
tih
tiih
t i
ti
tih,iihhh
xgywyrm
gyr
yrw,s,E
∑
∑
∑∑
−−η=
−η=
−=
∆
∆ xv
m2
21
| X
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
24
Competitive MoE: Regression
{ }( ) ( )
( )( ) t
t
th
thh
tth
t
tih
tiih
tihih
tih
t h i
tih
ti
thh,iihhh
gf
fyr
wy
yrgw,s,
xm
xv
xv
m
∑
∑
∑ ∑ ∑
−η=
−η=
==
⎥⎦
⎤⎢⎣
⎡−−=
∆
∆
21
exp log|2
XL
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
25
Competitive MoE: Classification
{ }( ) ( )
∑∑
∑ ∑ ∑
∑ ∑ ∏
==
⎥⎦
⎤⎢⎣
⎡=
=
k
tkh
tih
k kh
ihtih
t h i
tih
ti
th
t h i
rtih
thh,iihhh
ww
y
yrg
ygw,s,ti
xvxv
m
exp exp
exp exp
log exp log
log| XL
-
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
26
Hierarchical Mixture of Experts
Tree of MoE where each MoE is an expert in a higher-level MoE
Soft decision tree: Takes a weighted (gating) average of all leaves (experts), as opposed to using a single path and a single leaf
Can be trained using EM (Jordan and Jacobs, 1994)