Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are...
Transcript of Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are...
![Page 1: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/1.jpg)
Internet EngineeringJacek Mazurkiewicz, PhD
Softcomputing
Part 13: Radial Basis Function Networks
![Page 2: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/2.jpg)
Contents• Overview• The Models of Function Approximator• The Radial Basis Function Networks• RBFN’s for Function Approximation• The Projection Matrix• Learning the Kernels• Bias-Variance Dilemma• The Effective Number of Parameters• Model Selection
![Page 3: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/3.jpg)
RBF• Linear models have been studied in
statistics for about 200 years and the theory is applicable to RBF networks which are just one particular type of linear model.
• However, the fashion for neural networks which started in the mid-80 has given rise to new names for concepts already familiar to statisticians
![Page 4: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/4.jpg)
Typical Applications of NN
• Pattern Classification
• Function Approximation
• Time-Series Forecasting
( )l f= xl C N
( )f=y x mY R y
mX R x
nX R x
1 2 3( ) ( , , , )t t tft − − −=x x x x
![Page 5: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/5.jpg)
Function Approximation
mX R fnY R
:f X Y→Unknown
mX RnY R
ˆ :f X Y→Approximator
f
![Page 6: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/6.jpg)
Linear Models
1
(( ) )i
i
i
m
f w=
=x xFixed BasisFunctions
Weights
![Page 7: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/7.jpg)
Linear Models
Feature VectorsInputs
HiddenUnits
OutputUnits
• Decomposition• Feature Extraction• Transformation
Linearly weighted output
1
(( ) )i
i
i
m
f w=
=x xy
1 2 m
x1 x2 xn
w1 w2 wm
x =
![Page 8: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/8.jpg)
Example Linear Models
• Polynomial
• Fourier Series
( ) i
i
iwf xx =
( )0exp 2( )k
k jf w k xx =
( ) , 0,1,2,i
i ix x = =
( )0( ) exp 2 0,1,, 2, k x j k x k = =
1
(( ) )i
i
i
m
f w=
=x x
![Page 9: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/9.jpg)
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
Single-Layer Perceptrons as Universal Aproximators
HiddenUnits
With sufficient number of sigmoidal units, it can be a universal approximator.
1
(( ) )i
i
i
m
f w=
=x x
![Page 10: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/10.jpg)
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
Radial Basis Function Networks as Universal Aproximators
HiddenUnits
With sufficient number of radial-basis-function units, it can also be a universal approximator.
1
(( ) )i
i
i
m
f w=
=x x
![Page 11: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/11.jpg)
Non-Linear Models
1
(( ) )i
i
i
m
f w=
=x xAdjusted by theLearning process
Weights
![Page 12: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/12.jpg)
Radial Basis Functions
• Center
• Distance Measure
• Shape
Three parameters for a radial function:
xi
r = ||x − xi||
i(x)= (||x − xi||)
![Page 13: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/13.jpg)
Typical Radial Functions
• Gaussian
• Hardy-Multiquadratic (1971)
• Inverse Multiquadratic
( )2
22 0 and r
r e r −
=
( ) 2 2 0 and r r c c c r = +
( ) 2 2 0 and r c r c c r = +
![Page 14: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/14.jpg)
Gaussian Basis Function (=0.5,1.0,1.5)
( )2
22 0 and r
r e r −
=
0.5 =1.0 =
1.5 =
![Page 15: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/15.jpg)
Inverse Multiquadratic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-10 -5 0 5 10
( ) 2 2 0 and r c r c c r = +
c=1
c=2c=3
c=4c=5
![Page 16: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/16.jpg)
Most General RBF
( ) ( )1( ) ( )i i
T
i −= − −μ x μx x
1μ2μ 3μ
Basis {i: i =1,2,…} is `near’ orthogonal.
![Page 17: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/17.jpg)
Properties of RBF’s
• On-Center, Off Surround
• Analogies with localized receptive fieldsfound in several biological structures, e.g.,– visual cortex;
– ganglion cells
![Page 18: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/18.jpg)
The Topology of RBF
Feature Vectorsx1 x2 xn
y1 ym
Inputs
HiddenUnits
OutputUnits
Projection
Interpolation
As a function approximator
![Page 19: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/19.jpg)
The Topology of RBF
Feature Vectorsx1 x2 xn
y1 ym
Inputs
HiddenUnits
OutputUnits
Subclasses
Classes
As a pattern classifier.
![Page 20: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/20.jpg)
• Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm.
• The activation function is selected from a class of functions called basis functions.
• They usually train much faster than BP.
• They are less susceptible to problems with non-stationary inputs
![Page 21: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/21.jpg)
• Popularized by Broomhead and Lowe (1988), and Moody and Darken (1989), RBF networks have proven to be a useful neural network architecture.
• The major difference between RBF and BP is the behavior of the single hidden layer.
• Rather than using the sigmoidal or S-shaped activation function as in BP, the hidden units in RBF networks use a Gaussian or some other basis kernel function.
![Page 22: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/22.jpg)
The idea
x
y Unknown Function to Approximate
TrainingData
![Page 23: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/23.jpg)
The idea
x
y
Basis Functions (Kernels)
Function Learned
1
)) ((m
i
iiwy f =
= = xx
![Page 24: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/24.jpg)
The idea
x
y
Basis Functions (Kernels)
Function Learned
NontrainingSample
1
)) ((m
i
iiwy f =
= = xx
![Page 25: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/25.jpg)
The idea
x
yFunction Learned
NontrainingSample
1
)) ((m
i
iiwy f =
= = xx
![Page 26: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/26.jpg)
Radial Basis Function Networks as Universal Aproximators
1
( ) ( )m
i
i
iy wf =
= =x x
x1 x2 xn
w1 w2 wm
x =
Training set ( ) ( )
1
( ),p
k
k
ky=
= xT
Goal ( )(( ))k kfy x for all k
( )2
( )
1
( )min kp
k
k
SSE fy=
= − x
( )2
)(
1
) (
1
p mk
i
k
k
i
i
y w= =
= −
x
![Page 27: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/27.jpg)
Learn the Optimal Weight Vector
w1 w2 wm
x1 x2 xnx =
Training set ( ) ( )
1
( ),p
k
k
ky=
= xT
Goal ( )(( ))k kfy x for all k
1
( ) ( )m
i
i
iy wf =
= =x x
( )2
( )
1
( )min kp
k
k
SSE fy=
= − x
( )2
)(
1
) (
1
p mk
i
k
k
i
i
y w= =
= −
x
![Page 28: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/28.jpg)
( )2
( )
1
( )min kp
k
k
SSE fy=
= − x
( )2
)(
1
) (
1
p mk
i
k
k
i
i
y w= =
= −
x
Regularization
Training set ( ) ( )
1
( ),p
k
k
ky=
= xT
Goal ( )(( ))k kfy x for all k
2
1
m
i
iiw=
+
2
1
m
i
iiw=
+
0i
If regularization is unneeded, set 0i =
C
![Page 29: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/29.jpg)
Learn the Optimal Weight Vector
( )2
2( )
1
( )
1
p mk
k
i
i
i
k wfyC = =
= − + xMinimize
( )( )
(
( )
) ( )
1
2 2kp
j
k
j
k
k
j j
wfC
fw w
y =
= − − +
x
x
( ) ( )( ) ( ) ( )
1
2 2p
k k
j
k
jj
k f wy =
= − − + x x
0 =
( ) ( ) ( )( ) * ( ) ( )
1 1
)* (p p
k k k
j j
k k
jj
kwf y = =
+ = x x x
![Page 30: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/30.jpg)
Learn the Optimal Weight Vector
( ) ( )( )(1) ( ), ,T
p
j j j =φ x x
( ) ( )( )* * (1) * ( ), ,T
pf f=f x xDefine
( )(1) ( ), ,T
py y=y
* *
j
T T
j jjw+ =φ f φ y 1, ,j m=
( ) ( ) ( )( ) * ( ) ( )
1 1
)* (p p
k k k
j j
k k
jj
kwf y = =
+ = x x x
![Page 31: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/31.jpg)
Learn the Optimal Weight Vector*
1 1
*
2 2
*
1
*
*
1
2 2
*
T T
T T
T T
m mmm
w
w
w
+ =
+ =
+ =
φ f φ y
φ f φ y
φ f φ y
( )1 2, , , m=Φ φ φ φ
1
2
m
=
Λ
( )* * *
1 2* , , ,T
mw w w=w
Define
**T T+ =wΦ Λf Φ y
![Page 32: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/32.jpg)
Learn the Optimal Weight Vector
**T T+ =wΦ Λf Φ y
( )
( )
( )
* (1)
* (2)
*
* ( )p
f
f
f
=
x
xf
x
( )
( )
( )
(1)
1
(2)
1
(
*
*
1
*
)
k
k
m
k
k
m
k
k
mP
k
k
k
w
w
w
=
=
=
=
x
x
x
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
(1) (1) (1)*1 21
(2) (2) (2) *1 2 2
*( ) ( ) ( )
1 2
m
m
p p pm
m
w
w
w
=
x x x
x x x
x x x
*=Φw
* *T T+ =ΛwΦ wΦ Φ y
![Page 33: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/33.jpg)
Learn the Optimal Weight Vector
( ) *T T+ =wΦ ΛΦ Φ y
( )1
* T T−
= +Λw Φ Φ Φ y
Variance Matrix
1 T−= A Φ y
1 :−A
Design Matrix:Φ
![Page 34: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/34.jpg)
The Empirical-Error Vector1* T−=w A Φ y
*
1w *
2w*
mw
( )kx
( )
1
kx ( )
2
kx( )k
nx( )kx ( )
1
kx ( )
2
kx( )k
nx
UnknownFunction
y ** =f Φw
1φ 2φ mφ
![Page 35: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/35.jpg)
The Empirical-Error Vector1* T−=w A Φ y
*
1w *
2w*
mw
( )kx
( )
1
kx ( )
2
kx( )k
nx( )kx ( )
1
kx ( )
2
kx( )k
nx
UnknownFunction
y ** =f Φw
1φ 2φ mφ
Error Vector
*= −e y f*= −y Φw 1 T−= −ΦAy Φ y
( )1 T
p
−= − AI Φ Φ y = Py
![Page 36: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/36.jpg)
Sum-Squared-Error
=e Py1 T
p
−= −P I Φ ΦA
T= +A Φ Φ Λ
( )1 2, , , m=Φ φ φ φError Vector
( )T
= Py Py
2T= y P y
( )2
( ) * ( )
1
pk k
k
SSE y f=
= − x
If =0, the RBFN’s learning algorithm
is to minimize SSE (MSE).
![Page 37: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/37.jpg)
The Projection Matrix
=e Py1 T
p
−= −P I Φ ΦA
T= +A Φ Φ Λ
( )1 2, , , m=Φ φ φ φError Vector
2TSSE = y P y
1 2( , , , )mspan⊥e φ φ φ=Λ 0
( ) =P Py Pe = e = Py
T= y Py
![Page 38: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/38.jpg)
RBFN’s as Universal Approximators
x1 x2 xn
y1 yl
12 m
w11 w12w1m
wl1 wl2wlml
Training set
( ) ) ( )(
1,
pk
k
k
== yxT
2
2( ) exp
2
j
j
j
− = −
x μx
Kernels
![Page 39: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/39.jpg)
What to Learn?
• Weights wij’s• Centers j’s of j’s• Widths j’s of j’s
• Number of j’s −Model Selection
x1 x2 xn
y1 yl
12 m
w11 w12w1m
wl1 wl2wlml
![Page 40: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/40.jpg)
One-Stage Learning
( )( ) ( )( ) ( ) ( )
1
k k k
ij i i jw y f = − x x
( ) ( )( )( ) ( ) ( )
2 21
ljk k k
j j ij i i
ij
w y f =
− = −
x μμ x x
( ) ( )( )2
( ) ( ) ( )
3 31
ljk k k
j j ij i i
ij
w y f =
− = −
x μx x
![Page 41: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/41.jpg)
One-Stage Learning
The simultaneous updates of all three sets of parameters may be suitable for non-stationary environments or on-line setting.
( )( ) ( )( ) ( ) ( )
1
k k k
ij i i jw y f = − x x
( ) ( )( )( ) ( ) ( )
2 21
ljk k k
j j ij i i
ij
w y f =
− = −
x μμ x x
( ) ( )( )2
( ) ( ) ( )
3 31
ljk k k
j j ij i i
ij
w y f =
− = −
x μx x
![Page 42: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/42.jpg)
x1 x2 xn
y1 yl
12 m
w11 w12w1m
wl1 wl2wlml
Two-Stage Training
Determines• Centers j’s of j’s.• Widths j’s of j’s.• Number of j’s.
Step 1
Step 2 Determines wij’s.
E.g., using batch-learning.
![Page 43: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/43.jpg)
Train the Kernels
![Page 44: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/44.jpg)
Unsupervised Training
![Page 45: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/45.jpg)
Methods• Subset Selection
– Random Subset Selection– Forward Selection– Backward Elimination
• Clustering Algorithms– KMEANS– LVQ
• Mixture Models– GMM
![Page 46: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/46.jpg)
Subset Selection
![Page 47: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/47.jpg)
Random Subset Selection
• Randomly choosing a subset of points from training set
• Sensitive to the initially chosen points.
• Using some adaptive techniques to tune– Centers– Widths– #points
![Page 48: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/48.jpg)
Clustering Algorithms
![Page 49: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/49.jpg)
Clustering Algorithms
![Page 50: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/50.jpg)
Clustering Algorithms
![Page 51: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/51.jpg)
Clustering Algorithms
1
2
3
4
![Page 52: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/52.jpg)
Questions -How should the user choose the kernel?
• Problem similar to that of selecting features for other learning algorithms.➢Poor choice ---learning made very difficult.
➢Good choice ---even poor learners could succeed.
• The requirement from the user is thus critical.– can this requirement be lessened?
– is a more automatic selection of features possible?
![Page 53: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/53.jpg)
Goal Revisit
Minimize Prediction Error
• Ultimate Goal − Generalization
• Goal of Our Learning Procedure
Minimize Empirical Error
![Page 54: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/54.jpg)
Badness of Fit• Underfitting
– A model (e.g., network) that is not sufficiently complex can fail to detect fully the signal in a complicated data set, leading to underfitting.
– Produces excessive bias in the outputs.
• Overfitting– A model (e.g., network) that is too complex may
fit the noise, not just the signal, leading to overfitting.
– Produces excessive variance in the outputs.
![Page 55: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/55.jpg)
Underfitting/Overfitting Avoidance
• Model selection • Jittering • Early stopping • Weight decay
– Regularization– Ridge Regression
• Bayesian learning • Combining networks
![Page 56: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/56.jpg)
Best Way to Avoid Overfitting
• Use lots of training data, e.g.,– 30 times as many training cases as there
are weights in the network.
– for noise-free data, 5 times as many training cases as weights may be sufficient.
• Don’t arbitrarily reduce the number of weights for fear of underfitting.
![Page 57: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/57.jpg)
Badness of FitUnderfit Overfit
![Page 58: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/58.jpg)
Badness of FitUnderfit Overfit
![Page 59: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/59.jpg)
Bias-Variance Dilemma
Underfit Overfit
Large bias Small bias
Small variance Large variance
![Page 60: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/60.jpg)
Underfit Overfit
Large bias Small bias
Small variance Large variance
Bias-Variance Dilemma
More on overfitting
• Easily lead to predictionsthat are far beyond the range of the training data.
• Produce wild predictionsin multilayer perceptrons even with noise-free data.
![Page 61: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/61.jpg)
Bias-Variance Dilemma
fitmore
overfitmore
underfit
Bias
Variance
![Page 62: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/62.jpg)
Bias-Variance Dilemma
Sets of functions
E.g., depend on # hidden nodes used.
The true modelnoise
bias
Solution obtained with training set 2.
Solution obtained with training set 1.
Solution obtained with training set 3.
biasbias
![Page 63: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/63.jpg)
Bias-Variance Dilemma
Sets of functions
E.g., depend on # hidden nodes used.
The true modelnoise
bias
Variance
![Page 64: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/64.jpg)
Model Selection
Sets of functions
E.g., depend on # hidden nodes used.
The true modelnoise
bias
Variance
![Page 65: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/65.jpg)
Bias-Variance Dilemma
Sets of functions
E.g., depend on # hidden nodes used.
The true modelnoise
bias
( )g x( ) ( )y g = +x x
( )f x
![Page 66: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/66.jpg)
Bias-Variance Dilemma
( )2
( ) ( )E y f −
x x ( )2
( ) ( )E g f = + −
x x
( ) ( )2 2( ) ( ) 2 ( ) ( )E g f g f = − + − +
x x x x
( ) 2 2( ) ( ) 2 ( ) ( )E g f E g f E = − + − +
x x x x
Goal: ( )2
min ( ) ( )E g f −
x x( )2
min ( ) ( )E y f −
x x
0 constant
![Page 67: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/67.jpg)
Bias-Variance Dilemma
( )2
( ) ( )E g f −
x x ( )2
[ ( )] [( ) () )( ]E f E fE g f − −
+ =
x xx x
( )2
[ ]( )) (E fE g = −
x x ( )2
[ ]( )) (E fE f + −
x x
( )( )[ ( )] [ ( )) ( ]2 ( )E g fE f E f− − − x xx x
( ) ( ) ( )( )2 2
[ ( )] [ ( )]( ) ( ) [ ( )] [2 ( ) ( ) ( )]E f E f E f EE g g f ff = + − − −
− −
x x x xx x x x
( )( )( ) ( )[ ( )] [ ( )]E E f Eg f f− −x xxx
( ) ( )E g f= x x [ )]( ()EE fg− x x [ ( ()] )E E f f− x x [ ( )] [ ( )]E f E fE+ x x
( ) ( )E g E f= x x [ )]( () EE fg− x x [ ( )] ( )EE f f− x x [ ( )] [ ( )]E f E f+ x x 0=
0
![Page 68: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/68.jpg)
Bias-Variance Dilemma
( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −
x x x x
( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −
−
x x xx
noise bias2 variance
Cannot be minimized
Minimize both bias2 and variance
![Page 69: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/69.jpg)
Model Complexity vs. Bias-Variance
( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −
x x x x
( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −
−
x x xx
noise bias2 variance
ModelComplexity(Capacity)
![Page 70: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/70.jpg)
Bias-Variance Dilemma
( ) ( )2 22( ) ( ) ( ) ( )E y f E E g f − = + −
x x x x
( ) ( )2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f = + + −
−
x x xx
noise bias2 variance
ModelComplexity(Capacity)
![Page 71: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/71.jpg)
Example (Polynomial Fits)22exp( 16 )y x x= + −
![Page 72: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/72.jpg)
Example (Polynomial Fits)
![Page 73: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/73.jpg)
Example (Polynomial Fits)
Degree 1 Degree 5
Degree 10 Degree 15
![Page 74: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/74.jpg)
Variance Estimation
Mean
Variance ( )22
1
1ˆ
p
i
i
xp
=
= −
In general, not available.
![Page 75: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/75.jpg)
Variance Estimation
Mean1
1ˆ
p
i
i
x xp
=
= =
Variance ( )22 2
1
ˆ1
1ˆ
p
i
i
s xp
=
= = −−
Loss 1 degree of freedom
![Page 76: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/76.jpg)
Simple Linear Regression
0 1y x = + +
2~ (0, )N 0
1
y
x
=
+
y
x
![Page 77: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/77.jpg)
Simple Linear Regression
y
x
0
1ˆˆ
y
x
=
+0 1y x = + +
2~ (0, )N
![Page 78: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/78.jpg)
Mean Squared Error (MSE)
y
x
0
1ˆˆ
y
x
=
+0 1y x = + +
2~ (0, )N
2
2ˆ
SSEMSE
p = =
−Loss 2 degrees
of freedom
![Page 79: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/79.jpg)
Variance Estimation
2ˆSSE
MSEmp
= =−
Loss m degrees of freedom
m: #parameters of the model
![Page 80: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/80.jpg)
The Number of Parameters
w1 w2 wm
x1 x2 xnx =
1
( ) ( )m
i
i
iy wf =
= =x x#degrees of freedom:
![Page 81: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/81.jpg)
The Effective Number of Parameters ()
w1 w2 wm
x1 x2 xnx =
1
( ) ( )m
i
i
iy wf =
= =x x The projection Matrix
1 T
p
−= −P I Φ ΦA
T= +A Φ Φ Λ
=Λ 0
( )p trace = − P m=
![Page 82: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/82.jpg)
The Effective Number of Parameters ()
The projection Matrix
1 T
p
−= −P I Φ ΦA
T= +A Φ Φ Λ
=Λ 0
( )1( ) T
ptrace trace −= − AP I Φ Φ
( )1 Tp trace −= − ΦA Φ
( )( )1TTp trace
−
= − Φ Φ ΦΦ
Pf)
( )( )1T Tp trace
−
= − Φ ΦΦ Φ
( )mp trace= − I
p m= −( )p trace = − P m=
![Page 83: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/83.jpg)
Regularization
( )((
1
)
2
)p
k
k
k fy=
− x
2
1
m
i
iiw=
Empirial
Er
Model s
penalror ts
yCo t = +
( )2
( )
1 1
( )p m
k
i
k
k
i
iwy = =
−
x
Penalize models withlarge weightsSSE
![Page 84: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/84.jpg)
Regularization
( )((
1
)
2
)p
k
k
k fy=
− x
2
1
m
i
iiw=
Empirial
Er
Model s
penalror ts
yCo t = +
( )2
( )
1 1
( )p m
k
i
k
k
i
iwy = =
−
x
Penalize models withlarge weightsSSE
Without penalty (i=0), there are m degrees
of freedom to minimize SSE (Cost).
The effective number of parameters =m.
![Page 85: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/85.jpg)
Regularization
( )((
1
)
2
)p
k
k
k fy=
− x
2
1
m
i
iiw=
Empirial
Er
Model s
penalror ts
yCo t = +
( )2
( )
1 1
( )p m
k
i
k
k
i
iwy = =
−
x
Penalize models withlarge weightsSSE
With penalty (i>0), the liberty to minimize SSE will be reduced.
The effective number of parameters <m.
![Page 86: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/86.jpg)
Variance Estimation
2ˆ MSE =
Loss degrees of freedom
SSE
p =
−
![Page 87: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/87.jpg)
Variance Estimation
2ˆ MSE =( )
SSE
trace=
P
![Page 88: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/88.jpg)
Model Selection• Goal
– Choose the fittest model
• Criteria– Least prediction error
• Main Tools (Estimate Model Fitness)– Cross validation– Projection matrix
• Methods– Weight decay (Ridge regression)– Pruning and Growing RBFN’s
![Page 89: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/89.jpg)
Empirical Error vs. Model Fitness
Minimize Prediction Error
• Ultimate Goal − Generalization
• Goal of Our Learning Procedure
Minimize Empirical Error
Minimize Prediction Error
(MSE)
![Page 90: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/90.jpg)
Estimating Prediction Error
• When you have plenty of data use independent test sets– E.g., use the same training set to train
different models, and choose the best model by comparing on the test set.
• When data is scarce, use– Cross-Validation– Bootstrap
![Page 91: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/91.jpg)
Cross Validation
• Simplest and most widely used method for estimating prediction error.
• Partition the original set into several different ways and to compute an average score over the different partitions, e.g.,– K-fold Cross-Validation– Leave-One-Out Cross-Validation– Generalize Cross-Validation
![Page 92: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/92.jpg)
Global hyperplane Local receptive field
EBP LMS
Local minima Serious local minima
Smaller number of hidden
neurons
Larger number of hidden
neurons
Shorter computation time Longer computation time
Longer learning time Shorter learning time
Comparison of RBF and MLP
![Page 93: Jacek Mazurkiewicz, PhD Softcomputing Part 13: …...•Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm. •The activation](https://reader030.fdocuments.net/reader030/viewer/2022041014/5ec4d56022367157d34e4ce5/html5/thumbnails/93.jpg)
RBF networks MLP
Learning speed Very Fast Very Slow
Convergence Almost guarantee Not guarantee
Response time Slow Fast
Memory
requirement
Very large Small
Hardware
implementation
IBM ZISC036
Nestor Ni1000www-5.ibm.com/fr/cdlab/zisc.html
Voice Direct 364
www.sensoryinc.com
Generalization Usually better Usually poorer
Comparison of RBF and MLP