1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual...
-
Upload
samantha-parsons -
Category
Documents
-
view
226 -
download
0
Transcript of 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual...
![Page 1: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/1.jpg)
1
Pattern Classification
X
![Page 2: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/2.jpg)
2
Content
General Method K Nearest Neighbors Decision Trees Nerual Networks
![Page 3: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/3.jpg)
General Method
Training Learning knowledge or parameters
Testing Applying learned to new instance
3
![Page 4: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/4.jpg)
5
K Nearest Neighbors
K Nearest Neighbors Advantage
Nonparametric architecture Simple Powerful Requires no training time
Disadvantage Memory intensive Classification/estimation is slow
![Page 5: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/5.jpg)
6
K Nearest Neighbors
The key issues involved in training this model includes setting the variable K
Validation techniques (ex. Cross validation) the type of distant metric
Euclidean measure
2
1
)(),(
D
i
YiXiYXDist
![Page 6: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/6.jpg)
7
Figure K Nearest Neighbors Example
X
Stored training set patternsX input pattern for classification--- Euclidean distance measure to the nearest three patterns
![Page 7: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/7.jpg)
8
Store all input data in the training set
For each pattern in the test set
Search for the K nearest patterns to the input pattern using a Euclidean distance measure
For classification, compute the confidence for each class as Ci /K,
(where Ci is the number of patterns among the K nearest patterns belonging to class i.)
The classification for the input pattern is the class with the highest confidence.
![Page 8: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/8.jpg)
9
Training parameters and typical settings
Number of nearest neighbors The numbers of nearest neighbors (K) should be
based on cross validation over a number of K setting.
When k=1 is a good baseline model to benchmark against.
A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.
![Page 9: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/9.jpg)
10
Training parameters and typical settings
Input compression Since KNN is very storage intensive, we may
want to compress data patterns as a preprocessing step before classification.
Using input compression will result in slightly worse performance.
Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.
![Page 10: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/10.jpg)
11 CPC group Seminar Thursday, June 1, 2006
Euclidean distance metric fails
Pattern to be classified Prototype A Prototype B
Prototype B seems more similar than Prototype A according to Euclidean distance.
Digit “9” misclassified as “4”.
Possible solution is to use an distance metric invariant to irrelevant transformations.
![Page 11: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/11.jpg)
12
Decision trees
Decision trees are popular for pattern recognition because the models they produce are easier to understand.
Root node
A A
B B B B
A. Nodes of the tree
B. Leaves (terminal nodes) of the tree
C. Branches (decision point) of the tree
C
![Page 12: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/12.jpg)
13
Decision trees-Binary decision trees
Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf.
Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable.
Each leaf is assigned to a particular class.
Yes No
Yes No
NoYes
BMI<24
![Page 13: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/13.jpg)
14
Decision trees-Binary decision trees
Since each inequality that is used to split the input space is only based on one input variable.
Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.
B C
![Page 14: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/14.jpg)
15
Decision trees-Linear decision trees
Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.
aX1+bX2Yes No
Yes No
NoYes
![Page 15: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/15.jpg)
Biological Neural Systems
Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010
Connections (synapses) per neuron : ~104–105
Face recognition : 0.1 secs High degree of distributed and parallel computation
Highly fault tolerent Highly efficient Learning is key
![Page 16: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/16.jpg)
Excerpt from Russell and Norvig
![Page 17: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/17.jpg)
A Neuron
Computation: input signals input function(linear) activation
function(nonlinear) output signal
ajoutput links
ak
outputInput links
Wk
j
ai = output(inj)
inj
j
kkjj IWin *
![Page 18: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/18.jpg)
Part 1. Perceptrons: Simple NN
x1
x2
xn
.
.
.
w1
w2
wn
a=i=1n wi xi
Xi’s range: [0, 1]
1 if a y= 0 if a <
y
{
inputs
weights
activationoutput
![Page 19: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/19.jpg)
Decision Surface of a Perceptron
x1
x2
Decision line
w1 x1 + w2 x2 = w1
1 1
0
0
00
0
1
![Page 20: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/20.jpg)
Linear Separability
x1
x2
10
0 0
Logical AND
x1 x2 a y
0 0 0 0
0 1 1 0
1 0 1 0
1 1 2 1
w1=1w2=1=1.5
x1
10
0
w1=?w2=?= ?
1
Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
![Page 21: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/21.jpg)
Threshold as Weight: W0
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=-1
a= i=0n wi xi
y
1 if a y= 0 if a <{
=w0
Thus, y= sgn(a)=0 or 1
![Page 22: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/22.jpg)
Perceptron Learning Rule
w’=w + (t-y) x
wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.
In Han’s book it is lower case L It determines the magnitude of weight updates wi .
If the output is correct (t=y) the weights are not changed (wi =0).
If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.
![Page 23: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/23.jpg)
Perceptron Training Algorithm
Repeatfor each training vector pair (x,t)
evaluate the output y when x is the inputif yt then
form a new weight vector w’ accordingto w’=w + (t-y) x
else do nothing
end if end forUntil y=t for all training vector pairs or # iterations > k
![Page 24: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/24.jpg)
Perceptron Learning Examplet=1
t=-1
w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5
o=1
o=-1
(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5) =-1
w=[0.2 –0.2 –0.2]
(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3) =1
w=[-0.2 –0.4 –0.2]
(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1) =-1
w=[0.2 0.2 0.2]
![Page 25: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/25.jpg)
Part 2. Multi Layer Networks
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector
![Page 26: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/26.jpg)
Can use multi layer to learn nonlinear functions
How to set the weights?
x1
10
0
w1=?w2=?= ?
1
Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
x1
x2
3
4
5
w23
w35
![Page 27: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649ed05503460f94bdf140/html5/thumbnails/27.jpg)
28
End