Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision...
-
Upload
jasmin-matthews -
Category
Documents
-
view
234 -
download
1
Transcript of Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision...
Statistical Classification Methods
1. Introduction2. k-nearest neighbor3. Neural networks4. Decision trees5. Support Vector Machine
What is classification
• Locate new observations into known classes by previously trained model
• Model was trained by existing data with known label
Introduction
Machine Learning is the study of computer algorithms that
improve automatically through experience. [Machine learning, Tom Mitchell, McGraw Hill, 1997]
Machine Classifier
Training Data:example input/output pairs
inputoutput
Machine Learning for Classification
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes 10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ? 10
Test Set
Learningalgorithm
Training Set
Classification Steps
Introduction
Examples of Classification Task• Determining cells as cancer or non-cancer
• Classifying credit card transactions as legitimate or fraudulent
• Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil
• Categorizing news stories as finance, weather, entertainment, sports, etc
Introduction
• Prototype based Methods: K-Nearest Neighbour (KNN),Weighed KNN, Fuzzy KNN, etc.
• Boundary based Methods: Neural Networks, such as Multiple Layer Perception (MLP), Back Propagration (BP), Support Vector Machine (SVM), etc.
• Rule based Methods: Decision Tree
Classification Methods used in DM
Classification Method: 1kNN - Basic Information
• Training method:– Save the training examples
• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that are
nearest to the test example x– Predict the most frequent class among those yi’s.
kNN
kNN Steps
Store all input data in the training set
For each sample in the test set
Search for the K nearest sample to the input sample using a Euclidean distance measure
For classification, compute the confidence for each class as Ci /K,
(where Ci is the number of samples among the K nearest samples belonging to class i.)
The classification for the input sample is the class with the highest confidence.
kNN
• An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x))– ai(x) denotes features
• Euclidean distance between two instancesd(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
• Continuous valued target function– mean value of the k nearest training examples
kNN Calculation
kNN
kNN Calculation - 1-Nearest Neighbor
kNN
kNN Calculation - 3-Nearest Neighbor
kNN
On Class Practice 1
• Data – Iris.arff and your own data (if applicable)
• Method – k-NN– Parameter (Select by yourself)
• Software– wekaclassalgos1.7
• Step– Explorer->Classify->Classifier (Lazy IBK)
Classification Method: 2Neural Networks – Biological inspiration
• Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours.
• An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems.
• The nervous system is build by relatively simple units, the neurons, so copying their behaviour and functionality should be the solution.
Neural networks
A neural network is an interconnected group of nodes
Neural Networks – Basic Structure
Neural networks
Neural Networks – Biological inspiration
Neural networks
Dendrites
Soma (cell body)
Axon
Neural Networks – Biological inspiration
Neural networks
synapses
axondendrites
The information transmission happens at the synapses.
Neural Networks – Biological inspiration
Neural networks
The spikes (signal) travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse.
The neurotransmitters cause excitation (+) or inhibition (-) in the dendrite of the post-synaptic neuron.
The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron.
The contribution of the signals depends on the strength of the synaptic connection.
Neural Networks – Artificial neurons
Neural networks
Neurons work by processing information. They receive and provide information in form of spikes.
The McCullogh-Pitts model
Inputs
Outputw2
w1
w3
wn
wn-1
..
.
x1
x2
x3
…
xn-1
xn
y)(;
1
zHyxwzn
iii
Neural Networks – Artificial neurons
Neural networks
The McCullogh-Pitts model:
• spikes are interpreted as spike rates;
• synaptic strength are translated as synaptic weights;
• excitation means positive product between the incoming spike rate and the corresponding synaptic weight;
• inhibition means negative product between the incoming spike rate and the corresponding synaptic weight;
Neural Networks – Artificial neurons
Neural networks
Nonlinear generalization of the McCullogh-Pitts neuron:
),( wxfy
y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights.
Examples:
2
2
2
||||
1
1
a
wx
axw
ey
ey T
sigmoidal neuron
Gaussian neuron
Neural Networks – Artificial neural networks
Neural networks
Inputs
Output
An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs.
Neural Networks
Neural networks
Learning in biological systems
Learning = learning by adaptation
The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behaviour.
At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones.
Neural Networks
Neural networks
Learning as optimisation
The objective of adapting the responses on the basis of the information received from the environment is to achieve a better state. E.g., the animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy.
In other words, the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state.
Neural Networks
Neural networks
Learning in biological neural networksThe learning rules of Hebb:
• synchronous activation increases the synaptic strength;
• asynchronous activation decreases the synaptic strength.
These rules fit with energy minimization principles.
Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed.
Neural Networks
Neural networks
Learning principle for artificial neural networksENERGY MINIMIZATION
We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons.
ENERGY = measure of task performance error
Neural Networks- mathematics
Neural networks
Inputs
Output
),(
),(
),(
),(
144
14
133
13
122
12
111
11
wxfy
wxfy
wxfy
wxfy
),(
),(
),(
23
123
22
122
21
121
wyfy
wyfy
wyfy
14
13
12
11
1
y
y
y
y
y ),( 31
2 wyfyOut
23
23
23
2
y
y
y
y
Neural Networks-mathematics
Neural networks
input / output transformation
),( WxFyout W is the matrix of all weight vectors.
n
iii xwz
1
)(zHy
F actually is two functions:
weighted sum of input & activation function
Neural Networks- Perceptron
Neural networks
● Basic unit in a neural network● Linear separator● Parts
N inputs, x1 ... xn Weights for each input, w1 ... wn A bias input x0 (constant) and associated weight w0 Weighted sum of inputs, z = w0x0 + w1x1 + ... + wnxn A threshold function, i.e y=1 if z > 0, y=-1 if z <= 0
Neural Networks- Perceptron
Neural networks
x1
x2
.
.
.
xn
ΣThreshold
z = Σ wixi
x0
w0
w1
w2
wn
1 if z >0-1 otherwise
Neural Networks- Perceptron
Neural networks
Learning in Perceptron• Start with random weights• Select an input couple (x, F(x))• if then modify the weight according with
Note that the weights are not modified if the network gives the correct answer
)(xfy
iij xxfw )(
Neural Networks- Perceptron
Neural networks
• Can add learning rate to speed up the learning process; just multiply in with delta computation
• Essentially a linear discriminant• Perceptron theorem: If a linear discriminant
exists that can separate the classes without error, the training procedure is guaranteed to find that line or plane.
• only one layer, problem with solving complex problem
Neural Networks
Neural networks
MLP Backpropagation networks
• Attributed to Rumelhart and McClelland, late 70’s• Can construct multilayer networks. Typically we have fully
connected, feedforward networks.
Inputs
Output
Neural Networks – MLP BP
Neural networks
Learning Procedure:• Randomly assign weights (between 0-1)
• Present inputs from training data, propagate to outputs
• Compute outputs O, adjust weights according to the delta rule, backpropagating the errors. The weights will be nudged closer so that the network learns to give the desired output.
• Repeat; stop when no errors, or enough epochs completed
Neural Networks – MLP BP
Neural networks
Inputs
Output
if Error found here
Perceptron can only change weight here
MLP BP changes weight here as well
Neural Networks – MLP BP
Neural networks
• Very powerful - can learn any function, given enough hidden units! With enough hidden units, we can generate any function.
• Have the same problems of Generalization vs. Memorization. With too many units, we will tend to memorize the input and not generalize well. Some schemes exist to “prune” the neural network.
• Networks require extensive training, many parameters to fiddle with. Can be extremely slow to train. May also fall into local minima.
• Inherently parallel algorithm, ideal for multiprocessor hardware.• Despite the cons, a very powerful algorithm that has seen
widespread successful deployment.
Neural Networks – MLP BP
Neural networks
Parameters:
number of layersnumber of neurals on layertransfer function (activation function)number of iterations (cycles)
On Class Practice 2• Data
– Iris.arff (weka format) and your own data (if applicable)– Iris.txt (Neucom format)
• Method – Back-Propagation and Multiple layer Perceptron– Parameters (Select by yourself)
• Software– wekaclassalgos1.7
Steps:Explorer->Classify->Classifier (Nerual – multilayerperceptron - BackPropagation)
– Neucom Steps: Modeling Discovery -> Classification-> Neural Networks-> Multi-
Layer Perceptron
Self Organizing Map (SOM)
Neural networks
Characteristic:1. uses neighborhood2. High-dimensional low-dimensional
Two concepts:1. Training
builds the map using input examples.2. Mapping
classifies a new input vector
Components:• Nodes or Neurons
weight vector of the same dimension as the input data vectors and a position in the map space
– Gaussian neighborhood function:
– dji: initial distance of neurons i and j• in a 1-dimensional lattice | j - i |• in a 2-dimensional lattice || rj - ri ||
where rj is the position of neuron j in the lattice.
2
2
2exp)(
ijiji
ddh
Neighborhood Function
Neural networks
40N13(1) N13(2)
Neural networks
– measures the degree to which excited neurons in the vicinity of the winning neuron cooperate in the learning process.
– In the learning algorithm is updated at each iteration during the ordering phase using the following exponential decay update rule, with parameters
10 exp)( T
nn
Neighborhood Function
Neural networks
0
0.5
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
Degree ofneighbourhood
Distance from winner
Degree ofneighbourhood
Distance from winner
Time
Time
Neighborhood Function
Neural networks
SOM – Algorithm Steps
43
1. Randomly initialise all weights2. Select input vector x = [x1, x2, x3, … , xn] from training set
3. Compare x with weights wj for each neuron j to
4. determine winner find unit j with the minimum distance
5. Update winner so that it becomes more like x, together with the winner’s neighbours for units within the radius according to
6. Adjust parameters: learning rate & ‘neighbourhood function’7. Repeat from (2) until … ?
i
iijj xwd 2)(
)]()[()()1( nwxnnwnw ijiijij
1)1()(0 nn Note that: Learning rate generally decreases with time:
Neural networks
Step
StepStep
Step
Step
Step
Step
SOM - Architecture• Lattice of neurons (‘nodes’) accepts and responds to set of input
signals• Responses compared; ‘winning’ neuron selected from lattice• Selected neuron activated together with ‘neighbourhood’ neurons• Adaptive process changes weights to more closely inputs
2d array of neurons
Set of input signals
Weights
x1 x2 x3 xn...
wj1 wj2 wj3 wjn
j
Neural networks
On Class Practice 3
• Data – Iris.arff and your own data (if applicable)
• Method – SOM– Parameter (Select by yourself)
• Software– wekaclassalgos1.7
• Step– Explorer->Classify->Classifier (Functions - SOM)