Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision...

Statistical Classification Methods

1. Introduction2. k-nearest neighbor3. Neural networks4. Decision trees5. Support Vector Machine

What is classification

• Locate new observations into known classes by previously trained model

• Model was trained by existing data with known label

Introduction

Machine Learning is the study of computer algorithms that

improve automatically through experience. [Machine learning, Tom Mitchell, McGraw Hill, 1997]

Machine Classifier

Training Data:example input/output pairs

inputoutput

Machine Learning for Classification

Apply

Model

Induction

Deduction

Learn

Model

Model

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes 10

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ? 10

Test Set

Learningalgorithm

Training Set

Classification Steps

Introduction

Examples of Classification Task• Determining cells as cancer or non-cancer

• Classifying credit card transactions as legitimate or fraudulent

• Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil

• Categorizing news stories as finance, weather, entertainment, sports, etc

Introduction

• Prototype based Methods: K-Nearest Neighbour (KNN),Weighed KNN, Fuzzy KNN, etc.

• Boundary based Methods: Neural Networks, such as Multiple Layer Perception (MLP), Back Propagration (BP), Support Vector Machine (SVM), etc.

• Rule based Methods: Decision Tree

Classification Methods used in DM

Classification Method: 1kNN - Basic Information

• Training method:– Save the training examples

• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that are

nearest to the test example x– Predict the most frequent class among those yi’s.

kNN

kNN Steps

Store all input data in the training set

For each sample in the test set

Search for the K nearest sample to the input sample using a Euclidean distance measure

For classification, compute the confidence for each class as Ci /K,

(where Ci is the number of samples among the K nearest samples belonging to class i.)

The classification for the input sample is the class with the highest confidence.

kNN

• An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x))– ai(x) denotes features

• Euclidean distance between two instancesd(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)

• Continuous valued target function– mean value of the k nearest training examples

kNN Calculation

kNN

kNN Calculation - 1-Nearest Neighbor

kNN

kNN Calculation - 3-Nearest Neighbor

kNN

On Class Practice 1

• Data – Iris.arff and your own data (if applicable)

• Method – k-NN– Parameter (Select by yourself)

• Software– wekaclassalgos1.7

• Step– Explorer->Classify->Classifier (Lazy IBK)

Classification Method: 2Neural Networks – Biological inspiration

• Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours.

• An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems.

• The nervous system is build by relatively simple units, the neurons, so copying their behaviour and functionality should be the solution.

Neural networks

A neural network is an interconnected group of nodes

Neural Networks – Basic Structure

Neural networks

Neural Networks – Biological inspiration

Neural networks

Dendrites

Soma (cell body)

Axon


Neural networks

synapses

axondendrites

The information transmission happens at the synapses.


Neural networks

The spikes (signal) travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse.

The neurotransmitters cause excitation (+) or inhibition (-) in the dendrite of the post-synaptic neuron.

The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron.

The contribution of the signals depends on the strength of the synaptic connection.

Neural Networks – Artificial neurons

Neural networks

Neurons work by processing information. They receive and provide information in form of spikes.

The McCullogh-Pitts model

Inputs

Outputw2

w1

w3

wn

wn-1

..

.

x1

x2

x3

…

xn-1

xn

y)(;

1

zHyxwzn

iii


Neural networks

The McCullogh-Pitts model:

• spikes are interpreted as spike rates;

• synaptic strength are translated as synaptic weights;

• excitation means positive product between the incoming spike rate and the corresponding synaptic weight;

• inhibition means negative product between the incoming spike rate and the corresponding synaptic weight;


Neural networks

Nonlinear generalization of the McCullogh-Pitts neuron:

),( wxfy

y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights.

Examples:

2

2

2

||||

1

1

a

wx

axw

ey

ey T

sigmoidal neuron

Gaussian neuron

Neural Networks – Artificial neural networks

Neural networks

Inputs

Output

An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs.

Neural Networks

Neural networks

Learning in biological systems

Learning = learning by adaptation

The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behaviour.

At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones.

Neural Networks

Neural networks

Learning as optimisation

The objective of adapting the responses on the basis of the information received from the environment is to achieve a better state. E.g., the animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy.

In other words, the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state.

Neural Networks

Neural networks

Learning in biological neural networksThe learning rules of Hebb:

• synchronous activation increases the synaptic strength;

• asynchronous activation decreases the synaptic strength.

These rules fit with energy minimization principles.

Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed.

Neural Networks

Neural networks

Learning principle for artificial neural networksENERGY MINIMIZATION

We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons.

ENERGY = measure of task performance error

Neural Networks- mathematics

Neural networks

Inputs

Output

),(

),(

),(

),(

144

14

133

13

122

12

111

11

wxfy

wxfy

wxfy

wxfy

),(

),(

),(

23

123

22

122

21

121

wyfy

wyfy

wyfy

14

13

12

11

1

y

y

y

y

y ),( 31

2 wyfyOut

23

23

23

2

y

y

y

y

Neural Networks-mathematics

Neural networks

input / output transformation

),( WxFyout W is the matrix of all weight vectors.

n

iii xwz

1

)(zHy

F actually is two functions:

weighted sum of input & activation function

Neural Networks- Perceptron

Neural networks

● Basic unit in a neural network● Linear separator● Parts

N inputs, x1 ... xn Weights for each input, w1 ... wn A bias input x0 (constant) and associated weight w0 Weighted sum of inputs, z = w0x0 + w1x1 + ... + wnxn A threshold function, i.e y=1 if z > 0, y=-1 if z <= 0


Neural networks

x1

x2

.

.

.

xn

ΣThreshold

z = Σ wixi

x0

w0

w1

w2

wn

1 if z >0-1 otherwise


Neural networks

Learning in Perceptron• Start with random weights• Select an input couple (x, F(x))• if then modify the weight according with

Note that the weights are not modified if the network gives the correct answer

)(xfy

iij xxfw )(


Neural networks

• Can add learning rate to speed up the learning process; just multiply in with delta computation

• Essentially a linear discriminant• Perceptron theorem: If a linear discriminant

exists that can separate the classes without error, the training procedure is guaranteed to find that line or plane.

• only one layer, problem with solving complex problem

Neural Networks

Neural networks

MLP Backpropagation networks

• Attributed to Rumelhart and McClelland, late 70’s• Can construct multilayer networks. Typically we have fully

connected, feedforward networks.

Inputs

Output

Neural Networks – MLP BP

Neural networks

Learning Procedure:• Randomly assign weights (between 0-1)

• Present inputs from training data, propagate to outputs

• Compute outputs O, adjust weights according to the delta rule, backpropagating the errors. The weights will be nudged closer so that the network learns to give the desired output.

• Repeat; stop when no errors, or enough epochs completed


Neural networks

Inputs

Output

if Error found here

Perceptron can only change weight here

MLP BP changes weight here as well


Neural networks

• Very powerful - can learn any function, given enough hidden units! With enough hidden units, we can generate any function.

• Have the same problems of Generalization vs. Memorization. With too many units, we will tend to memorize the input and not generalize well. Some schemes exist to “prune” the neural network.

• Networks require extensive training, many parameters to fiddle with. Can be extremely slow to train. May also fall into local minima.

• Inherently parallel algorithm, ideal for multiprocessor hardware.• Despite the cons, a very powerful algorithm that has seen

widespread successful deployment.


Neural networks

Parameters:

number of layersnumber of neurals on layertransfer function (activation function)number of iterations (cycles)

On Class Practice 2• Data

– Iris.arff (weka format) and your own data (if applicable)– Iris.txt (Neucom format)

• Method – Back-Propagation and Multiple layer Perceptron– Parameters (Select by yourself)


Steps:Explorer->Classify->Classifier (Nerual – multilayerperceptron - BackPropagation)

– Neucom Steps: Modeling Discovery -> Classification-> Neural Networks-> Multi-

Layer Perceptron

Self Organizing Map (SOM)

Neural networks

Characteristic:1. uses neighborhood2. High-dimensional low-dimensional

Two concepts:1. Training

builds the map using input examples.2. Mapping

classifies a new input vector

Components:• Nodes or Neurons

weight vector of the same dimension as the input data vectors and a position in the map space

– Gaussian neighborhood function:

– dji: initial distance of neurons i and j• in a 1-dimensional lattice | j - i |• in a 2-dimensional lattice || rj - ri ||

where rj is the position of neuron j in the lattice.

2

2

2exp)(

ijiji

ddh

Neighborhood Function

Neural networks

40N13(1) N13(2)

Neural networks

– measures the degree to which excited neurons in the vicinity of the winning neuron cooperate in the learning process.

– In the learning algorithm is updated at each iteration during the ordering phase using the following exponential decay update rule, with parameters

10 exp)( T

nn


Neural networks

0

0.5

1

-10 -8 -6 -4 -2 0 2 4 6 8 10

0

0.5

1

-10 -8 -6 -4 -2 0 2 4 6 8 10

Degree ofneighbourhood

Distance from winner

Degree ofneighbourhood

Distance from winner

Time

Time


Neural networks

SOM – Algorithm Steps

43

1. Randomly initialise all weights2. Select input vector x = [x1, x2, x3, … , xn] from training set

3. Compare x with weights wj for each neuron j to

4. determine winner find unit j with the minimum distance

5. Update winner so that it becomes more like x, together with the winner’s neighbours for units within the radius according to

6. Adjust parameters: learning rate & ‘neighbourhood function’7. Repeat from (2) until … ?

i

iijj xwd 2)(

)]()[()()1( nwxnnwnw ijiijij

1)1()(0 nn Note that: Learning rate generally decreases with time:

Neural networks

Step

StepStep

Step

Step

Step

Step

SOM - Architecture• Lattice of neurons (‘nodes’) accepts and responds to set of input

signals• Responses compared; ‘winning’ neuron selected from lattice• Selected neuron activated together with ‘neighbourhood’ neurons• Adaptive process changes weights to more closely inputs

2d array of neurons

Set of input signals

Weights

x1 x2 x3 xn...

wj1 wj2 wj3 wjn

j

Neural networks

On Class Practice 3

• Data – Iris.arff and your own data (if applicable)

• Method – SOM– Parameter (Select by yourself)


• Step– Explorer->Classify->Classifier (Functions - SOM)

Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision...

Documents

Transcript of Statistical Classification Methods 1.Introduction 2.k-nearest neighbor 3.Neural networks 4.Decision...