Machine Learning: Connectionist

Machine Learning: Connectionist

McCulloch-Pitts NeuronPerceptronsMultilayer NetworksSupport Vector MachinesFeedback NetworksHopfield Networks

Uses

ClassificationPattern RecognitionMemory RecallPredictionOptimizationNoise Filtering

Artificial Neuron

Input signals, xi

Weights, wi

Activation level, Sigma wi x

i

Threshold function, f

Neural Networks

Network TopologyLearning AlgorithmEncoding Scheme

McCulloch-Pitts Neuron

Output is either +1 or -1.Computes weighted sum of inputs.If weighted sum >= 0 outputs +1, else -1.Can be combined into networks (multilayers)Not trainedComputationally complete

Example

Perceptrons (Rosenblatt)

Similar to McCulloch-Pitts neuronSingle layerHard limited threshold function, +1 if

weighted sum >=t, -1 otherwiseCan use sign function if bias includedAllows for supervised training (Perceptron Training

Algorithm)

Perceptron Training Algorithm

Adjusts weights by using the difference between the

actual output and the expected output in a a training

example.Rule: Δw

i = c(d

i – O

i) x

i

c is the learning rated

i is the expected output

Oi is the computed output, sign(Σw

i x

i).

Example: Matlab nnd4pr function

Perceptron (Cont'd)

Simple training algorithmNot computationally completeCounter-example: XOR function

Requires problem to be linearly separableThreshold function not continuous (needed for more

sophisticated training algorithms)

Generalized Delta Rule

Conducive to finer granularity in the error

measurement Form of gradient descent learning – consider the

error surface, the map of the error vs. the weights.

The rule takes a step closer to a local minima by

following the gradientUses the learning parameter, c

Generalized Delta Rule (cont'd)

The threshold function must be continuous. We use

the a sigmoid function, f(x) = 1/(1 + e -λx), instead of

a hard limit function. The sigmoid function is

continuous, but approximates the hard limit fn.The rule is: Δ w

i = c (d

i – O

i) f'(Σ w

i x

i) x

k

= - c (d

i -O

i) * O

i * (1 – O

i) * x

k

Hill-climbing algorithmc determines how much the weight changes in a

single step

Multilayer Network

Since a single-layer perceptron network is not

computationally complete, we allow for a

multilayer network where the output of each layer is

the input for the next layer (except for the final

layer, the output layer). The first layer whose input

comes from the external source is the input layer.

All other layers are called hidden layers.

Training a ML Network

How can we train a multilayer network? Given a

training example, the ouput layer can be trained like

a single-layer network by comparing the expected

output to the actual output and adjusting the weights

going of the lines going into the output layer

accordingly. But how can the hidden layers (and the

input layer) be trained?

Training an ML Network (cont'd)

The solution is to assign a certain amount of blame,

delta, to each neuron in a hidden layer (or the input

layer) based on its contribution to the total error.

The blame is used to adjust the weights. The blame

for a node in the hidden layer (or the input layer) is

calculated by using the blame values for the next

layer.

Backpropagation

To train a multilayer network we use the

backpropagation algorithm. First we run the

network on a training example. Then we compare

the expected output to the actual output to calculate

the error. The blame (delta) is attributed to the non-

output-layer nodes by working backward, from the

output layer to the input layer. Finally the blame is

used to adjust the weights on the connections.

Backpropagation (cont'd)

Δ wi = - c * (d

i -O

i) * O

i * (1 – O

i) * x

k, for output

nodesΔ w

i = - c * O

i * (1 – O

i) * Σ

j(-delta

j * w

ij) * x

k, for

hidden and input nodeswhere delta

j = (d

i – O

i) * O

i * (1 – O

i) or

deltaj = - O

j * (1 – O

j) * Σk(-delta

k * w

jk)

Example - NETtalk

NETtalk is a neural net for pronouncing English

text.The input consists of a sliding window of seven

characters. Each character may be one of 29 values

(26 letters, two punctuation chars, and a space), for

a total of 203 input lines.There are 26 output lines (21 phonemes and 5 to

encode stress and syllable boundaries).There is a single hidden layer of 80 units.

NETtalk (cont'd)

Uses backpropagation to trainRequires many passes through the training setResults comparable to ID3 (60% correct)The hidden layers serve to abstract information

from the input layers

Competitive Learning

Can be supervised or unsupervised, the latter

usually for clusteringIn Winner-Take-All learning for classification, one

output node is considered the “winner.” The weight

vector of the winner is adjusted to bring it closer to

the input vector that caused the win.Kohonen Rule: Δ wt = c (Xt-1 – Wt-1)Don't need to compute f(x), weighted sum sufficient

Kohonen Network

Can be used to learn prototypesInductive bias in terms of the number of prototypes

originally specified.Start with random prototypesEssentially measures the distance between each

prototype and the data point to select the winnerReinforces the winning node by moving it closer to

the input dataSelf-organizing network

Support Vector Machines

Form of supervised competitive learningClassifies data to be in one of two categories by

finding a hyperplane (determined by the support

vectors) between the positive and negative instancesClassifies elements by computing the distance from

a data point to a hyperplane as an optimization

problemRequires training and linearly separable data, o.w.,

doesn't converge.

Machine Learning: Connectionist

Documents

Transcript of Machine Learning: Connectionist