Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur [email protected].
-
Upload
edwina-cross -
Category
Documents
-
view
235 -
download
0
Transcript of Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur [email protected].
![Page 2: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/2.jpg)
![Page 3: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/3.jpg)
Neural Networks NN 1 3
The Neuron• The neuron is the basic information processing unit of a
NN. It consists of:1 A set of synapses or connecting links, each link
characterized by a weight: W1, W2, …, Wm
2 An adder function (linear combiner) which computes the weighted sum of the inputs:
3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.
m
1jj xwu
j
) (u y b
![Page 4: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/4.jpg)
Computation at Units• Compute a 0-1 or a graded function of the
weighted sum of the inputs• is the activation function
ii xwxw.
1w
nw
2w
1x
2x
nx
).( xwgg
()g
![Page 5: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/5.jpg)
Neural Networks NN 1 5
The Neuron
Inputsignal
Synapticweights
Summingfunction
Biasb
ActivationfunctionLocal
Fieldv Output
y
x1
x2
xm
w2
wm
w1
)(
![Page 6: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/6.jpg)
Common Activation Functions
• Step function: g(x)=1, if x >= t ( t is a threshold)g(x) = 0, if x < t
• Sign function: g(x)=1, if x >= t ( t is a threshold)g(x) = -1, if x < t
• Sigmoid function: g(x)= 1/(1+exp(-x))
![Page 7: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/7.jpg)
Neural Networks NN 1 7
Bias of a Neuron
• Bias b has the effect of applying an affine transformation to u
v = u + b• v is the induced field of the neuron
v
u
m
1jj xwu
j
![Page 8: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/8.jpg)
Neural Networks NN 1 8
Bias as extra input
Inputsignal
Synapticweights
Summingfunction
ActivationfunctionLocal
Fieldv Output
y
x1
x2
xm
w2
wm
w1
)(
w0x0 = +1
• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.
bw
xwv j
m
j
j
0
0
![Page 9: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/9.jpg)
Neural Networks NN 1 9
Face Recognition
90% accurate learning head pose, and recognizing 1-of-20 faces
![Page 10: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/10.jpg)
Neural Networks NN 1 10
Handwritten digit recognition
![Page 11: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/11.jpg)
Computing with spaces
x1 x2
y
perceptual features
+1 = cat, -1 = dog
x1
x2
y
dog cat
y g(Wx)QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
E y g(Wx) 2error:
![Page 12: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/12.jpg)
Can Implement Boolean Functions
• A unit can implement And, Or, and Not• Need mapping True and False to numbers:– e.g. True = 1.0, False= 0.0
• (Exercise) Use a step function and show how to implement various simple Boolean functions
• Combining the units, we can get any Boolean function of n variablesCan obtain logical circuits as special case
![Page 13: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/13.jpg)
Network Structures
• Feedforward (no cycles), less power, easier understood– Input units– Hidden layers– Output units
• Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf)
• Ltf: defined by weights and threshold , value is 1 iff otherwise, 0txw .
tw
![Page 14: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/14.jpg)
Neural Networks NN 1 14
Single Layer Feed-forward
Input layerof
source nodes
Output layerof
neurons
![Page 15: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/15.jpg)
Neural Networks NN 1 15
Multi layer feed-forward
Inputlayer
Outputlayer
Hidden Layer
3-4-2 Network
![Page 16: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/16.jpg)
Network Structures
• Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples:
• Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory• Boltzmann machines: more general, with applications
in constraint satisfaction and combinatorial optimization
![Page 17: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/17.jpg)
Simple recurrent networks
z1 z2
x1 x2
hidden layer
input layer
output layer
context units
input
(Elman, 1990)
x2 x1
copy
x(i1)
x(i)
![Page 18: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/18.jpg)
Perceptron Capabilities
• Quite expressive: many, but not all Boolean functions can be expressed. Examples:– conjuncts and disjunctions, example
– more generally, can represent functions that are true if and only if at least k of the inputs are true:
– Can’t represent XOR
1)( 2121 xxxx
kxxx n ...21
![Page 19: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/19.jpg)
Representable Functions
• Perceptrons have a monotinicity property: If a link has positive weight, activation can
only increase as the corresponding input value increases (irrespective of other input values)
• Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)
![Page 20: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/20.jpg)
Representable Functions
• Can represent only linearly separable functions
• Geometrically: only if there is a line (plane) separating the positives from the negatives
• The good news: such functions are PAC learnable and learning algorithms exist
![Page 21: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/21.jpg)
Linearly Separable
+
++
+
+
++
+++
+
+
-
_
![Page 22: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/22.jpg)
NOT linearly Separable
++
+
+
_
+
+ OR
+
+
![Page 23: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/23.jpg)
Problems with simple networks
x1 x2
x1
x2 y
Some kinds of data are not linearly separable
x1
x2
AND
x1
x2
OR
x1
x2
XOR
![Page 24: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/24.jpg)
A solution: multiple layers
z1 z2
y
x1 x2
y
z1
z2
x1
x2
hidden layer
input layer
output layer
![Page 25: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/25.jpg)
The Perceptron Learning Algorithm
• Example of current-best-hypothesis (CBH) search (so incremental, etc.):
• Begin with a hypothesis (a perceptron)• Repeat over all examples several times– Adjust weights as examples are seen
• Until all examples correctly classified or a stopping criterion reached
![Page 26: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/26.jpg)
Method for Adjusting Weights
• One weight update possibility:• If classification correct, don’t change• Otherwise:– If false negative, add input: – If false positive, subtract input:
• Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example
jjj xww
jjj xww
![Page 27: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/27.jpg)
Properties of the Algorithm
• In general, also apply a learning rate
• The adjustment is in the direction of minimizing error on the example
• If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator
jjj xww
![Page 28: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/28.jpg)
Another Algorithm(least-sum-squares algorithm)
• Define and minimize an error function• S is the set of examples, is the ideal
function, is the linear function corresponding to the current perceptron
• Error of the perceptron (over all examples):
• Note:
()h()f
Se
ehefhE 2))()(()2/1()(
)(.)(.)( exwexweh ii
![Page 29: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/29.jpg)
The Delta Rule
x1 x2
y
+1 = cat, -1 = dog
E y g(Wx) 2
wij E
wij
E
wij
2 y g(Wx) g'(Wx) x j
wij y g(Wx) g'(Wx) x j
output error
influenceof input
for any function g with derivative g
perceptual features
![Page 30: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/30.jpg)
Derivative of Error
• Gradient (derivative) of E:
• Take the steepest descent direction:
• is the gradient along , is the learning rate
],...,,[)(10 nw
E
w
E
w
EhE
iiiii w
Ewwww
where,
iw
E
iw
![Page 31: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/31.jpg)
Gradient Descent
• The algorithm: pick initial random perceptron and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient)
E
Descent direction:
Gradient direction:
![Page 32: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/32.jpg)
E (error)
wij
E
wij
0
E
wij
0
E
wij
0
wij E
wij
( is learning rate)
General-purpose learning mechanisms
![Page 33: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/33.jpg)
Gradient Calculation
))())(())(()((
))()(())()((22
1
))()((2
1]))()((
2
1[ 22
Se ii
Se i
Se iSeii
ehw
efw
ehef
ehefw
ehef
ehefw
ehefww
E
![Page 34: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/34.jpg)
Derivation (cont.)
jiw
exwexehef
w
exwehef
efw
exwehef
w
E
i
jj
Sei
Se j i
jj
Se ii
for ,0))(.(
as ,))())(()((
)))(.(
))(()((
constant is )( ,)))(.(
0))(()((
![Page 35: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/35.jpg)
Properties of the algorithm• Error function has no local minima (is quadratic)
• The algorithm is a gradient descent method to the global minimum, and will asymptotically converge
• Even if not linearly separable, can find a good (minimum error) linear classifier
• Incremental?
![Page 36: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/36.jpg)
Multilayer Feed-Forward Networks
• Multiple perceptrons, layered• Example: a two-layer network with 3 inputs one
output, one hidden layer (two hidden units)
hidden layer
output layerinputs layer
2x
3x
1x
![Page 37: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/37.jpg)
Power/Expressiveness
• Can represent interactions among inputs (unlike perceptrons)
• Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used
• Learning algorithms exist, but weaker guarantees than perceptron learning algorithms
![Page 38: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/38.jpg)
Back-Propagation
• Similar to the perceptron learning algorithm and gradient descent for perceptrons
• Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error)
• Assumption: internal units use differentiable functions and nonlinear
• sigmoid functions are convenient
![Page 39: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/39.jpg)
Neural Networks NN 1 39
Recurrent Network with hidden neuron(s): unit delay operator z-1 implies dynamic system
z-1
z-1
z-1
Recurrent network
inputhiddenoutput
![Page 40: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/40.jpg)
Back-Propagation (cont.)
• Start with a network with random weights• Repeat until a stopping criterion is met– For each example, compute the network
output and for each unit i it’s error term– Update each weight (weight of link going
from node i to node j):
i
ijw
)( where, iowwww jijijijij
Output of unit i
![Page 41: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/41.jpg)
The Error Term
• • •
• •
)o'(i)Err(iδi
nodeoutput an is if ),()()( i ehefiErr ii
node internal an is if ,)( i wiErr jij
i node of derivative is )(' io
1 sigmoid,For -o(i)) o(i) (o'(i)
![Page 42: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/42.jpg)
Derivation
• Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc):
• Differentiate (with respect to each weight…)• For example, we get
for weight connecting node j to output i
iunitoutput
ii ehefE
2))()((2
1
,)()()(')( iji
joiErriojow
E
ijw
![Page 43: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/43.jpg)
Properties
• Converges to a minimum, but could be a local minimum
• Could be slow to converge(Note: Training a three node net is NP-Complete!)
• Must watch for over-fitting just as in decision trees (use validation sets, etc.)
• Network structure? Often two layers suffices, start with relatively few hidden units
![Page 44: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/44.jpg)
Properties (cont.)
• Many variations to the basic back-propagation: e.g. use momentum
• Reduce with time (applies to perceptrons as well)
10),1()(- )( nwionw ijjij
Nth update amount a constant
![Page 45: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/45.jpg)
Networks, features, and spaces
• Artificial neural networks can represent any continuous function…
• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality
![Page 46: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/46.jpg)
NN properties
• Can handle domains with – continuous and discrete attributes– Many attributes– noisy data
• Could be slow at training but fast at evaluation time
• Human understanding of what the network does could be limited
![Page 47: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/47.jpg)
Networks, features, and spaces
• Artificial neural networks can represent any continuous function…
• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality
• A way to explain how people could learn things that look like rules and symbols…
![Page 48: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/48.jpg)
Networks, features, and spaces
• Artificial neural networks can represent any continuous function…
• Simple algorithms for learning from data– fuzzy boundaries– effects of typicality
• A way to explain how people could learn things that look like rules and symbols…
• Big question: how much of cognition can be explained by the input data?
![Page 49: Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com.](https://reader031.fdocuments.net/reader031/viewer/2022031909/56649e795503460f94b787da/html5/thumbnails/49.jpg)
Challenges for neural networks• Being able to learn anything can make it
harder to learn specific things– this is the “bias-variance tradeoff”