Hopfield Network for Associative...
Transcript of Hopfield Network for Associative...
Part VII 1
CSE 5526: Introduction to Neural Networks
Hopfield Network for Associative Memory
The basic task
• Store a set of fundamental memories {𝛏𝛏1, 𝛏𝛏2, … , 𝛏𝛏𝑀𝑀} so that, when presented a new pattern 𝐱𝐱, the system outputs one of the stored memories that is most similar to 𝐱𝐱 • Such a system is called content-addressable memory
Part VII 2
Architecture
• The Hopfield net consists of N McCulloch-Pitts neurons, recurrently connected among themselves
Part VII 3
…
𝜉𝜉𝑁𝑁 𝜉𝜉2 𝜉𝜉1
𝑥𝑥1 𝑥𝑥2 𝑥𝑥𝑁𝑁
…
Definition
• Each neuron is defined as
𝑥𝑥𝑗𝑗 = 𝜑𝜑 𝑣𝑣𝑗𝑗
where 𝑣𝑣𝑗𝑗 = ∑ 𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑁𝑁𝑗𝑗=1 + 𝑏𝑏𝑗𝑗
and 𝜑𝜑 𝑣𝑣 = � 1 if 𝑣𝑣 ≥ 0−1 otherwise
• Without loss of generality, let 𝑏𝑏𝑗𝑗 = 0
Part VII 4
Storage phase
• To store fundamental memories, the Hopfield model uses the outer-product rule, a form of Hebbian learning:
𝑤𝑤𝑗𝑗𝑗𝑗 =1𝑁𝑁� 𝜉𝜉𝜇𝜇,𝑗𝑗
𝑀𝑀
𝜇𝜇=1
𝜉𝜉𝜇𝜇,𝑗𝑗
• Hence 𝑤𝑤𝑗𝑗𝑗𝑗 = 𝑤𝑤𝑗𝑗𝑗𝑗, i.e., 𝐰𝐰 = 𝐰𝐰𝑇𝑇, so the weight matrix is
symmetric
Part VII 5
Retrieval phase
• The Hopfield model sets the initial state of the net to the input pattern. It adopts asynchronous (serial) update, which updates one neuron at a time
Part VII 6
Hamming distance
• Hamming distance between two binary/bipolar patterns is the number of differing bits • Example (see blackboard)
Part VII 7
One memory case
• Let the input 𝐱𝐱 be the same as the single memory 𝛏𝛏
𝑥𝑥𝑗𝑗 = 𝜑𝜑 �𝑤𝑤𝑗𝑗𝑗𝑗𝑗𝑗
𝑥𝑥𝑗𝑗
= 𝜑𝜑1𝑁𝑁�𝜉𝜉𝑗𝑗𝜉𝜉𝑗𝑗𝜉𝜉𝑗𝑗𝑗𝑗
= 𝜑𝜑 𝜉𝜉𝑗𝑗 = 𝜉𝜉𝑗𝑗 Therefore the memory is stable
Part VII 8
One memory case (cont.)
• Actually for any input pattern, as long as the Hamming distance between 𝐱𝐱 and 𝛏𝛏 is less than 𝑁𝑁 2⁄ , the net converges to 𝛏𝛏. Otherwise it converges to −𝛏𝛏 • Think about it
Part VII 9
Multiple memories
• The stability (alignment) condition for any memory 𝛏𝛏𝜗𝜗 is
𝜑𝜑 𝑣𝑣𝑗𝑗 = 𝜉𝜉𝜗𝜗,𝑗𝑗
where
𝑣𝑣𝑗𝑗 = �𝑤𝑤𝑗𝑗𝑗𝑗𝜉𝜉𝜗𝜗,𝑗𝑗𝑗𝑗
=1𝑁𝑁��𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜗𝜗,𝑗𝑗
𝜇𝜇𝑗𝑗
= 𝜉𝜉𝜗𝜗,𝑗𝑗 +1𝑁𝑁�� 𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜗𝜗,𝑗𝑗
𝜇𝜇≠𝜗𝜗𝑗𝑗
Part VII 10
crosstalk
Multiple memories (cont.)
• If |crosstalk| < N, the memory system is stable. In general, fewer memories are more likely stable
• Example 2 in book (see blackboard)
Part VII 11
Example (cont.)
Part VII 12
Memory capacity
• Define
𝐶𝐶𝑗𝑗𝜗𝜗 = −𝜉𝜉𝜗𝜗,𝑗𝑗�� 𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜗𝜗,𝑗𝑗 𝜇𝜇≠𝜗𝜗𝑗𝑗
𝐶𝐶𝑗𝑗𝜗𝜗 < 0 ⇒ stable
0 ≤ 𝐶𝐶𝑗𝑗𝜗𝜗 < 𝑁𝑁 ⇒ stable 𝐶𝐶𝑗𝑗𝜗𝜗 > 𝑁𝑁 ⇒ unstable
• What if 𝐶𝐶𝑗𝑗𝜗𝜗 = 𝑁𝑁?
Part VII 13
Memory capacity (cont.)
• Consider random memories where each element takes +1 or -1 with equal probability (prob.). We measure
𝑃𝑃error = Prob(𝐶𝐶𝑗𝑗𝜗𝜗 > 𝑁𝑁) • To compute capacity Mmax, decide on an error criterion • For random patterns, 𝐶𝐶𝑗𝑗𝜗𝜗 is proportional to a sum of 𝑁𝑁(𝑀𝑀 − 1) random numbers of +1 or -1. For large 𝑁𝑁𝑀𝑀, it can be approximated by a Gaussian distribution (central limit theorem) with zero mean and variance 𝜎𝜎2 = 𝑁𝑁𝑀𝑀
Part VII 14
Memory capacity (cont.)
• So
𝑃𝑃error =1
2π𝜎𝜎� exp (−
𝑥𝑥2
2𝜎𝜎2)
∞
𝑁𝑁d𝑥𝑥
=12−
12π𝜎𝜎
� exp −𝑥𝑥2
2𝜎𝜎2𝑁𝑁
0d𝑥𝑥
=12
1 −2π� exp −𝜇𝜇2
𝑁𝑁/(2𝑀𝑀)
0d𝜇𝜇
Part VII 15
error function
(𝑥𝑥 = 2𝜎𝜎𝜇𝜇)
Memory capacity (cont.)
• Error probability plot
Part VII 16
x N
𝑃𝑃(𝐶𝐶𝑗𝑗𝝑𝝑)
Memory capacity (cont.)
𝑃𝑃error 𝑴𝑴max/𝑵𝑵 0.001 0.105
0.0036 0.138 0.01 0.185 0.05 0.37 0.1 0.61
Part VII 17
• So 𝑃𝑃error < 0.01 ⇒ 𝑀𝑀max = 0.185𝑁𝑁, an upper bound
Memory capacity (cont.)
Part VII 18
• What if 1% flips occur? Avalanche effect? • Real upper bound: 0.138N to prevent the avalanche effect
Memory capacity (cont.)
• The above analysis is for one bit. If we want perfect retrieval for 𝛏𝛏𝜗𝜗 with probability of 0.99:
(1 − 𝑃𝑃error)𝑁𝑁> 0.99
• Approximately 𝑃𝑃error < 0.01𝑁𝑁
• For this case
𝑀𝑀max = 𝑁𝑁2log 𝑁𝑁
Part VII 19
Memory capacity (cont.)
• But real patterns are not random (they could be encoded) and the capacity is worse for correlated patterns
• At one favorable extreme, if memories are orthogonal
�𝜉𝜉𝜇𝜇,𝑗𝑗𝜉𝜉𝜗𝜗,𝑗𝑗𝑗𝑗
= 0 for 𝜗𝜗 ≠ 𝜇𝜇
then 𝐶𝐶𝑗𝑗𝜗𝜗 = 0 and 𝑀𝑀max = 𝑁𝑁 • This is the maximum number of orthogonal patterns
Part VII 20
Memory capacity (cont.)
• But in reality a useful system stores a little less; otherwise the memory is useless as it does not evolve
Part VII 21
…
Coding illustration
Part VII 22
Energy function (Lyapunov function)
• The existence of an energy (Lyapunov) function for a dynamical system ensures its stability
• The energy function for the Hopfield net is
𝐸𝐸(𝐱𝐱) = −12��𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑥𝑥𝑗𝑗
𝑗𝑗𝑗𝑗
• Theorem: Given symmetric weights, 𝑤𝑤𝑗𝑗𝑗𝑗 = 𝑤𝑤𝑗𝑗𝑗𝑗, the energy
function does not increase as the Hopfield net evolves
Part VII 23
Energy function (cont.)
• Let 𝑥𝑥𝑗𝑗′ be the new value of 𝑥𝑥𝑗𝑗 after an update
𝑥𝑥𝑗𝑗′ = 𝜑𝜑 �𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑗𝑗
• If 𝑥𝑥𝑗𝑗′ = 𝑥𝑥𝑗𝑗, 𝐸𝐸 remains the same
Part VII 24
Energy function (cont.) • In the other case, 𝑥𝑥𝑗𝑗′ = −𝑥𝑥𝑗𝑗:
𝐸𝐸 𝑥𝑥𝑗𝑗′ − 𝐸𝐸 𝑥𝑥𝑗𝑗 = −
12��𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑥𝑥𝑗𝑗′
𝑗𝑗𝑗𝑗
+12��𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑥𝑥𝑗𝑗
𝑗𝑗𝑗𝑗
= −12��𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑥𝑥𝑗𝑗′
𝑗𝑗𝑗𝑗≠𝑗𝑗
+12��𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑥𝑥𝑗𝑗
𝑗𝑗𝑗𝑗≠𝑗𝑗
= −𝑥𝑥𝑗𝑗′�𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑗𝑗≠𝑗𝑗
+ 𝑥𝑥𝑗𝑗�𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑗𝑗≠𝑗𝑗
= 2𝑥𝑥𝑗𝑗�𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑗𝑗≠𝑗𝑗
= 2𝑥𝑥𝑗𝑗�𝑤𝑤𝑗𝑗𝑗𝑗𝑥𝑥𝑗𝑗𝑗𝑗
− 2𝑤𝑤𝑗𝑗𝑗𝑗 < 0
Part VII 25
since 𝑤𝑤𝑗𝑗𝑗𝑗 = 𝑤𝑤𝑗𝑗𝑗𝑗
different signs
since 𝑤𝑤𝑗𝑗𝑗𝑗 = 𝑀𝑀𝑁𝑁⁄
Energy function (cont.)
• Thus, 𝐸𝐸(𝐱𝐱) decreases every time 𝑥𝑥𝑗𝑗 flips. Since 𝐸𝐸 is bounded, the Hopfield net is always stable
• Remarks: • Useful concepts from dynamical systems: attractors, basins of
attraction, energy (Lyapunov) surface or landscape
Part VII 26
Energy contour map
Part VII 27
2-D energy landscape
Part VII 28
Memory recall illustration
Part VII 29
Remarks (cont.)
• Bipolar neurons can be extended to continuous-valued neurons by using hyperbolic tangent activation function, and discrete update can be extended to continuous-time dynamics (good for analog VLSI implementation)
• The concept of energy minimization has been applied to optimization problems (neural optimization)
Part VII 30
Spurious states
• Not all local minima (stable states) correspond to fundamental memories. Typically, −𝛏𝛏 𝜇𝜇, linear combination of odd number of memories, or other uncorrelated patterns, can be attractors • Such attractors are called spurious states
Part VII 31
Spurious states (cont.)
• Spurious states tend to have smaller basins and occur higher on the energy surface
Part VII 32
local minima for spurious states
Kinds of associative memory
• �Autoassociative (e.g. Hopfiled net)
Heteroassociative: store pairs of < 𝐱𝐱𝜇𝜇 , 𝐲𝐲𝜇𝜇 > explicitly
Part VII 33
𝐱𝐱
𝐲𝐲
matrix memory
holographic memory