Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.
-
Upload
wesley-hill -
Category
Documents
-
view
214 -
download
2
Transcript of Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.
![Page 1: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/1.jpg)
Stochastic Optimizationand Simulated Annealing
Psychology 85-419/719January 25, 2001
![Page 2: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/2.jpg)
In Previous Lecture...
• Discussed constraint satisfaction networks, having:– Units, weights, and a “goodness” function
• Updating states involves computing input from other units– Guaranteed to locally increase goodness– Not guaranteed to globally increase goodness
![Page 3: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/3.jpg)
The General Problem: Local Optima
Goo
dnes
s
Activation State
Local Optima
True Optima
![Page 4: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/4.jpg)
How To Solve the Problemof Local Optima?
• Exhaustive search?– Nah. Takes too long. n units have 2 to the nth
power possible states (if binary)
• Random re-starts?– Seems wasteful.
• How about something that generally goes in the right direction, with some randomness?
![Page 5: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/5.jpg)
Sometimes It Isn’t Best ToAlways Go Straight Towards
The Goal
• Rubik’s Cube: Undo some moves in order to make progress
• Baseball: sacrifice fly
• Navigation: move away from goal, to get around obstacles
![Page 6: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/6.jpg)
Randomness Can Help Us Escape Bad Solutions
Activation State
![Page 7: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/7.jpg)
So, How Random Do WeWant to Be?
• We can take a cue from physical systems• In metallurgy, metals can reach a very strong
(stable) state by:– Melting it; scrambles molecular structure– Gradually cooling it– Resulting molecular structure very stable
• New terminology: reduce energy (which is kind of like the negative of goodness)
![Page 8: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/8.jpg)
Simulated Annealing
T
neti i
e
ap
1
1]1[
Odds that a unit is on is a function of:
The input to the unit, net
The temperature, T
![Page 9: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/9.jpg)
Picking it Apart...
• As net increases, probability that output is 1 increases– e is raised to the negative of net/T; so as net gets
big, e to the negative of net/T goes to zero. So probability goes to 1/1=1.
T
neti i
e
ap
1
1]1[
![Page 10: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/10.jpg)
The Temperature Term
• When T is big, the exponent for e goes to zero.
• e (or anything) to the zero power is 1
• So, odds output is 1 goes to 1/(1+1)=0.5
T
neti i
e
ap
1
1]1[
![Page 11: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/11.jpg)
The Temperature Term (2)
T
neti i
e
ap
1
1]1[
• When T gets small, exponent gets big.
• Effect of net becomes amplified.
![Page 12: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/12.jpg)
Different Temperatures...
Net Input
Pro
babi
lity
Out
put i
s 1
High Temp
Med Temp
Low Temp
0
1
.5
T
neti i
e
ap
1
1]1[
![Page 13: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/13.jpg)
Ok, So At What RateDo We Reduce Temperature?
In general, must decreaseit very slowly to guaranteeconvergence to globaloptimum
)1log()(
t
ctT
0 50 100
T
In practice, we can getaway with a more aggressiveannealing schedule..
![Page 14: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/14.jpg)
Putting it Together...
• We can represent facts, etc. as units
• Knowledge about these facts encoded as weights
• Network processing fills in gaps, makes inferences, forms interpretations
• Stable Attractors form; the weights and input sculpt these attractors.
• Stability (and goodness) enhanced with randomness in updating process.
![Page 15: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/15.jpg)
Stable Attractors Can BeThought Of As Memories
• How many stable patterns can be remembered by a network with N units?
• There are 2 to the N possible patterns…• … but only about 0.15*N will be stable• To remember 100 things, need 100/0.15=666
units!• (then again, the brain has about 10 to the 12th
power neurons…)
![Page 16: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/16.jpg)
Human Performance, When Damaged (some examples)
• Category coordinate errors– Naming a CAT as a DOG
• Superordinate errors– Naming a CAT as an ANIMAL
• Visual errors (deep dyslexics)– Naming SYMPATHY as SYMPHONY– or, naming SYMPATHY as ORCHESTRA
![Page 17: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/17.jpg)
The Attractors We’ve TalkedAbout Can Be UsefulIn Understanding This
CAT
COT
“CAT”
CAT
COT
“CAT”
Normal Performance A Visual Error
(see Plaut Hinton, Shallice)
![Page 18: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/18.jpg)
Properties of Human Memory
• Details tend to go first, more general things next. Not all-or-nothing forgetting.
• Things tend to be forgotten, based on– Salience– Recency– Complexity– Age of acquisition?
![Page 19: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/19.jpg)
Do These Networks Have These Properties?
• Sort of.
• Graceful degradation. Features vanish as a function of strength of input to them.
• Complexity: more complex / arbitrary patterns can be more difficult to retain
• Salience, recency, age of acquisition?– Depends on learning rule. Stay tuned
![Page 20: Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.](https://reader035.fdocuments.net/reader035/viewer/2022072014/56649e755503460f94b76be9/html5/thumbnails/20.jpg)
Next Time:Psychological Implications:
The IAC Model of Word Perception
• Optional reading: McClelland and Rumelhart ‘81 (handout)
• Rest of this class: Lab session. Help installing software, help with homework.