Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive...

Stochastic Neural Networks, Optimal Perceptual Interpretation, and the

Stochastic Interactive Activation Model

PDP ClassJanuary 15, 2010

Goodness Landscape of The Cube Network

The Boltzmann Machine:The Stochastic Hopfield Network

Units have binary states [0,1], Update is asynchronous. The activation function is:

Suppose (as is the case in the cube network) that, for T>0, it is possibleto get from any state to any other state. If this condition holds, we say thenetwork is ergodic. Although the process may start in some particular state, then if we wait longenough, the starting state no longer matters. From that time on, we say thenetwork is at equilibrium, and under these circumstances:

More generally, at equilibrium we have the Probability-Goodness Equation:

or

Why is This True? (Intuitive Explanation)

• Consider two states, differing on the activation of a single unit

• The two states differ in goodness by an amount equal to the net input to the unit

• The probability of the unit being on is

• The probability of the unit being off is

• The rest of the states have the same probability

• So the ratio of the probability of the states is

• Which is

/

1

1 net Te/

/1

net T

net T

e

e

/net Te

1 2 /G G Te

Simulated Annealing

• Start with high temperature. This means it is easy to jump from state to state.

• Gradually reduce temperature.

• In the limit of infinitely slow annealing, we can guarantee that the network will end up in the best possible state (or in one of them, if two or more are tied for best).

• Thus, the best possible interpretation (or one of the best, if there is a tie) can always be found, if you are patient!

Exploring Probability Distributions over States

• Imagine settling to a non-zero temperature, such as T = 0.5.

• At this temperature, there’s still some probability of being in a state that is less than perfect.

• Consider an ensemble of networks

– At equilibrium (i.e. after enough cycles, possibly with annealing) the relative frequencies of being in the different states will approximate the relative probabilities given by the Probability-Goodness equation.

• You will have an opportunity to explore this situation in the homework assignment.

A Problem For the Interactive Activation Model

• Bayes Rule, Massaro’s model, and the logistic activation function all give rise to a pattern of data we will call ‘logistic additivity’.

• And data from many experiments exhibits this pattern• Unfortunately, the interactive activation model does not

exhibit this pattern.• Does this mean that the interactive activation model is

fundamentally wrong – i.e. processing is strictly feedforward (as Massaro believed)? Or is there some other inadequacy in the model?

Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/)

From Massaro & Cohen (1991)

Massaro’s Model

• Joint effects of context and stimulus obey the fuzzy logical model of perception (next slide):

• ti is the stimulus support for r given input i and cj is the contextual support for r given context j.

• Massaro sees this model as having a strictly feed-forward organization:

Evaluate stimulus

Evaluate contextIntegration Decision

• Massaro’s model implies ‘logistic additivity’:

log(pij/(1-pij)) = log(ti/(1-ti)) + log(cj/(1-cj))lo

git(

p ij)

The pij on this graph corresponds to the p(r|Sij) on the preceding slide

L-like R-like

Different lines referto different contextconditions:r means ‘favors r’l means ‘favors l’n means ‘neutral’

Massaro’s argument against the IA model

• In the IA model, feature information gets used twice, once on the way up and then again on the way back down.

• Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme.

• It appears from simulations that in a sense he is right (see next slide).

• Does this mean that processing is really not interactive? If not, why not?

Ideal logistic-additive pattern (upper right)vs. mini-IA simulation results (lower right).

What was wrong with the Interactive Activation model?

• The original interactive activation model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic.

• McClelland (1991) incorporated that intrinsic variability in the computation of the net input:

• Now we choose the alternative with the highest activation after settling.

• Logistic additivity is observed.

• The result holds up in full-scale models and can be proven to hold under certain constraints on network architecture (Movellan & McClelland, 2001).

i i j ij ij

net bias a w

Intrinsic Variability

Why logistic additivity holds in a Boltzmann machine version of the IA model

• Suppose the task is to identify the letter in position 2 in the IA model. The model is allowed to run to equilibrium… then states of the position 2 units are sampled until a state is found when one and only one of the letters in this position is active. The probability of this is given by:

• Define:

This decomposes into:

Why logistic additivity holds in the IA Model

• This reduces to:

• This consists of a factor for the bias, a factor for the stimulus,and a factor for the context.

Conditions on Logistic Additivity In Stochastic Interactive Models (Movellan & McClelland, 2001)

• Logistic additivity holds in a stochastic, bi-directionally connected neural network when two sources of input do not interact with each other except via the set of units that are the basis for specifying the response.

• This applies to the two sources of input to the identification of the letter in any position in the interactive activation model.

• Simulations suggest that the exact details of the activation function and source of variability are unimportant.

• Would the effects of two context letters on a third letter exhibit logistic additivity?

Effects of two different letters on a third letter can violate logistic additivity

Final D favors E when First letter is R,But favors U when first letter is M.

Consider the case in which the external input supports E and U equally in the middle letter position. Then: When First letter is R: p(E|D) = ~.6; p(E|N) =~.4 When First letter is M: p(E|D) = ~.4; p(E|N)=~.6

Logistic additivity as a toolfor analyzing network architectures

• Sternberg used additivity in reaction times to analyze cognitive architectures under the discrete stage model.

• We can use logistic additivity in response probability to analyze the structure of network architectures within an interactive activation framework.

– If two factors affect structurally separable pathways influencing the same response representations, they should have logistic-additive effects.

– If two factors have effects that are not logistically additive, this indicates that the processing pathways intersect someplace other than at the response representation.

Gilcrest, A. L. Perceived lightness depends on perceived spatial

arrangement. Science, 1977, 195, 185-187.

• Experiment shows that subjects assign a surface a ‘color’ based on which other surfaces they see it as co-planar with. Thus color depends on perceived depth, violating modularity.

Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive...

Documents

Transcript of Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive...