Approximation of Neural Network Dynamics by Reaction-Diffusion Equations

27
Mathematical Methods in the Applied Sciences, Vol. 19, 651-677 (1996) MOS subject classification: 92 B 20, 35 K 57; 65 M 99 Approximation of Neural Network Dynamics by Reaction-Diffusion Equations Roderick Edwards Department of Mathematics and Statistics, University of Victoria, P.O. Box 3045, Victoria, B.C., Canada V8 W 3P4 Communicated by H. Neunzert The equations of the Hopfield network, without the constraint of symmetry, can have complex behaviours. Cottet borrowed techniques from particle methods to show that a class of such networks with symmetric, translation-invariant connection matrices may be approximated by a reaction-diffusion equation. This idea is extended to a wider class of network connections yielding a slightly more complex reaction-diffusion equation. It is also shown that the approximation holds rigorously only in certain spatial regions (even for Cottet’s special case) but the small regions where it fails, namely within transition layers between regions of high and low activity, are not likely to be critical. 1. Introduction A good deal of attention has been paid in the last few years to the type of neural network model often referred to as the ‘Hopfield network’ since Hopfield’s important contribution to the study of such systems [27,28]. The equations describing the dynamics of these networks are of the form zii = - GIui + 1 Tijg(luj), (1.1) j where i and j are indices over all neurons in the network, ui represents the membrane potential of the ith neuron, GI > 0 is a ‘leakage’ rate (or resistance parameter in Hopfield’s formulation), Tij is the ‘synaptic efficacy’modulating the effect of neuronj on neuron i, and g is a sigmoid response function with ‘gain’ 1 > 0, describing how a neuron’s firing rate depends on its membrane potential. These equations may also have external input terms, ci, and threshold terms, Oi, describing signals to each neuron arriving from outside the net and firing thresholds other than zero. For example, zii = - GIui + c Tijg(IZ(uj - 0,)) + ci. j As a model for biological neural networks this is clearly a great simplification but it nevertheless extracts some features of their design. Artificial neural networks using CCC 0170-4214/96/08065 1-27 0 1996 by B. G. Teubner Stuttgart-John Wiley & Sons Ltd. Received 8 March 1995

Transcript of Approximation of Neural Network Dynamics by Reaction-Diffusion Equations

Mathematical Methods in the Applied Sciences, Vol. 19, 651-677 (1996) MOS subject classification: 92 B 20, 35 K 57; 65 M 99

Approximation of Neural Network Dynamics by Reaction-Diffusion Equations

Roderick Edwards

Department of Mathematics and Statistics, University of Victoria, P.O. Box 3045, Victoria, B.C., Canada V8 W 3P4

Communicated by H. Neunzert

The equations of the Hopfield network, without the constraint of symmetry, can have complex behaviours. Cottet borrowed techniques from particle methods to show that a class of such networks with symmetric, translation-invariant connection matrices may be approximated by a reaction-diffusion equation. This idea is extended to a wider class of network connections yielding a slightly more complex reaction-diffusion equation. It is also shown that the approximation holds rigorously only in certain spatial regions (even for Cottet’s special case) but the small regions where it fails, namely within transition layers between regions of high and low activity, are not likely to be critical.

1. Introduction

A good deal of attention has been paid in the last few years to the type of neural network model often referred to as the ‘Hopfield network’ since Hopfield’s important contribution to the study of such systems [27,28]. The equations describing the dynamics of these networks are of the form

zii = - GIui + 1 T i j g ( l u j ) , (1.1) j

where i and j are indices over all neurons in the network, ui represents the membrane potential of the ith neuron, GI > 0 is a ‘leakage’ rate (or resistance parameter in Hopfield’s formulation), Tij is the ‘synaptic efficacy’ modulating the effect of neuronj on neuron i, and g is a sigmoid response function with ‘gain’ 1 > 0, describing how a neuron’s firing rate depends on its membrane potential. These equations may also have external input terms, ci, and threshold terms, Oi, describing signals to each neuron arriving from outside the net and firing thresholds other than zero. For example,

zii = - GIui + c Tijg(IZ(uj - 0,)) + ci. j

As a model for biological neural networks this is clearly a great simplification but it nevertheless extracts some features of their design. Artificial neural networks using

CCC 0170-4214/96/08065 1-27 0 1996 by B. G. Teubner Stuttgart-John Wiley & Sons Ltd.

Received 8 March 1995

652 R. Edwards

these dynamics have proven useful in some applications. There are, of course, many models of neural nets (see, for example, the survey papers [24, 31, 32, 441). The dynamics of many of the models (especially those closer to biology) have the form of equations (1.2).

The dynamical behaviour of these equations is in general complex and difficult to describe. Much of the literature has concentrated on the special case of a symmetric connection matrix, T (i.e. Ti j = Tj i ) . This is mathematically convenient as there exists an energy functional in this case which ensures convergent behaviour of the net [28]: All initial conditions approach a fixed point. Convergence is useful in the application of these neural nets to content-addressable memory or the retrieval of patterns (‘memories’) from distorted or similar versions of them; it is possible to construct the transition matrix so that selected patterns become fixed points of the dynamics.

However, there is no biological reality to the symmetry of synaptic connections and there are good reasons why neural nets (perhaps even artificial ones) should not simply converge to fixed points (see e.g. the paper of Skarda and Freeman and the discussion following [41]; also [35]). Biological neural nets, whether we refer to the entire brain of an organism or a subsystem of a brain, do not simply converge to fixed levels of neural activity-they respond to external input, they apparently behave chaotically (see e.g. [21,29,41]) and they learn without supervision. Hence, it seems likely that something can be learned by studying more general neural network models and in particular non-symmetric ones.

When the Hopfield network equations are allowed to have a non-symmetric transition matrix, the range of possible dynamics is much greater. It appears, on the basis of numerical studies [37, 381 and analytic studies (e.g. [42,43]), that their behaviour can be chaotic. We would like to explore the types of dynamics possible and to develop analytic techniques to help do this. This paper explores one such technique, namely, approximation of the Hopfield network equations by a single partial differential equation. This blurring of the distinction between individual neurons is in the same spirit as the laws of thermodynamics and statistical mechanics where important properties do not depend on keeping track of the motions of individual particles.

Neurobiologists often take measurements of neural activity by inserting electrodes which record activity of a pool of neurons in a general area. Although neurons are discrete entities, it appears that at least some (and perhaps all, according to Skarda and Freeman) of the interesting behaviour occurring in biological brains occurs at this level of averaged activity over an area. (see e.g. [41, pp. 163, 190; 23, pp. 7-10]). This suggests that continuous space models might be capable of describing these behav- iours.

Furthermore, it has been pointed out (e.g. by Cottet [13]) that it is appropriate to study the limiting system as the number of neurons in a network goes to infinity if one is interested in studying the behaviour of very large neural networks, particularly since the increase in size may cause changes in the type of dynamics and asymptotic behaviour.

The possibility of using a continuous space model with the same form as discrete space ‘Hopfield’ models has been explored in the literature (e.g. [3, 4, 11, 17-20, 25, 401). Neural activity is essentially modelled by a ‘neural field’ rather than a neural

Approximation of Neural Network Dynamics 653

network. For example,

Amari has given a detailed analysis of the relationship between discrete space and continuum (integro-differential equation) models in the case of random connectivities

It has been observed that these integro-differential equation models, under some circumstances are formally equivalent to certain reaction-diffusion equations (or pairs of them) [25, pp. 105-108; 34, pp. 481-4891. There has also been analysis of reac- tion-diffusion equation models of neural activity independent of the integro-differen- tial equation models [22,33]. However, a rigorous analysis of the approximation of the Hopfield network dynamics by reaction-diffusion equations has been lacking.

Cottet has suggested how this may be done [12]. He has shown at least formally that, in a restricted case of symmetric, translation-invariant connection functions T (i.e. T(x, y ) = T ( y , x) and T ( x , y ) = T ( x + z , y + z)) , the integro-differential equa- tion may be replaced by a certain quasi-linear partial differential equation (PDE) of reaction-diffusion type. In fact, he claims (using techniques from particle methods in fluid dynamics, etc.) that solutions to Hopfield equations are approximated arbitrarily closely by the solution to a PDE in a fixed time interval. Thus, he has attempted to give a rigorous analysis of the relationship between his continuous space and discrete space models, rather than simply presenting a new continuous space model. Although it appears that the convergence of the approximation cannot be rigorously carried through in general, it holds in some spatial regions (and a weaker form of convergence may still hold everywhere).

Cottet was particularly interested in applications of these equations to image processing problems (see also [13]) and for these problems, his restrictions on the connection function T are appropriate. However, it is possible to apply the same methods to more general T and obtain PDEs whose solutions approximate those of the corresponding Hopfield nets over fixed time intervals.

In particular, it is not necessary to insist that T be symmetric or translation- invariant. There are still, of course, restrictions on the form of T (smoothness, moment conditions, concentration around the diagonal), but there is a class of such functions (and corresponding matrices) for which the Hopfield nets are approximated by the behaviour of PDEs. The particular PDEs which are obtained are of a type for which some theory exists so we have an idea as to how they behave. In any case, this opens up the possibility of bringing the analytic theory of PDEs to bear on the behaviour of some neural network models.

It is also of interest to observe for what types of connection function (or matrix) T the PDE approximation theorems break down. PDEs may not be capable of approximating all network dynamics, particularly complex, chaotic dynamics. The results of this paper, for example, lead us to look at connection matrices T with wildly varying entries (in other words, interspersed inhibitory and excitatory connections) to produce the kind of complex behaviour not possible for the PDEs.

Our main object, however, is to prove a theorem establishing the reaction-diffusion approximation of Hopfield net equations with a more general form of the function (or

PI.

654 R. Edwards

matrix) T than covered by Cottet's result. A discussion of the implications and limitations follows.

2. The conventional Hopfield network equations

The Hopfield network equations with external inputs and thresholds may be written as in (1.2) or, if instead we let ui represent the amount by which the membrane potential of neuron i exceeds its threshold, we may write them alternately as

ti, = c Tijg(Auj) - Nui + ci - U e i . (2.1) j

We require O! > 0, 1 > 0 and g sigmoidal in shape, increasing on R and bounded. Typically, we use g : R + (0, 1) or g : R + ( - 1, 1) (the effect of this choice is discussed below). To be precise we will now assume:

g : R + ( - 1, l), g E c', g 'b ) > 0,

1 g'(x) d g'(0) = - .

B Hopfield [28] showed that there exists an energy functional (Lyapunov functional)

for these equations in the case where T is symmetric:

where G = g - l and

ui = g(LuJ

represents the firing rate of neuron i. In fact,

_ - dE dt _ - l c g ' ( l u i ) t i i c T i j g ( 1 u j ) -

i ( i aui + Ci -

(2.4)

since g' > 0, so that energy decreases except at equilibria. If T is not symmetric, then the E above is not a Lyapunov functional for the system.

Convergent behaviour appears to be slightly robust in regard to asymmetry [6] but in general non-convergent behaviour is to be expected (see e.g. [37, 38,42,43]).

We can re-express equation (2.1) in terms of firing rates, u i , as follows:

1 U. - - G ( u ~ ) , ' - A

from (2.4), so

Approximation of Neural Network Dynamics 655

This is entirely equivalent to (2.1) for initial conditions ui(0) E range(g). It is easy to show that solutions to equations (2.1) and therefore to equations (2.5), are bounded and so exist globally in time and are unique

In the literature g is often taken to be an odd function taking values in ( - 1, l), such as tanh. In particular, such a function has g(0) = 0. Horizontal shifts in the response function may be accounted for by a threshold term. A more realistic sigmoid might, however, take values in (0, 1). For example, a logistic function is often used:

However, by a change of co-ordinates, the resulting Hopfield-type equation can be transformed into the original one with an additional threshold term. Equation (2.1) can still be transformed to equation (2.5) and then we let w = 2u - I so that w = 2d. Then

1 wi = ~ [ A F 2 Tij (7) wi + 1 - EG rG) + I(ci - ~ 6 i ) G(V)

Now let F(w) = 2 G ( v ) , so that F'(w) = G ' ( 9 ) . Then

which is of the same form as before with gain 1, inputs 2ci and thresholds

Thus, horizontal and vertical shifts in the response function do not significantly alter the model, except in changing the threshold values. If g is taken to be an odd function with range ( - 1, l), then the Hopfield equations without inputs or thresholds (1.1) have the steady-state solution ui = 0. In biological nets, where firing rates should be strictly positive, this may not make sense, but there is no reason not to create artificial nets with this property if it is desired.

(2aei - cj ~ ~ ~ 1 .

Learning in a conventional Hopfield net is accomplished by setting S

T . . = 1 ~~'")&"' EJ I '

m = 1

for i # j , and Tii = 0, where drn) represents the mth pattern to be learned with uIm) = f 1, say. If s is not too large and the patterns are approximately orthogonal then the patterns will be close (in phase space) to fixed points of the dynamics [28]. This 'learning rule', loosely referred to as Hebbian learning, is not the only one which has been applied to the Hopfield network equations. Some others are described by Denker [l5] and Michel and Farrell[32] but learning rules do not concern us further here.

3. Approximation of an integral operator by a differential operator

The theorem of this section is a generalization of that presented in Degond and Mas-Gallic [14] (see their Remark 2, p. 491) and uses similar notation.

656 R. Edwards

We let a = (a1, a2, . . . a,) denote a multi-index and ei denote the standard basis vector in R" with 1 in the ith component. For x E R", we will be concerned with a function q(x) with the following properties (moment conditions):

and r

(3.la)

(3.lb)

for some integer r > 1. For example, if q is even (q(x) = I?( - x)), compactly supported and q E L*(R"), then the moment conditions are satisfied with r = 2. For > 0, we define the cut-off function

(3.1~)

We consider a continuous space analogue to the connection matrix of the Hopfield net of the following form. For x, y E R", define

We define the integral operator I & , which corresponds to the summed inputs from other neurons, by

We will also use the shorthand notation (following Degond and Mas-Gallic):

( Y - X)aV&(X - Y)dY,

Lemma 1. Let x,y E R". Suppose q satisjies conditions (3.la) and (3.lb). Then

T o if a = 0, Zi(x) = c2z i if a = 2ei, r' 0 i f a # 2 e i , l < l a l d r + l

and when IaI = r + 2,

Iz;(x)l < Z(X) < E ' + ~ K , . + ~ ,

where 2: and 2, are dejined by (3.4).

(3.3)

(3.4)

Approximation of Neural Network Dynamics 657

Proof. Let u = ( y - x)/a. Then E" du = dy and

Zi(x) = lRn(mY[k r ] ( - u ) ] (d'du) = & I a 1 ( - 1)lai u"q(u)du 1. and the result follows directly from (3.la) and (3.lb).

We now state and prove the theorem generalizing that of Degond and Mas-Gallic. The integral operator Z E given by (3.3) will be approximated by the differential operator D2 defined by

(3.5)

where z = diag(z,, z2, ... , zn), q from equation (3.1).

Theorem 1. Let x,y E R". Suppose that T E is of theform giuen by (3.2) and ZEf(x) by (3.3) with

p E L"(R:, W r ' 2 y R ; ) )

and also that r] satisjes conditions (3.la) and (3,lb). Suppose also that D z is as giuen in (3.5). Then there exists a constant C > 0, depending on K r + z ( from (3.lb)) and )I p 11, the norm of p in the above space, such that

(I Dzf- Z'fl10,m G CE' I l f l l r + 2 . m

for a n y f e Wr+2*m([Wn).

Proof. In (3.3), we consider x to be fixed for the time being and expandf(y) as a Taylor polynomial about f(x) with integral remainder:

r + l 1

lal=o @,

f ( Y ) = c -i (Y - xYa"f(4 + R A x , Y),

where

This gives the following representation for Z'I(x):

I"f(x) = Gf(x) + R?(x),

where

Now we approximate I + f (x) by expressing T E in terms of p and'r] and expanding p(x, y) about (x, x) for a particular as follows:

658 R. Edwards

where

By Lemma 1, the only non-zero terms in the sum in Z+(x) are the term with tl = 0 = 0 and those with tl + = 2ei, i = 1, ... ,n. Thus,

where

with

R'(x) = R",~(X) + R;(x).

Note that the expressions in (3.11a) can be rewritten together as

12,o(x) + IZ,l(X) + 12,2(x) = $diV,(TV,CP(X, Y)f(Y)l)l(x,x),

Z"(x) = DZf(X) + R"x).

so by (3.5) and (3.12),

(3.11)

(3.11a)

(3.12)

(3.1 2a)

Thus, the theorem is proved by obtaining a bound on Re. To obtain this bound first note that the condition on p in the statement of the

theorem says that there is a single (essential) bound on the magnitude of ,u and its first

Approximation of Neural Network Dynamics 659

(r + 2) derivatives in y everywhere. We will denote this bound, which is a norm on its space, by I/ p 11, i.e.

/ I p / I = ess SUP( I / P II W r + 2 Y R ; ) ) , X

Now we estimate each component of the error. From (3.6) and (3.8), and using (3.2), (3.4) and the hypotheses on p a n d 5

1

IR?(x)I G(Y+2)IfIr+2,cc 1 ~ [ ~ ~ I T ' ( x , y ) l l ( y - ~ x ) ' l S 0 I(1 -O)'+'ldOdy, Id( = r + 2

where I * I r + Z , m denotes the usual seminorm in W r + 2 , m . But JAl(l - O ) I r + l Id0 = 1/(r + 2), so

IR?(x)l G I f I r+2 ,cc 1 /Te (~ ,y ) l I (y -x ) " ldy la l=r+2 a* R"

G ~ r ~ r + ~ , n K r + ~ l l f I I r + l , m l l PII* where C r + 2 , n = C ~ ~ ~ o C I B I = r + 2 - l a l $ ~ . Thus, (3.12a) gives

IR'(x)I G Er(Sr+2.n + C r + 2 , n ) ~ r + ~ l l f l l r + ~ , c c l l P ( I * This is true for all x, so

II D 2 f - I ' fI l0,cc = I1 Re 1l0,cc C E r I I f l I r + 2 , r n ,

for some C > 0, depending on )I p II and Kr+2 = JRnI x l r + ' l q(x)l dx.

Remark 1. If q is even and q E L' then the above theorem is true with r = 2 (as long as p also satisfies the conditions of the theorem).

Remark 2. The above theorem is also true if R" is replaced by a subset R c R", as long as R is also used in the conditions of the theorem (moment conditions, etc.). An even function q will satisfy the moment conditions with r = 2 as long as Q is symmetric about the origin (so that if x E R, so is - x).

660 R. Edwards

Remark 3 . Simpler forms of T& lead to simpler forms of the differential operator D 2 .

(i) If zi = t, a constant, then

(ii) If in addition p = 1 then

(3.13)

4. Quadrature for the integral

We have shown that the differential operator D2 is approximated by the integral operator Z'. We now wish to approximate Ze by a sum over points on a grid with grid point spacing h. This can be d w e as in the literature on particle methods. Particle methods allow us to obtain approximations of arbitrarily high order, given enough smoothness in q and p. (Of course, high-order approximations may involve very large constants).

Let x i = ih, i E Z" and similarly for xj. Our discrete analogue of the integral operator will be

where

Lemma 2. Zf 4 E Wm.' (R") and p E Wm*" (R"x R") for m > n, then there exists a constant C , depending on m, n, II p I l m . m and I1 q I l m , l (but not h or E ) such that

for all f E W"," (R"). Proof: Define a cell, Bj, around each grid point by

Bj = { y E R", (jk - $)h < yk < ( j , + $)h, 1 < k < n}, where the subscript k here refers to a component of an n-vector. Thus, the grid point xj = j h is the centre of Bj. Let

This is the particle approximation. Now let r

Approximation of Neural Network Dynamics 66 1

We let g = f T & ( x ; ) so that our quadrature error is P

= E j ( f T ' ( X , * ) ) , j o P

where, ( . , } refers to the usual inner product in L2. Raviart's [36] Theorem 3.1 with p = 1 says that there exists a C > 0 such that

as long as m > n, and g E Wm31(R"). With g(y) = f ( y ) T E ( x , y), this gives a bound on the quadrature error of

Now, we take L" norms on both sides of this inequality to obtain

662 R. Edwards

The quadrature error is a function of x, so the maximum error at a grid point, xi, is bounded by the above:

Note that although T' andfare only defined almost everywhere, p E Wm*m(R" x R"), q E Wm*l (Rn) andfE W'",m(Rn) guarantee thatfT" is in Wm9l (Rn) and since by the Sobolev embedding theorems [l] W"91(sZ)c;Co(Q, we may assume that we have a representative of the equivalence class off T' that is continuous on bounded sets. This establishes the result.

Remark 1. The result is also true if we restrict the spatial domain to R c R".

5. Generalization of Cottet's result

We now use the above results to generalize the theorem of Cottet [12]. This amounts to combining the two approximations above to obtain an integral operator close to the differential operator and then a sum close to the integral. These approxi- mations are in terms of the solution to the PDE, so we must also have an a priori bound on the solution in terms of the initial data. These are then used to show that solutions to an equation involving the sum (the Hopfield network) are approximated by solutions to a partial differential equation involving the differential operator D 2 .

Consider the Hopfield network equations expressed in terms of firing rates as in (2.5) but without threshold or input terms for the time being, i.e.

We will suppose that g is a strictly increasing odd function taking values in ( - 1, 1) and with a maximum slope at 0 as in conditions (2.2). Thus, G = 9 - l is well-defined and strictly increasing on ( - 1, 1). Also, G'(u) 2 G'(0) = /I for all Y, since G' has a minimum at 0 (see Fig. 1).

Consider also the partial differential equation,

Approximation of Neural Network Dynamics 663

-1.0 -0.5 0.0 0.5 1 .o

V

Fig. 1. Plot of aG(u) against yu, showing the intersections at _+ us. Here a = 1, y = 2, G(u) = tanh-' ( u ) and /3 = 1.

where D,u is as in (3.5) above, G is as described above for equation (5.1) and y = ,l(f$'. Here, E and h are the parameters involved in the two approximations, E for replacing the differential operator by the integral and h for quadrature of the integral. Existence and uniqueness are also not difficult to establish for this equation under the condi- tions

q,u(x, x) > 0, Qx, V i , (5.3)

uo(x) E ( - 171). (5.4)

In fact, there is a maximum principle guaranteeing boundedness and global existence of solutions as long as initially u E ( - 1,l) . We can also recast the equation in terms of u, the membrane potential, giving another quasi-linear parabolic equation, and then global existence for this equation guarantees that u = g ( h ) remains in ( - 1, 1). Proofs of global existence and uniqueness for such quasi-linear equations can be found, for example, in [30].

664 R. Edwards

However, we also need control over derivatives of solutions to equation (5.2) since the approximations in Theorem 1 and Lemma 2 involve Sobolev norms of the solution. It is implied in Cottet's Theorem 1 that there exist bounds, independent of E and y, on derivatives of the solution to his equation, in terms of similarly bounded initial data and a fixed time interval [lZ]. It appears that such a condition in unlikely to hold globally (see discussion in section 6) but we may expect it to hold on some regions.

For the time being we will assume for the sake of simplicity that we have an a priori bound:

/ I u / Im,m 6 CII 00 Ilrn,m,

for x E [w", uo E W",", where C is independent of E and y . Now we give our main result.

Theorem 2. Let z, zo and T E ( x , y ) be defined as above in (3.la), (3.2) and Theorem 1, and let p and q satisfy the conditions of that theorem with q E W'"*'([w") and p E Wm9'"(Rn x R") for m > n, m 3 r + 2 (rfrom the moment conditions on q, (3.1)). Let ui(t) be the solution to (5.1) with initial conditions

ui(0) = q 0 , i E Z".

Let w ( x , t ) be the solution to (5.2) with initial conditions

w(x, 0) = uo(x),

such that uo E Wrn,m(R") and

uo(xi) = u0,i

(i.e. initial conditions coincide). Assume also that conditions (5.3)-(5.5) hold. Then, for t E [0, M ] these solutions satisfy

for some constant C > 0, depending on the parameters a, 1, p, and T i j as well as

/ I P- /Irn,m, I / v 1lm.1, K r + 2 , II uo I lm,m, MS

Proof. By Theorem 1,

where I R'(x) I < CE' / I u l l r n , m . Note that Wrn,m(Rn x R") c Lm(R:, Wm*m(R:)) so that any p satisfying the conditions of this theorem must also satisfy those of Theorem 1, as long as m > r + 2, which we have assumed. Then, by Lemma 2,

Approximation of Neural Network Dynamics 665

where I R ( x ) 1 < C(E‘ + tion (5 .5 ) , 1 R(x) 1 6 C(E“ + (hm/~mt2)) . In other words,

/ I u( t ) I l m , m at any time t E [0, M I , and so by condi-

G ’ ( w ~ ) w , ( x ~ , t ) = Tijwj - ~ G ( w i ) + yc2R(xi) j

at grid points. So the solution wi of the PDE at grid points and the solution ui of the system of ODES at grid points satisfy

G’(~i)iri - G’(wi)Wi = A x T i j ( ~ j - wj) + a ( G ( ~ i ) - G ( w ~ ) ) - YE’R. j

Now integrate and take absolute values, noting that the initial conditions for u and w are equal.

rt r

Taking the supremum over all i and writing

we have

11 ~ ( u ( t ) ) - ~ ( w ( t ) ) 11 a G s’ CAT II u(s) - w(s) II m + a II ~ ( u ( s ) ) - ~ ( w ( s ) ) 11 m 0

+ yc2 II R(s) II a] ds.

Note that the supremum above exists since p E Wmvw(Rn) and q E Wm,l(Rn), and since Wm,l(Q) 4 Co(sZ) for bounded Q by a Sobolev embedding theorem we may take q to be continuous.

Now observe that I ui - wi I < i1 G(ui) - G(wi) I where p = G(0) since by the mean value theorem:

Thus,

666 R. Edwards

and

Now apply Gronwall’s inequality to get

where C depends on M and the constants a, 1, B, and T as well as (through C 1 and C , )

Remark 1. q,p,uo are fixed functions, so making E -+ 0 and hrn/(Ern’2) -0 (which implies y -+ 00) gives convergence of the approximation. This amounts to letting the size of the ‘visible window’ shrink to zero while at the same time letting the number of neurons in the ‘visible window’ increase to infinity.

Remark 2. If p = 1, then

/ I p lIrn,m, II v llrn,l, K r + 2 and I1 00 Ilrn,m.

T O

E2 D2u = div(zV,) + - u

and if in addition, fi = z*, a constant, then

z* 70

2 E 2 D ~ u = - A U +-u

and

1 u - T [ y ~ 2 - A ~ + ( ~ ~ o ~ - a G ( ~ ) ) 1 z* , - G (u ) 2

which is the equation of Cottet [12].

(5.7)

Approximation of Neural Network Dynamics 667

Remark 3. The initial value problem for equation (5.2) is well-posed if zip(x, x) > 0,. for each i and x (recall (3.5)). Note that it is not well-posed if z ip (x , x) < 0 for any i. For example, in equation (5.7) if z* < 0, we have backwards diffusion.

Remark 4. Thresholds and external inputs in the Hopfield model may be incorpor- ated easily into the above analysis with no significant change in the proofs. If ci is the external input and Bi is the firing threshold of the ith neuron, then the Hopfield network may be formulated as in (2.5) and the PDE becomes

Remark 5 . A more biologically realistic sigmoid with range (0, 1) and maximum slope not at zero can be handled by recasting the model in terms of a sigmoid with range ( - 1 , l ) and maximum slope at zero as described in section 2. Thus, we can still carry out the approximation of this theorem with an extra threshold term.

Remark 6. R" may be replaced by R throughout if appropriate boundary conditions are imposed. Then if 1 1 u IJm,m 6 C 11 uo I lm,m only on a bounded set, the result still holds there.

6. Convergence of the approximation

Theorem 2 applies only where we have a bound on the solution independent of E and y (condition 5.5). Under what circumstances can we expect this condition to hold? We attempt to answer this question by looking at a simple one-dimensional example.

One standard way to obtain the desired bounds is by energy estimates. We show what happens when we attempt to apply this technique to a simple form of equation (5.2). The basic idea is that, if the small E is neglected, derivatives of the solution to the resulting equation can grow exponentially. For a fixed time interval, they could be bounded except that the factor y occurs in the growth rate. There are regions, however, where the growth is stopped or reversed.

We take x E R c R', an interval, and D2 given by (3.13) with z* = 2, z0 = 1, and for simplicity, assume periodic boundary conditions. Thus,

0, = E2ya(u)Uxx + yb(u)

d x , 0) = V O W E [ - u s , us19

(6.1) with

where us is the positive solution to yu = uG(u), up < y, and a(u) = l /G(u) > 0 and b(u) = [ y u - aG(u)] /yG(u) . (Note that if up 2 y, then there is no us > 0 (see Fig. l), so all solutions collapse to 0. Under these circumstances, condition (5.5) holds globally, but for the neural network application, we are only interested in the other case).

First recall from section 5 that u itself is already bounded for uo(x) E ( - 1,l). Also, u E [ - us, us] if uo E [ - us, us] by the maximum principle for this equation. Differenti- ating (6.1) by x gives

wr = &2rCa(~)wxlx + yb'(v)w, (6.2)

668 R. Edwards

where w = u,. We can approximate the L2 norm of w as follows:

s I s s v

d d - 1 1 w 11: = - (w, W) = 2 ( ~ , w,) = 2 w[E~~[u(u)w,], + yb’(~)w]dx dt dt

= - 2c2y u ( u ) w : ~ ~ + 27 b’(u)W2dx

< - 2~’y a(u)w:dx + 2ymaxIb’(u)III w 11:.

Let B = max, I b’(u) 1, which exists for u E [ - us, us], and is independent of y for y large, since b’(u) is even and decreasing on (0, u s ] (see Fig. 3) and

l a 1 0 < b’(0) = - - - < - ; P Y P

l a a 0 > b‘(u,)=--- > -- + O as y - + o o .

G ( u s ) Y Y Now applying variation of constants to the above, we obtain

so that

Furthermore,

so

Letting z = w, and differentiating (6.2) again, we can use (6.3) and (6.4) to estimate 11 z 11: in a similar way to show that

The same procedure works for still higher derivatives. These estimates give us an upper bound on L2 norms of u,, u,,, etc. for t E [O, M] in

terms of initial data, but this bound depends also on y. For convergence of the approximation we require y -, 00, so this technique does not give us an appropriate upper bound on derivatives of the solution everywhere. If a derivative is unbounded in L2 on a compact interval, then of course it is also unbounded in L“.

However, looking again at (6.2) and neglecting the small E term, we see that the growth rate depends on both y and on b’(u). The latter quantity has a maximum of - at u = 0 and becomes negative for 1 u I near us (see Figs. 2 and 3). Thus, for larger

Approximation of Neural Network Dynamics

-1.0 -0.5 0.0 0.5 1 .o

W

Fig. 2. Plot of b(u), with a = 1, y = 2, G(u) = tanh-' (u) and /3 = 1. Note that b(u) = 0 at us.

669

-1 .o -0.5 0.0 0.5 1 .o

Fig. 3. Plot of b'(u), with a = 1, y = 2, G(u) = tanh-' (u) and /3 = 1. Note that b(o) < 0 at f us

values of u, w actually decays. Consider, for example, an initial condition, uo(x), that is an odd function and monotone increasing on R. Then u(0, t ) = 0 for all t and

670 R. Edwards

so w(0) grows exponentially with rate depending on y. However, for x away from 0, I u I also increases since

u, = Yb(4

and b(u) has the sign of u for u E ( - us, us). Then as I u I exceeds the value where b'(u) = 0, w, becomes negative and u flattens out.

The above analysis suggests that in the region of transition layers we cannot expect to get a bound on u, or higher derivatives independent of y. However, away from transition layers, u flattens out and approaches f us, so derivatives of u will be bounded.

Another approach is to attempt to obtain a time-independent bound. In the transition layers, the reaction term in (6.1) drives the magnitude of u and its derivatives up, but eventually, the small diffusion term dampens this growth. So there must be an upper bound, even in the transition layers, which holds for all time. However, such bounds must depend on E. For example, let us look at equation (6.1) in the particular form

u, = (1 - ~ ~ ) [ ~ ~ y u , , + yu - tanh-'(u)].

u, = ~ ~ y u , , + yo - tanh- ' (u ) ,

(6.5)

This is similar to

(6.6)

which is an example of what is variously known as the Allen-Cahn equation or the Ginzburg-Landau equation. This type of equation has been studied by Carr and Peg0 [S-lo], as well as Reyna and Ward [39] and Ward [45], for example, and it is clear that (6.5) and (6.6) have the same energy functional and the same equilibria. It is well-known that equations of this type develop transition layers between positive and negative regions which are metastable in general (see section 8). The transition layers are approximated by the equilibrium solution which is monotonically increasing (or decreasing) in x (namely the heteroclinic orbit between the positive and negative solutions to yu = tanh- ' u, - us and us). The norms of derivatives of solutions will be dominated by these transition layers, since elsewhere the solutions become quite flat. Thus, we expect that the norms of the transition layer in the equilibrium state will control the norms of all solutions to (6.5) and (6.6).

Consider this monotonic equilibrium, u(x), solving

~ ~ y u , , + yv - tanh- (u) = 0. (6.7)

Clearly u itself is bounded and 11 u 11 X E R ' S O

= us. Also, u takes on all values in ( - us, us) for

1 ) I u , , I I , = ~ ~ ~ ~ -tanh-'(u)-u " IY

We are interested in what happens as E + 0 and y + 00. It is not difficult to show that

Approximation of Neural Network Dynamics 67 1

and that the rate of growth is O(E-'). In the limit as E .+ 0, this equilibrium approaches the discontinuous function,

- 1 i f x < O

1 i f x > O

Thus, 11 ux 11 This example shows that L" norms of derivatives of solutions cannot in general be

bounded independently of E and t. The possibility remains that bounds exist in Lp for p < 00. Then we might hope for Theorem 2 to hold globally in space if Lm norms were replaced by Lp norms. This appears not to be the case, as we see below.

We examine the L1 norms of derivatives of the solution to (6.6) (the transition layer) which is increasing and centred at 0, on an interval ( - a, a). Note that if an L' norm is not bounded, neither is any LP norm for 1 < p < co. First,

must also grow without bound as E .+ 0.

11 U, I1 1 = U,dx = U(U) - U( - U ) = ~ u ( u ) , j: a

which is constant, independent of E and y. However,

I1 uxx I1 1 = y I Vxx I dx = - 2 uxx dx = 2CUX(O) - v,(a)l. - a

As E 3 0, u,(u) -+ 0 and u,(O) -+ co as shown above. The order of growth of ux(0) is O ( E - ~ ) , shown as follows. In (6.7) let

1

Y f (u) = - (tanh-I u) - u,

so that the equation becomes

E ' U X X ( 4 =f(o(x)).

Let x = ~y and define w ( y ) = u ( E ~ ) = u(x). Then

W A Y ) = E U X b Y ) ,

WYY(Y) = E2YXX(&Y),

and so

WY,(Y) = f ( U ( E Y ) ) =f(w(y)).

This implies that w is independent of E ( y is not critical here).

have Since we know that tl, is positive and takes its supremum at 0, letting x = EY we

for some c =- 0, since w is independent of E.

672 R. Edwards

We have already shown that II u,, I1 = 2[u,(O) - ux(a)] and u,(a) + 0 as E -+ 0, so I / u,, / I = O(i) also. Similarly, denoting the value of x where u,, orf(v(x)) takes its minimum as b,

= 2uxx(a) - 4uxx(b) = 2uxx(a) - 4 II ox, II m .

Since I / u,, I / m = O($), so is II ox,, II 1 .

The approximation error for the operator in Cottet's equation is O(E' + ~-~y-"'/') if the solution is bounded in W"'." by initial conditions. If not and we try to use the above time-independent bounds on the solution, we lose three orders of E by demand- ing that three derivatives of u be in Lm (and in general m orders of E for u E W"'*m). Recall that we need m 2 r + 2 and r 2 1, so even with rn = r + 2, the approximation becomes E-"' O ( E " - ~ + ~-~y-"'") = O ( E - ~ + ~-~-"y-""") and convergence is lost. Even if we could do the approximations in Wm,l , we only save one order of E and convergence is still lost.

Thus, in the transition layers, time-independent bounds depend too strongly on E,

and energy estimates only give us a bound if y t is bounded, whereas for convergence we require E + 0 and y + 03. Nevertheless, away from transition layers, we expect condition (5.5) to hold and therefore Theorem 2 as well. Moreover, numerical work appears to confirm that the approximation does hold reasonably well.

We speculate that the approximation is actually quite good as far as is important for the application to neural networks. The approximation holds in regions where the solution's derivatives do not grow, i.e. away from transition layers. Typically, we know that transition layers form quickly and then move only extremely slightly for a long time. If the transition layers are slightly offset by the approximation (a shift of order E, say), it is not of critical importance for the neural network dynamics; positive and negative regions are still virtually the same. The transition layers which form in the evolution of the reaction-diffusion equation should thus closely model the corresponding neural network dynamics despite the lack of rigorous convergence in the entire domain. A rigorous analysis of this question would require properly accounting for boundary conditions. After a long period of time, when the layers start to collapse, we might no longer expect to have a close approximation, of course.

Since, the transition layers become sharper as E -+ 0 and y + 03, it is possible that the approximation converges in a weaker sense such as, for example,

meas{x: 3, M, I M I < m such that I D*,u(x, t ) I 2 c } -+ 0 as E --f 0.

Finally, if we consider equation (5.2) in its general form, the above conclusions should be substantially the same. The presence of the function p in (5.2) means that the solution will not necessarily become flat away from transitions, but it should still be controlled there by the norm of j i in W"'*m(R" x R").

To summarize, we expect that the reaction-diffusion equation will be good at defining regions of high and low activity in the network and at approximating the activity in these regions, but may not accurately model the structure of the transition layers themselves.

Approximation of Neural Network Dynamics 673

7. Discussion

Approximating a system of many ordinary differential equations (enormous sys- tems for real biological brains) by a single partial differential equation has the advantages of allowing us to study a simpler model (even though we have gone from a finite-dimensional to an infinite-dimensional system) and allowing us to bring a different body of theory to bear on the problem. It is similar in spirit to the approximation of gas dynamics by statistical equations and thermodynamics where it is too difficult to keep track of vast numbers of individual particles. It has been argued (by Skarda and Freeman [41, p. 1901, for instance) that the level of averaged neural activity is more significant for behaviour than the level of the individual neurons. It is possible that there is something crucial about the behaviour of neural nets at the level of individual neurons, at least for some purposes, but this itself may become clear from a study of their collective behaviour with an understanding of the relationship between the collective and individual levels.

What do the restrictions on the function T E in Theorem 2 mean? Also, what exactly does the approximation mean when, as we let E get smaller and y get larger (2 + 0), the parameters of the PDE change and also the connection matrix of the Hopfield net changes? In fact, we cannot decide for a Hopfield net with a specific connection matrix whether it is a good approximation to a PDE. However, Theorem 2 does give a general idea of what the matrices for nets approximating PDEs will be like.

We must have E small, h small and ~"'+'/h"' large. We also want q E L' so we could take, for example,

so that q E , the cut-off function has

SUPP(Vl&) = c - E, E l " . Thus, T E ( x , y ) and

effectively sample p near its 'diagonal', i.e. near (x, x). We can think of this as a 'visible window' of size 2~ in each direction around each neuron such that other neurons within its window are connected to it but neurons outside its window are not. The spacing between the neurons is determined by the step size used in the quadrature, h, so the number of neurons in a visible window will be

In the case of one space dimension (n = l), the connection matrix T will be large ( h small), will have a band about the diagonal (visible window) which is wide (f large) but narrow in relation to all of T ( E small). Also, the entries of the matrix will not fluctuate wildly; i.e. an entry will not differ too radically from neighbouring entries (as a result of the smoothness requirements on q and p). Finally, in order to obtain a well-posed PDE, we require T E to be predominantly positive (p i > 0). Note that specific connections, T i j , may still be negative.

674 R. Edwards

This gives a degree of classification of Hopfield nets, some having PDE-like behaviour, others not. The restriction to banded matrices with wide bands is reason- able from the biological point of view as, typically, neurons will be connected to many other neurons but mainly those in its vicinity.

Note that in the PDE obtained from the network, equation (5.2), the connection matrix has been ‘squeezed’ into the diagonal of p, namely p(x, x) itself and the first and second y-derivatives of p(x, y) evaluated at (x, x), and into the parameters zo and z. In this sense, at least, the PDE-like nets are simpler than general ones. In the case studied by Cottet ( p = 1, ti = t), all the information of the connection matrix is squeezed into the two parameters, zo and ?. In one space dimension, the matrix T has constant diagonals in Cottet’s case (this is the translation invariance).

Another level of classification is obtained by this method. Any Hopfield net whose solutions closely approximate those of a PDE also closely approximate solutions to standard discretizations of the PDE, as long as appropriate discretizations are used. These standard discretizations are, of course, also Hopfield nets of a very simple type, so that all Hopfield nets which have the same PDE approximation form a class with similar behaviour and there is at least one very simple representative of each class.

Learning in the conventional Hopfield net consists in setting or modifying the connection matrix, T . There is still some room for making analogous modifications directly in the PDE formulation, by altering p ( x , x) and the y-derivatives of p at (x, x) and the parameters zi and zo, though the range of possibilities for modification may be smaller. In Cottet’s simpler case only the two parameters zo and t are available for modification.

8. Behaviour of the reaction-diffusion equation

In the case studied by Cottet ( p = I), there is an energy functional corresponding to that for the Hopfield net. Assuming appropriate boundary conditions, we express equation (5.4) as

1 u - - ‘ [ & ’ - A v - f ( v ) t , - G ( u ) 2

wheref(u) = aG(u)/y - zou, and we require to > ct G ( O ) / y = aj?/y. In one space dimen- sion, for example, we have the energy functional

where F(u) =f(u). It is easy to show that

This energy functional can be modified in the obvious way if a threshold function and an external input function are included. The existence of this energy functional is not surprising since Cottet’s connection matrices are symmetric. However, it can be generalized for equation (5.2), still in one spatial dimension, as long as ,uyy(x, x) = pXx(x, x) and py(x, x) = p x ( x , x). Interestingly, since this is a weaker

Approximation of Neural Network Dynamics 675

condition than symmetry (p(x, y) = p ( y , x)), and since also r] need not be even as long as it satisfies the moment conditions, we have an energy functional which applies to a wider class of connection functions T & than the symmetric ones.

Note that if zo < aP/y, the gain in equation (8.1) is too low and all solutions collapse immediately to zero. (This is not true, of course, when threshold and input terms are present). Also, if z ̂ < 0, the diffusion operator is negative and the reaction-diffusion equation is ill-posed, as mentioned in the general case above. In Cottet's case, T is essentially just r ] , so this says that the neural connections should be predominantly excitatory for the reaction-diffusion equation to be well-posed.

Equation (8.1) is similar to the Ginzburg-Landau equation or Allen-Cahn equa- tion [2] (see also [7], which deals with a different discrete form of this equation). The factor y / G ( u ) does not alter the dynamics significantly, as the energy functional is the same (recall that G ( u ) 2 /?, for all u E ( - 1, 1.)). In the one-dimensional case, an equation of this type has been studied by Carr and Peg0 [S-lo], Reyna and Ward [39] and Ward [45]:

0, = &2uxx -f(u), wheref has the same form as above. They show that solutions typically develop transition layers which are metastable, eventually collapsing to a constant state. It is known that no stable patterns exist for this equation with Neumann or periodic boundary conditions [26]. Numerical evidence suggests that Cottet's equation be- haves similarly. The image processing application suggested by Cottet for this type of equation depends on this development of transition layers: diffusion smooths out noise and the reaction term enhances contrast [12, 131.

It is possible that some of the metastable solutions of the reaction-diffusion equation correspond to stable states of corresponding Hopfield nets (recall that the approximation is only good for a fixed time interval). In fact there are PDEs of essentially the same type (therefore having no stable patterns) whose discretizations (by standard finite difference methods) do have stable patterns [16].

The generalized equation (5.2) has apparently not been studied. It should retain features of the specific equation in which p = 1, but the variation in p must have a modulating effect. Analysis and numerical experimentation are required.

9. Conclusions

Reaction-diffusion equations of the type obtained in this paper give us approxima- tions to the behaviour of Hopfield net equations with connection matrices of a certain form. Study of the reaction-diffusion equations can shed some light on the dynamics of such networks. Also, the conditions of the theorem giving the approximation give a rough classification of connection matrices: those which approximate the PDEs and those which do not. Those which are approximations to PDEs are also approxima- tions to the discrete systems obtained by standard discretizations of the PDEs.

The theorems also provide a semi-rigorous analysis of the relationship between continuous and discrete space models and suggest what form of connection matrix is more likely to produce complex behaviour. Specifically, if we want a Hopfield net with more complex behaviour than the PDE-like ones, we should look at connection matrices which either vary wildly in nearby entries or perhaps deviate significantly

676 R. Edwards

from the banded structure. In fact, the ill-posedness of the PDE in the case of negative moments points to interesting and complex behaviour.

Acknowledgement

This work was supported by a Postgraduate Scholarship as well as research grants A-7847 and A-8965, all from the Natural Sciences and Engineering Research Council of Canada. I thank Dr. Reinhard Illner for his helpful comments on several drafts of this paper and for his inspiration during the course of this research.

References

1. 2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15. 16.

17.

18.

19.

20.

21.

22.

Adams, R. A,, Soboleu Spaces, Academic Press, New York, 1975. Allen, S. M. and Cahn, J. W., ‘A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening’, Acta Metallurgica, 27, 1085-1095 (1979). Amari, S.-i., ‘Mathematical theory of self-organization in neural nets’, in: Organization of Neural Networks: Structures and Models (W. Von Seelen, G. Shaw and U. M. Leinhos, eds.), VCH publishers, New York, 1988, pp. 399-413. Amari, S.-i., ‘Dynamical stability of formation of cortical maps’, in: Dynamic Interactions in Neural Networks: Models and Data (M. A. Arbib and S. Amari, eds.), Springer, Berlin, New York, 1989, pp.

Amari, S.-i., Yoshida, K. and Kanatani, K.-i., ‘A mathematical foundation for statistical neuro- dynamics’, SIAM J . Appl. Math., 33, 95-126 (1977). Amit, D. J., Modeling Brain Function: The World of Attractor Neural Networks, Cambridge University Press, Cambridge, 1989. Cahn, J. W., Chow, S.-N. and Van Vleck, E. S., ‘Spatially discrete nonlinear diffusion equations’, Rocky Mountain J . Math., 25, 87-1 18 (1995). Carr, J. and Pego, R. L., ‘Very slow phase separation in one dimension’, in: PDEs and Continuum Models of Phase Transitions, Lecture Notes in Physics 344 (M. Rascle, et al., eds.), Springer, Berlin, New York, 1989, pp. 216-226. Carr, J. and Pego, R. L., ‘Metastable patterns in solutions of ut = e2u,, -f(u): Comm. Pure Appl. Math.,

Carr, J. and Pego, R., ‘Invariant manifolds for metastable patterns in u, = &*U,, -f(u): Proc. Roy. SOC. Edinburgh, Sect. A. 116, 133-160 (1990). Coolen, A. C. C. and Ruijgrok, Th. W., ‘Image evolution in Hopfield networks’, Phys. Reu. A, 38,

Cottet, G.-H., ‘Modeles de reaction-diffusion pour des rbseaux de neurones stochastiques et dbter- ministes’, C . R. Acad. Sci. Paris, 312-1, 217-221 (1991). Cottet, G.-H., ‘Diffusion approximation on neural networks and applications for image processing’, in: Proc. 6th European Con$ on Mathematics in Industry (F. Hodnett, ed.), B. G. Teubner, Stuttgart, 1992. Degond, P. and Mas-Gallic, S., ‘The weighted particle method for convection-diffusion equations. Part 1: The case of an isotropic viscosity’, Math. Comput., 53, 485-507 (1989). Denker, J. S., ‘Neural network models of learning and adaptation’, Phys. D. 22, 216-232 (1986). Edwards, R., A system of nonlinear differential equations arising from the dynamic Ising model, MSc thesis, Heriot-Watt University, Edinburgh, 1990. Ermentrout, G. B., ‘Asymptotic behaviour of stationary homogeneous neuronal nets’, in: Competition and Cooperation in Neural Nets Lecture Notes in Biomathematics, Vol. 45, (S.4. Amari and M. A. Arbib, eds.), Springer, Berlin, New York, 1982.

15-34.

42,523-576 (1989).

4253-4255 (1988).

Ermentrout, G. B- and Cowan, J. D., ‘Temporal oscillations in neuronal nets’, J . Math. Biology, 7 , 265-280 (1979). Ermentrout, 6. B. and Cowan, J. D., ‘Large scale spatially organized activity in neural nets’, SIAM J . App. Math., 38, 1-21 (1980). Ermentrout, G. B. and Cowan, J. D., ‘Secondary bifurcation in neuronal nets’, SIAM J . Appl. Math., 39,

Evans, N. W., Illner, R. and Kwan, H. C., ‘On information-processing abilities of chaotic dynamical systems’, J . Statist. Phys., 66, 549-561 (1992). Fife, P. C., ‘Sigmoidal systems and layer analysis’, in: Competition and Cooperation in Neural Nets Lecture Notes in Biomathematics, Vol. 45, (S.-i. Amari and M. A. Arbib, eds.), Springer, Berlin, New York, 1982.

323-340 (1980).

Approximation of Neural Network Dynamics 677

23. Freeman, W. J., Mass Action in the Nervous System, Academic Press, New York, 1975. 24. Grossberg, S., ‘Nonlinear neural networks: principles, mechanisms, and architectures’, Neural Net-

25. Heiden, U. an der, Analysis of Neural Networks, Lecture Notes in Biomathematics, Vol. 35, Springer,

26. Henry, D., Geometric Theory of Semihear Parabolic Equations, Springer, Berlin, New York, 1982. 27. Hopfield, J. J., ‘Neural networks and physical systems with emergent collective computational abilities’,

28. Hopfield, J. J., ‘Neurons with graded response have collective computational properties like those of

29. Kwan, H. C., Network relaxation as behavioral action, Research in biological and computational

30. Ladyienskaja, 0. A,, Solonnikov, V. A. and Ural’ceva, N. N., Linear and Quasi-linear Equations of

31. Lippman, R. P., ‘An introduction to computing with neural nets’, I E E E ASSP Magazine, 4,4-22 (1987). 32. Michel, A. N. and Farrell, J. A,, ‘Associative memories via artificial neural networks’, I E E E Control

Systems Magazine, 10, 6-17 (1990). 33. Mimura, M., ‘Aggregation and segregation phenomena in reaction-diffusion equations’, in: Competi-

tion and Cooperation in Neural Nets Lecture Notes in Biomathematics, Vol. 45, ( S . 4 Amari and M. A. Arbib, eds.), Springer, Berlin, New York, 1982.

works, 1, 17-61 (1988).

Berlin, New York, 1980.

Proc. Nat . Acad. Sci. U.S.A., 79, 2554-2558 (1982).

two-state neurons’, Proc. Nat . Acad. Sci. U.S.A., 81, 3088-3092 (1984).

vision. University of Toronto Technical Report no. RBCV-TR-88-26, 1988.

Parabolic Type, American Mathematical Society, Providence, RI, 1968.

34. Murray, J. D., Mathematical Biology, Springer, Berlin, New York, 1989. 35. Parisi, G., ‘Asymmetric neural networks and the process of learning’, J . Phys. A, 19, L675-L680 (1986). 36. Raviart, P. A,, ‘An analysis of particle methods’, in: Numerical Methods in Fluid Dynamics (F. Brezzi,

ed.), Springer, Berlin, New York, 1985, pp. 243-324. 37. Renals, S., ‘Chaos in neural networks’, in: Neural Networks: EURASIP Workshop 1990, Sesimbra,

Portugal, February 15-17, 1990, Proc. (L. B. Almeida and C. J. Wellekens, eds.), Springer, Berlin, New York, 1990. pp. 90-99.

38. Renals, S. and Rohwer, R., ‘A study of network dynamics’, J . Statist. Phys., 58, 825-848 (1990). 39. Reyna, L. G. and Ward, M. J., ‘Resolving weak internal layer interactions for the Ginzburg-Landau

40. Ruijgrok, Th. W. and Coolen, A. C. C., ‘Generalized Hopfield Models for neural networks’, Actu

41. Skarda, C. A. and Freeman, W. J., ‘How brains make chaos in order to make sense of the world’,

42. Sompolinsky, H., Crisanti, A. and Sommers, H. J., ‘Chaos in random neural networks’, Phys. Rev. Lett.,

43. Tirozzi, B. and Tsodyks, M., ‘Chaos in highly diluted neural networks’, Europhys. Lett., 14, 727-732

44. Toulouse, G. ‘Perspectives on neural network models and their relevance to neurobiology’, J . Phys. A,

45. Ward, M. J., ‘Metastable patterns, layer collapses, and coarsening for a one-dimensional Ginzburg-

Equation’, Eur. J . Appl. Math., 5, 495-523 (1994).

Physica Polonica B, 21, 379-389 (1990).

Behavioural Brain Sci., 10, 161-195 (1987).

61, 259-262 (1988).

(1991).

22, 1959-1968 (1989).

Landau equation’, Stud. Appl. Math., 91, 51-93 (1994).