An All-mos Analog Feed Forward

8/3/2019 An All-mos Analog Feed Forward

http://slidepdf.com/reader/full/an-all-mos-analog-feed-forward 1/4

AN ALL-MOS ANALOG FEEDFORWARDNEURAL CIRCUIT WITH LEARNING

Fathi M.A. Salam and Myung-Ryul Choi

Systems and Circuits & Artificial Neural NetsLaboratories

Department of Electrical EngineeringMichigan State University

East Lansing, MI 48824

ABSTRACT

We describe an all-MOS circuit realization for a

feedforward artificial neural network. We also introduce an

all-MOS realization of a modified leaming rule. In addition

to analytical verification, the modified learning rule has beenshown, via computer code as well as SPICE simulations, to

successfully store into the network any given analog values

(within the permissible range). An all-MOS architecture for aprototype two-layer artificial neural network has beenspecifically tested via SPICE simulations. The resultsdemonstrate the leaming capability of the all-MOS circuitrealization and establish a VLSI modular architecture forcomposing a larg e-scale neural network system.

I. INTRODUCTION

Implementation in large scale is the vehicle through

which artificial neural networks (ANNs) would reveal their

compu tational powers. It is thus crucial that mutual

accommodation is achieved between the medium of

implementation and the proposed artificial neural network

(ANN) models. Only through such an accommodation wouldthe full advantage of the powers of ANNs be realized.

The main obstacle in the VLSI electronic

implementation of neural models has been the monolithic

implementation of the variable changeable linear resistive

element@). In [I], a solution to this obstacle was proposed by

introducing all-MOS four-quadrant vector multipliers, where

a single all-MOS vector multiplier circuit executes the

product of a vector of [output) signals and their

corresponding weights. In [ 2 ] , all-MOS feedforward and

feeddback ANNs architectures employing the four-quadrant

scalar multipliers have been introduced. SPICE simulationresults of prototype ANN networks have demonstrated the

successful operation of these all-MOS ANN architectures.

In this work, we introduce a modified leaming rule for

feedforward ANNs that requires less computation and that issuitable for analog circuit implementation employing all-MOS four-quarant multiplier modules as building blocks. We

then present SPICE simulation for all-MOS feedforward

prototype network with all-MOS implementation of the

modified leaming rule. We demonstrate the functionality of

this prototype network in learning a given pattem via SPICE

simulation. It is important to point out that the SPICE

simulation converges in the order of nano seconds in learning

each pattem. This order of magnitude represents the timeconstant(s) of the overall dynamics of the circuitimplementation.

In the following sections, we briefly describe the basicmathematical model for feedforward ANNs and the

Acknowledgment: This work is supported in part by ONR Grant

N00014-89-5-1833. the Michigan Research Excellence Fund (REF), andNS F Grant ECS-8814027, .

mathematical model for the so-called delta-rule or error

back-propogation. Then we describe the learning algorithm

based on the theory of continuous-time gradient dynamical

systems. Moreover, we introduce a modified version of thealgorithm w hich reduces computation and zliminates the needfor "computing" some derivatives of the sigmoidal neuron

functions.

We emphasize the differential equations form of the

learning algorithm for basically two reasons: (i) the theory

and construction of gradient dynamical systems in the

differential equations form is well established: this featuresimplifies the construction of mathematical models and itguarantees the global convergence (of all initial conditions) to

only equilibrium points (for bounded below energy functions)

[5,6] , and (ii) this form is naturally suitable for analog circuit(silicon VLSI) implementation which is believed to be a

natural medium for ANN implementation [4].

These two features enhance the potential for large scale

implementation with guaranteed global convergenceproperties to equilibria The (global) convergence speed is

independent of the number of neuron units in the networkand is only governed by the circuit time-constants. That is,

scalability to large networks would not cause any instability

or convergence problems.

11. A MODIFIED LEARNING ALGORITHMFOR FEEDFORWARD ANNS

TI.1 Feed forward Neural Network s

Consider the basic structure for multilayer feedforward

ANNs , see [ 7 ] ,where outputs of one layer are weighted andsummed as an input to a neuron ir, the next layer. Thisprocess begins with the first layer, which is called the inputlayer, and ends with the last layer, which is called the output

layer. Any layer between the input and the output layers isoften referred to as a Hidden layer. This simple regular

repetitive structure constitutes the feedforward ANN. The

input layer is optional and its purpose is to uniformally

squash each component of the real world data via thesigmoid al neuron element. The goveming static equation for

each neuron unit in any layer may be described as

where B s the i-th output of a neuron unit in the previous

layer (for the first layer, it is the i-th output, i.e. the

squashed external input to the network), LVji is the weight (orstrength) of the connection from the i-th output of the

previous layer to the j-th neuron input, g j is a bias at the

input of the j-th neuron, and rJ is the output of the j-th

neuron. S j (.) is a nonlinear (sigmoid) function [ 7 ] .

The Widrow-Hoff rule and its extension, the Delta-rule [ 7 ] ,

recursively update the weights in the network so that a given

CH2868-8/90/0000-2$1.000 1990IEEE

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA. Downloaded on June 8, 2009 at 10:02 from IEEE Xplore. Restrictions apply.



set of desired input-output pairs (or patterns) is realized via

the mapping (2.1). The Delta rule has also been popularized

as the (Error) Back-Propagation learning rule [7].

11.2 Th e Er ro r Back Pro p ag a t io n Rule

The error back-propagation rule attempts to minimizethe squared output error for training sample input-output pairs

(or patterns). For every training sample, it modifies eachweight in proportion to (the negative of) the partial derivative

of the squared error with respect to that weight.

( r p l , . . . , )

define the squared error function as follows [7].

For each desired target p, say =

I "E,, = -C( t .- ,)*.

PJ YF J2 I=1

The error back-propagation or delta rule adjusts weights

in the local direction of greatest error reduction. Le t

(2.3)P

where Ap W j denotes the change in the weight W ji at the k-

th iteration due to the input-output pattern p . The delta ruledefines the change due to applying pattern p by

(2.4)

where ?l0 is the (learning) rate. Observe that q is assumed

to be sufficiently small in order for eqn. (2.4) to be a truly

gradient system. Whe n the gradient in (2.4) is compute d, itresults in the same form for all w eights [7], namely,

where the index p signifies that we are computing the effectof pattern p alone, yp i is the output of the previous layer,

and hPj is a function of the error. Spj is given as follows:

(a ) if unit j is a member of the (last) output layer, then

(2.6i)

(b ) if unit j a member of the (last) output layer, then

(2.6ii)

where k is the index for the units in the next layer to

which unit j is connected.

Using the update eqn (2.5) and the appropriate equation for

tiP, as specified by eqn (2.6), the update law is recursively

computed in discrete time.

Note that the total error function due to all patterns is

E = C E p . (2.7)P

Thus the total update for every weight is the sum of theupdates due to each pattern. That is,

where we used a common (learning) rate.

Remarks:

1. It should be emphasized (as stress in [7], page 680) that

the (learning) rate must be small for convergence and for the

prevention of oscillations. However, the speed of

convergence would suffer considerably as q is taken to be

very small.

2. The discrete form of updating the weights, eqn (2.5), isinfluenced to some extent by the assumed implementation of

the algorithm as a software digital computer program. It

should be noted, however, that such implementation wouldresult in expo nential growth of computations for the weights

of the hidden layers farthest from the output layer. This can

be seen from the repetitive application of the recursiveformula (2,6ii).

3. The computation of the update laws (2.5), as suggested in

171, is sequential. Tha t is, first the computation f or theupdate laws for the weights feeding into the last (output)layer is executed, then the computation for the update laws

for the weights feeding into the layer before last, etc.. This

method of computation violates the gradient descent theoryand consequently the characteristics of the gradient dynamic

system are not necessarily preserved.

4. In [7], the patterns are presented to the networksequentially. That is, one pattern is presented, then the

network executes until learning or updating is completed forthis pattern. Then, another pattern is presented and updatingstarts off from the weights attained previuosly until

convergence of the weights is achieved, and so on. It shouldbe remarked, that this procedure would not necessarily retain

memory of all patterns. Theoretically, the network is

guaranteed to retain only the last pattern for which the

weights have converged to an acceptable local minimium ofEP .

11.3 A Continuous-t ime Grad ient Update Rule

defined as

The correct gradient dynamic update law should be

(2.9i)

where Wji is the time-rate of change of the weight W j i and

where we used equations (2.6) with ep j given by

(a ) if unit j is a member of the (last) output layer, then

e ' = [ rp j - y p j 1

(b) if unit j

(2.9ii)P I

a member of the (last) output layer

ep j = wk j , (2.9iii)k

where k is the index for neuron units in the next layer

to which unit j is connected.

Remarks:

1. The differential equation update laws of (2.9) represent atrue (continuous-time) gradient dynamic system.

Consequently, there exist no oscillations or complicated

dynamics. Equilibria are the only steady states.

2. The (learning) rate q may, in principle, take any arbitrary

positive value without jeopardizing the gradient descentnature. If a digital computer integration routine is used,

however, q should be sufficiently small as compared to, and

is dictated by, the integration step-size.

3. If the update law (2.9) is directly realized via an analogelectronic circuit, then: (i) the (learning) rate can be as large

as it is physically possible, (ii) there is no time-complexity

2509




problem in "computing" the update laws. All of the updatelaws for all the weights will be govemed by their time-

constants (i.e., the product of the equivalent capacitors and

their corresponding equivalent resistors.)

4. For analog implementations, a neuron is often modeled as

an operational amplifier or as a double inverter. An explicitexpression for the op-amp function is not readily available in

closed form. Consequently, the function S j , or more

specifically, dSjlduj,n the update eqn. (2.9) is not available

(or computable) explicitly. This presents a potentially seriousproblem for implementing the differential equation update

law via electronic or silicon analog circuits. One solution,which may require excessive computations, is to approximate

this derivative as

dYj - AYj3=--du j dui Au j '

This entails a digital differentiation which must be executed

with fast enough sampling rate.

11.4 A Modified Differential Equation Update Law

particular view to analog electronic circuit implementation:

We propose the following modified update law, with

(2.10)P

where epj is as given in (2.9ii) and (2.9iii).

R e m a r k

We have also used Runge Kutta Fourth Order integration

routine to simulate (2.9) and (2.10) with very fast

conv ergen ce properties as compared to the discrete update

rule (2.5).

The update law (2.10) may be rewritten in the general

form

as. -l aE,w. i = - q (A).

auj awji

(2.1 1)

Equation (2.1 l) , and consequently (2.10), is a gradient-like

dynamic system, see [8].

III The Feedforward AN N All-MOS Circuit with Learning

A prototype all-MOS continuos-time analog feedforwardANN circuit with learning is designed to d emonstrate theimplementation of the learning capability using the proposed

learning algorithm (2.10). The block diagram of the circuit is

depicted in Fig. 1. The feedforward ANN all-MOS circuit

consists of three layers: the input layer, the hidden layer, and

the output layer. The input layer is simply the voltage nodesof the external inpu t voltages denoted by X1 and X 2 . Th e

hidden layer has two neurons with outputs denoted by y l and

y2 . vvji denotes the interconnection between the i th input of

the input layer and the j th neuron of the hidden layer. The

output layer has one neuron which generates an output

denoted by 7. k, is the interconnection between the jth

neuron of the hidden layer and the k th neuron of the output

layer (here k = 1). During learning, these weights are

adjusted according to the learning rule (2.10) which now

specializes to

(3. l i )

(3.lii)

(3.2i)

(3.2ii)

(3.2iii)

1 1 2 = x 2 ( f - P I W l l (3.2iv)

(3.7v)

(3.2vi)

In eqn (3.1), s j and are the nondecreasing differentiablemonotone sigmoid functions of a neuron for the hidden layer

and the output layer, respectively.

Fig. 1 consists of two components, the feedfonvard

A NN (map) subcircuit and the leaming subcircuits. The

feedforward AN N subcircuit executes the function of eqn

(3.1). It consists of 2-dimensional vector multipliers to

com pute the 2-dimensional vector multiplication of eqn (3. l) ,

doub le inverters as neurons, and voltage followers. The

leaming subcircuits execute the differential equations of eqn

(3.2). They are composed of scalar multipliers to compute all

the scalar multiplication, capacitors for the integration in eqn

(3.2), voltage shifters, and voltage followers. In these

subcircuits, voltage followers are used to increase the fanout

capability of the outputs of multipliers when feeding into the

driving inputs through the drain of MO S transistors of a

another multiplier [13. In addition, we used a voltage follower

to isolate the output of the multiplier from the capacitor

implementing the derivative of each weight according to th e

leam ing update laws (3.2). The voltage shifters are used to

achieve output-input compatability at the cascading junctions

of multipliers while satisfying the operation constraints of the

four-quadrant multipliers [l-31. For example, the output of

the multiplier leading into the hidden layer with output y j

ra ng es f ro m - 2 . 5 ~o 2 . 5 ~ . In order to satisfy the operatingvoltage range of the next multiplier, this value is shifted by

2 . 5 to a range from Ov to 5v, passed to the sigmoidalnonlinearity of the neuron j in the Hidden layer, then it is

connected to the compatible inputs of the multipliers in the

next stage (with range from Ov to 5v). Fig. 2 depicts the

schematic for the all-MOS analog four-quadrant 2-D vector

multiplier which we have used in our implementation. See

[1,2] and the references therein for more information on these

vector m ultiplier.Table 1 gives a set of fixed weights that (via SPICE

simulations) solves the well-known XOR problem for the

feedforward ANN (map) subcircuit. SPIC E simulation has

verified the performance of the feedforward ANN MOS cir-

cuit with the learning rule (see Fig.1). For each pattem, allthe weights of this circuit, namely, ~ 1 1 , 1 2 , 21 , W22, @1i,

and iVI2, are simultaneously initialized.Table 2 summarizes the results of the SPICE

simulations for each given pattem. The leaming process

evolves until, in the order of 10-100 nano seconds, the

weights converge to their steady states. These steady state

values achieve the leaming of the given pattem. Observe that

all variables are given in volts.We remark that the steady state weights in each case do

not solve the XOR problem, however. To guarantee solving

the XOR, we note that all patterns must be simultaneously

presented-- as the theory of continuous-time gradient dynamicsystems dictates. This would require additional learning

subcircuits and is presently being pursued. We finally remark

that the building blocks of this prototype can be composed

into regular structures to build large feedforward ANN

circuits with analog learning capability.

2510




IV. CONCLUSIONS

An analog all-MOS implementation of artificial

feedforward neural network with a modified learning rule is

described. The implementation employs all-MOS analogrour-quadrant multipliers as building blocks. SPICE

simulations have been conducted to demonstrate the

functionality of the MO S circuit with modified continuous-time dynamic learning rule. The SPICE simulations have

shown that learning each given pattern is achieved in the

ordcr of nano seconds. The results quantify the advantagesand demonstrate the feasibility of direct dedicated hardwarecontinuous-time analog VLSI silicon implementation offecdforward ANNs .

X l

x 2

1

weights

~ 1 1~ 1 2

~ 2 1~ 2 2

~ 1 1--W 17

output

Y ,,

-

[ I F. M. A. Salam, N.I. Khachab, M. Ismail, and Y. Wang, " An analogMOS implementation of the synaptic weights for feedback neural nets",Prcx. of IEEE Int. Sym p. Circuits and Systems May 1989, pp. 12 23-

1226.

121 F. M. A. Salam, M. R. Choi, and Y. Wang, "An Analog MOSImplementation of Synaptic Weights for FeedforwadFeedback Neural

Ne&", Proc. of 32nd Midwest Symposium on Circuits and Systems,

Champaign, Illinois, August, 1989.

131 N.I. Khachab and M. Ismail," Novel Continuous-Time All MOS FourQuadrant Multipliers" Roc. of IEEE Int. Symp. Circuits and Systems

1.11 C. Mead, Analog VLSI and Neural Systems, Prentice Hall, 1989.1.5) F. M. A. Salam and Y. Wang, "Some Properties of Dynamic

Feedbxk Neural Nets," in the session on Neural Networks and Control

Systems, the 27th IEEE Conference on Decision and Control, December

1088. pp. 337-342.

101 M . llirsch and S. Smale, Differential Equations, Dynamical Systems,

and Linear Algebra, Academic Press, 1974.

171 I). Rumelhart, G. Hinton, and R. Williams, "Leaming internal

representations by error propagation," in Parallel Distributed Processing:Exploration in the Microstructures of Cognition, Vol. 1, D. R u m h a r t

and J. McClelland (Eds.), Cambridge, MA: MIT Press, 1986, pp. 318-362.

181 F. M. A . Salam, "A Modified Learning Rule for Feedforward

Artificial Neural Nets For Analog Implementation," Memorandu m No.MSU/EE/S 90102, Department of Electrical Engineering, Michigan State

University, East Lansing, MI 48824-1226, 26 January 1990.

May 1987. P.P. 762-765.

0.5 0.5 4.5 4.5

0.5 4. 5 0.5 4.5

1 o 4.5 4.5 1 o

init. S.S. init. S.S. init. S.S . init. S.S.

-0.5 0.053 0.4 0.02 0.4 -0.019 -0.5 -0.0520.6 0.053 -0.4 -0.019 -0.4 0.02 0.6 -0.052

-0.5 0.053 -0.5 0.02 -0.5 -0.019 -0.5 -0.0520.5 0.053 0.5 -0.019 0.5 0.02 0.5 -0.0520.7 0.276 0.7 0.168 0.7 0.168 0.7 0.2760.5 0.276 0.5 0.168 0.5 0.168 0.5 0.276

init. S.S. init. S.S. init. S.S . init. S.S.

0.0 1.81 0.0 5.0 0.0 5.0 0.0 1.81

Fig. 1 The block diagram of the feedforward

AN N all-MOS circuit with learning

Fig. 2 The all-MOS a nalog four-quadrant 2-D vector multiplier

Table 1. SPICE simulation of the feedforward MOS circuit with fixed weights solving

the XOR problem (units are in volts). b 1 1 E 1 2 w 2 1 w 2 2 i F 1 1 W12) = (0.5 -0.5 -0.5 0.5

0.5 0.5)

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA Downloaded on June 8 2009 at 10:02 from IEEE Xplore Restrictions apply

An All-mos Analog Feed Forward

Documents

Transcript of An All-mos Analog Feed Forward