Fast backpropagation learning using optimization of learning rate for pulsed neural networks

Fast Backpropagation Learning Using Optimization of Learning Rate for PulsedNeural Networks

KENJI YAMAMOTO, SEIICHI KOAKUTSU, TAKASHI OKAMOTO, and HIRONORI HIRATAChiba University, Japan

SUMMARY

Neural networks are widely applied to informationprocessing because of their nonlinear processing capability.Digital hardware implementation of neural networks seemsto be effective in the construction of neural network systemsin which real-time operation and much broader applicationsare possible. However, the digital hardware implementationof analog neural networks is very difficult because of theneed to satisfy restrictions concerning circuit resourcessuch as circuit scale, arrangement, and wiring. A techniquethat uses a pulsed neuron model instead of an analog neuronmodel as a method for solving this problem has beenproposed, and its effectiveness has been confirmed. Toconstruct pulsed neural networks (PNN), backpropagation(BP) learning has been proposed. However, BP learningtakes considerable time to construct a PNN compared withthe learning of an analog neural network. Therefore, somemethod of speeding up BP learning in PNN is necessary. Inthis paper, we propose a fast BP learning method usingoptimization of the learning rate for a PNN. In the proposedmethod, the learning rate is optimized so as to speed uplearning in every learning cycle. To evaluate the proposedmethod, we apply it to pattern recognition problems, suchas XOR, 3-bit parity, and digit recognition. The results ofthe computer-based experiments demonstrate the validityof the proposed method. © 2011 Wiley Periodicals, Inc.Electron Comm Jpn, 94(7): 27–34, 2011; Published onlinein Wiley Online Library (wileyonlinelibrary.com). DOI10.1002/ecj.10249

Key words: pulsed neural network; backpropaga-tion learning; learning rate.

1. Introduction

Neural networks, an engineering model of the neuro-nal networks in the brains of living beings, are widely used,particularly in recognition, control, and prediction, becauseof their nonlinear processing capabilities [1]. However,most of the forms in which neural networks are imple-mented are based on software running on von Neumanncomputers. Implementing neural networks in hardware isdesirable for the purpose of increasing the range of neuralnetwork applications and improving speed.

Moreover, in recent years, improvements in the hard-ware implementation of digital circuits, including program-mable devices such as field programmable gate arrays(FPGA) and complex programmable logic devices(CPLD), have made it possible to implement circuitsquickly and at low cost. In particular, dynamically recon-figurable hardware, which can change its circuit configura-tion dynamically in parallel with other processes, andevolvable hardware (EHW), which can autonomously ac-quire its own circuit configuration, are attracting attention.As a result of the implementation of neural networks in suchdevices, the range of applications of neural networks canbe expanded, and faster processing as well as rapid responseto the results of learning are possible. When considering thehardware implementation of a neural network in digitalcircuitry, constraints related to the scale of the circuit in thedevice, wiring, and other circuit resources must be satisfied,in contrast to computational processing using software.Moreover, the devices generally used as EHW are smallerthan common VLSI in terms of circuit scale from consid-erations of reproducibility and replacement. As a result, itis not desirable to transfer the computations performed insoftware in unaltered form to a multibit configuration.

A method using a pulsed neural network, with bit-se-rial transmission of 1-bit pulse density signals over onesignal connection between neurons, has been proposed asa method of resolving the above problems [2]. This ap-proach appears to be highly effective when implementing

© 2011 Wiley Periodicals, Inc..

Electronics and Communications in Japan, Vol. 94, No. 7, 2011Translated from Denki Gakkai Ronbunshi, Vol. 128-C, No. 7, July 2008, pp. 1137–1142

27

multiple neurons because the wiring region and circuit scalecan be reduced in hardware implementation. The pulsedneuron model is originally based on a biological prototype.However, in an engineering implementation it can be ex-pected to yield superior neural networks from the stand-point of circuit scale and signal processing.

Previously, Hebbian learning rules [3] suitable for apulsed neuron model and the error backpropagation method[4] specific to pulsed neural networks (PNN) have beenproposed. However, because the input and output values arerepresented as pulse frequencies in the pulsed neuronmodel, the pulse frequency cannot be determined unless acertain number of pulses is received, and the input andoutput values cannot be determined. As a result, more timeis required for error backpropagation learning in a PNNthan for learning in an analog neural network. Increasingthe learning rate and increasing the amount of changeduring a single update to accelerate learning can be consid-ered. However, if the learning rate is increased, then con-vergence fluctuates due to an excessive increase in updatesduring initial learning, and the convergence rate deterio-rates.

Methods of varying the learning rate in order toaccelerate learning in an analog neural network have beenproposed [5, 6]. Thus, in this paper, we propose a back-propagation learning method for a PNN that can accelerateconvergence by optimization during updates to the connec-tion weights and attenuation rate, and that makes variablethe learning rate that previously was a fixed value. As aresult of this method, the average number of learning cyclesrequired can be reduced without a drop in the convergencerate. The validity of the proposed method is demonstratedthrough computer-based experiments.

Below, Section 2 describes the backpropagationlearning method specific to PNNs. Section 3 explains theproposed method. In Section 4, computer-based experi-ments are described, and Section 5 summarizes the paperand identifies topics for the future.

2. Pulsed Neural Networks

2.1 The pulsed neuron model

Figure 1 shows a schematic diagram of the pulsedneuron model. This is a simulation of the electrical activityin a single neuron. The input signal and output signal areboth pulse sequences having a time evolution. The magni-tude of each signal is represented by the pulse frequency.

When a pulse arrives in the pulsed neuron model, thelocal membrane potential pn(t) in that location rises inaccordance with the connection weight wn, then attenuatesto the resting potential with a time constant τ. The internalpotential I(t) in the pulsed neuron model is represented as

the sum of all local resting potentials at that time. Theneuron fires (emits a pulse) when the internal potentialexceeds the threshold Ith. This action is represented by thefollowing four equations:

Here dt represents the minimum unit of the circuit activa-tion time, and a represents the attenuation rate, governedby the attenuation time constant. A Heaviside function isused for the output function H(.).

2.2 Backpropagation learning specific to thepulsed neuron model

A backpropagation learning method [4] specific to aPNN has been proposed as a method suitable for the pulsedneuron model. Because the input and output values arerepresented as pulse frequencies in the pulsed neuronmodel, the frequency of the pulses cannot be determinedunless a certain number of pulses has been received, and theinput and output values cannot be determined. As a result,the input and output pulses in this model represent one setin N clocks, that is, one number. In specific terms, N firingsin N clocks are represented by 1, 0 firings by 0, and q firingsby q/N.

In this learning method, we use an approximation ofthe relationship between the output and the average netvalue in N clocks in each neuron by employing a sigmoidfunction as the output function for use during learning.Equation (5) gives the mean net value, and Eq. (6) gives thesigmoid approximation function used as the output func-tion:

(5)

(1)

(2)

(3)

(4)

Fig. 1. The pulsed neuron model.

28

In addition to x, which represents the real numericalinput and output, we also use x(t), which represents thepulsed input and output at each clock cycle, as a repre-sentation specific to the pulsed neuron. Here t representsthe clock time, and takes an integer value from 1 to N; u isthe net value, and β is the slope coefficient of the sigmoidapproximation function. Equation (7) is the equation forupdating the connection weight in the output layer neurons,and Eq. (8) is the equation for updating the connectionweights in the hidden layer neurons in this learning method:

where α1 represents the learning rate during connectionweight learning; γ is a coefficient that expresses the rela-tionship between the input and the average net value; andP__

k represents the average value of the membrane potential,independent of the connection weight, in N clocks. Theseare given by the equations below:

2.3 Attenuation rate backpropagationlearning method

An attenuation rate backpropagation learningmethod specific to the pulsed neuron model has been pro-posed, in which the attenuation rate is learned by thebackpropagation method, utilizing the property that theoutput function of the pulsed neuron is dependent on theattenuation rate a. Because the output function of the pulsedneuron is dependent on Eq. (6) for the sigmoid approxima-tion function used during learning is extended as follows:

Here g(a) is the slope of the sigmoid approximation func-tion, dependent on a, and β′ is the slope coefficient of thenew sigmoid approximation function. Table 1 shows thevalues of the relationship between the attenuation rate andthe sigmoid approximation function obtained in computerexperiments.

Based on Table 1, the trend function g(a) is approxi-mated as follows:

We have devised an attenuation rate backpropagationlearning method using the extended output function. Inaddition to x, h, and o, representing the real numerical inputand output, x(t), h(t), and o(t) representing the pulsed inputand output in each clock cycle are also used as quantitiesspecific to the pulsed neuron. t is the clock value, an integervalue between 1 and N.

With the learning rate during attenuation rate learningset to α2, Eq. (14) represents the update equation for theattenuation rate in the output layer neurons, and Eq. (15)represents the update equation for the attenuation rate in themiddle layer neurons in this learning method:

(7)

(8)

(6)

(12)

(13)

(9)

(10)

(11)(14)

(15)

Table 1. Relationship between the attenuation rate andthe approximation function

29

3. Learning Rate Optimization

In the backpropagation learning method for PNNgiven in Section 2, the input and output values are repre-sented as pulse frequencies. Unless a certain number ofpulses are received, the pulse frequency cannot be deter-mined and learning cannot be performed. As a result, com-pared to the backpropagation learning method in anordinary analog neural network, the problem arises. How-ever, if the learning rate is increased, then convergencefluctuates due to an excessive increase in the updates duringinitial learning, and the convergence rate deteriorates. Thus,in this paper we propose a backpropagation learningmethod for a PNN that incorporates learning rate optimiza-tion and accelerates convergence by the performance ofoptimization during updating of the connection weights andthe attenuation rate, and uses a variable learning rate ratherthan the previously used fixed rate.

First, let us consider an optimization method for thelearning rate during connection weight learning. The rootmean square error is used in the error evaluation functionunder the backpropagation learning method for a PNN.If the training signal for ok for the k-th output in theoutput layer is tk, then the root mean square error E isgiven by

First, let us consider correction of the connectionweight in the backpropagation learning method. If thelearning rate is α1, then the correction ∆w of the connectionweight w in the backpropagation learning method can befound as follows:

In the proposed method, ∂E/∂w is found first. Thus,∂E/∂w is determined unambiguously and is constant. Con-sequently, the correction ∆w of the connection weight is asingle-variable function of the learning rate α1. Therefore,the root mean square error E is also a single-variablefunction of the learning rate α1. E is a quadratic function,as shown in Eq. (16). Thus, the learning rate α1 that mini-mizes E can be found by satisfying

Below we derive the update equation for the connec-tion weight learning rate α1. During the r-th learning step,if the k-th output after the update is ok

+(r), then the errorek

+(r) after the update is given by

Here ok+(r) is the k-th output after the update; hj

+(r) is the j-thoutput in the hidden layer after the update, and w~(r) is themagnitude of the update for the connection weight. First-order approximation is used:

The change in the net value resulting from the updateto the connection weight is so small that it can be ignored:

Based on the above, the update equation for α1(r) is asfollows:

(28)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(16)

30

The attenuation rate learning rate is also optimizedby a similar method. The update equation for the attenu-ation rate learning rate α2(r) is

4. Computer Experiments

4.1 Experimental results

The error backpropagation learning method specificto a PNN is called “conventional method 1,” the errorbackpropagation learning method using attenuation ratelearning is called “conventional method 2,” the error back-propagation learning method using learning rate optimiza-tion during connection weight learning is called “proposedmethod 1,” and the error backpropagation learning methodusing learning rate optimization during attenuation ratelearning is called “proposed method 2.” We performedcomparative experiments with these methods. In the experi-ments, we compared the cases in which the two conven-tional methods and proposed method 1 alone were used, inwhich conventional method 2 alone was used, and in whichboth proposed methods were applied to three problems: theXOR problem, the 3-bit parity problem, and the digit rec-ognition problem. Table 2 lists the true values for the 3-bitparity problem, and Fig. 2 shows the input for the digitrecognition problem. Table 3 lists the parameter values used

in each experiment. The learning rate and attenuation rategiven in the tables are the values used in the fixed-ratemethod.

Each method was implemented in C++. The com-puter experiments were performed on a personal computer(CPU: Pentium 4, 3.06 GHz; OS: Windows XP; compiler:Microsoft VisualStudio 2005 Ver. 8.0) and evaluated. In theexperiments, we provided the learning set given by thetables for the true values, then performed one cycle oflearning after the presentation of one set was completed. Inthe attenuation rate error backpropagation learning method,after the presentation of one set, learning of the connectionweights was performed, after which attenuation rate learn-ing was performed. In each instance of learning, learningwas judged successful when the root mean square error wasbelow the maximum permissible error ε, and learning wasjudged to have failed when the number of learning cyclesexceeded 5000 without the root mean square falling belowε. A total of 100 trials were performed. The network con-figuration consisted of: 2 input layers, 3 hidden layers, and1 output layer in the XOR problem; 3 input layers, 5 hiddenlayers, and 1 output layer in the 3-bit parity problem; and35 input layers, 5 hidden layers, and 10 output layers in thedigit recognition problem.

Tables 4, 5, and 6 list the results of the experimentson the XOR problem, the 3-bit parity problem, and the digitrecognition problem. Figures 3, 4, and 5 show the variationin the learning rate in one trial for each problem.

(29)

(30)

(31)

Table 2. The 3-bit parity problemTable 3. Parameters

Fig. 2. The digit recognition problem.

31

4.2 Discussion

It is clear from Tables 4, 5, and 6 that using learningrate optimization during connection weight learning in allproblems reduced the average number of learning cyclesand the computation time. In addition, it is clear from Figs.3, 4, and 5 that the learning rate during connection weightlearning initially had a low value, then took higher valuesas the amount of connection weight updating fell with theprogress of training. Thus, error backpropagation learningin a PNN can be accelerated by using learning rate optimi-zation during connection weight learning. The averagenumber of learning cycles was reduced substantially, byapproximately 800, in the digit recognition method, but itwas reduced only by approximately 200 in the 3-bit paritymethod, a relatively small reduction compared to the digitrecognition problem. The reason for the significant differ-ence in the benefits of acceleration of error backpropaga-tion learning depending on the problem appears to be anincrease in the similarity of the input and output patterns,as shown in Table 2 for the 3-bit parity problem, so thatmany local solutions were generated during error back-propagation learning, and optimization became more diffi-cult. Conversely, in the digit recognition problem, the 10outputs were separated in the patterns, so that the similarityis low, and optimization is simpler because there are fewlocal solutions. This appears to be the reason that thereduction in the average number of learning cycles is great-est in this problem.

Furthermore, in only the digit recognition problem,when using learning rate optimization during connectionweight learning, the convergence rate improved, reachingalmost 100%. This appears to be because in cases whereconvergence previously could not be achieved within 5000cycles of learning, it was achieved in the method usinglearning rate optimization due to the acceleration of learn-ing.

On the other hand, even when learning rate optimi-zation was used during attenuation rate learning, the vari-ation in the average number of learning cycles was minimal,as can be seen from the tables. Furthermore, even whenperforming learning rate optimization during both connec-

Table 4. Experimental results (XOR)

Table 5. Experimental results (3-bit parity)

Table 6. Experimental results (digit recognition)

Fig. 4. Variation of learning rate in connection weightlearning (3-bit parity).

Fig. 3. Variation of learning rate in connection weightlearning (XOR).

32

tion weight learning and attenuation rate learning, the re-sults were virtually the same as when performing learningrate optimization alone during connection weight learning.Hence, learning rate optimization during attenuation ratelearning seems to be of little benefit to the acceleration oflearning. The reason appears to be that the major contribu-tion to convergence is made by learning of connectionweights, and learning of the attenuation rate is at the mostsupplementary, so that even when the learning rate of theattenuation rate is optimized, acceleration of learning can-not be achieved.

5. Conclusions

We have devised an error backpropagation learningmethod using optimization of the learning rate during con-nection weight learning and attenuation rate learning for thepurpose of accelerating backpropagation learning in aPNN. We applied our proposed method to the XOR prob-lem, the 3-bit parity problem, and the digit recognitionproblem, and performed computer-based experiments and

a comparison with conventional methods. The resultsshowed that the average number of learning cycles requiredin all of the problems was reduced by optimization of thelearning rate during connection weight learning, indicatingthe validity of the proposed method. On the other hand,optimization of the learning rate during attenuation ratelearning produced little change in the results, confirming alack of benefit.

Future topics include further improvement of themethod of learning rate optimization so that it works effec-tively even in problems with a high degree of similarity inthe input and output patterns, and an analysis of the hard-ware implementation of the proposed method.

REFERENCES

1. Sakawa M, Tanaka Y. Introduction to neurocom-puting. Morikita Publishing; 1997. (in Japanese)

2. Tanaka Y, Kuroyanagi S, Iwata A. A technique forhardware implementation of neural networks usingFPGA. Technical Report of the Neuro-ComputingResearch Group, IEICE, NC 2000-179, p 175–182,2001. (in Japanese)

3. Motoki M, Hamagami T, Koakutsu S, Hirata H. AHebbian learning rule restraining catastrophic forget-ting in pulse neural networks. Trans IEICE2003;123:1124–1133. (in Japanese)

4. Yamane Y, Koakutsu S, Hirata H. Neural networkmodel for evolvable hardware. 16th Electrical andElectronic Systems Division Conference, p 445–448,2004. (in Japanese)

5. Murai H, Omatsu S, Oe S. Improvement of conver-gence speed of back-propagation method by geneticalgorithm and its application to remote sensing analy-sis. Trans IEICE 1997;J80-D2:1311–1313. (in Japa-nese)

6. Yoshikawa T, Kawaguchi Y. A high speed learningmethod for backpropagation rules in neural net-works. Trans IEICE 1992;J75-D2:837–840. (in Japa-nese)

Fig. 5. Variation of learning rate in connection weightlearning (digit recognition).

33

AUTHORS (from left to right)

Kenji Yamamoto (student member) received a bachelor’s degree from the Department of Electronic and MechanicalEngineering of Chiba University in 2007 and began the first part of the doctoral program in artificial and systems engineeringat the Graduate School of Engineering. He is engaged in research on neural networks.

Seiichi Koakutsu (member) completed the doctoral program in manufacturing science at the Graduate School of NaturalScience of Chiba University in 1992 and became a lecturer in the Faculty of Engineering. He was appointed an associateprofessor in 1997, and subsequently an associate professor in the Graduate School of Natural Science. In 2007 he became anassociate professor in the Graduate School of Engineering. In 1994–1995 he was a visiting researcher at the University ofCalifornia, Santa Cruz. He is engaged in research on VLSI layout, stochastic optimization methods, and neural networks. Heholds a D.Eng. degree, and is a member of IEEE, INNS, IEICE, and SICE.

Takashi Okamoto (member) received a bachelor’s degree from the Department of Physics and Informatics of KeioUniversity in 2003 and completed the doctoral program at the Graduate School of Science and Engineering in 2007. He was aJSPS postdoctoral fellow in 2006 (DC2). In 2007 he was appointed a professor in the Graduate School of Engineering of ChibaUniversity. He received a 2006 SICE Academic Scholarship and Research Award. He is engaged in research on optimizationmethods for computational modeling of nonlinear dynamic systems. He holds a D.Eng. degree, and is a member of SICE.

Hironori Hirata (senior member) completed the doctoral program in electrical engineering at Tokyo Institute ofTechnology in 1976 and became a lecturer in the Faculty of Engineering at Chiba University. He became an associate professorin 1981 and a professor in 1994. He became a professor in the Graduate School of Natural Science in 1997, and a professor inthe Graduate School of Engineering in 2007. He is interested in the basic theory of modeling, analysis, and design of large-scalesystems, in particular ecological systems, VLSI layout, and distributed systems. He received an IEEJ Progress Prize in 2001.He holds a D.Eng. degree, and is a member of IEEE (Fellow), INNS, IEICE, IPSJ, SICE, ISCIE, and the Japanese Society forMathematical Biology (JSMB).

34

Fast backpropagation learning using optimization of learning rate for pulsed neural networks

Documents

Transcript of Fast backpropagation learning using optimization of learning rate for pulsed neural networks