Neural Network with Binary AcFvaFon for Efficient...

1
Neural Network with Binary AcFvaFon for Efficient Neuromorphic CompuFng Weier Wan, Ling Li Weier Wan PhD candidate @ Dept. Electrical Engineering Stanford University Email: [email protected] Contact Artificial neural network has been adopted to achieve state-of-the-art performance across many tasks. However, the deployment of ANN on conventional hardware such as CPU and GPU remains highly inefficient in terms of both power and speed, largely limiting its usage in systems with tight power budget and real-time processing requirement. The main issue is the frequent data movement between memory (where data and parameters are stored) and logic (where computations are done), which is very expensive with the modern semiconductor technology. Therefore, we want a novel hardware architecture that can maximally eliminate data movement. Background Non-volaFle Memory (NVM) Crossbar Array Ling Li PhD @ Dept. Electrical Engineering Stanford University Email: [email protected] Affine Transformation using NVM crossbar: Encode input vector V as voltage pulses applied on each row, and parameter matrix G as conductance of NVM elements. Using Ohm’s law, we can compute the matrix vector multiplication: I = G*V in O(1) time, not depending on the input vector size. Moreover, all computation happens right at where the parameters are stored. No data movement highly energy efficient. 1. Tanh Neuron for Approximating Gradient and Encouraging Binarization 2. Binarizing While Training (in contrast to binarizing while testing) 3. Stochastic Multi-sampling for Real-value Input Normalize real-value input to be within the range [-1 1]. Treat the normalized input as expectation value of Bernoulli random variable x: Sample multiple times from the distribution and sum at output. 4. Bounded Weights (Clip weights at large value) Conductance for real devices are bounded, use two devices per parameter to implement both positive and negative values. 1. Approximated Gradient 2. Stochastic Multi-Sampling 3. Bounded Weights With the four techniques proposed above, we can implement a multi- layer neural network with binary activation with no or little loss in testing performance. Such network can be efficiently deployed onto the proposed NVM-crossbar based hardware architecture. ANN with Binary AcFvaFons Converting summed current into a multi-bit value requires an ADC that consumes large power and area. Therefore, we propose to constrain the inputs and the activations of network to binary value -1 and 1, which can be implemented with a simple integrator and comparator on hardware Problem with using binary activation function: 1. Non-differentiable, gradient = 0 everywhere except at x=0 2. In real world, most inputs take real value Techniques for Binarizing ANN E ( x ) = ϕ (1 ϕ ) ϕ = ( E ( x ) + 1) / 2 Results Tanh w/o binarizaFon Train and Binarize Binarize and Train CIFAR-10 51.6 30.3 49.8 MNIST 97.7 80.7 97.4 CIFAR-10 tanh tanh MNIST Binarize and Train DeterminisFc Binarize input StochasFc MulF- sampling (8) StochasFc MulF- sampling (64) CIFAR-10 49.8 35.6 47.9 49.8 MNIST 97.4 97.0 97.5 97.7 Weights value distribuAon Conclusion Training Progress LeC: structure of NVM crossbar array with red dots represenAng NVM devices Right: Use NVM crossbar to perform affine transformaAon in O(1) Ame Proposed network architecutre

Transcript of Neural Network with Binary AcFvaFon for Efficient...

Page 1: Neural Network with Binary AcFvaFon for Efficient ...cs229.stanford.edu/proj2016/poster/WanLi-NeuralNetwork...To preview the print quality of images, select a magnificaon of 100%

PosterPrintSize:Thispostertemplateis36”highby48”wide.Itcanbeusedtoprintanyposterwitha3:4aspectraAo.

Placeholders:ThevariouselementsincludedinthisposterareonesweoCenseeinmedical,research,andscienAficposters.Feelfreetoedit,move,add,anddeleteitems,orchangethelayouttosuityourneeds.Alwayscheckwithyourconferenceorganizerforspecificrequirements.

ImageQuality:YoucanplacedigitalphotosorlogoartinyourposterfilebyselecAngtheInsert,Picturecommand,orbyusingstandardcopy&paste.Forbestresults,allgraphicelementsshouldbeatleast150-200pixelsperinchintheirfinalprintedsize.Forinstance,a1600x1200pixelphotowillusuallylookfineupto8“-10”wideonyourprintedposter.Topreviewtheprintqualityofimages,selectamagnificaAonof100%whenpreviewingyourposter.Thiswillgiveyouagoodideaofwhatitwilllooklikeinprint.Ifyouarelayingoutalargeposterandusinghalf-scaledimensions,besuretopreviewyourgraphicsat200%toseethemattheirfinalprintedsize.Pleasenotethatgraphicsfromwebsites(suchasthelogoonyourhospital'soruniversity'shomepage)willonlybe72dpiandnotsuitableforprinAng.

[Thissidebarareadoesnotprint.]

ChangeColorTheme:Thistemplateisdesignedtousethebuilt-incolorthemesinthenewerversionsofPowerPoint.

Tochangethecolortheme,selecttheDesigntab,thenselecttheColorsdrop-downlist.

Thedefaultcolorthemeforthistemplateis“Office”,soyoucanalwaysreturntothataCertryingsomeofthealternaAves.

PrinAngYourPoster:Onceyourposterfileisready,visitwww.genigraphics.comtoorderahigh-quality,affordableposterprint.EveryorderreceivesafreedesignreviewandwecandeliverasfastasnextbusinessdaywithintheUSandCanada.Genigraphics®hasbeenproducingoutputfromPowerPoint®longerthananyoneintheindustry;daAngbacktowhenwehelpedMicrosoC®designthePowerPoint®soCware.

USandCanada:1-800-790-4001Email:[email protected]

[Thissidebarareadoesnotprint.]

NeuralNetworkwithBinaryAcFvaFonforEfficientNeuromorphicCompuFngWeierWan,LingLi

WeierWanPhDcandidate@Dept.ElectricalEngineeringStanfordUniversityEmail:[email protected]

Contact

Artificial neural network has been adopted to achieve state-of-the-art performance across many tasks. However, the deployment of ANN on conventional hardware such as CPU and GPU remains highly inefficient in terms of both power and speed, largely limiting its usage in systems with tight power budget and real-time processing requirement. The main issue is the frequent data movement between memory (where data and parameters are stored) and logic (where computations are done), which is very expensive with the modern semiconductor technology. Therefore, we want a novel hardware architecture that can maximally eliminate data movement.

Background

Non-volaFleMemory(NVM)CrossbarArray

[email protected]:[email protected]

Affine Transformation using NVM crossbar: Encode input vector V as voltage pulses applied on each row, and parameter matrix G as conductance of NVM elements. Using Ohm’s law, we can compute the matrix vector multiplication: I = G*V in O(1) time, not depending on the input vector size. Moreover, all computation happens right at where the parameters are stored. No data movement – highly energy efficient.

1. Tanh Neuron for Approximating Gradient and Encouraging Binarization 2. Binarizing While Training (in contrast to binarizing while testing) 3. Stochastic Multi-sampling for Real-value Input Normalize real-value input to be within the range [-1 1]. Treat the normalized input as expectation value of Bernoulli random variable x: Sample multiple times from the distribution and sum at output. 4. Bounded Weights (Clip weights at large value) Conductance for real devices are bounded, use two devices per parameter to implement both positive and negative values.

1. Approximated Gradient 2. Stochastic Multi-Sampling 3. Bounded Weights With the four techniques proposed above, we can implement a multi-layer neural network with binary activation with no or little loss in testing performance. Such network can be efficiently deployed onto the proposed NVM-crossbar based hardware architecture.

ANNwithBinaryAcFvaFonsConverting summed current into a multi-bit value requires an ADC that consumes large power and area. Therefore, we propose to constrain the inputs and the activations of network to binary value -1 and 1, which can be implemented with a simple integrator and comparator on hardware Problem with using binary activation function: 1.  Non-differentiable, gradient = 0 everywhere except at x=0 2.  In real world, most inputs take real value

TechniquesforBinarizingANN

E(x) =ϕ − (1−ϕ )⇒ϕ = (E(x)+1) / 2

Results

Tanhw/obinarizaFon

TrainandBinarize

BinarizeandTrain

CIFAR-10 51.6 30.3 49.8MNIST 97.7 80.7 97.4

CIFAR-10tanh

tanhMNIST

BinarizeandTrain

DeterminisFcBinarizeinput

StochasFcMulF-sampling(8)

StochasFcMulF-sampling(64)

CIFAR-10 49.8 35.6 47.9 49.8MNIST 97.4 97.0 97.5 97.7

WeightsvaluedistribuAon

Conclusion

TrainingProgress

LeC:structureofNVMcrossbararraywithreddotsrepresenAngNVMdevicesRight:UseNVMcrossbartoperformaffinetransformaAoninO(1)Ame

Proposednetworkarchitecutre