A Low-Power Neurosynaptic Implementation of Local Binary … · 2017. 4. 16. · (LBP), a class of...

A Low-Power Neurosynaptic Implementation of Local Binary Patterns for Texture Analysis

Alexander Andreopoulost, Rodrigo Alvarez-Icaza+, Andrew S. Cassidy+, Myron D. Flicknert tIBM Research - Almaden

650 Harry Road San Jose, CA 95120-6099

Abstract-We demonstrate how to map Local Binary Patterns (LBP), a class of leading feature extractors, onto a neuromorphic processor such as TrueNorth, a silicon expression of a non-von Neumann, low-power, spiking-based, brain-inspired processor. The application is presented in the form of a texture feature extractor that can process 8-bit grayscale video at 30fps. While consuming less than 140mW of power, this neuromorphic implementation provides a rotation and contrast insensitive characterization of texture, with similar accuracy as a standard von Neumann implementation of the same algorithm. The successful mapping of an important vision routine on a neuromorphic architecture is indicative of an alternative paradigm for addressing the von Neumann bottleneck, which is currently placing severe constraints on the processing speed, power consumption, reliability, scalability, programmability and mobility of vision algorithms. This also introduces a new methodology for the design of vision algorithms for power efficient, asynchronous, mobilitytargeted applications.

I. INTRODUCTION

For more than fifty years, computer vision research was focused on the development of algorithms for von Neumann architectures, where progress is characterized by simultaneous increases in power densities and clock frequencies [1]. This has provided a sufficiently powerful hardware substrate that enabled significant progress for highly constrained vision problems where speed was of the utmost importance [2]. The brain, on the other hand, is characterized by low spiking frequencies and low power densities, and is capable of reasoning about extremely complex cognitive vision tasks whose solution eludes current state-of-the-art computer vision algorithms. At rv 50 X 109 neurons, rv 1014 synapses, an energy efficiency of rv lOfJ per synaptic event, and a power consumption of rv20W [3], the primate's visual system surpasses in robustness and complexity the capabilities of even the most advanced artificial vision system ever created.

Texture is often described as a property of "stuff", as compared to a property of "things" in an image [4], [5]. This is analogous to the difference between count nouns in linguistics, which are countable (e.g., "one balloon", "two pencils"), and mass nouns which are uncountable (e.g., "water", "cement", "gravel"). In other words, textures characterize regions which are not countable by numbers but by local statistics. By analyzing textured regions one can extract significant cues as to the identity of an object. Texture cues have been used in numerous applications such as industrial surface inspection,

+IBM Research - Austin 11501 Burnet Road Austin, TX 78758

remote sensing, bio-medical image analysis, eye localization, iris recognition, fingerprint recognition, palm-print recognition, gait recognition, facial age classification, content based image retrieval, action recognition, texture synthesis, texture segmentation, and robot navigation [6], [7], [8], [9], [10], [II], [12], [13]. However, the computational requirements for texture measures tend to be quite high [14], impeding their adoption in low-power high-mobility applications.

Within this context, we demonstrate how to map Local Binary Patterns (LBP) [15], a class of leading feature extractors, onto TrueNorth, a silicon expression of a non-von Neumann, low-power, spiking-based, brain-inspired kernel of computation [1], [3]. This work introduces a number of algorithms for programming neurosynaptic/spiking architectures (not necessarily TrueNorth-based) and demonstrates how the composition of such algorithms can lead to the development of novel classes of power efficient, asynchronous, mobilitytargeted applications.

The paper is organized as follows: In Section II we describe the TrueNorth hardware and software substrates. Section III describes the neuromorphic algorithm. In Section IV we present texture classification accuracy and power consumption results. Section V concludes the paper.

II. SUB STRATES

We first provide an overview of the TrueNorth architecture and then introduce the software environment used for application development. This material is a prerequisite for describing the implementation of the texture model.

A. The Neuron Model

TrueNorth is a low-power, brain-inspired, digital chip architecture [1], [3], [16] with one million spiking neurons and 256 million synapses organized in 4,096 neurosynaptic cores. TrueNorth is implemented in a 28nm silicon process and has rv5.4 billion transistors (cf. Fig. 1). The cores are interconnected by a two-dimensional on-chip mesh network. Further, multiple TrueNorth chips can be seamlessly interconnected by grid tiling.

Each core contains a 256 x 256 binary crossbar interconnected by programmable binary synapses Wi,j, and consists of 256 input axons i E {I, ... , 256} and 256 output neurons j E {I, ... , 256} (cf. Fig.l). To implement weighted synapses,

978-1-5090-0620-5/16/$3l.00 ©2016 IEEE 4308

Fig . I. A TrueNorth chip consists of 4,096 interconnected cores (right) . Each core (top-left) receives input spikes through 256 input axons, and 256 neuro ns generate output spikes. I nput axo ns are connected to output neurons through a 256 x 256 crossbar. Cores are co nnected via a 2D mesh network-o n-chip (bottom left) using asynchronous routers.

each axon i has a type index Gi E {a, b , c, d} and each neuron j assigns a 9-bit signed weight, Syi, to axon type Gi. This

results in an effective weight of Wi,j x Syi from axon i to neuron j.

Neurons generate spikes which are sent to axon inputs via the on-chip/off-chip interconnection network, thus enabling the exchange of information. A spike encodes a value of 1 and is effectively a packet with a target delivery time. The value of 0 is communicated implicitly through the absence of a spike. An axon that receives a spike, transfers it to each neuron it is connected to via the binary synaptic crossbar. Spikes can represent values using the rate, time, and/or place at which spikes are created - see the appendix for an overview. A global clock with a nominal Ims tick drives the core's operation, during which all spikes are delivered to their destinations.

The neuron equation described in detail in [ 17], defines the computation performed by neuron j at tick t. It executes five operations in sequence, which correspond to an extension of the leaky integrate-and-fire neuron model.

(i) Synaptic integration: the membrane potential, or neuron state Vj(t), is the sum of its value Vj(t - 1) at the previous tick and the weighted sum of input spikes Ai (t) E {O, I}. The input spikes arrive at the neuron from up to 256 input axons i, using the neuron's weight Syi associated with each axon's type Gi:

256

Vj(t) = Vj(t - 1) + L Ai(t) x Wi,j x Syi ( 1) i=1

(ii) Leak integration: The neuron membrane potential Vj(t) is incremented (or decremented) by a signed leak Aj. This leak acts as a constant bias on the neuron dynamics.

(iii) Threshold evaluation: Vj(t) is compared with a threshold O:j :::': 0 after synaptic and leak integration.

(iv) Spike firing: If Vj(t) :::': O:j, the neuron "fires" or injects a spike into the network, bound for its destination axon. If Vj(t) does not reach its threshold, the neuron will not spike, or equivalently outputs a value of O.

(v) Reset: If a spike is fired, the neuron either resets Vj(t) to a configurable reset value or decrements it by O:j.

The neuron also supports different leakage modes, stochastic modes for synapses, leak, thresholds, and more, thus further expanding its basic operation. This neuron model can implement a wide range of arithmetic, logical, and stochastic operations, and can emulate the twenty lzhikevich models of biological neurons [ 17]. A neuron's computational behavior is configured by setting the 23 neuron parameters (synaptic weights, leak, thresholds, stochastic operation, etc.). In this paper, we use several neuron configurations/types to implement the needed functionality.

A neuron outputs up to one spike per tick. Each neuron has a tick frequency of 1kHz and can therefore output between 0 and 1 ,000 spikes per second. These spikes are sent to an axon on the same core or on a different core. An axon may receive its input from more than one neuron, which is referred to as a bus-OR. If more than one spike arrives at an axon in the same tick, these spikes are merged into a single spike (logical OR operation). Furthermore, each neuron has an associated delay value between 1 and 15, which is the number of ticks from the time a spike is generated by the neuron until the time the spike is consumed by the target axon. At time of consumption, and based on the programmed crossbar connectivity, the axon distributes the spike to up to 256 neurons on the core.

B. The Software Model

This section briefly overviews the corelet programming paradigm [ 1 8] used to build the TrueNorth application. This paradigm supports the different style programmers have to learn to program brain-inspired, neurosynaptic architectures and empower their migration away from implementing sequential algorithms on von Neumann architectures.

The core let programming paradigm offers:

(i) An abstraction for a TrueNorth program, named a corelet, which is an object-oriented paradigm for creating and hierarchically composing networks [ 1 8],

(ii) a library of reusable corelets for composing larger, more complex functional networks,

(iii) an end-to-end corelet programming environment (CPE) which integrates seamlessly with the TrueNorth simulator called Compass. Compass is a highly scalable, parallel, spike-for-spike equivalent simulator of the TrueNorth functional blueprint, which runs on Linux, OS X, and BlueGene supercomputers. It has been tested with networks of over 2 billion neurosynaptic cores, 500 billion neurons and over 1014 synapses [19].

A TrueNorth program is a specification of a network of neurosynaptic cores, and the network's input axons and output neurons. A corelet is an abstraction for representing a

2016 International Joint Conference on Neural Networks (/JCNN) 4309

parameterized TrueNorth program. A corelet exposes only the network's external inputs and outputs while masking all other details of the neurosynaptic cores, their connectivity and configuration. The internal network connectivity of these external inputs and outputs is hidden from the corelet user through the use of lookup tables, which are referred to as input connectors and output connectors respectively. By specifying values for the corelet parameters, a TrueNorth program instantiation is generated which specifies the TrueNorth processor's behavior.

The texture feature extractor introduced in this paper is a corelet which consists of multiple sub-corelets. Some of the more prominent sub-core lets are: ordered set indicator, thresholded indicator, cyclical derivative, if-else with time-divisiondemultiplexing, cumulative distribution estimator, normalized histogram extraction corelets and various spike coding conversion routines.

III. ALGORITHM DES CRIPTION

We first provide an architecture-independent description of the LBP algorithm and then describe the neurosynaptic implementation of the algorithm.

A. LBP formulation

The local binary pattern corelet takes as input a grayscale patch of image pixels, and produces as output a normalized histogram providing a discriminative characterization of the textured region. This histogram can then be used with a classifier, such as a nearest neighbor classifier, to determine the class label of the input patch.

More formally, an LBP is characterized by an ordered set (L1' ... , Ln) of tuples, where Li = (Pi, Ri), i = 1, ... , n, and Pi, � E Z+. Each Li defines an ordered set of Pi coordinates (-Risin(27rk/Pi),Ricos(27rk/Pi)), where k =

1, ... , Pi. For each Li a normalized histogram is extracted from the image patch by extracting m pixels and sampling the circular neighborhood around each of these pixels, as specified by Li. For each pixel j = 1, ... , m we extract LBPj - ",Pi ( )2k-1 h i - .Gk=l S gk,i,j - gi,j w ere

s(x) = {I, 0,

if x;::: 0 if x < 0

(2)

gi,j denotes the intensity of the lh pixel and gk,i,j denotes the pixel intensity corresponding to the kth element in the uniformly-spaced circle centered on the lh sample - the nearest pixel. Then the LBP feature is defined as

if U(LBPl) � 2 otherwise

(3)

where U(LBPl) � 2 is true for so-called uniform patterns

[15], which correspond to microstructures in the image with limited complexity (i.e., at most two 0/ 1 transitions can exist in the binary representation of LBPf). Intuitively, uniform patterns correspond to microstructures such as bright spots

--j --j (LBPi = 0), flat areas or dark spots (LBPi = Pi) and

edge-like structures ( 1 � LBP� � Pi - 1). The normalized

coreIe (Fig. 41

if U52,LBP/ else,l+Pi

F(b,i,j) = H[b - 1 - wI"] 8F(b,i,}) =

feb . . ) 8b ,1, J

Lf(b,i,j) Hist(b,i) <X; �

L.J f(b,i, j) b,

Normalized Histogram

corelet (Fig. 51

Fig. 2. The LBP coreiet, its input and output, and its sub-corelets. H[·] is the Heaviside function. and Hist(b, i) is the value in bin b E {l, ... , Pi + 2} of the normalized histogram corresponding to Li. The composition of the ordered set indicator, uniform texture detector, if-else, and normalized histogram coreiets shown above, produces the LB P corelet.

--1 --m histogram extracted from the m samples LBPi , ... ,LBPi is the texture feature corresponding to Li. By concatenating the n histograms corresponding to L1, ... , Ln we obtain the texture signature in the form of a histogram with 2n + 2:�1 Pi bins, which is illumination and rotation insensitive for small neighborhoods. The use of multiple histograms with varying radii Ri also gives a degree of resolution invariance. This texture signature can be used, for example, as a set of features in a standard classifier.

Fig. 2 gives a conceptual overview of the constituent corelets needed to achieve the desired functionality. The Ordered Set Indicator corelet is responsible for producing a spike-based representation of LBPt (see Sec. III-B). The Uniform Texture Detector corelet is responsible for producing a spike-based representation of the function UO in Eg. 3 (see Sec. III-C). The If-Else corelet is responsible for merging the results of these two first corelets and producing a spike-based

representation of LBP� in Eg. 3 (see Sec. III-D). Finally, for each i, a Normalized Histogram corelet is responsible for

producing the histogram representation of LBP� across all samples j (Sec. III-E). This consists of converting a ratecoded input into a thermometer code (see the Appendix), applying a derivative-like operation on the thermometer code to produce a single-spike denoting the 0/1 transition point in

4310 2016 International Joint Conference on Neural Networks ([JCNN)

/� � a P;;; a aaa a = bits aaaaa a=-1-4 aa aaa a p;;;. a lala a

Ioa aa a - a a = ............ a _ c c c c c c c c � �pg-I C c c k I [j]'" C C C !\. C C C

b k b I b Iilii

C C b I (.oft" b b """ b b

10 b b = I I -

bits 5-8

b b b b b b b b b b ;:::; b b b ;:::;:

b r'fI."-t b b ;::;. ddddd� �� d 'fi� � ddd Qj b� 'd'd d d d.g{ :

A B � F ( H

i i i i i i i ddddd � � , .. {d

Decimal: 114 � � : : : : : : : : Bina�: 01110�10 � �� Stream 1: 2 spikes Stream 2: 7 spikes Output spikes{ 0 0 0 1 1 1 0 0

Fig. 3. The subset of the Ordered Set Indicator corelet, outputting LBPl for (Pi, Ri) = (8,1) (eight po ints at radius one), where index j maps to a unique 3 x 3 neighborhood in the source frame. Spikes enter via the axons on the left of the crossbar and spikes exit via the neurons at the bottom of the crossbar. Black dots on the crossbar denote binary synapses set to I. Lower-cas e letters E {a, b, c, d} denote axon types. The bottom left inset shows a decimal value 114, its binary string representation 01110010, and the number of spikes representing bits 1-4 (0010) and bits 5-8 (0111) using two rate-coded streams .

the thermometer code, followed by an accumulator core1et that sums all these spikes and produces a normalized histogram representation of the input spikes.

B. Ordered Set Indicator

Each 8-bit gray-scale frame is sent to the Ordered Set

Indicator core1et (cf. Figs. 2, 3) which is responsible for producing a binary string representation of each LBP!, for i = 1, .. . , nand j = 1, ... , m. As shown in Fig. 3, during the transduction process, each frame is converted to two streams encoding the first and last four bits of each 8-bit pixel value, by using a rate-code in a 15 tick window. This is a form of dynamic range extension via spatial mosaics encoded using ISms windows [20], partially enabling the processing, via spikes, of 8-bit video in real-time.

Neurons A-H in Fig. 3, which correspond to the case where Li = (8, 1), share four axon types a, b, c, d with signed weights of 1, 16, - 1, - 16 respectively. The centroid of each 3 x 3 patch of pixels is input via axons of type c, d which result in an identical membrane potential decrease in the range [-255,0] for all eight neurons A-H. Similarly each neuron also takes as input a value in the range [0,255] corresponding to the nearest pixel in the circular neighborhood corresponding to Li, by setting the correct pair of axon type a, b binary synapses to a value of l. In the case (Pi, Ri) = (8,1) shown in

Fig. 3, the eight nearest pixels to the central pixel are identical to its eight neighboring pixels. Thus, each neuron effectively computes Eq. 2, where a neuron spikes if and only if gk,i,j -gi,j ?: 0

In order to synchronize the firing timings of all the neurons in the core1et, we use control axons for probing and resetting the membrane potential of each neuron. Initially, each neuron has a membrane potential of 0 and a threshold value of a =

16 x np = 256 (Sec. II-A) where np is the number of probing axons used. This guarantees that the neuron can only fire a spike when the probing spikes enter the corresponding probing axons, by increasing the potential of all neurons by 256. At the next tick, spikes enter the reset axons, which decrease the membrane potential by a sufficient amount to hit the neuron's negative potential threshold [ 17] , and force a hard potential reset to zero. This removes any memory from the neurons and enables them to accept the next input frame. The control spikes are created by neurons that are set to spike periodically with the desired period and phase. The full set of neuron parameters used is provided in the appendix.

C. Uniform Texture Detector

The Uniform Texture Detector corelet (cf. Figs. 2, 4) takes as input the populationlbinary-code output of LBP! computed by the ordered set indicator core1et, and outputs boolean values representing whether U(LBP!) :s; 2. While U(·) could have been represented as a lookup table, this would have been too expensive neuron-wise for large Pi, Ri, and therefore we calculate it online. This highlights one of the critical points with neuromorphic programming, since solutions that are good for a von Neumann architecture where random-access memory is plentiful, might not necessarily be efficient on a neurosynaptic architecture, and vice-versa. In neurons A, C, E, G the neuron weights of axon types a, bare -1, 1 respectively, while for neurons B, D, F, H axon types a, b have neuron weights of 1, - 1 respectively. All neurons have a spiking threshold of a = 1 and reset to 0 upon spiking. As a result a neuron spikes when its corresponding input axons have a 0 -t 1 transition in a certain direction (top to bottom in Fig. 4). Another set of 8 neurons with identical crossbar architecture is also used (not shown in Fig. 4) where the weight values are reversed, and as a result each neuron only fires on 1 -t 0 transitions along the same direction. In unison we see that the sum of the spikes produced equals the number of 0 +--+ 1 transitions in the pixel space - i.e., the value of U(LBP!). By assigning delays of 1 - 8 in the two sets of neurons and merging them using a bus-OR, we obtain a rate coded representation of U(LBP!). Notice, in Fig. 4, that the two neurons with the same delay never spike together as they only respond to opposite polarities in their input. Notice also that no control axons are needed.

The two sets of 8 neurons are merged via a bus-OR at the axon of type a in neurons I, J. These neurons output a single spike whenever their rate coded input is at least 3 or at most 2 respectively. The operation of neuron I is similar to the ordered set indicator neurons with the exception that there is

2016 International Joint Conference on Neural Networks (!JCNN) 4311

B C D E G H I J 0 --0 --0 --1 --1 --1 --0 --0 --

r- A �� :�� 'b ;;;;; �� = :� ;;;;;.

=- /\

r---- al-C,.-. Qj .g - bl-C,.. .... 6:

,..-- �c�:::>4 ...... I � 1- dl-C>+ .... 1111 Qj l

=1 A \ .... N M o::r Ln ID r-. CO I

Qj » > � > �� ia � "t:I ... I

1� ��"']J : c ,!:! �cc c c c c cii:l:c] I

..g ,; I I I I I I I I • .... .... I '" 110 "'_.x._'t:_Y __ '!_!' __ 'L Y_ J f3.:§ I

3£ I � III r-------------------------J I � - Ir----------------------------

,....."... ___ ----,l: K l M N 0 P Q R 5 T �OO II

LBP/ = 1 0 L _ :�-a ;::: .).IQj. ....... 1 0 0 -

=

)0 . )� ;::;;. Else (evenl ' Else (odd '

Gating l(even -Gating 2(odd)'

Gating 3(even) ' Gating 4(odd)'

.) a-"""

.� a .) a � .) a ')d ')d -""" Reset (even).

Reset (odd)· ·�a �

) a--"'" r-----

:=� = .) c �

.) c r:;..

r---

: i--I 1--I H -I I I I I III I I I I / \ \/ \/ \/ \/ \/ \ \/ j \

______ J ! I ! ! ! I I I I lilL I I I I I L__________ I I I cc::C=C I L ____________ J I I I ________________ .J I I I I --------------------- I L ______________________ �

Qj Qj "t:I "t:I > > 0 0 GJ C1I --:;:-�� QJ - en .!! - w w

Fig . 4. T he Uniform Texture Detector corelet co ntains Cyclical Derivative neurons A-H and Indicator neurons I, J. T he ff-Else With Time Division

Demultiplexing corelet co ntains neuro ns K-T for even and odd numbered frames (demultiplexing) . S ee Sees. lII-C,ID-D for the data flow descriptio n.

no negative input and the neuron's a value is 3 more than the probing weight so that a spike is produced only if at least 3 spikes enter. Neuron J also makes use of a threshold initializer control axon d whose weight equals the desired threshold of 2 and sets the neuron potential to 2 at the start of each frame. An a = 127 is used and the other axons a, b, c have weights of -1, 127, -255, which enable neuron J to fire if and only if at most 2 spikes enter the neuron, which corresponds to the case U(LBP!) ::; 2. These boolean values are used by the If-Else corelet described next.

D. If-Else With Time Division Demultiplexing

The If-Else With Time Division Demultiplexing corelet (cf. Figs. 2, 4) demonstrates how to implement a branching mechanism on networks of neurosynaptic cores. The corelet takes as input the boolean values produced by neurons I, J, a ratecoded version of �[�1 S(gk,i,j - gi,j) from the ordered set indicator output, as well as default else values of Pi + 1,

and produces via neurons Q-T a burst-coded representation

of LBP{. When the if conditional is true (i.e., when neuron J spikes), the value of �[�1 S(gk,i,j - gi, j) is routed to the output via an if neuron (one of Q, S), otherwise neuron I spikes and Pi + 1 is routed via an else neuron (one of R, T).

For larger Pi values (e.g., 16 and 24), the output is demultiplexed to achieve a 30fps output (33 ticks per frame). This is due to the Pi + 1 tick window needed for �[�1 S(gk, i,j - gi,j) and the default else value to enter the corelet, the Pi + 1 tick window to output LB� and the 1 tick reset. Neurons Q, R

output the result for even-numbered frames, and neurons S, T

for odd-numbered frames. The default else neuron values are input via two axons - EIse(even), EIse(odd) - each of which receives Pi + 1 spikes every second frame.

Neurons K-P are AND gates. Each takes single-tick binary input from two axons, and outputs a single spike for that tick if and only if it received a spike input from both axons . These neurons are implemented by setting the synapse weight of axon a to 1 , the spiking threshold a = 1 and adding a negative leak of A = -1, so that the membrane potential is decreased by 1 during each tick. A set lower bound of 0 on the membrane potential guarantees that it never becomes negative. Periodic gating signals (Gating 1-4) are used to specialize each neuron for odd or even frames , ensuring that the boolean values produced by I, J are routed to the correct neuron Q-T.

These latter four neurons are linear neurons, meaning that they spike if the potential exceeds the threshold a = 1, but upon spiking the potential does not reset, but is decreased by a up to a non-negative lower bound. As a result by setting axons a, b, c to synaptic weights of 1, 255, 255 and ensuring that each neuron resets to -255 (axon d), the bus-OR of neurons

--j Q-T will represent LBP i .

E. Normalized Histogram

The Normalized Histogram corelet (cf. Figs. 2, 5)) demonstrates how a fundamental routine in vision - normalized histograms - can be implemented in a neurosynaptic spiking architecture and provide state-of-the-art results in a vision application. The neurons produced by the If-Else core1et are merged by a bus-OR in axon a of neurons A-J. Each of these ten neurons outputs a single spike if the burst-coded input (000000111 in Fig.5) is at least equal to a certain threshold (9,8,7, ... , 0 for neurons A, B, C, ... , J respectively). Their implementation is analogous to neuron I in Fig. 4. This effectively converts the burst-code to a single-tick thermometercode used by neurons K-T. These latter neurons are also analogous to neurons A-H in Fig. 4 in that they respond to 1 -+ 0 transitions, with the exception that neuron T has a single input axon and responds to a single input spike, effectively replicating its input. In Fig. 5 this results in the output of binary string 0001000000.

The length ten binary string produced by neurons K-T

for sample j of Li is used as the sample's vote in the histogram. In other words, each string entry is sent to a neuron similar to U whose job is to accumulate the spikes from up to 256 x 15 samples. As the total number of samples m

4312 2016 International Joint Conference on Neural Networks (/JCNN)

A BC D E FGH I J 0000111- Q HC��t-��-'-i��t-� o

Probe --� b HC��t-� ........ � ..... +-... -.--+ o I I I

II: o II') ::::I co I I I

Reset --� cJ-C:>-... + ... ----4 ........ + ..... � .........

f .... ;.-:;::: .:=;::;::;:-::-:. -_ -:. __ 111r" .. ____________ _

11111 ':::::::::::::::::::::::::::::::' --11111 11r:=================== __ _ 111111111,...-----------------------

I�il� <II 0 u

111111111 1 1111111111 11111111 I 1 -11111111 � 1 r::-111111111 � � IIIIIIIIL� 1 b IIIIIIII-� 1 � p;;;, IIIIIIL_� 1 fb =

K L M N ••• ••• ••• ••• ••• ••• ••• ••• ••• •••

S T

I I

I: � o 0 � u

Qj 4i 0.0.

�.!! - 0. o.E E III � III

�� X OlD "'111 o. N ::::Ie!!

!\,. :u �-o

E :u

111111 c:-IIIIIL_� 0 Q .... L __ � 0 fb � 1111 c:-IIIL __ � 0 � :::: IIL ___ � 0 � :::: IL ___ � 0 � =

_ L ____ � 0 J! _

Bin 1 of IS ample j

'l"1.-------------.J.. 0 0 0 1 ••• 0 0

\ Round I Ii i ,. .� V I I I • I--� I_� b I I I •• !

Up to 2q spikes

I I ----------E I 1.. __________ _ L _____________ _

I , ' Bins 2-10 of sample j

Hist(1, i) Up to 2q spikes

I I , --------------�

Fig . 5. The Normalized Histogram c orelet contains Cumulative Distribution

neurons A-J, Derivative neurons K-T, and Accumulator neurons U, V for the first of ten histogram bins (H ist(l, i»). See Sec . W-E for a description of the data flow.

could potentially be much larger than 256 x 15, the core1et automatically determines how many neurons of type U it needs and distributes an approximately even number of samples across each of these neurons, thus creating c sub-averages. Each axon receives up to 15 bus-OR inputs each from a neuron with a distinct delay 1-15. Thus neuron U effectively accepts up to 256 rate-coded inputs. U is a linear neuron (Sec. III-D) with the synapse weight a and threshold a appropriately set to output up to 2q spikes proportional to its input. These c

sub-averages are then averaged by another linear neuron V that uses weight 1 for axon type a and threshold a = c.

In Fig. 5 this results in "E. ��l Hist(b, i) � 2q where q is the desired number of bits in the histogram bin quantization. The rounding axon types b have a weight of la/2J and add this bias to the membrane potential once per frame, since otherwise U, V would output a result closer to the floore) of the averages. Through multiplexing techniques, as was demonstrated in Fig. 4, we can achieve a 30fps output, despite

Fig. 6. Samples of the 180 x 180 pixel, 16-class test images from the Brodatz dataset (Contri b_TC_OOOOO).

Fig . 7. Samples of the 128 x 128 pixel, 24-cJass test images from the Dutex_TC_OOOlO dataset.

the fact that typically 2q > 33. New coding techniques, such as binary codes, will result in further efficiencies and make the need for multiplexing unnecessary.

IV. EXPERIMENTS

We replicated the experimental method of [ 15], for classifying 16 texture classes from the Brodatz album (cf. Fig. 6) and


TABLE I ACCURACY RESULTS FOR THE 16-CLASS BRODATZ DATASET

(Contrib_TC_OOOOO).

P,R 8,1

16,2

24,3

8,1 + 16,2

8,1+24,3

16,2+24,3

8,1+16,2+24,3

TrueNorth von Neumann 9 bit A B -86.9 87.1 88.2

99.7 99.5 98.5

98.1 97.8 99.1

99.7 99.7 99.0

98.8 98.2 99.6

99.7 99.8 99.0

100 99.8 99.1

TABLE II ACCURACY RESULTS FOR THE 24-CLASS OUTEX DATASET

(Outex_TC_00010).

P,R 8,1

16,2

24,3

8,1 + 16,2

8,1+24,3

16,2+24,3

8,1+16,2+24,3

TrueNorth lO-bit 79.1

88.4

91.9

89.2

94.6

94.8

95.6

von Neumann I A B 80.4 85.1

91.1 88.5

94.1 94.6

90.8 93.1

94.9 96.3

96.1 95.4

96.2 96.1

24 texture classes from the Outex dataset (cf. Fig. 7) providing a benchmark for investigating the features' contrast and rotational invariance. We documented the performance of the neurosynaptic architecture in terms of its power consumption and accuracy vis-a-vis the von Neumann based implementations. As previously reported [IS], in the case of the Brodatz dataset, a variant of the G-statistic [21] was used to assign a test sample S to the model class M that maximized the loglikelihood statistic L(S, M) = L�=l Sb log Mb where Sb, Mb denote the values in bin b of the corresponding normalized histograms, and each model class M consisted of the average LBP histogram of the corresponding class training samples. In the case of the Outex dataset each model M consisted of the LBP histogram of a single training set image, and each test sample was assigned the class of the majority of the three most similar models. The von Neumann implementation's and TrueNorth's outputs were sent to the same Matlab-based loglikelihood classifier.

Table I shows the results on the Brodatz dataset, as reported in [IS], where bilinear interpolation was used for sub-pixel sampling (referred to as von Neumann B) I. In addition, the results of an implementation of the algorithm using nearest-neighbor interpolation for sub-pixel sampling is shown (referred to as von Neumann A) . Finally, the column labelled TrueNorth 9-bit shows the neurosynaptic version of von Neumann A where the TrueNorth architecture was used for the feature extraction (Fig. 2) and the output histograms were quantized to 9 bits (i.e., a total of 512 spikes were dispersed across the histogram representation of each Li in

1 to validate the correctness of our experimental method we also reproduced, within a negligible delta, the corresponding results reported in [ IS]

(j) '"

0.9

� 0.8 13 i'! � 0.7 .$ � 0.6 >.

� � 0.5

�

Accuracy as a Function of Quantization (Contrib_ TC_OOOOO) - - - // r -;:::"=-:'+- -: = '' = '--

I /' X'"

1 ..

/' /.;1 __ --B-------€)-----0

1 1 -B-S,l ,-x- 16,2 - + - 24,3 __ S,1+16,2

-[J- 8,1+24,3 0.4 / - i1 - 16,2+24,3

7 S

--+- S, 1 + 16,2+24,3

Quantization (# bits) 10

Fig. 8. Accuracy as a function of quantization for the Contrib_TC_OOOOO dataset and for all seven combinations of LBP parameters (P, R).

0.9

(j) O.S '" <f> � 0.7 13 i'! 0.6 � .$ 0.5 .,. �O.4 >. " � 0.3 8 <{ 0.2

Accuracy as a Function of Quantization (Outex_ TC_OOOl 0)

/ /

I

- --- .=- . =

-B-S,l -x- 16,2

- + -24,3 __ S,1+16,2 -0- 8,1+24,3

- i1 - 16,2+24,3 --+- S,l + 16,2+24,3

0�--�--�---�--�--�10 5 7 S Quantization (# bits)

Fig. 9. Accuracy as a function of quantization for the Outex_TC_00010 dataset and for all seven combinations of LBP parameters (P, R).

the corresponding LBP feature). Figure 8 shows the accuracy for various quantization levels. Similarly, Table II and Figure 9 show the corresponding results for the Outex dataset. Figure 10 shows a number of active power consumption and core count benchmarks for three radii and six histogram quantization levels per radius, with 50 x 50 pixel frame inputs at 30fps.

In terms of accuracy on the Brodatz dataset, TrueNorth performs as well as the von Neumann implementations of the same algorithm. In the case of the Outex dataset, TrueNorth provides similar performance as von Neumann A, while the use of multiple radii features suffice to achieve performance within 0.5% of von Neumann B. We notice that the use of a single radius value (24,3) is sufficient to provide similar performance as both von-Neumann implementations, at under 140mW, implying that a typical smartphone battery can drive this algorithm for a week. The use of a joint histogram with 2 or 3 radii is sufficient to further increase TrueNorth's performance in the Outex dataset by 2-4%.


5

0

5 � E -=- 20

� DOl 15 >

� 10

o 4

Active Power and Core Count as a Function of Quantization

1_(8,1)1

r=J(16,2) r=J

(24,3)

7 8 Quantization (# bits) 10

4

3

3

2

2

1

1

000

500

000

500 § o U (])

000 8 500

000

500 11

Fig . 10 . Active power (left y-axis) for 18 models over a I s input sequence. Passive power was measured at � 100m W at I .OY. Each model's core count (right y-axis) is represented by a plus (+). The data does not inc lude the cost of converting pixel data to spikes and of interpreting the output spikes.

V. CONCLUSION

The von Neumann bottleneck is currently placing severe constraints on the processing speed, power consumption, reliability, scalability and programmability of the architectures that have driven the evolution of vision algorithms for over 50 years, motivating the search for alternate models of computation. We demonstrated the feasibility of using a network of spiking neurosynaptic cores to create a lowpower texture feature extractor with similar accuracy as a standard von Neumann implementation of the same algorithm. In the process we introduced a number of algorithms for programming neurosynaptic architectures. Within this context, this work contributes to the development of novel classes of power efficient, asynchronous, mobility-targeted applications.

ApPENDIX A

Fig. 1 1 depicts five different time-based coding schemes and three place-based codes. In the right subfigure we show the value 7 encoded by the five different time-based spike coding schemes, using a 1 pin input connector over a 15 tick window. In the left subfigure we show the encoding of value 7 by three place-based codes, using a 10 pin input connector over 1 tick. In rate coding the information is encoded by the number of spikes over the time window. In burst coding the spikes are sequential and begin at the beginning of the window. In reverse burst coding the spikes are again sequential but begin at the end of the window. In time-to-spike coding the spike's time of occurrence within the time window denotes the value. In reverse time-to-spike coding the spike's time of occurrence from the end of the window denotes the value. In a thermometer code all spikes enter the connector at the same tick but spikes enter using sequential pins starting from the first pin. In reverse thermometer the spikes enter using sequential pins starting from the last input connector pin. In binary population coding the spikes all enter at the same tick,

... ... ... ... ro 'C ;;s- ;;s-o ro ro 3 3' <

'C S!: 3 3 'P 'P � 0- � C :J C iiJOJ 0 0 � � ro 0-... � 3 3 0- ro 0· -<

� � 'C '2. c :J � � '" " 0-ro ro

• • p=l • t=15 • • p=2 • t=14

• p=3 • t=13 • • • p=4 • • t=12 • • • p=5 • • t=l1

• • p=6 • t=10 • • • p=7 • • • t=9 • • p=8 • t=8 • • p=9 • • • t=7

• p=10 • t=6 • t=5 • • t=4 • t=3 • t=2 • • t=l

Fig . I I. Three place-based (left) and five time -based (right) spike c oding schemes. Spikes are represented as red dots. See text for details.

and their pin order may or may not be important, depending on the context.

ApPENDIX B

In Table III we show the neuron parameter values defining the dynamic behavior of the neurons according to the neuron equation introduced in [ 17]. For neurons U, V of Figure 5, we show the case where 254 axons of neuron U are used (does not include the rounding and resetting control axons), where 15 source neurons are BUSOR-merged in each of these 254 axons, and q = 5, so that 254 x Ng � 25

= 32. For neuron V we assume each non-control axon i receives up to Xi ::::: 32 spikes from each of c = 20 distinct source neurons, so that L;�l � ::::: 32, and therefore neuron V gives the average response of its input.

REFERENCES

[I ] P. A. Merolla, J. V. Arthur, R. Alvarez-lcaza, A. S . Cassidy, J . Sawada, F. Akopyan , B. L. Jackson , N. Imam, C. Guo, Y. Nakamura, B. Brezzo, 1. Vo, S . K. Esser, R. Appuswamy, B. Taba , A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, "A million spiking-neuron integrated circuit with a scalable communication network and interface ," Science, vol. 345, no. 6 197, pp. 668 -673, 20 14 . I

[2] A. Andreopoulos and J. K. Tsotsos, "50 years of object recognition: D irections forward," Computer Vision and Image Understanding, vol. 1 17, n o. 8 , pp. 827-891 , 20 1 3 . I


TABLE III THE NEURON PARAMETER VALUES OF THE NETWORKS.

Figure # / Corelet Neuron Type j [S; , SJ , Sj , Sf] Ej Figure 3 A-H [ 1 , 16,- 1 , - 16] 0

Ordered Set Indicator

Figure 4 A [- 1 , 1 ,0,0] 0 Cyclical Derivative B [ 1 , - 1 ,0,0] 0

C [- 1 , 1 ,0,0] 0 D [ 1 , - 1 ,0,0] 0 E [- 1 , 1 ,0,0] 0 F [ 1 , - 1 ,0,0] 0 G [- 1 , 1 ,0,0] 0 H [ 1 , - 1 ,0,0] 0

Figure 4 I [ 1 , 1 27,-255,0] 0 Indicator J [- 1 , 1 27,-255,2] 0

Figure 4 K-P [ 1 ,0,0,0] 0 If-Else Q,S [ 1 ,255,255,-255] 0

R,T [ 1 ,255,255,-255] 0

Figure 5 A-J [ 1 , 1 27,-255,0] 0 Cumulative Distribution

Figure 5 K,M,O,Q,S [ 1 , - 1 ,0,0] 0 Derivative L,N,P,R,T [- 1 , 1 ,0,0] 0

Figure 5 U [ 1 ,59,-255,0] 0 Accumulator V [ 1 , 1 0,-255 ,0] 0

[3] A. S. Cassidy, R. Alvarez-Icaza, and et aI. , "Real-time scalable cortical computing at 46 giga-synaptic ops/watt with - 100x speedup in time-tosolution and - 100,000x reduction in energy-to-solution," in International Conference for High Performance Computing, Networking, Storage and Analysis - Supercomputing, 20 14. I

[4] E. H. Adelson and J. R. Bergen, "The plenoptic function and the elements of early vision," in Computational Models of Visual Processing. MIT Press, 199 1 , pp. 3-20. I

[5] M. S. Landy and N. Graham, Visual Perception o.f Texture. MIT Press, 2004. 1

[6] T. Ahonen, A. Hadid, and M. Pietikainen, "Face description with local binary patterns: Application to face recognition," Pallern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 12, pp. 2037-204 1 , 2006 . I

[7] M. Heikkila and M. Pietikainen, "A texture-based method for modeling the background and detecting moving objects;' IEEE transactions on pallern analysis and machine intelligence, vol. 28, no. 4, pp. 657-662, 2006. I

[8] M. Heikkila, M. Pietikiiinen, and C. Schmid, "Description of interest regions with local binary patterns," Pallern recognition, vol. 42, no. 3 , pp. 425-436, 2009 . I

[9] S. Lazebnik, C. Schmid, and J. Ponce, "A sparse texture representation using affine-invariant regions," in Computer Vision and Pallern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2. IEEE, 2003 , pp. ll-3 19 . I

[ 10] J. Malik, S. Belongie, T. Leung, and J. Shi, "Contour and texture analysis for image segmentation," International journal of computer vision, vol. 43, no. I , pp. 7-27, 200 1 . I

[ I I ] X. Tan and B. Triggs, "Enhanced local texture feature sets for face recognition under difficult lighting conditions," Image Processing, IEEE Transactions on, vol. 19, no. 6, pp. 1635-1650, 2010 . I

[ 12] M. Varma and A. Zisserman, "A statistical approach to texture classification from single images," International Journal of Computer Vision, vol. 62, no. 1 -2, pp. 61-8 1 , 2005 . I

[ 1 3] G. Zhao, T. Ahonen, J. Matas, and M. Pietikiiinen, "Rotation-invariant image and video description with local binary pattern features," Image

Aj 0

0 0 0 0 0 0 0 0

0 0

- I 0 0

0

0 0

0 0

Cj C<j (3j Mj Rj "'j 'Yj delay Vj (O) 0 16 x np 255 0 0 0 0 1 0

0 I 0 0 0 I 0 I 0 0 I 0 0 0 1 0 2 0 0 I 0 0 0 I 0 3 0 0 I 0 0 0 1 0 4 0 0 I 0 0 0 I 0 5 0 0 I 0 0 0 1 0 6 0 0 I 0 0 0 I 0 7 0 0 I 0 0 0 1 0 8 0

0 130 0 0 0 0 0 I 0 0 127 127 0 0 0 0 1 2

0 I 0 0 0 0 0 1 0 0 I 255 0 0 I I I -255 0 I 255 0 0 1 I 1 -246

0 136-127 0 0 0 0 0 1 0

0 I 0 0 0 I 0 I 0 0 I 0 0 0 1 0 1 0

0 1 19 0 0 0 1 I 1 59 0 20 0 0 0 I I I 10

Processing, IEEE Transactions on, vol. 2 1 , no. 4, pp. 1465-1477, 20 12 . 1

[ 14] T. Randen and J. Husoy, "Filtering for texture classification: A comparative study," IEEE Transactions on Pallern Analysis and Machine Intelligence, vol. 2 1 , pp. 29 1-3 10, 1999. I

[ 1 5] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns;'

IEEE Transactions on Pattern Analysis and Machine Intelligence,

vol. 24, no. 7, pp. 97 1-987, 2002. 1 , 3 , 6, 7 [ 1 6] A. Andreopoulos, B . Taba, A. Cassidy, R. Alvarez-Icaza, M. D. Flickner,

W. P. Risk, A. Amir, P. A. Merolla, J. Y. Arthur, D. J. Berg, J. A. Kusnitz, P. Datta, S. K. Esser, R. Appuswamy, D. R. Barch, and D. S. Modha, "Visual saliency on networks of neurosynaptic cores;' IBM Journal of Research and Development, 20 15 . 1

[ 17] A. S. Cassidy, P. A. Merolla, J. Y. Arthur, S. K. Esser, B. L. Jackson, R. Alvarez-lcaza, P. Datta, J . Sawada, T. M. Wong, Y. Feldman, A. Amir, D. B .-D. Rubin, F. Akopyan, E. McQuinn, W. P. Risk, and D. S. Modha, "Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores," in IEEE International Joint Conference on Neural Networks, 20 1 3 . 2, 4, 8

[ 1 8] A. Amir, P. Datta, W. P. Risk, A. S. Cassidy, J. A. Kusnitz, S. K. Esser, A. Andreopoulos, T. M. Wong, M. D. Flickner, R. Alvarez-lcaza, E. McQuinn, B. Shaw, N. Pass, and D. S. Modha, "Cognitive computing programming paradigm: A corelet language for composing networks of neurosynaptic cores;' in IEEE International Joint Conference on Neural Networks, 201 3 . 2

[ 1 9] R. Preissl, T. M. Wong, P. Datta, M. D. Flickner, R. Singh, S. K. Esser, W. P. Risk., H. D. Simon, and D. S. Modha, "Compass: A scalable simulator for an architecture for cognitive computing," in International Conference on High Performance Computing, Networking, Storage and Analysis, 20 12. 2

[20] B. Wandell, A. Gammal, and B. Girod, "Common principles of image acquisition systems and biological vision," Proceedings o.f the IEEE, vol. 90, no. 1 , pp. 5-17, 2002. 4

[2 1 ] R. Sokal and F. Rohlf, Biometry: the principles and practice o.f statistics in biological research. W.H. Freeman and Co. , New York, 1969. 7


A Low-Power Neurosynaptic Implementation of Local Binary … · 2017. 4. 16. · (LBP), a class of...

Documents

Transcript of A Low-Power Neurosynaptic Implementation of Local Binary … · 2017. 4. 16. · (LBP), a class of...