Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf ·...
Transcript of Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf ·...
1/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Tarek Taha
Electrical and Computer Engineering Department,
University of Dayton
February 26, 2013
Memory Integrated Neural Network
Accelerators
2/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Objective
BusBus Bus
Bu
s
Bu
s
C C
C C
BusBus
Bu
s
Bu
s
C C
C C
BusBus
Bu
s
Bu
s
Bu
s
Bus
Bu
s
Bus
Bu
s
C C
C C
R
C C
C C
R
C C
C C
R
R
C C
C C
R
C C
C C
R
C C
C C
R
C C
C C
RR
Input data
Design multicore neural network processors
Examine:
• Routing
• Memory options
• Analog and Digital options
3/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Related Work: SpiNNaker
Each chip contains 18 ARM cores and a network router
Geared towards brain scale simulation
Cores share 128MB SDRAM stacked within the package
Multiple chips connected in a 2D torroidal network
4/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Related Work: FACETS
Mixed signal ASIC wafer
Geared towards brain scale simulation
200k neurons and 50 million synapses
Synaptic weight is represented in a 4-bit SRAM with a 4-bit DAC
5/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Related Work: IBM design
Crossbar memory based digital core
Each core has capacity to simulate 256 integrate and fire neuron
1024 synapse per neuron
One bit per synapse
6/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Implementing Synapses
Memory elements to implement synapse:
Memristors can also mimic neural computing circuits
Level 1
Level 2
Level 1
Level 2
Level 1
Level 2
DRAM SRAM Memristor
crossbar
Location Off-chip On-chip On-chip
Density (Gb/cm2) 6.67 0.338 12
Read time ~100 0.3 0.3
7/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Digital Core Design
Multiply-add-
accumulate to
calculate neural
outputs
reg
×
reg
×
reg
×
reg
×
+ + + +
Deco
der
Pre-synaptic
neuron
number i
Wij
Input buffer
Pre-synaptic
neuron inputs
coming in over
routing network
from other cores
Pre-synaptic
neuron value
Control
Unit
(i, xi)
xi
Output buffer
Post-synaptic
neuron outputs to
routing network (j, vj)
SRAM synaptic
memory array
N Axons
M Neurons
8/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Digital Core Configurations
Configurations:
2 bit per synapse, 1 bit per neuron
Multipliers not needed.
4 bits per synapse
Multiplier and adder needed
Analysis
SRAM array power and area calculated from CACTI.
Only SRAM array, predecoder, and sense amplifiers considered.
Static and dynamic powers both considered
9/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Routing Analysis
5
12
6
12
7
12
5
12
6
12
7
12
4 4 4 4
4 4 4 4
5
12
6
12
7
12
5
12
6
12
7
12
4 4 4 4
4 4 4 4
Layer l-1
Layer l
Layer l+1
4
4
4
4
Model developed to determine routing network bandwidth
requirements
Router and link powers calculated from Orion
10/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
New Memristor SPICE Model
A generalized memristor SPICE model has been developed
Capable of modeling several different published memristor devices for a
variety of different voltage inputs
The result of our model has been compared to the published
characterization data using a quantifiable error percentage
Variation tolerance studies have been performed at the device and
circuit level
Layer thickness variation
Variation in state variable dynamics has also been studied
Caused by inconsistent stoichiometric ratio in the metal-oxide layer
Leads to variability in the number oxygen vacancies in each device on
a wafer
11/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Model Equations
State Variable Motion
Non-Linear Drift
Voltage Threshold
Current-Voltage (I-V) Relationship
12/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Modeling – Sinusoidal Inputs
Boise State
Device
Pino Compact Model Univ. of Dayton Model HP Labs Model Biolek Model
0 0.5 1 1.5 2-1
-0.5
0
0.5
1
time (s)
Vo
lta
ge
(V
)
-1 -0.5 0 0.5 1-15
-10
-5
0
5
10
15
Voltage (V)
Cu
rre
nt (
A)
0 0.5 1 1.5 2-20
-10
0
10
20C
urr
en
t (
A)
0 0.05 0.1 0.15 0.2-1
0
1
time (s)
Vo
ltag
e (
V)
-1 -0.5 0 0.5 1-2
-1
0
1
2
3
4
Voltage (V)
Cu
rre
nt (m
A)
0 0.05 0.1 0.15 0.2-5
0
5
Cu
rre
nt (m
A)
-0.4 -0.2 0 0.2 0.4 0.6
-2
0
2
4
Voltage (V)
Cu
rre
nt (m
A)
100 Hz
100 Hz Targets
100 kHz
0 10 20 30 40-3
-2
-1
0
1
2
3
4
time (ms)
Cu
rre
nt (m
A)
0 10 20 30 40-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Vo
lta
ge
(V
)
Current
Voltage
13/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Modeling – Sweeping Inputs
Boise State Device
Univ. of Dayton
Model* HP Labs Model Joglekar Model Biolek Model
*C. Yakopcic T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, “A Memristor Device Model”
IEEE Electron Device Letters, 2011.
C. Yakopcic, T. M. Taha, G. Subramanyam, E. Shin, P. T. Murray, and S. Rogers, “Memristor-Based Pattern
Recognition for Image Processing: an Adaptive Coded Aperture Imaging and Sensing Opportunity,” SPIE
Adaptive Coded Aperture Imaging and Non-Imaging Sensors. (2010).
0 0.05 0.1 0.15 0.2 0.25 0.3
0
0.05
0.1
0.15
0.2
Voltage (V)
Cu
rre
nt (m
A)
0 0.05 0.1 0.15 0.2 0.25 0.3
0
0.2
0.4
0.6
0.8
1
Voltage (V)
Cu
rre
nt (m
A)
0 0.05 0.1 0.15 0.2 0.25 0.3
0
0.2
0.4
0.6
0.8
1
Voltage (V)
Cu
rre
nt (m
A)
0 0.1 0.2 0.30
0.5
1
Voltage (V)
Cu
rre
nt (m
A)
14/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
UD Model: Boise State Device
[1] A. S. Oblea, A. Timilsina, D. Moore, and K. A. Campbell, “Silver Chalcogenide Based Memristor Devices,” International Joint Conference on
Neural Networks (IJCNN), (2010).
-0.4 -0.2 0 0.2 0.4 0.6
-2
0
2
4
Voltage (V)
Cu
rre
nt (m
A)
100 Hz
100 Hz Targets
100 kHz
0 10 20 30 40-3
-2
-1
0
1
2
3
4
time (ms)
Cu
rre
nt (m
A)
0 10 20 30 40-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Vo
lta
ge
(V
)
Current
Voltage
0 0.1 0.2 0.30
0.5
1
Voltage (V)
Cu
rre
nt (m
A)
Model result Data from [1]
Model matches the target
data with 6.64% error [1]
Model matches the target
data with 6.66% error [1]
15/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
UD Model: Univ of MI Device
Model matches the target data with 6.21% error
[1] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale Memristor Device as Synapse in Neuromorphic Systems,” Nano
Letters, 10 (2010).
0 5 10 15-3
-1.5
0
1.5
3
4.5
6
Vo
lta
ge
(V
)
1 2 3 40
0.2
0.4
0.6
0.8
1
Cu
rre
nt (
A)
-2 -1 0-0.4
-0.3
-0.2
-0.1
0
0 5 10 15-0.5
-0.25
0
0.25
0.5
0.75
1
Cu
rre
nt (
A)
Current
Voltage
Voltage (V) time (s)
Our model result
Characterization
data from [1]
16/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
UD Model: HP Labs Device
Model matches the target
data with 11.58% error [1]
Model matches the target
data with 13.6% error (8.72%
when not considering the
largest outlier [2])
[1] G. S. Snider, “Cortical Computing with Memristive Nanodevices,” SciDAC Review, (2008).
[2] J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart and R. S. Williams, “Memristive switching mechanism for metal/oxide/metal
nanodevices,” Nature Nanotechnology, 3, 429–433 (2008).
-1.5 -1 -0.5 0 0.5 1 1.5-40
-20
0
20
40
60
Voltage (V)
Cu
rre
nt (
A)
0 1 2 3 4 5 6 7 8-40
-30
-20
-10
0
10
20
30
40
50
60
time (s)
Cu
rre
nt (
A)
0 1 2 3 4 5 6 7 8-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Vo
lta
ge
(V
)
Current
Voltage
-2 -1 0 1 2-200
-100
0
100
200
300
Voltage (V)
Cu
rre
nt (
A)
0 0.5 1 1.5 2-200
-150
-100
-50
0
50
100
150
200
250
300
time (s)
Cu
rre
nt (
A)
0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Vo
lta
ge
(V
)
Current
Voltage
17/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
UD Model: HP Labs TaOx Device
0 5 10 15 20-7.5
-5
-2.5
0
2.5
5
time (s)
Cu
rre
nt (m
A)
0 5 10 15 20-1.5
-1
-0.5
0
0.5
1
Vo
lta
ge
(V
)
Current
Voltage
-1 -0.5 0 0.5 1-5
0
5
Voltage (V)C
urr
en
t (m
A)
Model matches the target data with 4.60% error
[1] F. Miao, J. P. Strachan, J. J. Yang, M.-X. Zhang, I. Goldfarb, A. C. Torrezan, P. Eschbach, R. D. Kelley, G. Medeiros-Ribeiro, R. S. Williams, „Anatomy of
a Nanoscale Conduction Channel Reveals the Mechanism of a High-Performance Memristor,” Advanced Materials, vol. 23, no. 47, pp. 5633-5640,
Nov. 2011.
18/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
UD Model: Univ. of Iowa Device
0 5 10 15 20 25-80
-60
-40
-20
0
20
40
60
80
time (s)
Cu
rre
nt (m
A)
-1 -0.5 0 0.5 1-80
-60
-40
-20
0
20
40
60
Voltage (V)
Cu
rre
nt (m
A)
0 5 10 15 20 25-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Vo
lta
ge
(V
)
Current
Voltage
Matches physical characterization with 5.97% error
K. Miller, K. S. Nalwa, A. Bergerud, N. M. Neihart, and S. Chaudhary, “Memristive Behavior in Thin Anodic Titania,” IEEE Electron Device Letters 31(7),
(2010).
K. Miller, “Fabrication and modeling of thin-film anodic titania memristors,” Master’s Thesis, Iowa State University, Electrical and Computer
Engineering (VLSI), Ames, Iowa, (2010).
19/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Memristor Modeling
Our model is more robust and accurate than other models
Results are based on a 16 memristor crossbar circuits with a
10 ns switching time
Measurement
Memristor Models Included in the Study
Biolek Joglekar Laiho Chang Our Model
Time Before
Error (µs) 0.517 0.517
15
(Finished:
no error)
0.135
15
(Finished:
no error)
Accuracy Low Low Medium Medium High
Generality High High Medium Medium High
a1
a2
a3
a4. . .
b1 b2 b3 b4. . .
20/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Crossbar Write Technique
Write method used stops state change of unselected devices
Although this does not solve the problem for reads
X X X X
Initial Data From Previous Cycle
Write 1s Write 0s
X 1 X 1Vw/2
Vw/2-Vw/2
Vw/2-Vw/2
Data After Step 1
0 1 0 1-Vw/2
Vw/2-Vw/2
Vw/2-Vw/2
Data After Complete Write Cycle
21/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Unwanted Current Paths
Alternate current paths in the resistor array consume power and
degrade signal.
a1
a2
a3
a4. . .
b1 b2 b3 b4. . .
Correct read between a1 and b1
Incorrect read between a1 and b1 due
to the alternate current path
a1
a2
a3
a4. . .
b1 b2 b3 b4. . .
22/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Tiled Memristor Digital Memory
Inputs
Outputs OutputsMemristor
Tile row 0
access line
Tile row 1
access line
Untiled memory array
Tiled memory array • Full crossbar simulated in SPICE
• Wire resistances simulated
• Untiled memory array write
energy very high
• Read noise margin low
• Designed a tiled memristor
memory array
23/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Memristor Memory Properties
Inputs
Outputs Outputs
Crossbar
size
Training
energy per
synapse
Read
energy per
synapse
4×4 1.90 pJ 1.84 fJ
8×8 5.07 pJ 2.00 fJ
Memristor
• Training energy increases with
crossbar size.
• Novel segmented memristor
crossbar memory.
• 4x4 crossbar: 2 bits per transistor
• 8x8 crossbar: 4 bits per transistor
Tile row 0
access line
Tile row 1
access line
24/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Classifier Simulation
2 layer neural classifier
Based on a 16 memristor crossbar
Iteratively trained through MATLAB and
SPICE 1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
Image Classification Set
Class 1
Class 2
Class 3
Class 4
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
Voltage Input
Write Enable
Read Enable
Voltage Output
I1
I2
I3
I4
O1 O2 O3 O4
Inputs
Outputs
25/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Simulation Setup
Simulation platform
Uses SPICE to produce accurate crossbar circuit simulation
Output voltages and synaptic weights are analyzed in MATLAB
3. Read weights from file
4. Simulate crossbar with
different inputs and record
outputs
5. Export output voltages to
file
1. Generate weights and write
to a SPICE configuration file
2. Call SPICE program with the
new weight configuration file
6. Evaluate output voltages
from export file
7. Revise weights based on
outputs
8. Go to step 2
MATLAB / C SPICE
26/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Image Conversion
Image data is converted to a SPICE waveform
Each row in the image is converted to a voltage of 16 intervals
Entire image is represented by 4 voltage pulses
1 2 3 4
1
2
3
4
1 1 1 1
0 0 0 1
0 0 0 1
0 0 0 0
1V
1/15V
0V
1/15V
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
Class Output
Time (ns)
Voltage (
V)
Class 1
Class 2,3,4
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
Input Data Set
Time (ns)V
oltage (
V)
Neuron 1
Neuron 2
Neuron 3
Neuron 4
27/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Simulation Result (1/2)
Crossbar bar is performing as a linear seperator
Can differentiate between all 4 image classes
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
Class 1 Class 2
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4
Voltag
e (
V)
Output Neuron
Voltag
e (
V)
Output Neuron
28/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Simulation Result (2/2)
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
Class 3 Class 4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
Crossbar bar is performing as a linear seperator
Can differentiate between all 4 image classes
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4
Voltag
e (
V)
Output Neuron
Voltag
e (
V)
Output Neuron
29/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Parallel operation
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
Voltage Input
Write Enable
Read Enable
Voltage Output
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
Voltage Input
Write Enable
Read Enable
Voltage Output
I0
I1
I2
I3
I4
I5
I6
I7
O1 O2 O3 O4
• Read enable signal and diode allow multiple crossbars to be read in parallel
• Multipliers and adders not needed
30/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Analog Memristor Core
Pre-synapticneuron value
Memristor crossbar arraySRAM array
Pre-synaptic Neuron Values (PSNV)
AC
AC
AC
AC
AC AC AC AC
AC
AC
AC
AC
AC AC AC AC
AC
AC
AC
AC
AC AC AC AC
AC
AC
AC
AC
AC AC AC AC
(PSNV)
Outputs Outputs
Pre-synapticneuron number
Output buffer
acc
×
acc
×
acc
×
acc
×
+ + ++
Dec
od
er
• Analog crossbar array replaces digital memory AND multiply-add units
• Full crossbar can be evaluated in parallel
• Most of the core can be shutdown once computations are done
31/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Digital Memristor Core
• Digital crossbar array replaces SRAM memory array
• One row of synapses processed per cycle
• Most of the core can be shutdown once computations are done
Pre-synapticneuron value
Memristor crossbar arraySRAM array
Pre-synapticneuron number
Output buffer
acc
×
acc
×
acc
×
acc
×
+ + ++
Dec
od
er
Inputs
Outputs Outputs
32/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Core Characteristics
Config.
Synaptic
memory
device
Memory
cells per
synapse
Bits per
synapse Adder
Core
area
(mm2)
Energy per
neuron (pJ)
Core
throughput
(neurons/s)
1 Memristor 1 Analog Current add 0.037 23 263.9×106
2 Memristor 2 2 bits 12 bit adder 0.098 240 42.1×106
3 SRAM 2 2 bits 12 bit adder 0.288 377 42.1×106
Config.
Synaptic
memory
device
Memory
cells per
synapse
Bits per
synapse Multiplier Adder
Core
area
(mm2)
Energy per
neuron (pJ)
Core
throughput
(neurons/s)
4 Memristor 1 Analog Memr. resist. Current add 0.058 26 66.3×106
5 Memristor 4 4 bits 4 bit multiplier 18 bit adder 0.179 391 28.6×106
6 SRAM 4 4 bits 4 bit multiplier 18 bit adder 0.513 780 26.6×106
Cores for 1 bit per neuron case (multiplies not needed):
Cores for 4 bits per neuron case:
Cycle Time: 5ns (200MHz clock)
Technology: 40nm
Energy considers leakage, data transfer to/from router, and control circuits.
33/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
General Purpose Systems Comparison
Programmed commercial systems:
Data set was small enough to fit in cache/onboard memory
Simulated neural networks with 1024 synapses per neuron
NVIDIA Tesla M2070 GPGPU
225W max power
448 cores
575 MHz
CUDA program
Intel Xeon X5650 processor
95W max power
Six cores
2.66 GHz
SSE instructions modeled
NVIDIA GPGPU:
Intel Xeon:
34/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Comparison: 1 Bit Neurons
Example 1:
• 25,600 neurons
• 100,000 iterations/s
• 1024 synap./neuron
Example 2:
• 1,706,667 neurons
• 1500 iterations/s
• 1024 synap./neuron
Example 1: 25,600 neurons
100,000 iterations/s
Example 2: 1,706,667 neurons
1500 iterations/s
Configuration
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
Memristor Analog 2 248 0.15% 0.70 24,395 2 248 0.15% 0.70 24,395
Memristor Digital 2 333 0.91% 1.25 13,633 2 333 0.91% 1.25 13,633
SRAM 5 388 0.91% 28.02 607 5 388 0.91% 28.02 607
NVIDIA M2070 12 529 99.2% 2700.00 6 12 529 99.2% 2700.00 6
Intel Xeon X5650 179 240 99.90% 17005.00 1 179 240 99.90% 17005.00 1
Example 1: 25,600 neurons
100,000 iterations/s
Example 2: 1,706,667 neurons
1500 iterations/s
Configuration
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
Memristor Analog 1 3.7 9.7% 0.07 253,489 2 248 0.15% 0.70 24,395
Memristor Digital 1 9.7 60.8% 0.62 27,546 2 333 0.91% 1.25 13,633
SRAM 1 35.2 60.8% 1.13 15,099 5 388 0.91% 28.02 607
NVIDIA M2070 12 529.0 99.2% 2700.00 6 12 529 99.2% 2700.00 6
Intel Xeon X5650 179 240.0 99.9% 17005.00 1 179 240 99.90% 17005.00 1
35/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Comparison: 4 Bit Neurons
Example 1: 25,600 neurons
100,000 iterations/s
Example 2: 1,706,667 neurons
1500 iterations/s
Configuration
# of
chips
Chip
area
(mm2)
%
active
Power
(W)
Power
eff. over
Xeon
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
Memristor Analog 1 395 0.58% 0.70 24,210 1 395 0.58% 0.70 24,210
Memristor Digital 4 303 1.34% 1.63 10,419 4 303 1.34% 1.63 10,419
SRAM 9 383 1.34% 48.67 349 9 383 1.34% 48.67 349
NVIDIA M2070 12 529 99.2% 2700.00 6 12 529 99.2% 2700.00 6
Intel Xeon X5650 179 240 99.9% 17005.00 1 179 240 99.9% 17005.00 1
Example 1: 25,600 neurons
100,000 iterations/s
Example 2: 1,706,667 neurons
1500 iterations/s
Configuration
# of
chips
Chip
area
(mm2)
%
active
Power
(W)
Power
eff. over
Xeon
# of
chips
Chip
area
(mm2)
%
Active
Power
(W)
Power
eff. over
Xeon
Memristor Analog 1 5.9 38.6% 0.07 234,859 1 395 0.58% 0.70 24,210
Memristor Digital 1 18.2 89.6% 0.62 16,968 4 303 1.34% 1.63 10,419
SRAM 1 29.1 89.6% 1.13 8,215 9 383 1.34% 48.67 349
NVIDIA M2070 12 529.0 99.2% 2700.00 6 12 529 99.2% 2700.00 6
Intel Xeon X5650 179 240.0 99.9% 17005.00 1 179 240 99.9% 17005.00 1
Example 1:
• 25,600 neurons
• 100,000 iterations/s
• 1024 synap./neuron
Example 2:
• 1,706,667 neurons
• 1500 iterations/s
• 1024 synap./neuron
36/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Comparison: Digtal vs Analog cores
Neu
ron
nu
mb
er
Output buffer
AC
x,+
AC
x,+
AC
x,+
AC
x,+
1024x1024SRAM array
Digital SRAM Analog
29.1 mm2
Adder and multiplier needed
1 synapse processed /cycle
(per neuron)
1.13 W
256 Neurons with 1024
synapses each.
Synapses hold 16 possible
values.
5.9 mm2
Current summation
1024 synapses processed/cycle
(per neuron)
0.07 W 16x more energy*
Slower
Add/Multiply
6x area
Inputs
Outputs Outputs
*Energy calculation considers leakage and data routing power
37/37 Towards Memristor based Neuromorphic Architectures Tarek Taha
Conclusion
Specialized cores very efficient for neural network acceleration
Further improvements possible for both SRAM and memristor
cores
Memristor cores efficient when carrying out recognition tasks
Acknowledgement: This work was partially supported by an
NSF Award