Parallel Stateful Logic in RRAM: Theoretical Analysis and ...
Transcript of Parallel Stateful Logic in RRAM: Theoretical Analysis and ...
![Page 1: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/1.jpg)
Parallel Stateful Logic in RRAM: Theoretical Analysis and Arithmetic Design
Feng Wang, Guojie Luo, Guangyu Sun, Jiaxi Zhang, Peng Huang, Jinfeng Kang
2019/07/16
![Page 2: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/2.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 3: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/3.jpg)
RRAM Resistance for Representing Logic Values
[1] Akinaga, H., & Shima, H. (2010). Resistive random access memory (ReRAM) based on metal oxides. Proceedings of the IEEE.
3
HRS (logical 0)
LRS(logical 1)
SET (πππΏ β ππ΅πΏ > πππΈπ )
RESET (ππ΅πΏ β πππΏ > ππ πΈππΈπ )
metal
metal
insulator
WL
BL
![Page 4: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/4.jpg)
RRAM State Switching as Primitive Logic Operations
[2] Kvatinsky, S., Member, S., Belousov, D., Liman, S., Satat, G., Member, S., β¦ Weiser, U. C. (2014). MAGIC β Memristor-Aided Logic. TCAS-II.
4
MAGIC NOR implementation π = NOR(π, π)
β ππΊ > 2 β ππ πΈππΈπ
Parallel execution over rows and columns
β WL parallelism: π ππ = NOR π π1, π π2 π β [1,π]
β BL parallelism: π ππ = NOR(π 1π , π 2π) π β [1, π]
VG VG GND R1
R1
R1
R1 R1
R1
R1
R1
R1
BL voltage controller
WL1
WL2
WLm
BL1 BL2 BLm
R11 R12 R1m
R21 R22 R2m
Rm1 Rm2 Rmm
RRAM cell
![Page 5: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/5.jpg)
RRAM-based Stateful Logic Families
[3] Borghetti, J., Snider, G. S., Kuekes, P. J., Yang, J. J., Stewart, D. R., & Williams, R. S. (2010). βMemristiveβ switches enable βstatefulβ logic operations via
material implication. Nature.
[4] Huang, P., Kang, J., Zhao, Y., Chen, S., Han, R., Zhou, Z., β¦ Liu, X. (2016). Reconfigurable Nonvolatile Logic Operations in Resistance Switching Crossbar
Array for Large-Scale Circuits. Advanced Materials.
[5] Avati, V., Eggert, K., & Taylor, C. (2018). FELIX: Fast and Energy-Efficient Logic in Memory. In ICCAD.
[6] Xu, L., Bao, L., Zhang, T., Yang, K., Cai, Y., Yang, Y., & Huang, R. (2018). Nonvolatile memristor as a new platform for non-von Neumann computing. In
ICSICT.
5
Support parallel execution
Functionally complete
Work Stateful logic operations
[3] IMP
[2] NOR, NOT
[4] NAND, NIMP
[5] NOR, NAND, Min, OR
[6] NOR, NOT, NAND, NIMP, XOR
![Page 6: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/6.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 7: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/7.jpg)
SIML Computation Model7
All of the stateful logic families satisfy the four assumptions
β The latency of a single stateful logic operation is identical.
β The input number of a single operation cannot exceed a constant.
β The latency of WL and BL operations is identical.
β The degree of parallelism can reach the crossbar size. The crossbar size scales with the problem size.
![Page 8: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/8.jpg)
Lower Bounds of the Time Complexity (1)8
(a) Theorem 1 (b) Theorem 2
Condition bitwise functions most arithmetic functions
Parallelismupper bound
max(π€, β) ππ
π€ + β
Time complexity lower bound
trivial bound: ππ
max(π€,β)shape bound: π(π€ + β)
Example function ππ = ππ1 NOR ππ2 (π = 1, 2,β¦ , π) ππ = NORπ=1π ππ1 NOR ππ2 (π = 1, 2, β¦ , π)
Example algorithm (netlist)
Example layout in RRAM
π11 π21 β¦ ππ1
π12 π22 β¦ ππ2
π1 π2 β¦ ππ
π11 π21 β¦ ππ1
π12 π22 β¦ ππ2
π1 π2 β¦ ππ
π11
π12
π1 β¦ ππ1
ππ2
ππ
#1
#1 #1 #1
#n-1
#n #n #n
π11
π21 ππ1
β¦
ππ2
ππ
π(π): total cycles in the series implementation, π€: width of layout (# of BLs), β: height of layout (# of WLs), ππππππ₯: length of the critical path.
![Page 9: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/9.jpg)
Lower Bounds of the Time Complexity (2)9
(c) Corollary 1 (d) Theorem 3
Condition square layout a given algorithm
Parallelismupper bound
π(π
π€β) π
π
ππππππ₯
Time complexity lower bound
function bound: π( π€β) algorithm bound: ππππππ₯
Example function β π = NORπ=1π ππ1
Example algorithm (netlist)
β
Example layout in RRAM
β
π(π): total cycles in the series implementation, π€: width of layout (# of BLs), β: height of layout (# of WLs), ππππππ₯: length of the critical path.
π1 π2 β¦ ππ ππ
π1
π2 ππ
πβ¦
#1 #n-1
![Page 10: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/10.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 11: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/11.jpg)
Ripple Carry Adder11
One-bit full adder
Pulse Logic operation
1 π1 = NOR(π΄, π΅)
2 π2 = NOR(π΄, π1)
3 π3 = NOR π΅, π1
4 π4 = NOR(π1, π2)
5 π5 = NOR(π4, πΆπ)
6 πΆπ = NOR(π1, π5)
7 π6 = NOR(π4, π5)
8 π7 = NOR(π5, πΆπ)
9 π = NOR(π5, π7)
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
π11 π12 β¦ π1π
β¦ β¦ β¦ β¦
π41 π42 β¦ π4π
π51 π52 β¦ π5π
πΆπ1 πΆπ2 β¦ πΆππ
πΆπ1 πΆπ1(πΆπ2) β¦ πΆπ πβ1 (πΆππ)
π61 π62 β¦ π6π
β¦ β¦ β¦ β¦
πΊπ πΊπ β¦ πΊπ
β‘
β’
β
β β’ add π BLs in parallel β‘ generate and propagate carries in series
time complexity: π(π)shape / algorithm lower bound
![Page 12: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/12.jpg)
Carry Select Adder12
π¨ππ β¦ π¨π π π¨ππ β¦ π¨π π 0 π1 β¦ π π
π¨π( π+π) β¦ π¨π(π π) π¨π( π+π) β¦ π¨π(π π) 0 π π+1 β¦ π2 π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π¨π(πβ π+π) β¦ π¨ππ π¨π(πβ π+π) β¦ π¨ππ 0 ππβ π+1 β¦ ππβ²
π΄1( π+1)β² β¦ π΄1(2 π)β² π΄2( π+1)β² β¦ π΄2(2 π)β² 1 π π+1β² β¦ π2 πβ²
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π΄1(πβ π+1)β² β¦ π΄1πβ² π΄2(πβ π+1)β² β¦ π΄2πβ² 1 ππβ π+1β² β¦ ππβ²
β¦ β¦ β¦ β¦ β¦ β¦ πΊπ β¦ πΊ π
β¦ β¦ β¦ β¦ β¦ β¦ πΊ π+π β¦ πΊπ π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ πΊπβ π+π β¦ πΊπ
β
β’
β‘
β copy the input β‘ add 2 π β 1 WLs in parallel β’ select the correct result
time complexity: π( π)function / algorithm lower bound
![Page 13: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/13.jpg)
Carry Save Adder13
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
β¦ β¦ β¦ β¦
πΆπ1 πΆπ2 β¦ πΆππ
ππ1 ππ2 β¦ πππ
0 πΆπ1 β¦ πΆπ(πβ1)
ππ1 ππ2 β¦ πππ
β¦ β¦ β¦ β¦
πΊπ πΊπ β¦ πΊπ
shift πΆπ1, β¦,πΆππ right
β
β’
β add π BLs in parallel β’ add the last two addends πΆπ, ππ
β‘
time complexity: π( π)function lower bound
π1 β¦ π π π΄1
π π+1 β¦ π2 π π΄2
β¦ β¦ β¦ β¦
ππβ π+1 β¦ ππ π΄ π
β¦ β¦ β¦ π
β accumulate each WL in parallelβ‘ accumulate the sums in the same BLs
β β‘
![Page 14: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/14.jpg)
Adder Summary14
Function two π-bit integer addition π΄ integer addition
Algorithm ripple carry adder carry select adder carry select adder
Layout π π Γ π(1) π π Γ π( π) π π Γ π( π)
Time complexity π(π) π( π) π( π)
Lower bound type shape / algorithm bound function / algorithm bound function bound
![Page 15: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/15.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 16: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/16.jpg)
Dot Product16
πππ β¦ ππ π΄ πππ β¦ ππ π΄ π1 β¦ π π π1
ππ( π΄+π) β¦ ππ(π π΄) ππ( π΄+π) β¦ ππ(π π΄) π π+1 β¦ π2 π π2
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
ππ(π΄β π΄+π) β¦ πππ΄ ππ(π΄β π΄+π) β¦ πππ΄ ππβ π+1 β¦ ππ π π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ πΆπ
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ ππ
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ π¨π β π¨π
β
β’
β element-wise multiplication β‘ accumulate π WLs in parallel β’ accumulate ππs using the carry save adder
β‘
time complexity: π( π)function lower bound
![Page 17: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/17.jpg)
Real Data Type Comparison
[7] KΓΆster, U., Webb, T. J., Wang, X., Nassar, M., Bansal, A. K., Constable, W. H., β¦ Rao, N. (2017). Flexpoint: An Adaptive Numerical Format for Efficient Training of
Deep Neural Networks. arXiv.
17
Β·Β·Β·
Β·Β·Β·
Β·Β·Β·Β·Β·Β·
Β·Β·Β·
Β·Β·Β·
Β·Β·Β· Β·Β·Β·
Β·Β·Β·
fixed-point flex-point
float-point
exponent mantissa
![Page 18: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/18.jpg)
Flex-point Support18
πππ πππ β¦ π31
πππ πππ β¦ π32
β¦ β¦ β¦ β¦
πππ πππ β¦ π3π
Co
ntr
olle
rππππ ππππ β¦ ππ₯π3
β input exponent
β‘ arithmetic instruction
β’ output status
β£ shift instruction
β€ output exponent
Crossbar
![Page 19: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/19.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 20: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/20.jpg)
Integer Algorithm Evaluation20
0
200
400
600
800
8 16 24 32 40 48 56 64
cycl
e
bit width n
MAGIC-adder [8]
ripple carry
carry select
100
1000
10000
100000
1000000
16 32 64 128 256 512 10242048
cycl
e
integer number M
MAGIC-adder [8]
APIM [9]
carry save
0
20000
40000
60000
80000
16 32 64 128 256 512 1024 2048
max
imal
M
crossbar size
APIM [9]
carry save
0
1000
2000
3000
4000
64 128 256 512
cycl
e
vector dimension M
IMAGING [10] Proposed
[8] Talati, N., Gupta, S., Mane, P., & Kvatinsky, S. (2016). Logic design within memristive memories using memristor-aided loGIC (MAGIC). TNANO.
[9] Imani, M., Gupta, S., & Rosing, T. (2017). Ultra-Efficient Processing In-Memory for Data Intensive Applications. In DAC.
![Page 21: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/21.jpg)
Flex-support Evaluation21
0
2
4
6
8
10
100 400 700 1000
Late
ncy
(s)
Matrix size M
RRAM ARM
0
1
2
3
4
100 400 700 1000
Ener
gy /
MΒ³
(nJ)
Matrix size M
RRAM ARM
![Page 22: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/22.jpg)
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
![Page 23: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/23.jpg)
Conclusion23
Theoretical analysis
β SIML model
β time complexity lower bound
Integer addition
β ripple carry adder
β carry select adder
β carry save adder
Extension
β multiplication support
β flex-point support
![Page 24: Parallel Stateful Logic in RRAM: Theoretical Analysis and ...](https://reader031.fdocuments.net/reader031/viewer/2022012019/61687f29d394e9041f6ff375/html5/thumbnails/24.jpg)
&A