PowerBASIC Compiler for Windows Version 8 Compiler... · PowerBASIC Compiler for Windows Version 8
A Multi-Ported Memory Compiler Utilizing True Dual-port...
Transcript of A Multi-Ported Memory Compiler Utilizing True Dual-port...
![Page 1: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/1.jpg)
A Multi-Ported Memory Compiler Utilizing True
Dual-port BRAMs
Ameer Abdelhadi and Guy LemieuxDepartment of Electrical and Computer Engineering
University of British Columbia
Vancouver, Canadaa place of mindTHE UNIVERSITY OFBRITISH COLUMBIA
May 3rd, 2016
![Page 2: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/2.jpg)
Motivation (1):FPGAs as parallel accelerators
•Used as parallel acceleratorsHave dual-ported memories only
1/20
1000’sDual-PortedBlock RAMs
1,000,000’sLogic
Elements
1000’sMultipliers/
DSPs
![Page 3: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/3.jpg)
Motivation (2)Mixed port requirements
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
•Multi-porting approaches provide simple (fixed) ports only
•Waste of resources if these ports are not active simultaneously
2/20
![Page 4: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/4.jpg)
Live-Value Table (LVT)
Multi-read
Replication
LVT with2 write and1 read ports
2-port RAM1
2-port RAM2
Multi-write
LVT
W0
W1
R
2-port RAM1
2-port RAM2
W R0
R1
Easy!
Hard!
10
1
1
Always writes 0’s
Always writes 1’s
3/20
![Page 5: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/5.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - stores data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 6: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/6.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - stores data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 7: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/7.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 8: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/8.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 9: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/9.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 10: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/10.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 11: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/11.jpg)
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
This work solves the final step and most important problem of Block RAM allocation
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
![Page 12: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/12.jpg)
Mixed Port Requirements (1):Fixed ports
5/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Fixed (simple) ports:The majority of multi-ported memories supports fixed ports only
![Page 13: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/13.jpg)
Mixed Port Requirements (2):True ports
6/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
True ports:Some techniques support the construction of multi-true-ports
BRAMs in FPGAs are true dual-ported
![Page 14: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/14.jpg)
Mixed Port Requirements (3):Switched ports
7/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Switched ports:A number of writes are switched with a number of reads
True ports are special case of switched ports
![Page 15: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/15.jpg)
Switched Ports (1)Example
8/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Key Observation:BRAMs’ true ports can be utilized to optimize switched ports
Objectives:Optimize the construction of multi-switched ports
![Page 16: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/16.jpg)
Switched Ports (2)Fixed ports abstraction
9/20
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
![Page 17: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/17.jpg)
Switched Ports (3)Fixed data banks
10/20
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
I-LVT
![Page 18: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/18.jpg)
Switched Ports (4)DFG modeling
11/20
Complete Bigraph
Vertex Port
Edge 1W1R BRAM
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
![Page 19: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/19.jpg)
Switched Ports (5)Switched DFG
12/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
![Page 20: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/20.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
![Page 21: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/21.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
![Page 22: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/22.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
![Page 23: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/23.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
![Page 24: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/24.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
![Page 25: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/25.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
![Page 26: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/26.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
![Page 27: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/27.jpg)
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
![Page 28: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/28.jpg)
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
![Page 29: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/29.jpg)
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
I-LVT
![Page 30: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/30.jpg)
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
I-LVT I-LVT
Fixed Ports(Complete Bigraph)
Switched Ports (Optimized Bigraph)
12 BRAMs8 BRAMs(33% reduction)
![Page 31: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/31.jpg)
Multi-switched-ports Compiler
•A RAM compiler optimizes data banks construction•Generates DFG from port requirements•Solves set-covering problem on all edges
• Covers are predefined biclique patterns• Solved as BLP problem
•Generates Verilog modules based on optimal covering
15/20
Available as open source contributionhttps://github.com/AmeerAbdelhadi
http://www.ece.ubc.ca/~lemieux/downloads/
Supports bypassing (RAW & RDW) and Initialization
![Page 32: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/32.jpg)
Graphical User Interface (GUI)
16/20
![Page 33: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/33.jpg)
Source of inspiration:Multi-True-Ports by Choi et al. / UofT• Provides true ports only (no simple/fixed ports)
• Is a special case of our generalized approach
• Doesn't need a CAD tools
17/20
RAM
R/W
Data
3
S3
n Read / n WriteRegister-based
LVT
S0 S1 S2 S3 Sn-1
R/WData1
RAM
3 Read / 3 WriteRegister-based
LVT
![Page 34: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/34.jpg)
Experimental Results
•Run-in-batch flow manager•Uses Altera’s Quartus II for synthesis on Stratix V•Uses Altera’s ModelSim for verification with:
• Random vectors • Over a million RAM access cycles
•Results on random test-cases•Up to 8 switched ports•Up to 4 writes and 4 reads per switched port•Up to 28 writes/reads per test-case
18/20
Average BRAM Reduction Average ALMs Reduction Average Fmax Increase
Best of Previous 18% -3% -1%
True Ports 42% 53% 15%
![Page 35: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/35.jpg)
Conclusions
•A methodology to support switched write/read functionality•True dual-ported BRAMs are utilized to optimize the
RAM allocation•A RAM compiler optimizes the problem•An additional 18% average BRAM reduction
compared to the best of other approaches•Practical solution:• Initialization• Bypassing• Available as open source
19/20
![Page 36: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/36.jpg)
Future Directions
•Applications• Parallel computation• HLS – storage binding
•Optimization of switched ports port assignment• Extraction of mutually-exclusive functions from HDL
•Statistical approach• Ports which are mutually-exclusive in most cases can use
a switched port• Access conflicts will be rare
20/20
![Page 37: A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMslemieux/publications/presentations/abdelhadi-fcc… · A Multi-Ported Memory Compiler Utilizing True Dual-port BRAMs Ameer](https://reader035.fdocuments.net/reader035/viewer/2022081405/5f06e48c7e708231d41a419e/html5/thumbnails/37.jpg)
Thank You!