SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4...
Transcript of SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4...
![Page 1: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/1.jpg)
SmartCell Reconfigurable Architecture for Low-Power
Stream Processing
Cao Liang and Xinming HuangEmbedded Computing Lab
Worcester Polytechnic Institutehttp://computing.wpi.edu
MAPLD ConferenceSeptember 15-18, 2008Annapolis, MD
![Page 2: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/2.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 2
Outline
Introduction and Motivation
SmartCell Architecture
SmartCell Prototype with 64 PEs
Benchmark Applications and Performance
Conclusions
![Page 3: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/3.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 3
ChallengesMajor driving force in embedded computing
Multimedia signal and image processing Wireless communicationsMilitary and space applications
Design challenges: Low power (power efficiency)High performanceFlexibility (Programmability or reconfigurability)
Game Console
Radar imaging
Software radio PDA Image processing
Multimedia TV Data encryption
Scientific computing
![Page 4: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/4.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 4
Existing Computing PlatformsGeneral purpose processors (GPP)Application specific integrated circuit (ASIC)Reconfigurable architecture
Dominated by Field Programmable Gate Array (FPGA)
New architectures: CellBE, GPU
Reconfigurablesystems (FPGA)
Performance, Power Efficiency
Flex
ibili
ty General Purpose
Processor
Customized ASICs
![Page 5: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/5.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 5
Coarse-Grained Reconfigurable Architecture (CGRA)
Motivations of the SmartCell architectureCoarse-grained computing operatorsReconfigurable interconnectionDomain specific, e.g. stream processing
Bridging the gap between FPGA and ASIC
[Bjerregaard 2006]
![Page 6: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/6.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 6
Overview of SmartCell ArchitectureComputing units are tiled in a 2D structure
![Page 7: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/7.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 7
Design of Processor Element
Processor Element (PE)16-bit input, 36-bit output Logic, Shift, and Arithmetic operations
![Page 8: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/8.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 8
Design of Cell UnitInclude 4 PEs to form a quad structureFully connected cross-bar (S_Box) for date exchangeSerial peripheral interface (SPI) for instruction configuration
![Page 9: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/9.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 9
On-chip Interconnection Design
Modified CMesh On-chip NetworkTraditional CMesh Hierarchical CMesh
![Page 10: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/10.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 10
Control Logic Design
Four types of control signals Program counter control Datapath/delay control Operation controlNetwork-on-Chip control
Format of the instruction code (64-bit/instruct)
![Page 11: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/11.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 11
Configuration Structure
System configuration
![Page 12: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/12.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 12
Prototype Chip Design
Implementation of a seedling SmartCell system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs
RTL level design and simulationFPGA prototypingStandard cell ASIC implementation with TSMC .13 μm technology Total area is about 8.2 mm2
Runs up to 107 MHz Configuration time is within 12μs
![Page 13: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/13.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 13
SmartCell Features A combination of the following features makes SmartCell a unique approach in CGRA families
Dynamic reconfiguration Deep pipeline and parallelism Hardware virtualization Explicit synchronization Unique system topology
PE1
PE2
PE3
PE4 PE1
PE2
PE3
PE4
PE1
PE2
PE3
PE4PE1
PE2
PE3
PE4
PE1
PE2
PE3
PE4 PE1
PE2
PE3
PE4
PE1
PE2
PE3
PE4PE1
PE2
PE3
PE4
PE1
PE2
PE3
PE4 PE1
PE2
PE3
PE4
PE1
PE2
PE3
PE4PE1
PE2
PE3
PE4
1D Systolic array 2D Systolic array SIMD structure
![Page 14: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/14.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 14
Application Domain and Benchmarks
Application Domain
Test Benches
Signal processing
64-tap FIR64-tap IIR
Multimedia and image processing
32-point FFT8*8 2D-DCT,
8 by 8 Motion Estimation (ME) in 24 by 24 searching area
Scientific computing
128 by 128 Matrix Multiplication (MMM), 64th-order Polynomial Evaluation (PoE)
RC5 Data Encryption
![Page 15: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/15.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 15
Benchmark Mapping Infinite Impulse Response (IIR) filter
Biquad cascaded-IIR structure on a single Cell
![Page 16: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/16.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 16
Benchmark Mapping (cont’)2D Discrete Cosine Transform (2D DCT)
Decomposed into two 1D DCTs
![Page 17: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/17.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 17
Experimental SetupEvaluation Metrics
Area & TimingPower consumptionThroughput and power efficiencyComparing with RaPiD, Altera’s Stratix II FPGA and ASIC
System dimension 4 by 4
Design tools ModelSim, Synopsys
Library TSMC .13μm process
Voltage 1 V
Simulation freq. 100 MHz
![Page 18: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/18.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 18
Area and Power Consumption
Generated at 100 MHz with fully operational circuits
![Page 19: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/19.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 19
Power Consumption and Efficiency
On average 156 mW power consumption @ 100 MHz31 GOPS/W energy efficiency
only arithmetic & logic operations, excluding I/O power
FIR IIR 2D-DCT RC5 MMM FFT PoE ME
PDyn <mW> 144 181 153 132 135 161 142 137
PCore <mW> 152 189 161 140 143 169 150 145
EEff <GOPS/W>
42.1 33.9 39.8 45.7 11.2 18.9 42.7 11.0
![Page 20: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/20.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 20
Compare with RaPid, FPGA, and ASICPower and system throughput comparison
Power consumption of RaPiD has been scaled down to the same process technology of SmartCell system
![Page 21: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/21.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 21
Compare with Rapid and FPGA
52% average power reduction compared with RaPiD75% average power reduction compared with FPGA
![Page 22: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/22.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 22
Power Efficiency Comparison
* Compare to 90nm Stratix-II FPGAs
![Page 23: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/23.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 23
Conclusions
An interesting CGRA architecture is proposed and developed – namely SmartCellThe architecture is reconfigurable and can be targeted for different computing systemsA prototype design with 64 PEs shows both throughput and power efficiency in benchmarks of data streaming applicationsSmartCell may have the potential to bridge the gap between high-power FPGAs and inflexible ASICs
![Page 24: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/24.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 24
The following are backup slides
![Page 25: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/25.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 25
Existing CGRAs
![Page 26: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/26.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 26
Characteristic Comparison
![Page 27: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/27.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 27
SmartCell: Tiled Architecture, Processor Design and Interconnect
Many cells are titled in 2D layoutEach cell has 4 PEs (N,W,S,E)Simplified processor with memA crossbar within the cell; on-chip interconnect uses CMesh
![Page 28: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/28.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 28
Features and Application Domain
FFT Butterfly
RC5 data encryption
Polynomial Evaluation
FIR filter
2D DCT
![Page 29: SmartCell Reconfigurable Architecture for Low-Power Stream ... · system with 16 cell units in a 4 by 4 mesh structure, with a total of 64 PEs RTL level design and simulation FPGA](https://reader034.fdocuments.net/reader034/viewer/2022050407/5f896f02655c4662d30a924a/html5/thumbnails/29.jpg)
Xinming Huang SmartCell Reconfigurable Architecture for Low-Power Stream Processing 29
System Design and PerformancePrototype chip design: 4x4 cells (64 PEs), .13 TSMC,8.2mm2, 1V, about 156mW @100MHzBenchmark with RaPiD, Stratix-II (90nm), and ASIC
Acknowledgement:Dr. Michael Fritz, DARPA/MTO YFA ProgramCao Liang, WPI graduate assistant, now with AMD