Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless...
Transcript of Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless...
![Page 1: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/1.jpg)
Tutorial Tutorial HotChipsHotChips 01 01
Jan M. RabaeyJan M. Rabaey
BWRC
University of California @ Berkeley
http://www.eecs.berkeley.edu/~jan
Silicon Architectures forSilicon Architectures for
Wireless Systems Wireless Systems –– Part 2 Part 2
Configurable ProcessorsConfigurable Processors
With contributions from J. Wawrzynek and A. Dehon
![Page 2: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/2.jpg)
The Energy-Flexibility GapThe Energy-Flexibility Gap
Embedded ProcessorsSA110
0.4 MIPS/mW
ASIPs
DSPs 2 V DSP: 3 MOPS/mW
Dedicated
HW
Flexibility (Coverage)
En
erg
y E
ffic
ien
cy
MO
PS
/mW
(or
MIP
S/m
W)
0.1
1
10
100
1000
Reconfigurable
Processor/Logic
Pleiades
10-80 MOPS/mW
![Page 3: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/3.jpg)
Session: DAC 2001Session: DAC 2001
![Page 4: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/4.jpg)
The Growth of ReconfigurableThe Growth of Reconfigurable
Source: Schaumont et al., DAC 2001
![Page 5: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/5.jpg)
(Re)configurable Computing:(Re)configurable Computing:
Merging Efficiency and VersatilityMerging Efficiency and Versatility
“Hardware” customized to
specifics of problem.Direct map of problem
specific dataflow, control.
Circuits “adapted” as
problem requirements
change.
Spatially programmed connection of processing elements.Spatially programmed connection of processing elements.
![Page 6: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/6.jpg)
Spatial vs. Temporal ComputingSpatial vs. Temporal Computing
Spatial Temporal
Source: A. Dehon and J. Wawrzynek
![Page 7: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/7.jpg)
Benefits of ProgrammableBenefits of Programmable
� Non-permanent customization andapplication development after fabrication– “Late Binding”
� economies of scale (amortize large, fixeddesign costs)
� time-to-market (evolving requirements andstandards, new ideas)
Disadvantages� Efficiency penalty (area, performance, power)
� Correctness Verification
![Page 8: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/8.jpg)
Spatial/Configurable BenefitsSpatial/Configurable Benefits
� 10x raw density advantage over processors(and increasing)
� Energy efficiency (potentially)
� Locality, regularity, and predictability
� Ultimate distributed architecture
� Scalable with technology– Relies mostly on increase in computational density
– Avoids most of the physics pitfalls threatening high-performance computing
![Page 9: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/9.jpg)
Spatial/Configurable Drawbacks
� Resource management
– Each compute/interconnect resource dedicated to singlefunction
– Must dedicate resources for every computational subtask
– Infrequently needed portions of a computation sit idle -->inefficient use of resources
– But … not a real issue when transistors are abundant
� Potential mismatch between operations andoperators
� Interconnect plays dominant role
![Page 10: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/10.jpg)
Density ComparisonDensity Comparison
![Page 11: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/11.jpg)
Processor vs. FPGA AreaProcessor vs. FPGA Area
![Page 12: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/12.jpg)
Processors and FPGAsProcessors and FPGAs
![Page 13: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/13.jpg)
Issues in Configurable DesignIssues in Configurable Design
� Choice and Granularity ofComputational Elements
� Choice and Granularity of InterconnectNetwork
� (Re)configuration Time and Rate– Fabrication time --> Fixed function devices
– Beginning of product use --> Actel/QuicklogicFPGAs
– Beginning of usage epoch -->(Re)configurable FPGAs
– Every cycle --> traditional Instruction SetProcessors
![Page 14: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/14.jpg)
Granularity of Computational ElementsGranularity of Computational Elements
In Out
00 0
01 1
10 1
11 0
2-LUT
Mem
In1 In2
Out
The FPGA Approach: The Logic Level
![Page 15: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/15.jpg)
Granularity of Computational ElementsGranularity of Computational Elements
ReconfigurableReconfigurable
LogicLogic
ReconfigurableReconfigurable
DatapathsDatapaths
adder
buffer
reg0
reg1
mux
CLB CLB
CLBCLB
DataMemory
InstructionDecoder
&Controller
DataMemory
ProgramMemory
Datapath
MAC
In
AddrGen
Memory
AddrGen
Memory
ReconfigurableReconfigurable
ArithmeticArithmetic
ReconfigurableReconfigurable
ControlControl
Bit-Level Operations
e.g. encoding
Dedicated data paths
e.g. Filters, AGU
Arithmetic kernels
e.g. Convolution
RTOS
Process management
![Page 16: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/16.jpg)
For Spatial ArchitecturesFor Spatial Architectures
� Interconnect dominant
– area
– power
– time
� …so need to understand in order to
optimize architectures
![Page 17: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/17.jpg)
Dominant in AreaDominant in Area
![Page 18: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/18.jpg)
Dominant in TimeDominant in Time
![Page 19: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/19.jpg)
Dominant in PowerDominant in Power
65%
21%
9%5%
Interconnect
Clock
IO
CLB
XC4003A data from Eric Kusse (UCB MS 1997)
![Page 20: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/20.jpg)
Interconnect Design IssuesInterconnect Design Issues
� Flexibility -- route “anything”
– (within reason?)
� Area -- wires, switches
� Delay -- switches in path, stubs, wire
length
� Power -- switch, wire capacitance
� Routability -- computational difficulty
finding routes
![Page 21: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/21.jpg)
A NaA Naïïve Approach: Crossbarve Approach: Crossbar
� Any operator may
consume output
from any other
operator
![Page 22: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/22.jpg)
Avoiding Crossbar CostsAvoiding Crossbar Costs
� Good architectural design
– Optimize for the common case
� Designs have spatial locality
� We have freedom in operator
placement
� Thus: Place connected components
“close” together
– don’t need full interconnect?
![Page 23: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/23.jpg)
Exploiting Locality Exploiting Locality –– The Mesh The MeshSwitch Box
Connect Box
![Page 24: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/24.jpg)
Meshes donMeshes don’’t scalet scale
Typical Extensions
� Local neighbor-to-neighbor Interconnections
� Segmented Interconnect
� Hierarchical Network (tree, mesh)
![Page 25: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/25.jpg)
Example 1:Example 1:
The Pleiades Reconfigurable ArchitectureThe Pleiades Reconfigurable Architecture
A Satellite ProcessorA Satellite Processor
Configuration
Dedicated
Arithmetic
Configuration Bus
Reconfigurable Interconnect Network
Embedded
Processor
FPGA MemoryAddress
Generator
Arithmetic
Processor
Arithmetic
Processor
..
..
Network Interface
� Computational kernels are “spawned” to satellite processors
� Control processor supports RTOS and reconfiguration
� Order(s) of magnitude energy-reduction over traditional
programmable architectures
![Page 26: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/26.jpg)
Matching Computation and ArchitectureMatching Computation and Architecture
AddressGen AddressGen
Memory Memory
MAC MAC
Control
Processor
L CG
Convolution
Two models of computation:communicating processes + data-flow
Two architectural models:sequential control+ data-driven
![Page 27: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/27.jpg)
Execution Model of a Data-FlowExecution Model of a Data-Flow
KernelKernel
for(i=1;i<=L;i++)
for(k=i;k<=L;k++)
phi[i][k]= phi[i-1][k-1]
+in[NP-i]*in[NP-k]
-in[NA-1-i]*in[NA-1-k];
endstart
Embedded processor
AddrGen
MEM: in
ALU
ALU
AddrGen
MEM: phi
MPY MPY
• Distributed control and memory
Code seg
Code seg
![Page 28: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/28.jpg)
Reconfigurable Kernels for W-Reconfigurable Kernels for W-
CDMACDMA
� Dominant kernel M(MTX)
requires array of MACs and
segmented memories
� Additional operations such as
sqrt(x), 1/x, and Trellis decoding
may be implemented using
FPGA or cordic satellite
![Page 29: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/29.jpg)
Impact of Architectural ChoiceImpact of Architectural Choice
1870
StrongARM
131
Normalized Energy / stage [nJ]
TMS320C2xx
Energy/stage
49
TMS320LC54x
1000
100
10000
10
21u
StrongARM
10u
Normalized Delay/stage [s]
TMS320C2xx
Delay/stage
3.8u
TMS320LC54x
10u
1u
100u
100n
18.5
TMS320LC54x
Normalized Energy*Delay / stage [Js*e-14]
10
1
100
1000 Energy*Delay/stage
137
TMS320C2xx
0.1
3970
StrongARM
10000
Example: 16 point Complex
Radix-2 FFT (Final Stage)
13
570n 0.75
Ple
iad
es
Ple
iad
es
Ple
iad
es
![Page 30: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/30.jpg)
Architecture ComparisonArchitecture Comparison
LMS LMS Correlator Correlator at 1.67 at 1.67 MSymbolsMSymbols Data Rate Data Rate
Complexity: 300 Complexity: 300 MmultMmult/sec and 357 /sec and 357 MaccMacc/sec/sec
Note: TMS implementation requires 36 parallel processors to meet data rate -
validity questionable
16 Mmacs/mW!
![Page 31: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/31.jpg)
Inter-Satellite CommunicationInter-Satellite Communication� Data-driven execution
– A satellite processor is enabled only when input data is ready
� Data sources generate data of different types: scalars,vectors, matrices
� Data computing processors handle data inputs of differenttypes end-of-vector token
MPY
MPY1
nn
MACn
n1
AddrGen Memory
Embedded
processor
1
11
Data sources Data computing processors
![Page 32: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/32.jpg)
AGP SatelliteAGP Satellite� Address generator for the SRAM satellite
� Generates data streams of different types; all other satellites
process data streams
� Uses loop counters and stride counters to support 2 levels of
nesting
� Control information sent in parallel with the data using 2
additional control bits
for n=1 to 3 {
for k=1 to 3 {
addr_read(n,k);
}
} End of vector
End of matrix
![Page 33: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/33.jpg)
Satellite Processors: FPGASatellite Processors: FPGA
� Reconfigurable for both logic
function and interface control
� 4 x 9 CLB array in total
� 5-input 3-output CLBs
� 3 levels of interconnect hierarchy
� Mapped to various arithmetic
functions and control
� Programmable clock generator
4 x 8
FPGA Array
4 x 1
Interface Control
IN1
IN2
OUT
REQ
ACK
REQ
ACK
Programmable delay
![Page 34: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/34.jpg)
Low-Energy Embedded FPGALow-Energy Embedded FPGA
� Test chip
– 8x8 CLB array
– 5 in - 3 out CLB
– 3-level interconnect hierarchy
– 4 mm2 in 0.25 µm ST CMOS
– 0.8 and 1.5 V supply
� Simulation Results
– 125 MHz Toggle Frequency
– 50 MHz 8-bit adder
– energy 70 times lower than
comparable Xilinx
� Parameterized module
generator available
![Page 35: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/35.jpg)
Reconfigurable Interconnect NetworkReconfigurable Interconnect Network
Universal Switchbox
Cluster
Cluster
Level-1 Mesh Level-2 Mesh
Irregular mesh for Heterogeneous blocksIrregular mesh for Heterogeneous blocks
� A channel along every side of each block
� A switch box at every cross-point
Hierarchical Switchbox
Building hierarchy by clusteringBuilding hierarchy by clustering
� Intra-cluster: mesh structure
� Inter-cluster: larger-granularity mesh
Saves energy by a factor of 7 comparedSaves energy by a factor of 7 compared
to straightforward crossbar network!to straightforward crossbar network!
![Page 36: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/36.jpg)
Fast Design Space ExplorationFast Design Space Exploration
Interconnect ModelsInterconnect Models
N Inputs
B Buses
M Outputs
Multi-Bus
cluster
cluster
cluster
Hierarchical MeshMesh
Module
Model:Model:
�� Interconnect energy and delay model Interconnect energy and delay model
�� Algorithm mapping Algorithm mapping
�� Graph-based place and route Graph-based place and route
![Page 37: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/37.jpg)
Reconfiguration ModelReconfiguration Model
� Configuration codes are created statically at compile time
� Every configuration memory is reset and rewritten with new
configuration code before each kernel
� The core processor uses memory read/write instructions to
perform the reconfiguration
mem
Hardware modules
Reconfiguration
Interface Unit
Configuration
codes
Distributed
configuration memories
mem mem
mem
Core processor
![Page 38: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/38.jpg)
Kernel Execution & ConfigurationKernel Execution & Configuration
ARM8
SRAM
Interface
AGP1 MEM1
AGP2 MEM2
MAC1
OPORT1
IPORT1Interconnect
![Page 39: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/39.jpg)
MaiaMaia: Reconfigurable : Reconfigurable BasebandBaseband
Processor for WirelessProcessor for Wireless
� 0.25um tech: 4.5mm x 6mm
� 1.2 Million transistors
� 40 MHz at 1V
� 1 mW VCELP voice coder
� Hardware
� 1 ARM-8
� 8 SRAMs & 8 AGPs
� 2 MACs
� 2 ALUs
� 2 In-Ports and 2 Out-Ports
� 14x8 FPGA
![Page 40: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/40.jpg)
Results of VCELP Voice CoderResults of VCELP Voice Coder
79.7% of VCELP Code maps
onto Reconfigurable Datapath
VCELP code breakdown VCELP Energy breakdown
Compared to state-of-art 17mW DSP
Functionality Energy (mJ) for 1 sec
of VCELP speech
processing
Dot product 0.738
FIR filter 0.131
IIIR filter 0.021
Vector sum with
scalar multiply
0.042
Compute code 0.011
Kernels
Covariance matrix
compute
0.006
Program control 0.838
Total 1.787
![Page 41: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/41.jpg)
Design Methodology and FlowDesign Methodology and Flow
� Requires architecture exploration overheterogeneous implementation fabrics
� Should support refinement and co-designof hardware and software, as well asbehavior and architecture
� Should consider all important metrics, andpresent PDA (Power-Delay-Area)perspective
![Page 42: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/42.jpg)
Software Methodology FlowSoftware Methodology Flow
Algorithms
Kernel Detection
Estimation/Exploration
Partitioning
Software CompilationReconfig. Hardware Mapping
Interface Code Generation
Power & Timing Estimation
of Various Kernel Implementations
PDA Models
PremappedKernels
Acceleratorµproc &
Behavioraln
C++ Module Libraries
C++
SUIF+ C-IF
![Page 43: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/43.jpg)
Hardware-Software ExplorationHardware-Software Exploration
Macromodel call
![Page 44: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/44.jpg)
Industrial Example 1:Industrial Example 1:
XtensaXtensa Configurable Processor Configurable Processor
Source: Tensilica, Inc
Combines spatial and
temporal processing
Small core: 0.7 mm2 in 0.18 mm
~ 3 MIPS/mW
Core processor with extendible instruction set
![Page 45: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/45.jpg)
Design Methodology Design Methodology –– a Crucial Component a Crucial Component
Source: Tensilica, Inc
![Page 46: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/46.jpg)
Example: A DES Encryption ExtensionExample: A DES Encryption Extension
4 extra instructions
1700 additional gates
No cycle time impact
Code size reduction
Source: Tensilica, Inc
![Page 47: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/47.jpg)
Improvement over GP 32-bit processorImprovement over GP 32-bit processor
Source: Tensilica, Inc
![Page 48: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/48.jpg)
Industrial Example 2: Chameleon RCPIndustrial Example 2: Chameleon RCP(Reconfigurable Communications Processor)(Reconfigurable Communications Processor)
24 multipliers
128 DPUsSource: Chameleon, Inc
![Page 49: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/49.jpg)
Reconfigurable Processing FabricReconfigurable Processing Fabric
Source: Chameleon, Inc
![Page 50: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/50.jpg)
CS2000 Performance NumbersCS2000 Performance Numbers
![Page 51: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/51.jpg)
CS2000 PerformanceCS2000 Performance
Source: Chameleon, Inc
Power efficiency?
![Page 52: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/52.jpg)
Design MethodologyDesign Methodology
Source: Chameleon, Inc
![Page 53: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/53.jpg)
Industrial Example 3:Industrial Example 3:The The MorphICs MorphICs Dynamically Reconfigurable Architecture (DRA)Dynamically Reconfigurable Architecture (DRA)
DSP
Core
Memory
MCU
Core
WCDMA
CDMA
IS-136
GSM
Fixed logic…
DRA ProcessorDRA Processor
Software programmable
Hardware reconfigurable
Software
Download
WCDMA (mode, param)
CDMA (mode, param)
WTDMA (mode, param)
TDMA (mode, param)
� SIM CardSIM Card
�� Handset Memory Handset Memory
�� POS Programming POS Programming
�� Network Download Network Download
�� OTA Download OTA Download
Realizes Realizes cost, size and power targets similar to traditional core+hardwiredcost, size and power targets similar to traditional core+hardwired
Source: Morphics Technology
![Page 54: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/54.jpg)
Basestation Basestation of the Next Generation Wirelessof the Next Generation Wireless
Wideband
RF
Data
Networks
10/
100
or
Gbit
ATM
800 MHz
A B A B
RF/IF
Tuner
Block-Spectrum
A/D
RF/IF
Tuner
Block-Spectrum
A/D
RF/IF
Tuner
Block-Spectrum
A/D
Antenna
System
multiple sectors
multi-band
1900 MHz
A D B CFE
RF/IF
Tuner
Block-Spectrum
A/D
RF/IF
Tuner
Block-Spectrum
A/D
RF/IF
Tuner
Block-Spectrum
A/D
Antenna
System
multiple sectors
multi-band
![Page 55: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/55.jpg)
The common approach to hardware designinvolves:
multiple ASIC’s to support each standard.
� Hardwired implementation is not scalable or upgradeable to new standards.
� This approach costs time in a time-to-market dominated world.
� Creating new chipsets for every technology combination critically challenges
available design resources!
HW HW MultistandardMultistandard Solutions Solutions
Digital
Hardwired
ASICIF RF
Digital
Hardwired
ASICIF RF
Digital
Hardwired
ASICIF RF
AnalogProgrammable Unique
Combinations
DSP
Control Processor
![Page 56: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/56.jpg)
SW SW MultistandardMultistandard Solution SolutionApplying instruction-set processor architectures to
all baseband processing would be desireable...
…but is simply not an good implementation for base stations:-Unacceptably high cost per channel
-Unacceptably large power per channel
This is definitely not a viable implementation for terminals
AnalogProgrammable
IF RF
IF RF
IF RF
DSP
Control Processor
![Page 57: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/57.jpg)
FPGA the Solution?FPGA the Solution?
Cellular Handset Using Current FPGACellular Handset Using Current FPGA
Source: Morphics Technology
![Page 58: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/58.jpg)
Define optimal architecture for efficient
implementation
Successfully Using ReconfigurabilitySuccessfully Using Reconfigurability
Application-Specific Leverage
Focus on first on applications and constituent algorithms, not the
silicon architecture !
Wireless Communications Transceiver Signal Processing
Minimize the hardware reconfigurability to constrained set
Maximize the software parameterizability and ease of use of the
programmer’s model for flexibility
Source: Morphics Technology
![Page 59: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/59.jpg)
Application-Specific MOPS in DigitalApplication-Specific MOPS in Digital
CommunicationsCommunications
RF/IF
TDMA
Wideband Signal
Processing Engine
CDMA
Wideband Signal
Processing Engine
Wideband Channel
Decoder Engine
Programmable
DSP
Microprocessor
Digital
Downconversion
and
Channelization
Source: Morphics Technology
![Page 60: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/60.jpg)
MorphicsMorphics’’ DRL Architecture DRL ArchitectureHeterogeneous Multiprocessing Engine Using Application-
Specific Reconfigurable Logic
m
R
m
O
Clk
Enableoutput
input
input
m
R
m
O
Clk
Enableoutput
input
input
m
R
m
O
Clk
Enable
output
input
input
Small Granularity Kernel
Large Granularity Kernel
DA
TA
FL
OW
Source: Morphics Technology
![Page 61: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/61.jpg)
DRL KernelsDRL Kernels
DATA SEQUENCER
DATA MEMORY
PARAMETERIZABLE
CONFIGURABLE
ALU
Source: Morphics Technology
![Page 62: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/62.jpg)
Key Pieces of Design MethodologyKey Pieces of Design MethodologySystem-level Profiling� Analyze sequences of operations (arithmetic, memory access, etc)
� Analyze communication bottlenecks
� Key flexible parameters (algorithm v architecture parameters)
Architecture-level Profiling� ALU/kernel definition (sequences of operators)
� Memory profile
� Type of configurability required for flexibility
� Macro-sequencer development
Implementation� SW- programmer’s model developed at architecture specification stage
� SW- API proven out via behavioral models & demonstrator hardware
� VLSI-focus on regular predictable timing and routability
� VLSI- embedded reconfigurability in an ASIC flow
Source: Morphics Technology
![Page 63: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/63.jpg)
Architectural ImplementationArchitectural Implementation
Comparison: Reconfigurable FFTComparison: Reconfigurable FFT
Energy per Transform
vs. FFT size
Transforms per Second per mm2
vs. FFT size
* All results are scaled to 0.18µm
101
102
103
103
104
105
106
107
108
Function-specific reconfigurable hardware
Data-path reconfigurable processor
FPGA
Low-power DSP
High-performance DSP
101
102
103
10-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
Lower limit
Function-specific reconfigurable hardware
Data-path reconfigurable processor
FPGA
Low-power DSP
High-performance DSP
![Page 64: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/64.jpg)
Architectural Implementation Comparison:Architectural Implementation Comparison:
ReconfigurableReconfigurable Viterbi Viterbi Decoder Decoder
Energy per Decoded Bit
vs. Number of States
Decoding Rate per mm2
vs. Number of States
* All results are scaled to 0.18µm
101
102
10-11
10-10
10-9
10-8
10-7
10-6
10-5
10-4
Lower limit
Function-specific reconfigurable hardware
Data-path reconfigurable processor
FPGA
Low-power DSP
High-performance DSP
101
102
104
105
106
107
108
109
1010
Function-specific reconfigurable hardware
Data-path reconfigurable processor
FPGA
Low-power DSP
High-performance DSP
![Page 65: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/65.jpg)
Source: Source: SchaumontSchaumont et al, DAC 2001 et al, DAC 2001
![Page 66: Silicon Architectures for Wireless Systems – Part 2 ... · Silicon Architectures for Wireless Systems – Part 2 Configurable Processors With contributions from J. Wawrzynek and](https://reader033.fdocuments.net/reader033/viewer/2022042018/5e75c19cc28f400691296600/html5/thumbnails/66.jpg)
SummarySummary
� Configurable computing is finding its way into the
embedded processor space
� Best suited (so far) for
– Flexible I/O and Interface functionality
– providing task-level acceleration of “parametizable” functions
� Software flow still subject to improvement
DO NOT FORGET CONFIGURATION OVERHEAD