1Tang P187_MAPLD2004
High-Performance SEE-Hardened Programmable
DSP Array
Larry McMurchie, Carl SechenStudents: Victor Tang, James Lan, Duncan
LamDept. of EE, Univ. of Washington
2Tang P187_MAPLD2004
Outline• The RADAR architecture
• Why coarse-grained programmable architectures• Features of the RADAR architecture • Examples of FIR filter• Benchmarks
• Radiation Hardening of RADAR• SETs in combinational logic and pipeline registers• Register filtering technique
3Tang P187_MAPLD2004
Current Commercial FPGAs – “One Size Fits All”
• Flexibility -- they can implement any digital function• Commodities – not cheap ones, but not near as expensive as
ASICs to design and fabricate• Fewer man hours to design than ASICs• Reprogrammable in situ – allowing updates and bug fixes to be
made easily
4Tang P187_MAPLD2004
Downside of “One Size Fits All”
• Power can be 10X that of an ASIC that performs the same function
• Area/weight can be many times an equivalent ASIC• Performance may not meet requirements• Varying degrees of susceptibility to radiation effects
– Particularly as process feature sizes decrease
5Tang P187_MAPLD2004
A Critical Observation!An FPGA in a given system will generally be used only for a limited set of related functions
Example: an FPGA that performs high-throughput
DSP applications, e.g. a FIR filter
- May be reprogrammed to perform a variant of the FIR, e.g. different number of taps, or IIR
- But not a totally different operation, e.g. random
logic required for a control block
- Result for this example is that all the fine-grained, bit-level flexibility in an FPGA is wasted
6Tang P187_MAPLD2004
Is There a Better Way? If we can identify the domain of applications that will be used in a given environment …..
Then we can create a customized programmable device (CPD) that will :
• Approach ASIC performance in terms of power, area and throughput
• Retain sufficient programmability to enable all applications within the domain
7Tang P187_MAPLD2004
ASIC/CPD/FPGA Comparison
Flexibility
ASICs FPGAs
Area/Power
ASICs FPGAs
Customized PD
Customized PD
8Tang P187_MAPLD2004
RADAR is a Programmable Device Customized for DSP
• Based upon Reconfigurable Pipelined Datapaths (RAPID)• Linear bus-based datapath (as opposed to crossbar)
– Provides efficient local interconnect, which is dominant in DSP applications
• Many registers (in the right places) to allow intensive pipelining
• Combination of static and dynamic control– Static to determine the particular application– Dynamic to control multiple phases within the
applicationD. Cronquist, P. Franklin, C. Fisher, M. Figueroa and C. Ebeling, “Architecture Design of Reconfigurable Pipelined Datapaths,” 20th Anniversary Conf. On Advanced Research in VLSI, 1999.
9Tang P187_MAPLD2004
Example of RADAR Datapath
OutputStreams
InputStreams
Memory
ALU
Multiply
ALU
Multiply
ALU
Multiply
ALU
MultiplyMemory Memory Memory
multiplexor driver bus connector
Cell #1 Cell #2 Cell #3 Cell #4
/w
/w/w/w
/w/w/w/w
4 cells – each containing local memory, multiply, ALU and register plus input and output streams
10Tang P187_MAPLD2004
Bus Multiplexor and Drivers
GND
s
s
s
s
s
s
s
s
Tofunctional
unit
w
w
w
w
w
w
w
w
w
dynamic control
Fromfunctional
unit
w
w
w
w
w
w
w
w
w
11Tang P187_MAPLD2004
Bus Connectors
w w
s
s s
s
12Tang P187_MAPLD2004
Example #1 – 4 Tap FIR Filter
• Given a vector of coefficient weights• Compute the dot product of the coefficient weights and a
vector of inputs• Easily maps to a linear pipeline
……x9……x8……x7……x6……x5……x4……x3……x2……x1……x0* **
w0 w1 w2 w3
y7 =
*
Following slides courtesy of Carl Ebeling, Dept. of CSE, UW
13Tang P187_MAPLD2004
RADAR Datapath Programmed for 4-tap FIR filter
OutputStreams
InputStreams
Memory
ALU
Multiply
ALU
Multiply
ALU
Multiply
ALU
MultiplyMemory Memory Memory
14Tang P187_MAPLD2004
RADAR Performance Benchmarks
• Assume 16 RaPiD cells each containing a 16X16 multiplier, and 16 bit buses in communication network
• Applications: 8x8 DCT, motion estimation, FIR filter, matrix multiply, 2D Convolution
• Experiments so far in 0.18 micron CMOS show that 1GHz is achievable, giving 16 GOPs
15Tang P187_MAPLD2004
Common Techniques for SETs• TMR-in-Hardware for logic and memory
• 3X in power/area• Voting circuitry must be hardened
•Using larger gate widths• Increased current flow suppresses transients• Also increases power/area• Equivalent to using feature sizes of previous generation processes
•Adding resistors and capacitors• Low pass filtering of SETs• Increases power/area
In general, circuit design techniques such as these increase area, delay and power, are difficult to design, and do not transfer well between processes!
16Tang P187_MAPLD2004
TMR-in-Time and SETsA single event transient in a pipelined computation may be filtered using TMR-in-time, a simple temporal voting scheme:
C1
Clk
C2
Clk
C3
Clk M
Clk
Clk
Same data is applied on successive clock cycles,resulting in three threads of computation followed by a majority function
17Tang P187_MAPLD2004
TMR-in-TimeThis simple scheme works -- providing transients are no longer than the clock period
It suffers from a ~3X latency relative to the singlet (unhardened) circuit, but requires one third the hardware of the TMR-in-hardware approach.
In the RADAR architecture (where throughput is determined by the number of clock cycles that critical functional units are busy), throughput is the same for TMR-in-time and TMR-in-hardware
TMR-in-time approaches the singlet (unhardened) case in energy consumption per computation.
Data switching activity occurs only during the first of three cycles!Of course, clock power is 3X that of the singlet.
18Tang P187_MAPLD2004
Filtering SETs at Registers
clk1a
clk1b
clk1c
C1 M
clk1a
clk1b
clk1c
C2 M
clk1a
clk1b
clk1c
DT DT
Sampling data at every register and applying the majority function yields an optimized form of TMR-in-time.
D. Mavis and P. Eaton, “Soft Error Rate Mitigation Techniques for Modern Microcircuits,”Proc. of the 40th Annual Int. Reliability Physics Symposium, 2002, pp. 216-225
19Tang P187_MAPLD2004
Filtering SETs at Registers (cont.)
• If the delay of the transient is less than the clock separation time DT, only one of the three registers will latch incorrect data and the majority function will filter it out
• Note that SETs created in majority function itself will be filtered out at the following register.
• By increasing DT, the circuit can be made immune to transients caused by radiation of increasing LET values.
The means of generating clocks delayed by DT can be made a programmable feature in the architecture. i.e. the degree of radiation hardening is programmable!
20Tang P187_MAPLD2004
Power/Throughput Comparisons of
Hardening Techniques
Singlet
TMR-in-hardware
TMR-in-time
RegisterFiltering
Power
Throughput
As applied to a fixed size RADAR array Implementation assumes static CMOS Throughput is measured in output data values / unit time
21Tang P187_MAPLD2004
Application of Register Filtering to RADAR
Register filtering is well suited to the RADAR architecture
• Better power/throughput characteristics than other methods• The degree of radiation hardening can be programmable through adjustment of DT
22Tang P187_MAPLD2004
Summary
• RADAR – Programmable architecture customized for DSP
applications– Capable of 16 GOPS in 0.18 micron CMOS
• Radiation hardening of combinational logic– Using register filtering– Achieves near-ideal power/throughput
characteristics– Degree of radiation hardness programmable
Top Related