The CMU Reconfigurable Computing Project
description
Transcript of The CMU Reconfigurable Computing Project
![Page 1: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/1.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 1
The CMU Reconfigurable Computing Project
April 9, 1999Mihai Budiu
![Page 2: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/2.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 2
Current Project Members
ECE Department
Herman Schmit Srihari CadambiMatt MoeRobert TaylorRonald Laufer
CS Department
Seth Copen GoldsteinMihai Budiu
![Page 3: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/3.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 3
Why Study Reconfigurable Hardware?It is a nice computation paradigm
(wire your own computer)
![Page 4: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/4.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 4
Algorithm Year System Versus Speedup xDNA matching 1992 SPLASH 2 SPARC 10 4300
FIR Filter 1998 PipeRench UltraSparc300Mhz
90
IDEA Encryption 1998 PipeRench UltraSparc300Mhz
61
SAT solver 1997 Pamette SPARC 5110Mhz
17--1100
Ray Casting 1995 RIPP-10 Pentium75Mhz
33.8
Hidden MarkovModel
1996 1 Xilinx FPGA SPARC 10 24.4
DES Encryption 1996 GARP UltraSparc170Mhz
24
SPEC92 1994 MIPS+RC MIPS 1.22
Why Study Reconfigurable Hardware
![Page 5: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/5.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 5
Commercial Players
Source: In-stat April 1998 *Does not include software, hardwire or support EPROMs
![Page 6: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/6.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 6
What Is “Reconfigurable Hardware?”
Universal gates and/or
storage elements
Interconnectionnetwork
Switches
![Page 7: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/7.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 7
Basic Ingredient: RAM cell
0001
Universal gate = RAM
a0a1a0
a1
dataa1 & a2
![Page 8: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/8.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 8
A switch is controlled by a 1-bit RAM cell
0
1
1
1
Basic Ingredients (ctd)
![Page 9: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/9.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 9
Outline
• What is reconfigurable hardware• RH vs other computation paradigms• Challenges in RH research• PipeRench: the CMU project:
– the hardware– the software
• Conclusions
![Page 10: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/10.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 10
RH vs ASICs• Generally Application-Specific Integrated Circuits
will be faster than RH:– RH wires are slow & big– RH bit-slices are costly to interconnect– RH devices must store configuration on the chip
but• RH can be reprogrammed
– new algorithms– to fix bugs
• RH cheaper in small production• RH tolerates faults better• RH sometimes faster with staged computation
![Page 11: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/11.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 11
RH vs Microprocessors
• RH less flexible (like a VLIW with fixed instructions)
but• RH provides more (customized) computation
elements• RH can decrease memory traffic• RH can be tailored for specific algorithms
and data types
RH will not replace mP, but complement them
![Page 12: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/12.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 12
Types of RH
• FPGAs: bit-level logic functionality(the basic processing elements compute on 1 bit)
• word-based architectures: PipeRench (CMU)(basic PE operates on 8 bits)(basic PE is a small ALU)
• coarse architectures: RAW (MIT)(basic PE is a MIPS 2000 core)
![Page 13: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/13.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 13
RH In A SystemTitle:(coupling)Creator:(FrameMaker 5.5 PowerPC: LaserWriter 8 8.5.1)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
![Page 14: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/14.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 14
Challenges In RC
• Software tools:– Programming RC like software development– Automatic compilation from HLL– Automatic program partitioning
• Mapping efficiently algorithms (no ISA)• System issues
– interfaces– find “ideal” RC fabric
![Page 15: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/15.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 15
The CMU Reconfigurable Computing Project
![Page 16: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/16.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 16
Hardware Goals
• To build a complete reconfigurable hardware device
• To build the system integration hardware• To host the device in a PC
![Page 17: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/17.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 17
Our Device:
• Word processing elements• Pipelined architecture• Virtualized hardware• Local interconnection network• Wide pipelined bus
![Page 18: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/18.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 18
Configurationmemory
Stripes
Data & Configcontroller
Processingelements
![Page 19: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/19.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 19
Hardware Virtualization
Instructionscurrently in hardware
Instructions paged out
Actual availablehardware
Prog
ram
![Page 20: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/20.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 20
Hardware Virtualization (2)
compute
compute
compute
configurePage in
Page out
Program in configurationmemory
hardware
Overlap configuration with computation.
![Page 21: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/21.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 21
Processing Elements
• Look-up table• Any 3-to-1 function
a b
Cin
out
PE2 PE0PE1
![Page 22: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/22.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 22
The Interconnection Network
Word-level cross-bar
P*B bits
Pass Registers
0
P*B*N bits
B bits
PEPE N PE 1
![Page 23: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/23.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 23
The PCI BoardTitle:chip.epsCreator:fig2dev Version 3.2 Patchlevel 0-beta3Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
![Page 24: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/24.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 25
Software GoalTo program reconfigurable devices using the standard
software development processes:
– Compile C or Java– Do it quickly
Partitioner
DIL
Java
Data-flow Intermediate Language
Configuration
Reconfigurable HW CPU
Built
![Page 25: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/25.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 26
Building Circuits From DIL
a = b + c * d;e = c - d;
• variables wires• operators gates
+*
cb d
a
-
e
![Page 26: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/26.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 27
Mapping Circuits To
-
+
a b c
-
+
a b c
-+
a b c
-+
a b c
![Page 27: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/27.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 28
The DIL Compiler Front-End
ParserEvaluator
Loader
Loader
Dil input file
Circuit
componentlibrary
Componentcircuits
Backend
![Page 28: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/28.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 29
The DIL Compiler BackendCircuit
(expanded)
OptimizerPlacer-Router
CircuitCircuit
(placed)
Code generator
AsmC++
Front-end
C++xfig
The whole compilation process is very fast (compared to classical CAD tools).
We can compile two orders of magnitude faster.
![Page 29: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/29.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 30
Small BigEfficient usage Wasteful
Slower Faster bit-sliceFlexible interconnect Coarse routingBigger configuration Fewer configuration bits
Place and route easier Constrains the compiler
Processing Element Size Tradeoffs
![Page 30: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/30.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 31
Stripe Width Tradeoffs
Wider NarrowerFewer stripes More will fit
Virtualize more Fewer page-insBandwidth waste Less bandwidth available
Placer freedom Placement constrained
![Page 31: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/31.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 32
Wider NarrowerMore area Less area
High bandwidth Time-mux bus
Bus Width Tradeoffs
![Page 32: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/32.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 33
Clock Speed Tradeoffs(run-time)
Faster SlowerShort critical path Big chains
Long pipeline built Compact circuitsDecomposition overhead Little decomposition
Virtualized more Less virtualized
+24
2424+
++
2424
24
88
8
![Page 33: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/33.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 34
Configuration Bits per Stripe
0
200400
600800
1000
12001400
1600
64 80 96 112 128 144Stripe Width
Con
figur
atio
n B
its
2 4 8 16 32PE bit width
![Page 34: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/34.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 35
Title:(fir-throughput.eps)Creator:Adobe Illustrator(TM) 7.0Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
![Page 35: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/35.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 36
Project Status• Operational:
– Behavioral and structural models of Piperench in Verilog
– Assembler, simulator– Tools for visualization and debugging– One tile fabricated and tested– Very fast compiler from intermediate language
• In work:– Prototype PipeRench to be taped this summer – PCI board to host PipeRench in a PC
![Page 36: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/36.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 37
Simulated Speed-up vs. UltraSparc @ 300Mhz
328.8
29.020.6
90.961.8
26.0
76.1
1.0
10.0
100.0
1000.0
ATR Cordic DCT FIR IDEA Nqueens Over
![Page 37: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/37.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 38
Future Work
• Build the PCI board• Build the OS device drivers• Start investigating HLL issues:
– automatic partitioning– translation to DIL– special code transformations
![Page 38: The CMU Reconfigurable Computing Project](https://reader035.fdocuments.net/reader035/viewer/2022070419/56815bef550346895dc9e06f/html5/thumbnails/38.jpg)
SSS 4/9/99 CMU Reconfigurable Computing 39
Conclusions
• A set of important applications can benefit from RC devices
• RC offer potential for substantial performance improvement at a low cost
• RC devices will soon be mainstreamin the embedded computing world; perhaps in the future they will also permeate the desktop Pentium V
UVR