Compilation for Embedded Reconfigurable Computing...
Transcript of Compilation for Embedded Reconfigurable Computing...
![Page 1: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/1.jpg)
Compilation for Embedded
Reconfigurable Computing
Architectures: Part AArchitectures: Part AJoão M. P. Cardoso, and Pedro C. Diniz
3rd Summer School on
Generative and Transformational Techniques in Software Engineering
6–11 July, 2009, Braga, Portugal
![Page 2: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/2.jpg)
Outline
• First part of the Tutorial: Architectures– Motivation– Technology Trends– Reconfigurable Computing– Embedded and Reconfigurable Architectures– Reconfiguration– Reconfiguration– Improving Performance
• Second part of the Tutorial: Compiling– Main Compiler and Execution Concepts– Compiling to Fine-Grained Reconfigurable Architectures– Optimizations
• Conclusions
2
![Page 3: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/3.jpg)
Embedded Computing
• High-performance with low
energy and at low cost
• Short time-to-market
• Upgrades during product’s • Upgrades during product’s
lifetime (imply reprogramability)
3
![Page 4: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/4.jpg)
Motivation
• Speedups achieved by Reconfigurable Computing
• Speedup of Cray XD1 (FPGAs @ 200 MHz) over an
Opteron CPU (@ 2.4 GHz)
– DNA sequencing– DNA sequencing
• 695× using 1 FPGA
• 2,794× using 6 FPGAs
– Data Encryption Standard (DES) cipher
• 12,162×
• Power savings of 148× and 608×, respectively
4El-Ghazawi et al, IEEE Computer, 2008
![Page 5: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/5.jpg)
The Reconfigurable Computing
Paradigm
• Traditional computing
– start: algorithm (variable)
– architecture: fixed structure
• Reconfigurable Computing• Reconfigurable Computing
– start: algorithm (variable)
– architecture: variable structure
5
See Nick Tredennick’s Paradigm Classification Scheme
![Page 6: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/6.jpg)
TECHNOLOGY TRENDS
6
![Page 7: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/7.jpg)
Why higher levels of abstraction?
• Hardware and software design gaps
7Source: THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS: 2007 7
![Page 8: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/8.jpg)
System-level design requirements
• near-term years
8Source: THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS: 2007 8
![Page 9: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/9.jpg)
System-level design requirements
• long-term years
9Source: THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS: 2007 9
![Page 10: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/10.jpg)
Reconfigurability is seen
as a “must” for future
10
as a “must” for future
embedded computing
systems
![Page 11: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/11.jpg)
RECONFIGURABLE COMPUTING
11
![Page 12: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/12.jpg)
Reconfigurable (Custom) Computing
• Hardware resources can be “configured” for a specific architecture– Specialized Functional Elements and
Processing Elements
– Interconnect between “Nodes” Custom to Data flow in the application
– Interconnect between “Nodes” Custom to Data flow in the application
– Configurable on-chip memories (size, data-width, indexing, etc.)
– Execution Models (Pipelined, Multithreading, VLIW)
All possible in the same reconfigurable fabric
12
![Page 13: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/13.jpg)
Reconfigurable (customizable) Fabrics
Customized
Customized
F(a,b,c,d)
Customized
memoriesCustomized
interconnects
13
![Page 14: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/14.jpg)
Reconfigurable (customizable) Fabrics
� Data travel on paths statically
or dynamically defined
� Many on-chip Memories
• Parallel accesses
� Native support for data � Native support for data
streaming applications
� Custom Pipelining
• On-chip configurable
memories can be adapted
to communication needs
14
![Page 15: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/15.jpg)
Reconfigurable (customizable) Fabrics
� Data travel on paths statically
or dynamically defined
� Many on-chip Memories
• Parallel accesses
� Native support for data Computing
MemoryFIFO
� Native support for data
streaming applications
� Custom Pipelining
• On-chip configurable
memories can be adapted
to communication needs
15
Computing
EngineComputing
Engine
Memory Memory
FIFO
![Page 16: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/16.jpg)
Reconfigurable (Custom) Computing
• Orders of magnitude speed-ups over traditional computing systems
• Why? Customization is the key:
– High operation- and task-level parallelism
Increased by storage organization (data • Increased by storage organization (data replication/distribution over multiple on-chip memories)
– Non-Standard Numeric Formats (fixed-point, etc.)
– Custom Routing
16
![Page 17: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/17.jpg)
Reconfigurable (Custom) Computing
• Benefits:– Reconfiguration is ideal for fast
prototyping and early evaluation of realistic performance
– Performance
– Tolerate Defects– Tolerate Defects
• Costs:– Added complexity of execution
models makes programming very hard (we have not yet solved the parallel programming problem yet, sort of…)
17
![Page 18: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/18.jpg)
Reconfigurable Computing
Many companies: Cray,
SGI, SRC, ARC, PACT,
PicoChip, Tilera, etc.
Based on source: Bezdek, J.C, Fuzzy
models - what are they, and why,
IEEE Trans. on Fuzzy Systems, 1993.
Reconfigurable Computing has
already achieved this point!
18
![Page 19: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/19.jpg)
Reconfigurable Computing
• The Sony PSP Example
– Reconfigurable Architecture: Virtual Mobile Engine (VME): audio
• 24-bit data width
• 166 MHz
• Single-cycle context switch
19
• Single-cycle context switch
http://www.hotchips.org/archives/hc16/3_Tue/8_HC16_Sess8_Pres1_bw.pdf 19
![Page 20: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/20.jpg)
Reconfigurable Computing
• The Sony PSP Example
– Reconfigurable Architecture: Virtual Mobile Engine (VME): audio
• 24-bit data width
• 166 MHz
• Single-cycle context switch
20
• Single-cycle context switch
20http://www.hotchips.org/archives/hc16/3_Tue/8_HC16_Sess8_Pres1_bw.pdf
![Page 21: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/21.jpg)
Are Architectures Merging?
Multi-(Many)-core vs. Reconfigurable
� Regularity of Reconfigurable Fabrics (e.g., FPGAs)
allow them to ride Moore Law
• Unbelievable large number of devices
• Hard-macro cores can be plugged-in
MulticoreManycoreReconfigurable
Fabrics
• Hard-macro cores can be plugged-in
21
![Page 22: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/22.jpg)
RECONFIGURABLE COMPUTING
ARCHITECTURES
22
![Page 23: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/23.jpg)
Basic Concepts
• Fine-Grained Reconfigurable Arrays:
– Main Cell: Logic block
– Main example: FPGAs (Field Programmable Gate
Arrays)Arrays)
23
![Page 24: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/24.jpg)
FPGA Example
• Virtex-5 (5VLX330): – 207,360 6-LUTs/FFs
– 10,368 Kbits BRAM
– 3,420 Kbits Distributed RAM
– 192 DSP48E Slices (includes a 25 x 18 multiplier)
• multiplier (32x32 → 32):
• 3 DSP48E (∼9.7 ns) ⇒ 64 multipliers!, or• 3 DSP48E (∼9.7 ns) ⇒ 64 multipliers!, or
• 754 LUTs (∼11 ns) ⇒ 275 multipliers!
• adder (32+32 → 32): • 32 LUTs (∼5 ns) ⇒ 6,480 adders!
• RISC processor (Microblaze), @200 MHz and about 1,400 6-LUTs in Virtex-5 (1650 w/ FPU)– Virtex-5 (5VLX330) has LUTs for 148
microprocessors!
24http://www.xilinx.com
![Page 25: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/25.jpg)
Basic Concepts
• Coarse-Grained Reconfigurable Arrays:
– Main cell: Functional Unit
25
destinations
Configuration
Controller
Functional Unit
(FU)
A B
MUX MUX
result
reg
Register
File (RF)
sources
![Page 26: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/26.jpg)
Coarse-Grained Reconfigurable
Array Example
• The PACT XPP-3C: high
performance fixed point DSP
26http://www.pactxpp.com
![Page 27: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/27.jpg)
VLIW vs. Coarse-Grained
Reconfigurable Arrays� Example: r=a×c+b×d;
� r6=r1*r2;
� r7=r3*r4;
r5=r6+r7;
a
x
c b
x
d
+� r5=r6+r7;
� VLIW: Multi-Port Register File
PE 1 . . .PE 2 PE N
ALU
r7=r3*r4;
r5=r6+r7;
r6=r1*r2;
nop;
PE2PE1
r
+
27
![Page 28: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/28.jpg)
VLIW vs. Coarse-Grained
Reconfigurable Arrays
� Example: r=a×c+b×d;
� r6=r1*r2;
� r7=r3*r4;
r5=r6+r7;
a
x
c b
x
d
+
FU FU FU FURegister File
28
� r5=r6+r7;
� CGRA:r
+
x x +Register File FU
ab c d r
![Page 29: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/29.jpg)
VLIW vs. Coarse-Grained
Reconfigurable Arrays� Example: r=a×c+b×d;
� r6=r1*r2;
� r7=r3*r4;
r5=r6+r7;
a
x
c b
x
d
+
PE 1 PE 2 PE 3
PE 4 PE 5 PE 6
PE 7 PE 8 PE 9
� r5=r6+r7;
� CGRA:r
+
29
![Page 30: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/30.jpg)
VLIW vs. Coarse-Grained
Reconfigurable Arraysa
x
c b
x
d
+
� Example: r=a×c+b×d;
� r6=r1*r2;
� r7=r3*r4;
r5=r6+r7;PE 1 PE 2 PE 3
PE 4 PE 5 PE 6
PE 7 PE 8 PE 9
r7=r3*r4;
r5=r6+r7;
r6=r1*r2;
PE2PE1 PE4
r
+� r5=r6+r7;
� CGRA:
30
![Page 31: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/31.jpg)
Past and Notable Architectures
Xilinx XC6200 Field
Programmable Gate
Arrays, 1995-2001
Chameleon (1997) RCP architecture,
2000-2003
Triscend
(1997),
2001-2004
31
![Page 32: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/32.jpg)
Granularity in RC Architectures
32
![Page 33: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/33.jpg)
Granularity in RC Architectures
• Fine-Grained Fabrics are able to host a wide
range of architectures
33
![Page 34: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/34.jpg)
Granularity in RC Architectures
• Fine-Grained Fabrics are not tied to a
computational model
– E.g., load/store (a) vs. data streaming (b)
34
![Page 35: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/35.jpg)
RECONFIGURATION
35
![Page 36: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/36.jpg)
Reconfiguration
• Possibility to modify the computing structures
in the field (i.e., after fabrication)
• Static
– During setup or before the beginning of the – During setup or before the beginning of the
computations
• Dynamic or in Runtime
– During the computations
36
![Page 37: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/37.jpg)
Reconfiguration
• The PACT XPP example (a Coarse-Grained
Reconfigurable Architectures)
Configuration Cache
fetch
configure
PE
PEPE
PE
Configuration Manager
(CM)
Cache(CC)
fetch
CMPort0
CMPort1
M
M
37
![Page 38: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/38.jpg)
Reconfiguration
• Coarse-Grained Reconfigurable Architectures
– The PACT XPP reconfiguration flow
Fetch (f) Configure (c) Compute (comp)
fetch configure
c0;If(CMPort0) then c1;If(CMPort1) then c2;
c1
<N
CMport0CMport1
c2
c0Configuration
Cache(CC)
Configuration Manager
(CM)
c0
38
![Page 39: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/39.jpg)
• The PACT XPP reconfiguration flow
Reconfiguration
begin
end
Conf. 1
Conf. 2
39 39
![Page 40: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/40.jpg)
• The PACT XPP reconfiguration flow
begin
Reconfiguration
40
Conf. 1
Conf. 2
end
Conf. 3
Conf. 4 Conf. 5
40
![Page 41: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/41.jpg)
Remarks about Reconfigurable
Computing Architectures
• Reconfigurable fine-grained architectures
– have the potential to virtually implement any
architecture
• Reconfigurable coarse-grained architectures • Reconfigurable coarse-grained architectures
– are more computing oriented with granularity
close to the data widths used in data processing
• Both permit customization and allow energy
savings and high-performance
41
![Page 42: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/42.jpg)
IMPROVING PERFORMANCE
42
![Page 43: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/43.jpg)
Main Target Architecture
• Microprocessor extended with Hardware
Accelerators (e.g., coarse-grained
reconfigurable arrays)
43
![Page 44: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/44.jpg)
Improving Performance
• For a given input application and a target embedded computing system:
• If requirements are not satisfied on:– execution time, power dissipation, energy
consumption, memory bandwidthconsumption, memory bandwidth
• Need to: – perform code optimizations
– migrate sections of the code to a hardware accelerator
– redesign the entire system (e.g., including different processors)
44
![Page 45: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/45.jpg)
Improve Execution Time
• Delegating to compiler optimizations such as -
O3 might not be enough!
• Find the most critical sections of the code and
then try to optimize those sectionsthen try to optimize those sections
– Use profiling tools (e.g., gprof) to identify them
– 90-10 rule of dumb: “90% of global execution time
is spent in 10% of the code”
45
![Page 46: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/46.jpg)
Amdahl's Law
• The global speedup we can achieve is limited
– by the fraccion (f) of the execution time of the
application, and
– by the speedup (S) we can achieve for that – by the speedup (S) we can achieve for that
fraction
S
ff
Speedup
+−
=
)1(
1
46
![Page 47: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/47.jpg)
Lessons from Amdahl's Law
• Two corollaries of this law:
– Improve fractions of the
application that reflect a
significant part of the global
exec. time
Small f: optimizations will
– But, the segments that we
ignore also limit the
speedup
• As S increases, the
speedup will tend to
0
10
20
30
40
50
60
70
80
90
100
0
0.0
4
0.0
8
0.1
2
0.1
6
0.2
0.2
4
0.2
8
0.3
2
0.3
6
0.4
0.4
4
0.4
8
0.5
2
0.5
6
0.6
0.6
4
0.6
8
0.7
2
0.7
6
0.8
0.8
4
0.8
8
0.9
2
0.9
6
Sp
ee
du
p
f
• Small f: optimizations will
have a minor impact
fSpeedup
−
=
1
1
47
![Page 48: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/48.jpg)
Example
• Libmad for MP3 decoding
• Analysing theoretical limits for possible accelerations:– considering dct32, a speedup upper bound of 1.11 is expected!
– to achieve an upper bound of 10 we need to take into account the first 9 funtions!
– Note that the analysis considers zero-execution time for each accelerated function and zero communication costs (it is always useful as a first analysis…)
37.5140% Libmad execution time
48Libmad: http://www.underbit.com/products/mad/.
16.48
10.02
5.37 5.01 4.8 4.68 3.59 3.3 2.24 2.18 1.65 0.66 0.56 0.49 0.4 0.23 0.23 0.2 0.10
5
10
15
20
25
30
35% Libmad execution time
![Page 49: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/49.jpg)
Example
• Tools, such as Compilers and Design Space Exploration (DES) environments, should give programmers an easy way to evaluate different solutions
37.5140% Libmad execution time
49
16.48
10.02
5.37 5.01 4.8 4.68 3.59 3.3 2.24 2.18 1.65 0.66 0.56 0.49 0.4 0.23 0.23 0.2 0.10
5
10
15
20
25
30
35% Libmad execution time
![Page 50: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/50.jpg)
CONCLUSIONS – PART A
50
![Page 51: Compilation for Embedded Reconfigurable Computing ...paginas.fe.up.pt/~jmpc/Talks/gttse2009...Compilation for Embedded Reconfigurable Computing Architectures: Part A João M. P. Cardoso,](https://reader033.fdocuments.net/reader033/viewer/2022052103/603d4dbffc3b9135a749ac41/html5/thumbnails/51.jpg)
Conclusions
• Reconfigurable computing architectures offer
flexible, low-cost, powerful, notable hardware
accelerators
• Unfortunately, • Unfortunately,
– Efficient compilation is hard
– Current topic of research, requiring a
multidisciplinary approach,…
51