Compiler Design for Digital Signal Processors · LSI Proprietary 3 Storage Networking Systems...
Transcript of Compiler Design for Digital Signal Processors · LSI Proprietary 3 Storage Networking Systems...
Compiler Design
for Digital Signal Processors
Erik Eckstein
03/25/2010
2LSI Proprietary
LSI Corporation
• LSI is a global leader in semiconductors and software solutions
• Solutions for storage and networking markets
• Founded 1981
• Headquarter: Milpitas, California
• Design centers around the world
– Small design center in Vienna: development of DSP software tools
3LSI Proprietary
StorageNetworking
Systems Semiconductors
Entry RAID Systems
Mid-range RAID
Systems
Host RAID Adapters
Host RAID SW
Data Management
SW
HDD ICs: HDCs, SoCs,
RCs, Pre-Amps
SAS & SAS ROC
Controllers
SAS Expanders, Bridges
SAN Fabric Products
Custom Storage
Products
Custom Products –
Enterprise Datacom
Network Processors
DSPs
Framers/Mappers
Link Layer Products
Modems, USB,
Firewire
Market Focus
4LSI Proprietary
• Digital Signal Processors– Used in mobile phones, wireless devices, infrastructure, etc.
– Audio, video processing
– High performance for signal processing algorithms
– Low power consumption
• Example: StarPro2716– 16 StarCore SC3400e DSP cores @ 750 MHz
– 8 ARM cores for networking packet processing
– Ethernet I/O support
– 6MB on chip memory
– 96 GMACs (96*109 multiply-accumulate instructions/sec)
– Multichannel speech processing in voice gateways
DSP Overview
5LSI Proprietary
DSP Characteristics
• RISC-like architecture
• Multiple parallel units
• VLIW (very long instruction word)
• Multiple memory buses
• 40-bit registers
• Complex addressing modes
• Hardware loops
• Fixed-point arithmetic
6LSI Proprietary
SC3400e Core
PCUprogram control unit
AGUaddress generation unit
DALUdata ALU
ALU
ALU
ALU
ALU
AA
U
AA
U
BM
U
PROGRAM
XPA (32)
XPD (128)
DATA
XAA (32)
XAD.in XAD.out (64) (64)
XBA (32)
2 FETCH BUFFERS
16DATA
REGISTERS
27ADDRESS
REGISTERS
7LSI Proprietary
DSP-C Compiler Requirements
• Support for all DSP features– with language extensions (fractional data types)
– with automatic recognition
• Optimizing for speed– time critical code
• Optimizing for code size– control code
8LSI Proprietary
high intermediate representation(general)
Compiler Design
low intermediate representation(processor specific)
component component component...
component
component
component
9LSI Proprietary
Code Generator
C Code Parser
Optimizations on Low Level IR
Optimizations on High Level IR
Front End
Back End
Compilation Stages
10LSI Proprietary
addr of addr of
High Level IR Example
assign
addr of
obj "b"
+
read read
obj "c"
obj "a"
a = b + c;
11LSI Proprietary
Low Level IR Example
move.w #<32,r0
suba r1,r0
asll d7,d1
move.w #<-1,d5 move.l (r6),d4
12LSI Proprietary
Optimizations are Important!
• Compiler contains approx. 50 different optimizations– 70% high level optimizations
– 30% low level optimizations
• Most optimization are invoked several times– up to 17 times!
13LSI Proprietary
Optimization Example: FIR Filter
long fir(short *x, short *y)
{
long sum = 0;
int i;
for (i = 0; i < 128; i++) {
sum += x[i] * y[i];
}
return sum;
}
14LSI Proprietary
Loop Strength Reduction
long fir(short *x, short *y)
{
long sum = 0;
int i;
for (i = 0; i < 128; i++) {
sum += *x * *y;
x++;
y++;
}
return sum;
}
15LSI Proprietary
Hardware Loop Support
long fir(short *x, short *y)
{
long sum = 0;
int i;
loop 128 {
sum += *x * *y;
x++;
y++;
}
return sum;
}
16LSI Proprietary
Instruction Selection
_fir
doensh3 #128
clr sum
loopstart3
move.w (x),tmp1
move.w (y),tmp2
imac tmp2,tmp1,sum
adda #4,x
adda #4,y
loopend3
rts
17LSI Proprietary
Register Allocation
_fir
doensh3 #128
clr d0
loopstart3
move.w (r0),d1
move.w (r1),d2
imac d2,d1,d0
adda #4,r0
adda #4,r1
loopend3
rts
5 cycles/iteration
18LSI Proprietary
Addressing Mode Optimization
_fir
doensh3 #128
sub d0,d0,d0
loopstart3
move.w (r0)+,d1
move.w (r1)+,d2
imac d2,d1,d0
loopend3
rts
3 cycles/iteration
19LSI Proprietary
Instruction Scheduling
_fir
doensh3 #128
sub d0,d0,d0
loopstart3
[ move.w (r0)+,d1
move.w (r1)+,d2 ]*)
imac d2,d1,d0
loopend3
rts
2 cycles/iteration*) executed in parallel
20LSI Proprietary
Software Pipelining
1 cycles/iteration
_fir
sub d0,d0,d0
doensh3 #127
[ move.w (r0)+,d1 move.w (r1)+,d2 ]
loopstart3
[ move.w(r0)+,d1
move.w(r1)+,d2
imac d2,d1,d0 ]
loopend3
imac d2,d1,d0
rts
21LSI Proprietary
Loop Transformation
0.25 cycles/iteration
_fir
...
loopstart3
[ move.4w (r0)+,d0:d1:d2:d3
move.4w (r1)+,d8:d9:d10:d11
imac d0,d8,d12
imac d1,d9,d13
imac d2,d10,d14
imac d3,d11,d1 ]
loopend3
...
Questions?
23LSI Proprietary