To DSP or Not to DSP?

17
To DSP or Not to To DSP or Not to DSP? DSP? Chad Erven Chad Erven

description

To DSP or Not to DSP?. Chad Erven. Words to Bits – Your Options. ASIC FPGA DSP Embedded RISC General Purpose Processor (GPP). Why Go Programmable?. Building the chip wrong Systems are increasingly too complex to efficiently be described by RTL designers - PowerPoint PPT Presentation

Transcript of To DSP or Not to DSP?

Page 1: To DSP or Not to DSP?

To DSP or Not to To DSP or Not to DSP?DSP?

Chad ErvenChad Erven

Page 2: To DSP or Not to DSP?

Words to Bits – Your Words to Bits – Your OptionsOptions ASIC ASIC FPGAFPGA DSPDSP Embedded RISCEmbedded RISC General Purpose Processor (GPP)General Purpose Processor (GPP)

Page 3: To DSP or Not to DSP?

Why Go Why Go Programmable?Programmable?

1.1. Building the chip wrongBuilding the chip wrong– Systems are increasingly too complex to efficiently be Systems are increasingly too complex to efficiently be

described by RTL designersdescribed by RTL designers– Errors are orders of magnitudes more difficult to find in Errors are orders of magnitudes more difficult to find in

hardware than softwarehardware than software– Defects are extremely costly in hardwareDefects are extremely costly in hardware

2.2. Building the wrong chipBuilding the wrong chip– Only software is flexible enough to adapt during and Only software is flexible enough to adapt during and

after system designafter system design

HARDWARE IS TO HARD!HARDWARE IS TO HARD!

Page 4: To DSP or Not to DSP?

So Software and So Software and Processors, Right?Processors, Right? Using processors has its drawbacks – Using processors has its drawbacks –

especially in SOC designsespecially in SOC designs

– Never a perfect match between the application and Never a perfect match between the application and the hardwarethe hardware

– Performance costs, power penalties, wasted silicon Performance costs, power penalties, wasted silicon will ALWAYS happen to some extent will ALWAYS happen to some extent

– Integrating multiple disparate cores with each Integrating multiple disparate cores with each otherother

Page 5: To DSP or Not to DSP?

Splitting the Splitting the Difference – ASIPsDifference – ASIPs Ever wish you were the processor Ever wish you were the processor

designer?designer?

Now you are! Write the exact Now you are! Write the exact instructions you need and nothing more.instructions you need and nothing more.

An Application Specific Integrate An Application Specific Integrate Processor (ASIP) offers the best of both Processor (ASIP) offers the best of both worldsworlds

Page 6: To DSP or Not to DSP?

Back Up!Back Up!

Isn’t hardware too much work?Isn’t hardware too much work?– YesYes

So doesn’t an ASIP defeat the So doesn’t an ASIP defeat the purpose?purpose?– NoNo

Why not?Why not?– Extending a base processor is much easierExtending a base processor is much easier– Readily amiable to automationReadily amiable to automation– You only have to verify the instruction description, You only have to verify the instruction description,

integration into the processor is guaranteed integration into the processor is guaranteed

Page 7: To DSP or Not to DSP?

Cool, Show Me How It Cool, Show Me How It WorksWorks ASIPs derive their performance from ASIPs derive their performance from

three problems for a processorthree problems for a processor1.1. Operations that are innately parallel must be Operations that are innately parallel must be

expressed seriallyexpressed serially– Somewhat solved by SIMD or MIMD processorsSomewhat solved by SIMD or MIMD processors

2.2. Memory space is addressed as one continuous spaceMemory space is addressed as one continuous space– Somewhat solved by modifiers and/or pragmas (dm/pm)Somewhat solved by modifiers and/or pragmas (dm/pm)

3.3. Applications are complicated by their expression as Applications are complicated by their expression as operations on C typesoperations on C types

– Somewhat alleviated by powerful instructions in hardwareSomewhat alleviated by powerful instructions in hardware

Page 8: To DSP or Not to DSP?

Working with the Innate Working with the Innate Nature of the AlgorithmNature of the Algorithm

Example –Example – byte swap (common telecom task)byte swap (common telecom task)

int *a, *b ; int *a, *b ; ……

for(int i= 0 ; i < 4096 ; i++ )for(int i= 0 ; i < 4096 ; i++ ){{a[i] =( a[i] =(

((b[i] & 0x000000ff) << 24) | ((b[i] & 0x000000ff) << 24) | ((b[i] & 0x0000ff00) << 8) | ((b[i] & 0x0000ff00) << 8) | ((b[i] & 0x00ff0000) >> 8) | ((b[i] & 0x00ff0000) >> 8) | ((b[i] & 0xff000000) >> 24) );((b[i] & 0xff000000) >> 24) );

}}

Page 9: To DSP or Not to DSP?

Working with the Innate Working with the Innate Nature of the AlgorithmNature of the Algorithm

Write your own instruction:Write your own instruction:

operationoperation swap { swap {in ARin AR x, x, out ARout AR y}{} y}{}{y = {x[7:0],x[15:8],x[23:16],x[31:24]};}{y = {x[7:0],x[15:8],x[23:16],x[31:24]};}

Making the C Code:Making the C Code:for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ;for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ;

Execution Cycles without TIE Execution Cycles without TIE ExtensionExtension

Execution Cycles With TIE Execution Cycles With TIE ExtensionExtension

4,915,300 4,915,300 1,638,5241,638,524 5X SPEED UP!!!5X SPEED UP!!!

Page 10: To DSP or Not to DSP?

Instruction FusionInstruction Fusion

op1

reg1 (input) reg2 (input)

reg3 (output)

op2

reg3 (input) reg4 (input)

reg5 (output)

Unfused operation

op1

reg1 (input) reg2 (input)

op2

reg4 (input)

reg5 (output)

Fused operation

Page 11: To DSP or Not to DSP?

ExampleExamplefor(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ;for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ;

Assembly:Assembly:

loop:loop:l8uil8ui a12,a11,0a12,a11,0l8uil8ui a13,a10,0a13,a10,0addiaddi a11,a11,1a11,a11,1addiaddi a10,a10,1a10,a10,1mull6u mull6u a8,a12,a13a8,a12,a13sraisrai a8,a8,4a8,a8,4s8is8i a8,a9,0a8,a9,0addiaddi a9,a9,1a9,a9,1

Page 12: To DSP or Not to DSP?

ExampleExample

addi addi

addi

l8uil8ui

srai

mull6u

1 10 0

1

4

a11a10

s8i

a9

a9

Page 13: To DSP or Not to DSP?

ExampleExample

addi addil8uil8ui

1 10 0a11

a10

fusion.mull6u.srai.s8i.addi

a9

a9

Page 14: To DSP or Not to DSP?

ExampleExample

New assembly code:New assembly code:

loop:loop:

l8uil8ui a12,a11,0a12,a11,0

l8uil8ui a13,a10,0a13,a10,0

addiaddi a10,10,1a10,10,1

addiaddi a11,a11,1a11,a11,1

fusion.mull6u.srai.s8i.addifusion.mull6u.srai.s8i.addi a9,12,a13a9,12,a13

Page 15: To DSP or Not to DSP?

BenchmarkingBenchmarking

• Hand coded assembly for the other processors

EEMBC ConsumerMarks (performance). From [Rowen] . EEMBC Summary (Performance/MHz). From [Rowen]

Page 16: To DSP or Not to DSP?

And I Haven’t Even And I Haven’t Even Gotten To…Gotten To… Sharing input operandsSharing input operands

Substituting variables with constantsSubstituting variables with constants

Replacing memory tables with logicReplacing memory tables with logic

Limiting immediate values to the minimum required widthLimiting immediate values to the minimum required width

Placing operands in special registersPlacing operands in special registers

Creating SIMD instructions Creating SIMD instructions

Reducing the size of operand specifiersReducing the size of operand specifiers

Custom input/output queues Custom input/output queues

Page 17: To DSP or Not to DSP?

Ok, Let Me Have It Dr. Ok, Let Me Have It Dr. Smith Smith

(The rest of you can ask questions (The rest of you can ask questions too)too)