LiLeveraging DSP BiBasic Oti i tiO...
Transcript of LiLeveraging DSP BiBasic Oti i tiO...
![Page 1: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/1.jpg)
L i DSP B i O ti i tiLeveraging DSP: Basic Optimization
FAE Summit 2015Embedded ProcessingEmbedded Processing
![Page 2: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/2.jpg)
Agenda• C6000 VLIW Architecture• Software Pipeline• Software Pipeline OptimizationSoftware Pipeline Optimization
– Estimate performancesU i CCS t ti i d– Using CCS to optimize code
– Software pipeline issues
• Hands‐on Lab: Optimize FIR filter
2
![Page 3: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/3.jpg)
C6000 VLIW A hit tC6000 VLIW Architecture
TI DSP: Basic Optimization
![Page 4: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/4.jpg)
C6000 DSP CoreA hi
MemoryArchitecture
A0 B0 • VLIW (Very Large Instruction .D1 .D2 Word) architecture:
– Two (almost independent) sides A and B
.S1 .S2
MAC
sides, A and B– 8 functional units: M, L, S, D – Up to 8 instructions sustained
.M1 .M2
MACs dispatch rate
..L1 L2
..
A31.L1 .L2
B31
Controller/DecoderController/Decoder4
![Page 5: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/5.jpg)
C6000 Cross‐Path
A0
Register File A
B0
Register File B
A1
A2
B1
B2
A3
A4
B3
B4
......A B
A31 B31
.D1
.S1
.D1
.S1
.M1
.L1
.M1
.L15
![Page 6: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/6.jpg)
Some C6000 Family Members
TMS320C6424TMS320C6424 TMS320C6748 TMS320C6678
6
![Page 7: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/7.jpg)
Partial List of .M Instructions
7
![Page 8: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/8.jpg)
Partial List of .D Instructions
8
![Page 9: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/9.jpg)
Partial List of .L Instructions
9
![Page 10: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/10.jpg)
Partial List of .S Instructions
10
![Page 11: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/11.jpg)
S ft Pi liSoftware Pipeline
TI DSP: Basic Optimization
![Page 12: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/12.jpg)
Non‐Pipelined vs. Pipelined CPU
CPU TypeClock Cycles
1 2 3 4 5 6 7 8 9CPU Type
F2 D2 E2 F3 D3 E3F1 D1 E1Non‐Pipelined
1 2 3 4 5 6 7 8 9
p
F1 D1 E1Pipelined F1 D1 E1F2 D2 E2
F D E
Pipelined
Stage Pipeline Function
F3 D3 E3
Pipeline fullF
Fetch• Generate program fetch address• Read opcode
DD d
• Route opcode to functional unitsD d i t ti
Now look at the C66x pipeline.Decode • Decode instructions
EExecute Execute instructions
![Page 13: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/13.jpg)
Program Fetch PhasesPhase Description
PG G f h ddPG Generate fetch address
PS Send address to memory
PW Wait for data readyPW Wait for data ready
PR Read opcode
C66xCore Functional
UnitsPR
Units
PWPS
Memory PG
PW
![Page 14: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/14.jpg)
Pipeline Phases: ReviewProgram Fetch
ExecuteDecode
PG PS PW PR D EPG PS PW PR D EPG PS PW PR D E
PG PS PW PR D EPG PS PW PR D E
PG PS PW PR D EPG PS PW PR D E
Single‐cycle performance is not affected by adding three program fetch phases.program fetch phases.
That is, there is still an execute every cycle.
How about decode? Is it only one cycle?
![Page 15: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/15.jpg)
Decode PhasesDecode Phase Description
DP Intelligently routes instruction toDP Intelligently routes instruction to functional unit (dispatch)
DC Instruction decoded at functional unit (d d )(decode)
C66xCorePR Functional
UnitsDP UnitsDPDC
PWPS
Memory PG
PW
![Page 16: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/16.jpg)
Pipeline Phases
Program Fetch ExecuteDecode
PG PS PW PR DP DC E1PG PS PW PR DP DC E1
PG PS PW PR DP DC E1PG PS PW PR DP DC E1
PG PS PW PR DP DC E1PG PS PW PR DP DC E1
PG PS PW PR DP DC E1PG PS PW PR DP DC E1
Pipeline Full
How many cycles does it take to execute an instruction?
![Page 17: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/17.jpg)
Instruction DelaysAll C66x instructions require only one cycle to
t b t lt d l dexecute, but some results are delayed.
Description Instruction Example Delay
Single Cycle All instructions except 0
Integer multiplication and new floating point
MPY, FMPYSP 1
L fl i i MPYSP 2Legacy floating point multiplication
MPYSP 2
Load LDW 4Branch B 5
![Page 18: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/18.jpg)
S ft Pi li O ti i tiSoftware Pipeline Optimization
• Estimating performance• Using CCS to optimize code• Using CCS to optimize code• Software pipeline issues
![Page 19: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/19.jpg)
Software Pipeline Example
Void example(float *in, float*out, int N, float V){{
sum = 1.0 ;for (i=0; i<N; i++){{
x = *in++ * V ;sum = sum + x ;*out++ = sum ;
}}}
How many cycles wouldHow many cycles wouldit take to perform the loop five times?
![Page 20: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/20.jpg)
Non Pipeline Code Flowo pe e Code o
Implementation of the loop in the following code:Void example(float *in, float*out, int N, float V){{
sum = 1.0 ;for (i=0; i<N; i++){{
x = *in++ * V ;sum = sum + x ;*out++ = sum ;
}}
![Page 21: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/21.jpg)
Software Pipeline Code FlowImplementation of the loop in the following code:Void example(float *in, float*out, int N, float V)
p
p ( , , , ){
sum = 1.0 ;for (i=0; i<N; i++){
x = *in++ * V ;sum = sum + x ;*out++ = sum ;
}}
The compiler kno s all the dela s and isThe compiler knows all the delays and is smart enough to build the correct software pipeline.pipeline.
![Page 22: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/22.jpg)
Software Pipeline Support• The compiler is smart enough to schedule instructions
efficiently.efficiently.• Software pipeline is the major speed‐up mechanism for
VLIW architectureVLIW architecture.• Software pipeline requires deterministic execution:
Not if branch and call– Not if, branch, and call– No interruptsNo dependencies– No dependencies
![Page 23: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/23.jpg)
Software Pipeline Example:So t a e pe e a p e:Interrupt
Implementation of the loop in the following code:p p gVoid example(float *in, float*out, int N, float V){
sum = 1.0 ;f (i 0 i<N i++)for (i=0; i<N; i++){
x = *in++ * V ;sum = sum + x ;
*out++ = sum ;}
}
![Page 24: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/24.jpg)
Software Pipeline Example: .D1 .D2 .M1 .L1LD1
2 LD So t a e pe e a p e:SPLOOP
2345
LDLDLD LD MPY
Implementation of the loop in the following code:
5678
LD MPYLD MPY
MPY ADDST MPY ADD
LDInterrupt
Void example(float *in, float*out, int N, float V){
sum = 1.0 ;for (i=0; i<N; i++)
91011
ST MPY ADDST MPY ADDST ADD for (i 0; i<N; i++)
{x = *in++ * V ;sum = sum + x ;
121314
STServing The Interrupt
LD*out++ = sum ;
}}
151617
LDLDLD
181920
LD MPYLD MPYLD MPY ADDLD ST MPY ADD21
![Page 25: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/25.jpg)
Code Development• Code Generation Tools can build executables from different code types:
– Generic C or C++ code
– C with intrinsic– Linear Assembly
– Assembly (DETAI)• Optimization is performed:Optimization is performed:
– In the front end
– Using the intrinsicR ll ti d ft i li h i ti i d li bl– Resource allocation and software pipeline search in optimized linear assembly
• To understand the quality of the optimization of a loop, compare the theoretical iteration interval (II: The actual number of cycles between two results of the loop) to the result of the
bl / ti iassembler/optimizer.
– Was the software pipeline successful (if not, why)?– Is the usage balanced between the two sides (if not, can it be improved)?
– What are the bottlenecks and how to mitigate them?
• To keep the assembly file, set the –k option
![Page 26: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/26.jpg)
Keep Generated Assembly File
![Page 27: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/27.jpg)
Build Options: Optimization and Debug
![Page 28: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/28.jpg)
‐S and ‐MW Setting
![Page 29: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/29.jpg)
And if You Don’t Find the GUI?
![Page 30: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/30.jpg)
.D1 .D2 .M1 .L1LD1 Dependencies
234
pWhat if out = in + 1?
In that case the code cannot start loading the567
MPY
ADD
In that case, the code cannot start loading the next input before the previous output is ready.
Unless the compiler knows otherwise the891011
STLD
Unless the compiler knows otherwise, the compiler assumes dependencies.
Implementation of the loop in the following code:Void example(float *in, float*out, int N, float V){
sum = 1 0 ;
11121314
MPYADD sum = 1.0 ;
for (i=0; i<N; i++){
x = *in++ * V ;
151617
STLD
sum = sum + x ;*out++ = sum ;
}}
181920 MPY
}21
![Page 31: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/31.jpg)
Dependencies The compiler knows that there is no dependencies in the following cases:in the following cases:• It can understand it from the code (e.g., the calling function is in the same file as the routine).
• The code uses the restrict keyword. y• A compiler switch tells the compiler that there is no overlay between vector pointers ( mt)no overlay between vector pointers (‐mt)
![Page 32: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/32.jpg)
IF and Conditional Execution• All assembly instructions are conditional instructions • In conditional instruction the functional unit executes the
instruction but the result is written to the output register ONLY if the condition is true Th diti h ld b k ONLY th l b f th• The condition should be known ONLY the cycle before the result is written to the output register
• Condition execution can replace if statements as follows:Condition execution can replace if statements as follows:
if (x < 1000.0) sum = sum + x --> [x <1000.0] sum=sum+x
• The compiler is smart enough to convert “simple” if statements into conditional execution
• The result of x < 1000.0 should known just one cycle before the last step of execution
![Page 33: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/33.jpg)
Function CallsFunction Calls
Void example(float *in, float*out, int N, float V){
sum = 1.0 ;for (i=0; i<N; i++){
x = *in++ * V ;sum = sum + f(x) ; *out++ = sum ;
}}
• Function call prevents the compiler from generating the software pipeline• Function call prevents the compiler from generating the software pipeline.
• Inline, the function removes this limitation.
• The compiler does not inline function (unless it is told to). It is up to the user.
![Page 34: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/34.jpg)
Software Pipeline ExampleSoftware Pipeline xample
void copyFunction(int *p1, int *p2, int N){
int i ;for (i=0; i<N;i++){
*p2++ = *p1++ ;}return ;
}
![Page 35: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/35.jpg)
Software PipelineE l R i d
;*----------------------------------------------------------------------------*;* SOFTWARE PIPELINE INFORMATION;* Example: Reminder;;* Loop found in file : ../utility.c;* Loop source line : 12;* Loop opening brace source line : 13;* Loop closing brace source line : 15;* Known Minimum Trip Count : 1 ;* Known Max Trip Count Factor : 1;* Loop Carried Dependency Bound(^) : 6;* Unpartitioned Resource Bound : 1;* Partitioned Resource Bound(*) : 2;* Resource Partition:* A id B id;* A-side B-side;* .L units 0 0 ;* .S units 0 0 ;* .D units 0 2* ;* .M units 0 0 ;* .X cross paths 0 0; .X cross paths 0 0 ;* .T address paths 0 2* ;* Long read paths 0 0 ;* Long write paths 0 0 ;* Logical ops (.LS) 0 0 (.L or .S unit);* Addition ops (.LSD) 0 0 (.L or .S or .D unit);* Bound(.L .S .LS) 0 0 ;* Bound(.L .S .D .LS .LSD) 0 1 ;*;* Searching for software pipeline schedule at;* Searching for software pipeline schedule at ...;* ii = 6 Schedule found with 2 iterations in parallel;* Done;*;* Loop will be splooped; p p p
![Page 36: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/36.jpg)
Restrict Qualifiers• Loop iterations cannot be overlapped unless input and output are
independent (do not reference the same memory locations).• Most users write their loops so that loads and stores do not overlap.• Compiler does not know this unless the compiler sees all callers or user
tells compiler.• Use restrict qualifiers to notify compiler.• Restrict tells the compiler that any location addressed by the following
pointer WILL NOT be accessed by any other vector.
void copyFunction(int *restrict p1, int *p2, int N){int i ;int i ;for (i=0; i<N;i++){*p2++ = *p1++ ;*p2++ = *p1++ ;}return ;}}
![Page 37: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/37.jpg)
;*----------------------------------------------------------------------------*;* SOFTWARE PIPELINE INFORMATION;*;* Loop found in file : ../utility.c;* Loop source line : 12;* Loop opening brace source line : 13;* Loop closing brace source line : 15;* Known Minimum Trip Count : 1 ;* Known Max Trip Count Factor : 1;* Known Max Trip Count Factor : 1;* Loop Carried Dependency Bound(^) : 0;* Unpartitioned Resource Bound : 1;* Partitioned Resource Bound(*) : 1;* Resource Partition:;* A-side B-side;* .L units 0 0 ;* .S units 0 0 ;* .D units 1* 1* ;* .M units 0 0 ;* .X cross paths 0 1* ;* .T address paths 1* 1* ;* Long read paths 0 0 ;* Long write paths 0 0 ;* Logical ops (.LS) 0 0 (.L or .S unit);* Addition ops ( LSD) 0 1 ( L; Addition ops (.LSD) 0 1 (.L or .S or .D unit);* Bound(.L .S .LS) 0 0 ;* Bound(.L .S .D .LS .LSD) 1* 1* ;*;* Searching for software pipeline schedule at ...g p p;* ii = 1 Schedule found with 7 iterations in parallel;* Done;*;* Loop will be splooped
![Page 38: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/38.jpg)
Agenda• C6000 VLIW Architecture• Software Pipeline• Software Pipeline OptimizationSoftware Pipeline Optimization
– Estimate performancesU i CCS t ti i d– Using CCS to optimize code
– Software pipeline issues
• Hands‐on Lab: Optimize FIR filter
38
![Page 39: LiLeveraging DSP BiBasic Oti i tiO ptimizationsoftware-dl.ti.com/.../external_files/DSP_Optimization.pdfTI DSP: Basic Optimization C6000 DSP Core Memory AhiArchitecture A0 B0 • VLIW](https://reader033.fdocuments.net/reader033/viewer/2022060905/60a06e1af3eb0525c37ffb07/html5/thumbnails/39.jpg)
DSP Lab Instructions
39