S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and...

39
s3.kth .se DSP Lecture 30/3-2010 Per Zetterberg

Transcript of S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and...

Page 1: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

s3.k

th.s

eDSP Lecture 30/3-2010

Per Zetterberg

Page 2: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Agenda

• General.• Starting CCS• Comparing matlab and DSP results. • Profiling when comparing matlab and DSP results. • Matlab<->DSP communication.• EDMA• QUAD_DAC_ADC (headphones). • _empty• State-machine using case statement.• Data formats.• Overlap and add.• Stack and heap.• Simple optimization rules.• Cache• Some advices.

Page 3: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

DSP Programming

Setup in the project course:

PC or ”host”

DSP or “target” (or

DSK)

Page 4: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

What is a DSP ?

A CPU which is optimized for signal processing:

• Special instructions for common signal processing operations, e.g. multiply and accumulate.

• Often on-chip circuits that handle input/output (IO).

• Low power consumption.• Cheap (compared to processors in e.g.

desktop computers).

n

nn yx

Page 5: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Project Prototype: DSP versus PC

•Concurrently running programs at both the DSP and the PC.

•DSP-card used for:

•Signal processing

•IO (sampling/playback)

•PC used for:

•Graphical User Interface (GUI)

•Controlling the application, receiving results.

Page 6: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

The DSP in the project course

You will use a Texas Instruments C6713 floating point digital signal processor.

• Massively parallel architecture (VLIW) - up to eight 32 bit instructions are executed simultaneously.

• Running at 225 MHz, giving 1.2 GFlops peak performance.

Belongs to the TI C6x family of DSPs

Widely used in industry

Page 7: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Software pipelining

• The processor can be programmed to perform eight operations in paralell (e.g. MULT, ADD, MV)

• Every instruction has a certain latency.• The compiler will pipeline code i.e. perform several

instructions in parallell in loops if:– There are no function calls in the loop.– Optimization –o3 is selected.– ..

• Check that important loops are pipelined.

Page 8: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Technical Requirements of Prototype

• Real-time functionality• DSP-card: signal processing, • PC: user interface• User interface through a GUI (windows style)

implemented in matlab.• No unnecessary use of processor time on the PC• Well structured and adequately commented

source code

For more details see

www.s3.kth.se/signal/edu/projekt/examination.shtml

Page 9: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Development Tools

• Matlab– Algorithm development.– Prototype verification.– User interface development (GUI)– Control of DSP card– Control of code profiling.

• DSP: Code Composer Studio– Algorithm implementation in C/Assembler– Debugging in conjunction with Matlab implementation– Code profiling.

Page 10: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

How to learn …

How to Quickly Learn DSP Programming :

http://www.s3.kth.se/signal/edu/projekt/DSPsupport/getting_started.shtml

Our web-pages:

http://www.s3.kth.se/signal/edu/projekt/DSPsupport/

Ask me:

[email protected]

Search on the net, newsgroups, ….

Page 11: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

PC programming (GUI)

Two methods:

• Using a GUIDE (a GUI for creating a GUI )• Programmatically.

Page 12: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

• CCStudio v3.3 is the code development environment.

• Use Setup CCStudion v3.3 when you need to change between targets.– C6713 DSK-USB– C6713 Device Cycle Accurate Simulator (little endian)– C6416 Device Cycle Accurate Simulator (little endian)

• Connnect to matlab– cc=ccsdsp; – cc.visible(0), cc.run, cc.isrunning.

Starting CCS

The hardware

When doing tutorial

Page 13: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Comparing matlab and DSP result

Principle to test isolated functions e.g. a decoder:

• Generate input in matlab.• Write input to the DSP.• Call DSP version of function.• Read output from the DSP.• Call matlab version of function.• Compare results.

Let’s have a look at the compare_with_matlab_31 skeleton!

Page 14: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Test important functions by

• Copy the entire compare_with_matlab_31.pjt project.

• Replace FuncionToBeTested with your code:– In the C-code.– In the matlab code.

• Define input and output data&parameters as relevant for your function.

• Change the matlab code to generate relevant input data.

• Sometimes called ”test harness” in industry.

Page 15: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Sending data between matlab and DSP when the DSP is not running:

Input_obj=createobj(cc,’Input’); % Input is a global % in the DSP

code.

write(Input_obj,Input); % write data

Input=read(Input_obj); % read data

Matlab <-> DSP communication 1(2)

.

matlab code

Page 16: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

DSP -> PC communication 2(3)

When the DSP is running (RTDX):

On the DSP side:

RTDX_write(&ctrl_chan_dsp2pc, &data_to_matlab, sizeof(float)*NO_FLOATS_TO_MATLAB );

On the matlab side:

data_from_DSP=readmsg(cc.rtdx,'ctrl_chan_dsp2pc', 'single')

Recommendation: Re-use code in the ”_empty” skeletons.

Page 17: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Matlab <-> DSP communication 3(3)

The PC<->DSP interface is slow

Allowed cheating (if necessary):

Pre-read data into memory before real-time processing.

Read result from memory, after real-time processing.

Large memory areas available in external memory:#pragma DATA_SECTION(Data,".external_mem") // On DSP

short Data[1000]; // On DSP

write(cc,h_Data.address(1), int16(Data)); %% In matlab

The data is not cleared when the program is reloaded.

Page 18: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Enhanced Direct Memory Access (EDMA)

TX buffer

RX buffer

DXR

McBSP

DRRADC

DAC

EDMAchannel

EDMAchannel

Memory

Triggers interruptHWI_INT8 when ready.

Leaves DSP free from moving data back and forth to ADC/DAC!

Page 19: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

EDMA PaRAM

Page 20: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Ping-Pong Buffering

hEdmaReloadXmtPing hEdmaReloadXmtPong

SRC=&gBufferXmtPing SRC=&gBufferXmtPong

LINK= hEdmaReloadXmtPong

LINK= hEdmaReloadXmtPing

DST=DXR DST=DXR

Let me show you EDMA_RTDX_GPIO_empty and QUAD_DAC_ADC_empty!

Page 21: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Skeleton programs handling EDMA+RTDX

”Single-antenna”EDMA_RTDX_GPIO_31_empty

EDMA_RTDX_GPIO_31.

”Dual-antenna”

QUAD_ADC_DAC_31_empty

QUAD_ADC_DAC_31.

Code development

Matlab prototype

Code development

Matlab prototype

Page 22: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

QUAD_DAC_ADC_31

Let’s go through QUAD_DAC_ADC_31_empty

Then go through QUAD_DAC_ADC_31

This is the DSP<->matlab interface to be used in the matlab prototype!!

Note: Documentation in “main.c”!

Page 23: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

State Machine using Case Statement in appl_Process

int State=START; // The state values may be integers defined using #define in header file.

void appl_Process(short *receive_buffer,short *transmit_buffer) {{

switch(State) {case START:

res1=func1(arg1,arg2); res2=func2(signal1);

State=WAITIING;break;

case WAITING:sync=func3(arg1,res2);if (sync) {State=RUNNING; transmit(signal2)}

else {State=WAITING; transmit(signal1)};break;

case RUNNING:.....

} }

Page 24: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Data formats

• C-types: char=8bits, short=16bits, int=32bits, float 32bits.

• Integers are signed or unsigned.• Float. Sign=1bit, exponent=8bits, fraction 23 bits.• In C, conversion is automatic (when pointers are not

involved…). • However, note the range …..

Page 25: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

The buffers in QUAD_DAC_ADC …

appl_Process(short *receive_buffer,short *transmit_buffer)

• The buffers consists of BUFFSIZE shorts (range [-2^15,2^15-1]).• BUFFSIZE is defined in EDMA_RTDX_GPIO.h to be 1024.• The number of bytes is 2*BUFFSIZE=2048.• In EDMA_RTDX_GPIO there are 4 channels (i.e. ADC and DAC

converters) which are interleaved.• Thus the number of 4-dimensional vector samples is

BUFFSIZE/2=256.• BUFFSIZE can be changed.

Page 26: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Overlap and add

Say we want to do implement a FIR filter.• The input buffer is 128 samples.• The filter is 10 samples.• The filtered signal is 128+10-1=137 samples.• But the output filter is 128 samples ….• Solution: overlap and add.• Variant 1: Save the last 9 samples. Add them to

the next buffer.• Variant 2: Overlap-and-add. See next slide.

Page 27: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Overlap and Add: With additional buffer

128 samples 128 samples 9

128 samples 128 samples 9

Zero these samples

Add the new signal

Move 128+9 samples

Good if transmit signal is 128 samples and unsynchronized!

Page 28: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Stack and Heap

float myfunction(short *buffer)

{

float internal_buffer[1000];

This data is stored in the stack. At least

4000 bytes needed.

The stack size is set in ”build options”. No warning is given by the compiler of the stack size is to small!!!

float *internal_buffer;

internal_buffer = (float *) malloc(1000*sizeof(float));

Allocated in heap

The heap size is also set in ”build options”. Also no warning!!!

Page 29: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Code Optimization

Let me show you optimization_example .

Page 30: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Simple Optimization Rules 1(2)

• Turn optimization on. Flags ”-o3”, program mode compilation ”–pm” and ”-op3” if possible.

• Turn debug off i.e do not use ”-g”.• Avoid function calls inside loops!• Use of division ”/” is a function call!, use _rcpsp

instead. Other intrinsics see table 8-6 in spru187n.• Avoid math-functions such as ”sin(x)” use look-up

tables instead.• Check that all important loops are pipelined by

searching for "SOFTWARE PIPELINE INFORMATION“ in generated “.asm” files.

Page 31: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Simple Optimization Rules 2(2)

• Allocate all time-critical code and data in internal memory (in our skeletons this is default allocating to external memory requires #pragma statement).

• Use the touch function in an initialization routine to have the most important data structure cached in internal memory. (This function can be copied from the cache_miss_example skeleton)

float ImportantData[100];…. touch(ImportantData,100);

Page 32: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

TMS320C6713 cache

CPU core

L1P. (Program cache) 4kB

L1D. (Data cache)

4kB

Memory

256kb Internal

16Mb External

Page 33: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

One-way cache (L1P)

Line 0

Line 1

Line 127

Mem 0x-0x1F

Mem 0x20-0x3F

Mem 0x0FE0-0x0FFF

Mem 0x1000-0x101F

Mem 0x1020-0x103F

Mem 0x1FE0-0x1FFF

Cache

SDRAM

Page 34: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Two-way cache (L1D)

Line 0A

Line 1A

Line 63A

Line 0B

Line 1B

Line 63B

Mem 0x-0x1F

Mem 0x20-0x3F

Mem 0x7E0-0x7FF

Mem 0x800-0x81F

Mem 0x820-0x83F

Mem 0x0FE0-0x0FFF

Page 35: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

L1D cache

Tag Set index Offset

045101131L1D address allocation:

•A new line of 32bytes is loaded on a read-miss with a penalty 4 clock-cycles.

•If two words are loaded per clock-cycle (reading sequentially from a memory segment) the overhead is 8/32*4=1clock-cykle per instruction cycle.

•A write-miss doesn’t lead to a loading of a new-line. A write buffer of four words handle up to four misses without penalty.

Page 36: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

main.c: Illustrates impact of L1D write and read misses (compulsory misses).

main2.c: Illustrates the problem with several data objects in the same set (thrashing)

Two data objects are in the same set if:Aa = K*2048+ Ab,

for some address Aa and Ab in Object A or B respectively, and for some K.

Two code objects are in the same set if:Aa = K*4096+ Ab,

for some address Aa and Ab in Object A or B respectively, and for some K.

cache_miss_example

Page 37: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

What to consider when programming to make good use of the cache

• Align all data buffers on 32byte boundaries. (#pragma DATA_ALIGN).

• Avoid to allocate more than two objects that map to the same set in the same algorithm.

• Avoid having two or more computationally complex algorithms that map to the same set.

• Profile the algorithms with and without cached data and program (see cache_miss_example).

• Force caching of important data and code before starting the realtime program starts (e.g in appl_Init()) by reading the data (touch) and calling the functions.

• Test processing data in smaller buffers to see if performance improves.

Page 38: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Some advices 1(2)

• Start with a skeleton.• Only insert functions which have been checked

against matlab.• Make one change at a time => much easier to find

out what went wrong.• Save ”before” and ”after” code.• Don’t use printf.

Page 39: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP.

Some advices 2(2)

• Check that all pointers are initialized.• If a variable are corrupted, check .map file to se

how it could be over-written.• Use extern declaration both in the file where

variable is declared and where it is used.• In real-time debugging. Store results to ”debug-

globals”.• When using sqrt, log, log10 use ”#include

<math.h>”.