Introduction to the C2000 Control Law Accelerator Part 1

Ver 6, 08 April 2009

Slide 1

Introduction to the C2000 Control Law Accelerator

Part 1Lori HeustessC2000 ApplicationsApril 8, 2009


Slide 2

Introduction: What is it? Why is it?Architecture:Floating-Point Format, Tasks, CLA Execution Flow, Time Slicing, Register Set, Program and Data Bus,

Memory and Register AccessInstructions:

Format, Addressing Modes, Types of InstructionsParallel Instructions, Status Flags

Pipeline: Pipeline Stages, Affects on InstructionsCLA Compared to C28x+FPU CLA in a Control System:

Code Partitioning, “Just in Time” ADC SamplingCode Development and Debug:

Anatomy of CLA Code, Initialization, Code Debug

Session Agenda


Slide 3



Format, Addressing Modes, Types of InstructionsParallel Instructions, Status Flags




Session Agenda


Slide 4

C28xCPU

3.3V

12-bitADC

CMP

HighRes

PWMCLA

An independent 32-bit floating-point math accelerator


What is the Control Law Accelerator (CLA)?

Operates independently of the C28x CPU Clocked at the CPU frequency (SYSCLKOUT) independent register set, memory bus structure and processing unit Direct access to ePWM+HRPWM, ADC result and comparator registers Low interrupt response time – no nesting of tasks Can read an ADC result “just-in-time” Execution of algorithms in parallel with the C28x CPU Executes time-critical control loops concurrently with the main CPU


Slide 5

C28xCPU

3.3V

12-bitADC

CMP

HighRes

PWMCLA



What is the Control Law Accelerator (CLA)?

Fully programmable: IEEE 32-bit floating-pointEasier to code than fixed-point Inherently more robustRemoves scaling and saturation burden.Sign inversion problems go awaySupports instructions to convert fixed-point to float when reading in the value (ex:

MIU16TOF32)


Slide 6

C28xCPU

3.3V

12-bitADC

CMP

HighRes

PWMCLA



Benefits of the CLA

Reduced sample-to-output delay (& jitter)

Faster system response & higher MHz control loops

Improved system robustness (IEC60730, SIL-3)

Free-up C28x CPU for other tasks (comms, diagnostics)

Automotive,White-goods

All Applications

Key Drivers

DigitalPower

Applications

Improved support for multi-channel (phase/freq) loops


Slide 7

32-bit CLA 60MHz

Data0RAM

2KByte

ProgRAM

8KByte

Data1RAM

2KByteSecure

MsgRAM

256Byte

InterruptSleep

32-bit C28-CPU60MHz

DAC

3 xComp3 x

Comp

DAC

3 x Comp

PICCOLO Device With CLA (F2803x series)

GPIO

Mux

SCIFLASH64/128KByte

4/8 sectors

Per

Bus

3xDAC10-bit

M0,M1RAM

4KByte

EPWM1 HRPWM

Per

Bus

SPI

I2C

CAN

EPWM2 HRPWM

ECAP

EQEP

OTP 2KByte

Secure

BootROM

32b32b 32b

OSC110MHz

LIN (SCI)

OSC210MHz

PLL

WD

LPM

mux

EXTXTAL

GPIOMUXXCLKIN

EPWM3 HRPWM

EPWM4 HRPWM

GPIO0

GPIOx

Ax

X1

X2

+

-

Interrupt

2 * SPI

EPWM5 HRPWM

EPWM6

EPWM7

POR/BOR XRSn

VSS

VREGENZ

VDD (core voltage)VDDIO

Digital Power VREG

3.3V +/-10%

VDDA

VSSAAnalog Power

3.3V +/-10%

ADC12-bit2 S/H

4.6MSPS

AIO

Mux

Per Bus

Bx

L0RAM

4KByte

3 External InterruptsJTAG

HRPWM

HRPWM


Slide 8



Format, Addressing Modes, Types of InstructionsParallel Instructions, Floating-Point Flags




Session Agenda


Slide 9

IEEE Single-Precision Floating-Point Format

SS EE M M

0 Positive or Negative Zero001

0 Positive or Negative Values* 0–0x7FFFF1–2541

Not a Number (NaN)Non-Zero255 (max)10

0 Positive or Negative Infinity0255 (max)1

Denormalized NumberNon-Zero010

*Normal Positive and Negative Values are Calculated as: ( -1 ) s x 2 (E-127) x 1.M+/- ~1.7 x 10 -38 to +/- ~3.4 x 10 +38

The normalized IEEE numbers have a hidden 1. Thus the equivalent signed integer resolution is the number of mantissa bits + sign + 1

The normalized IEEE numbers have a hidden 1. Thus the equivalent signed integer resolution is the number of mantissa bits + sign + 1

ValueMES

23-bit Mantissa (Implicit Leading Bit + Fraction Bits)

8-bit Exponent (Biased)

1 Sign Bit (0 = Positive, 1 = Negative)


Slide 10

IEEE Single-Precision Floating-Point Format

Most Widely Used Standard for Floating PointStandard number formats, special values (NaN, infinity)

Rounding modes & floating-point operations

Used on many CPUs including C67x, C28x+FPU

Note: The C3x used a different format. Note: The C3x used a different format.

These formats are commonly handed this way on embedded processors.These formats are commonly handed this way on embedded processors.

Simplifications for the CLA (Same as C28x+FPU):Flags & compare operations: Negative zero is treated as positive zero

Denormalized values are treated as zero

Not-a-number (NaN) is treated as infinity

IEEE

754IEEE

754

Round-to-zero mode supported (truncate)

Round-to-nearest mode supported (even)


Slide 11

Task Triggers From PeripheralInterrupts or Software

Task Triggers From PeripheralInterrupts or Software

INT11INT12

CLA1_INT1 to CLA1_INT8

LVF, LUF

Registers

MessageRAMs

CLA to CPU

CLA Data Write Data Bus

CLA Data Write Addr Bus

CLA Data Read Addr Bus

CLAProgramMemory

Main CPU Data Read Bus

CLA ExecutionRegisters

CLA ConfigurationRegisters

MR0 (32)MR1 (32)MR2 (32)MR3 (32)

MAR0MAR1

MPC

MSTF (32)

to

MVECT1

MVECT8

MCTL

MPISRCSEL1

MIFRMICLRMIFRC

MIOVFMICLROVF

MIERMIRUN

C28x CPU

PIE

Main C28x CPU Bus

Data RAMs

CPU to CLA

ADCResult

ePWMHRPWM

COMP

Main CPU Read/Write Data Bus

Map to CPU or CLA Space

Map to CPU or CLA Space

CLA Prog Addr Bus

CLA Prog Data BusCLA Data Read Data Bus

MEALLOW

MEMCFG

CL

A P

rog

ram

Bu

s

RAM0

RAM1

CL

A D

ata

Bu

s

ADCINT1 to ADCINT8

EPWM1_INT to EPWM7_INT

T0INT(CPU Timer 0)

MPERINT1to MPERINT8

IACK #16bit


Slide 12

What is a CLA Task?

CLA Task: CLA assembly code routine

CLA Supports 8 interrupts (Task1 to Task8)The start address of the task is configurable (MVECTx) The end address is marked with an MSTOP instructionExecuted by the CLA in response to an interrupt event

CLA Task: CLA assembly code routine

CLA Supports 8 interrupts (Task1 to Task8)The start address of the task is configurable (MVECTx) The end address is marked with an MSTOP instructionExecuted by the CLA in response to an interrupt event

Tasks can also be started via the main CPU’s IACK instructionFor example: IACK #0x0003 will flag Task1 and Task2

Tasks can also be started via the main CPU’s IACK instructionFor example: IACK #0x0003 will flag Task1 and Task2

The task executed depends on the interrupt received:Interrupt 1 = Task1: ADCINT1 or EPWM1_INT (Highest Priority)Interrupt 2 = Task2: ADCINT2 or EPWM2_INT…Interrupt 7 = Task7: ADCINT7 or EPWM7_INTInterrupt 8 = Task8: ADCINT8 or CPU Timer 0 (Lowest Priority)

Once a task begins it runs to completion (no task/interrupt nesting)

The task executed depends on the interrupt received:Interrupt 1 = Task1: ADCINT1 or EPWM1_INT (Highest Priority)Interrupt 2 = Task2: ADCINT2 or EPWM2_INT…Interrupt 7 = Task7: ADCINT7 or EPWM7_INTInterrupt 8 = Task8: ADCINT8 or CPU Timer 0 (Lowest Priority)

Once a task begins it runs to completion (no task/interrupt nesting)


Slide 13

CLA Time Slicing

CLA Task 1 CLA Task 2 CLA Task N

CPU Task 1

CPU Task 2

CPU Task 1

CPU Task 4

CPU Task 3

CLA Task 1 CLA Task 2

CPU Task 3

The CLA performs multiple tasks using the "Time Slicing" method

The main CPU handles other system tasks, the two work in parallel

Communication between CPU & CLA is via shared RAM

The CLA performs multiple tasks using the "Time Slicing" method

The main CPU handles other system tasks, the two work in parallel

Communication between CPU & CLA is via shared RAM


Slide 14

CLA ExecutionRegisters

CLA ConfigurationRegisters

MR0 (32)MR1 (32)MR2 (32)MR3 (32)

MAR0MAR1

MPC

MSTF (32)

to

MVECT1

MVECT8

MCTL

MPISRCSEL1

MIFRMICLRMIFRC

MIOVFMICLROVF

MIERMIRUN

MEMCFG

CLA Register Set

Four 32-bit Result Registers MR0 – MR3

MSTF: Status RegisterZero, negative, overflow, underflowRounding modeRPC: Return PC MEALLOW

Two 16-bit Auxiliary RegistersMAR0, MAR1Used for indirect addressing

MPC: 12-bit Program CounterOffset from the start of CLA program memoryIndicates instruction in the D2 phase

Eight Interrupt (Task) VectorsMVECT1 to MVECT8Offset from the start of CLA Program Memory to the beginning of the task

Interrupt/Task Source SelectionMPISRCSEL1: Task1: ADCINT1 or EPWM1_INT Task2: ADCINT2 or EPWM2_INT …. Task7: ADCINT7 or EPWM7_INT Task8: ADCINT8 or CPU Timer 0

MIER: Interrupt enable/disableMIRUN: Which task is running

Interrupt/Task ControlMIFR: FlagMICLR: ClearMIFRC: ForceMIOVF: Overflow flagMICLROVF: Overflow clear

Configuration and ControlMEMCFG: Memory configMCTL: CLA control

CLA Execution Registers:CSM ProtectedMain CPU has Read Only Access

CLA Execution Registers:CSM ProtectedMain CPU has Read Only Access

CLA Configuration Registers:CSM and EALLOW ProtectedMain CPU has Read and Write Access

CLA Configuration Registers:CSM and EALLOW ProtectedMain CPU has Read and Write Access


Slide 15

CLA Execution Flow

The task runs to completion(No task nesting)

The task runs to completion(No task nesting)

x = Highest priority task both enabled and pending

PriorityTask1: Highest . . .

Task8: Lowest

x = Highest priority task both enabled and pending

PriorityTask1: Highest . . .

Task8: Lowest

Task request is via software or interrupt assigned in MPISRCSEL1:

Task1: ADCINT1 or EPWM1_INTTask2: ADCINT2 or EPWM2_INT…

Task7: ADCINT7 or EPWM7_INTTask8: ADCINT8 or CPU Timer 0

Task request is via software or interrupt assigned in MPISRCSEL1:

Task1: ADCINT1 or EPWM1_INTTask2: ADCINT2 or EPWM2_INT…

Task7: ADCINT7 or EPWM7_INTTask8: ADCINT8 or CPU Timer 0

TaskRequest

?

Set MIOVF Bit(Overflow Flagged)

Set MIFR bit(Task Pending)

Yes

No

Yes

No

MIFRbit

Set?

The main CPU continues code execution in parallel with the CLA

The main CPU continues code execution in parallel with the CLA

Note: Software task requests will not set MIOVF

Note: Software task requests will not set MIOVF

When a task completes a task-specific interrupt is sent to the PIE

When a task completes a task-specific interrupt is sent to the PIE

Yes

Clear MIFR.x bitSet MIRUN.x bitMPC == MVECTx

Run CLA

TaskEnabled?

(MIER)

Yes

Yes

No

No

No

End ofTask?

MSTOP

TaskPending?

(MIFR)

Clear MIRUN.x bitTask x Interrupt to PIE


Slide 16

MessageRAMs

CLA to CPU

CPU to CLA

Registers

ADCResult

ePWMHRPWM

COMP

Data RAMs

RAM0

RAM1

CLAProgramMemory

CLA Program Memory:- Mapped to CPU program and data space at reset- CLA code must be even aligned (all instructions are 32-bits)- 4K x 16 (2048 CLA instructions), single cycle

Piccolo (2803x) CLA Memory and Register Access

CLA Data Memory:- Two blocks: RAM0 and RAM1. 1K x 16 each, single cycle- Mapped to CPU program and data space at reset- Each block can be independently mapped to CLA data space

Message RAMs:- Used to pass data between the CLA and CPU

CPU to CLA message RAM (Ignores CLA writes)CLA to CPU message RAM (Ignores CPU writes)

- Always mapped to both CPU and CLA memory space- 128 x 16 each, single cycle

Registers the CLA can Directly Access:- ePWM + HRPWM, Comparator and ADC Result registers- CLA MEALLOW protects EALLOW registers from CLA writes


Slide 17



Format, Addressing Modes, Types of InstructionsParallel Instructions, Floating-Point Flags




Session Agenda


Slide 18

CLA Instructions

Same instruction format as the C28x and C28x+FPU Destination operand is always on the left Same instruction format as the C28x and C28x+FPU Destination operand is always on the left

Destination Source Operands

Fixed Point: MPY ACC, T, loc16Floating Point: MPYF32 R0H, R1H, R2H CLA: MMPYF32 MR0, MR1, MR2

To enable support for CLA instructions on the use the switch: --cla_support=cla0

(C2800 codegen tools v5.2.x or later)

To enable support for CLA instructions on the use the switch: --cla_support=cla0

(C2800 codegen tools v5.2.x or later)

Same mnemonics as the C28x+FPU but with a leading “M”

Same mnemonics as the C28x+FPU but with a leading “M”


Slide 19

CLA Addressing Modes

Indirect Addressing with 16-bit Post Increment

Syntax: MAR0[#imm16]++MAR1[#imm16]++

Uses the address in MAR0 or MAR1 to access memoryAfter the read or write MAR0/MAR1 is incremented by #Imm16

MMOV32 MR1, MAR0[-2]++ ; Load MR1 with what MAR0 points to; & post increment MAR0 by -2

Indirect Addressing with 16-bit Post Increment

Syntax: MAR0[#imm16]++MAR1[#imm16]++

Uses the address in MAR0 or MAR1 to access memoryAfter the read or write MAR0/MAR1 is incremented by #Imm16

MMOV32 MR1, MAR0[-2]++ ; Load MR1 with what MAR0 points to; & post increment MAR0 by -2

Direct Addressing:

Encodes the 16-bit Address of the Variable:MMOV32 MR1, @_Var1

Direct Addressing:

Encodes the 16-bit Address of the Variable:MMOV32 MR1, @_Var1

CLA has only two addressing modes:

Both modes can access the low 64k of memory which includes: All of the CLA data space Both message RAMs Shared peripheral registers No stack pointer or data page pointer


Slide 20

Types of CLA Instructions

Type Example Cycles

Load (Conditional) MMOV32 MRa,mem32{,CONDF} 1

Store MMOV32 mem32,MRa 1

Load With Data Move MMOVD32 MRa,mem32 1

Store/Load MSTF MMOV32 MSTF,mem32 1

Compare, Min, Max MCMPF32 MRa,MRb 1

Absolute, Negative Value MABSF32 MRa,MRb 1

Unsigned Integer To Float MUI16TOF32 MRa,mem16 1

Integer To Float MI32TOF32 MRa,mem32 1

Float To Integer & Round MF32TOI16R MRa,MRb 1

Float To Integer MF32TOI32 MRa,MRb 1

Multiply, Add, Subtract MMPYF32 MRa,MRb,MRc 1

1/X (16-bit Accurate) MEINVF32 MRa,MRb 1

1/Sqrt(x) (16-bit Accurate) MEISQRTF32 MRa,MRb 1


Slide 21

Types of CLA Instructions

Type Example Cycles

Integer Load/Store MMOV16 MRa,mem16 1

Load/Store Auxiliary Register MMOV16 MAR,mem16 1

Branch/Call/Return Conditional Delayed

MBCNDD 16bitdest {,CNDF}1-7 *

Integer Bitwise AND, OR, XOR MAND32 MRa,MRb,MRc 1

Integer Add and Subtract MSUB32 MRa,MRb,MRc 1

Integer Shifts MLSR32 MRa,#SHIFT 1

Write Protection Enable/Disable MEALLOW 1

Halt Code or End Task MSTOP 1

No Operation MNOP 1

* Number of cycles varies based on how many of the delay slots can be used up* Number of cycles varies based on how many of the delay slots can be used up


Slide 22

Parallel Instructions

Instruction Example Cycles

Multiply & Parallel Add/Subtract

MMPYF32 MRa,MRb,MRc|| MSUBF32 MRd,MRe,MRf

1/1

Multiply, Add, Subtract

& Parallel Store

MADDF32 MRa,MRb,MRc|| MMOV32 mem32,MRe 1/1

Multiply, Add, Subtract, MAC

& Parallel Load

MADDF32 MRa,MRb,MRc|| MMOV32 MRe, mem32 1/1

Both Operations Complete in a Single Cycle!Both Operations Complete in a Single Cycle!

MADDF32 MR3, MR3, MR1|| MMOV32 @_Var, MR3

Single instructionSingle opcodePerforms 2 operations

Example: Add + parallel store

Single instructionSingle opcodePerforms 2 operations

Example: Add + parallel store

Parallel bars indicate a parallel instruction

Parallel bars indicate a parallel instruction


Slide 23

Look for the next presentation

Introduction to the C2000 Control Law Accelerator

Part 2

Thank you!

Introduction to the C2000 Control Law Accelerator Part 1

Documents

Transcript of Introduction to the C2000 Control Law Accelerator Part 1