Post on 18-Oct-2020
Concerto™ College
Topic 4: Control Subsystem
1
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
Concerto architecture
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
6
F28M35x
C28x 32-bit CPUUp to 150 MHz
FPU
Control Modules
3 x 32-bit eQEP6 x 32-bit eCAP
9x ePWM Modules:18x Outputs / 16x HR
ARM Cortex-M332-bit CPU
Up to 100 MHz
Cont
rol S
ubsy
stem
Shar
ed
Communications
4x SSI
2x I2C5x UART
2x CAN
USB OTG FS PHY
10/100 Ethernet MAC1588 w/ MII
12b, 10ch, 2SH, 3MSPS3ch Analog Comparator
AnalogPwr & Clocking• 10 MHz / 30 KHz INT OSC• 4-20 MHz EXT• Clock Fail Detect• 3.3V VREG• POR/BOR
256-512 KB ECC Flash
16 KB ECC RAM
64 KB ROM16 KB Parity RAM
Memory
2 KB MessageParity RAM
2 KB Message
Debug
RTJTAG
Host
Sub
syst
em
VCU• Viterbi• CRC• Complex MPY• FFT
System6Ch DMA
Comms•McBSP/I2S/SPI•SPI•I2C •UART
256-512 KB ECC Flash
20 KB ECC RAM
64 KB ROM
128-bit Security16 KB Parity RAM
Memory
System & Clocking32Ch DMA
12b, 10ch, 2SH, 3MSPS3ch Analog Comparator
Fault Trip Zones
Temp Sense
External Interface
2x128-bit Security
Up to 64 KB Masterable
4 Timers2 Watchdogs
Out of scope
12b, 10ch, 2SH, 3MSPS3ch Analog Comparator
Analog
12b, 10ch, 2SH, 3MSPS3ch Analog Comparator
Temp Sense
C28 subsystem overview
32x32 bitMultiplier
SectoredFlash
Program Bus
Data Bus
RAMBootROM
32-bitAuxiliaryRegisters
332-bit
Timers CPU
Register Bus
R-M-WAtomic
ALU
PIE Interrupt Manager
eQEP1-3
Watchdog
I2CA
SCIA
SPIA
ePWM1-9
eCAP1-6
FPU
McBSPA
DMA6 Ch.
DMA Bus
VCU
ADC
COMP
Common Interface Bus
M3
Bus
µDM
A B
us
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
VCU Module
VR0
VR1
VR2
VR3
VR4
VR5
VR6
VR7
VT0
VT1
CPI I/F
C28xCore
VCUVCU
ExecutionRegisters
VSTATUS
VCRC
C28x Mem Bus
VR8
CRC Unit (CU)
• Supports efficient SW implementation of Viterbi decoder by performing the ADD-Compare-Select and trace back operation in hardware
– 1 cycle branch metrics initialization for CR=1/2 and 2 cycle branch metrics initialization for CR=1/3
– 2-cycle Viterbi butterfly operation– 3-cycle Viterbi traceback operation per Viterbi stage
• Supports generation of CRC8, CRC16 and CRC32 on data stored in memory
– Byte-wise calculation to support PRIME
Viterbi Unit (VU)
• Supports complex number arithmetic and FFT calculation
– 2 cycle complex-number multiplication with 16-bit x16-bit = 32-bit real and imaginary parts
– 1 cycle complex-number addition– 2-cycle Complex multiply-and-accumulate (MAC)– A repeat Complex-MAC operation– Instruction to support 5-cycle 16-bit FFT butterfly
Arithmetic Unit (AU)
AU(compleX-
numberarithmetic Unit,
supports Complexnumber
multiplication,MAC and ADD)
VU(Viterbi Unit,
supports ViterbiADD-Compare-
SelectOperation)
CU(CRC Unit,
supports CRC8, CRC16
and CRC32)
Resources:VCU Overview Video, VCU Whitepaper
VCU benefits
RealReal
ImagImag
Power line modem example
OutOut
0
5
10
15
20
25
ViterbiButterfly
Viterbi TraceBack*
16-bit ComplexMPY/ADD/SUB
16-bit ComplezFFT Butterfly
16-bit CRC forblock length of
10 bytes
250
Cyc
les
* Cycles per stage
~ 7X faster
~ 7X faster ~ 10X
faster
~ 4X faster
~ 25X faster
• Optimized implementation of Viterbi decoder- Performs ADD-Compare-Select & trace back operation
in H/W
• Complex number arithmetic and FFT calculation– 2 cycle complex-number multiplication; 1 cycle
complex-number addition– 2-cycle Complex-MAC
• Generate CRC8, CRC16 and CRC32 on data stored in memory
– Byte-wise calculation to support PRIME
Viterbi Unit (VU)
Complex Math Unit (CMU)
CRC Unit (CU)
C28x implementation
VCU implementation
C28 + FPU + VCU pipeline
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
C28 interrupt
• 2 non-maskable interrupts (RS, “selectable” NMI)• Interrupt priority is fixed
• 14 maskable interrupts (INT1 – INT14)– 12 multiplexed interrupts through PIE
PIE C28 CPU
PIECTRL RegPIEACK RegPIEIER1 RegPIEIFR1 Reg
PIEIER12 RegPIEIFR12 Reg
PIERAM Vector
Table(128 vectors)
IER RegIFR Reg
ST1 (INTM flag) Reg
DBGIER Reg
INT1n (highest)
INT12n (lowest)
PeripheralSCI, ePWM, eCAP
eQEP,ADC, McBSPDMA, FPU
INT1.1n to INT1.8n
INT12.1n to INT12.8n
CPU Timer 0
CPU Timer 1
CPU Timer 2
Memory wrapperFlash - RAM
WAKEINTLowPower
XINT1
XINT3
XINT2
Ctl
Ctl
Ctl
GPIO0int…
GPIO63int
TINT0
TINT1
TINT2
INT13
INT14
NMINMINT
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
Control side Memory map• 64 kB ROM• 512 kB Flash• Up to 100 kB RAM• 4 kB RAM for IPC
0x0000 0000 RAM
0x0000 8000 RAM
0x0000 A000 RAM
0x0000 C000 shared RAM
0x0003 F800 IPC RAM
0x0010 0000 Flash
0x003F 8000 ROM
0x0000 0AE0 Periph Reg
Control side ROM• Size 64 kBytes (start from 0x003F 8000)
• Access single cycle
• Contains» C28 bootloader code» IPC handline code» math tables for IQ, FPU and PLC
• Access from C28 only
Control side Flash• Size 512 kBytes
14 sectors• Access 25 nsec (expected)
• Access from C28 only
137 MHz91 %3150 MHz
94 MHz94 %2100 MHz
77 MHz96 %180 MHz
40 MHz100 %040 MHz
Effective speed
EfficiencyWait stateC28 frequency
FLASH Bank (with ECC)
Pre-Fetch Buffer (128-bit)
Fetch Buffer (128-bit)
64
64
32
C28 core
DataCache
Flash performance
interface
Control side RAM• Size 100 kB• Access single cycle• Each block is SARAM
L0
L2
S0
S2
S4
S6
M2C
S5
S7
C2M
S1
S3
L3
L1
ECC
No
No
2 kB
M0-M1
parityparityparityparityECCSafety
ReadYesSharedNoNoM3
Yes
2 kB
M2CRead
YesYesYesNoDMA
2 kB8 kB8 kB8 kBSize
C2MS1-S7shared
L2-L3L0-L1
C28
M3
RR
RR
RR
RR
M0 M1
Dedicated to C28
Shared with M3 ECC safety
Accessible from DMA RR
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA• ePWM• Other peripherals• Benchmarks
20
Direct Memory Access (DMA)
Peripheral Interrupt Trigger Sources• ADC sequencers• external interrupt (XINT1-3)• McBSP• CPU Timers• ePWM Start-of-conversion signals
Data Sources / Destinations• Some RAM zones• ADC result/COMP registers• McBSP transmit/receive buffers• ePWM 1-9 registers
DMA Controller• 6 channels with independent PIE interrupts• Event-based machine• Channel 1 has the ability to be a high-priority channel• 4 cycles/word throughput (5 for McBSP)• Arbitration handing when CPU and DMA both try to access the same interface concurrently
Concerto control subsytem contains a 6-channel DMA
Trigger sourceCPU Timers, McBSPXINTn, ePWM, ADC
SARAMZones
McBSPModule
ePWMModules
ADC/COMPModule
DMA Bus
PIE
interrupt
CIB
DMA6-ch
Controller
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
EPWM Module Enhancements
HR-ACMPA
ActionQualifier
CMPA
CMPB
TBPRD
TBPHS TBPHSHR
FED
RED
CMPAB
16-bitCOUNTER
TBCNT
SOC&
InterruptGen
EPWMINTn (to CPU)
SOCA/C (to ADC)
SOCB/D (to ADC)
PWMA
Use Dead-Band To Phase Shift B Output Relative To A Output
(~150psec Resolution)
TripAction
PWMB
EPWMTZINTn (to CPU)
MultipleAsyncTripInputs
Analog Comp
CMPAHR
High Res Duty On B Also(~150psec Resolution)
HR-B
CMPBHR
High Res Dead-Band(~150psec Resolution)
FEDHR
REDHR
CHOPPER
Action Qualifier A
Action Qualifier B
EventFiltering,
Qualification& Blanking
Interrupt
SOC
integer . frac of cycle
31 16.15 8 7 0
integer cycles frac of cycle 0...0
24 bits of PWM Step Range
Increased Number Of Trigger InputsAnd Logical Combinations
TBPRDHR
CMPC
CMPD
CMPC
CMPD
CMPC & CMPD For Finer Positioning Of ADC Start Of Conversion Or Interrupts
•Simultaneous Update Of PRD/CMP RegistersAcross Multiple Modules
Dead-Band(with HR)
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
Other peripherals
• CPU Timer 2 independent clocking capabilities
• eCAP no change
• eQEP no change
• Other COM port no change
Agenda
• Concerto architecture• C28 subsystem overview• Processing units• Interrupt mechanism• Memory• DMA • ePWM• Other peripherals• Benchmarks
FLASH Bank
(Flash: 36ns access time, 27.7MHz)
Pre-Fetch Buffer (64-bit)
Fetch Buffer (64-bit)
64
64
32
C28-CPU/FPU
FLASH Bank (with ECC)
(Flash: 25.0ns access time, 40MHz)
Pre-Fetch Buffer (128-bit)
Fetch Buffer (128-bit)
64
64
32
C28-CPU/FPU
180nm, F05 FLASH Technology(all C2000 devices to date)
65nm, F021 FLASH Technology(future C2000 devices, starting with F28M35x)
Math Benchmark: PIDDevice
MHz180nm, F05 FLASH, 64-bit Fetch 65nm, F021 FLASH, 128-bit Fetch
36ns, 27.7MHz 25.0ns, 40MHz50 48.0 (1-wait) 96% 48.0 (1-wait) 96%
80 73.6 (2-wait) 92% 76% for FPU (see next slide)
76.8 (1-wait) 96%
100 81.0 (3-wait) 81% 94.0 (2-wait) 94%
150 93.0 (5-wait) 62% 136.5 (3-wait) 91%
FLASH Technology & Performance
Data cache
Arithm Benchmark(with div, no atan, sin, cos)
FLOAT math IQ math FLOAT math IQ math
Code Running From FLASH (2-wait) Code Running From RAM (0-wait)
Single iteration: 278 (76% FLASH efficiency)
(29% faster then IQ math)
394(94% FLASH efficiency)
210 (44% faster than IQ math)
372
Loop mode:(10 iterations)
2859(76% FLASH efficiency)
(30% faster then IQ math)
4096 (92% FLASH efficiency)
2161(43% faster than IQ math)
3781
TMS320F28335 benchmarks (180nm, F05 FLASH Technology)
Concerto benchmarks (65 nm, F021 technology)
• Using C28-FPU improves benchmark by 44% over IQ math
247 cyc
231 cyc
Float mathCode running from flash(3-wait)
250 MHz CPU freq
150 MHz CPU freq
Arithm Benchmark(with div, no atan, sin, cos)
Use parallel instruction(supported by compiler)
175 cyc
198 cyc
Benchmark analysis
Thank you