Advanced Microcontrollers Grzegorz Budzy ń...

AdvancedAdvanced MicrocontrollersMicrocontrollers

Grzegorz BudzyGrzegorz Budzyńń

LLectureecture 11:11:DigitalDigital SignalSignal ControllersControllers & & DigitalDigital SignalSignalProcessorsProcessors

Plan• Digital Signal Controllers

– Introduction

• Digital Signal Controllers vs Microcontrollers

• Digital Signal Controllers vs Digital Signal Processors

– DSC by Texas Instruments

– DSC by Freescale

• Digital Signal Processors

Digital Signal Controllers

Introduction

Digital Signal Controller (DSC) is a combination

of a Microcontroller and a Digital Signal

Processor (DSP)

Introduction

- Like microcontrollers DSC have:

- Fast interrupt response

- Control oriented peripherals (PWM, watchdog,

etc.)

- Usually programmed in C++ language (although

assembler programming possible)

Introduction

- Like digital signal processors DSC have:

- Single cycle multiply-and-accumulate MAC

instructions

- Barrel shifters

- Large Accumulators

Introduction

- Main applications of DSCs:

- Motor control

- Power conversion

- Sensor processing applications

Introduction

- Main DSC vedors:

- Texas Instruments

- NXP

- Microchip

- Infineon

- Renesas

DSC by Microchip

dsPIC – application areas

Sourc

e: [1

]

dsPIC - Architecture

• dsPIC – family of 16 bit RISC controllers with

DSP features

• Two subfamilies:

– dsPIC30F – smaller, slower

– dsPIC33F – highest performance

• The only digital signal controller on the

market available in QFN-28 cases at prices

down to 3$!!!

dsPIC - Architecture

• Main features:

– Modified Harvard architecture,

– Optimized for C-compilers

– Two 40-bit accumulators with rounding

– Memory options as in PIC24

– Many single-cycle MAC operations

– Cases 18 to 110 pins

DSC by Texas Instruments

DSC - TI portfolio

• Four main subfamilies:

– 24x 16-bit Series

– 28x Fixed point Series

– 28x Piccolo Series

– 28x Delfino Floating-point Series

DSC - TI portfolio

Sourc

e: [2

]

Piccolo Series• TMS320F2802x:

– fixed point microcontrollers

– 40-60MHz performance

– up to 64KB of on-chip flash

– small 38-pin package options

– feature rich peripherals:

• 150-ps high resolution enhanced pulse width modulators

(ePWMs)

• 4.6 MSPS 12-bit ADC

• high precision on-chip oscillators, analog comparators

• high speed 12-bit ADC

• support for I2C, SPI, and SCI.

Piccolo Series

- TMS320F2803x:

- fixed point 32-bit microcontrollers

- 60 MHz speed

- up to 128KB flash memory

- 64 or 80-pin packages

- peripherals and features of the 2802x devices plus:

- control law accelerator (CLA) for high efficiency control loops

- QEP module

- CAN and LIN interfaces

Delfino Series• TMS320F2833x:

– Integrated floating point unit simplifies

development and speeds control applications up

by an average of 50%

– F2833x devices run at up to 150 MHz (300

MFLOPS) with two package offerings that are pin-

for-pin compatible within all F2833x and F2823x

controllers

– Features up to 512KB of on-chip flash and a DMA

for high speed memory access.

Delfino Series• TMS320C2834x:

– delivers up to 600 MFLOPS of floating-point

performance

– up to 516KB of single-access RAM

– PWMs with 65-ps

– Direct Memory Access and a low-latency core

make the C2834x an excellent solution for

performance-hungry real-time control

applications.

28x Fixed-Point Series

• TMS320F2823x:

– F2823x generation of controllers is a fixed point

version of the F2833x devices

– Pin-to-pin compatible with the F2833x series, all

of the peripherals and features remain the same

except for the floating point unit.


• TMS320F280x:

– device offers 60-100Mhz performance

– were the first generation to feature:

• the on-chip 12.5 MSPS 12-bit ADC

• multiple high resolution PWM peripherals

• QEP (quadrature encoder pulse)

– F280xx devices have up to 256KB of flash

memory.


• TMS320F281x:

– F281x device generation features:

• 150Mhz core

• flexible Event Managers that provide access to timers,

compare/PWM units, captures, and quadrature-

encoder units

C2000

Architecture

Sourc

e: [3

]

C2000 core features• Main features:

– Efficient C engine with hardware that allows a C

compiler to generate compact code, resulting in

industry-leading code density

– Single cycle read-modify-write instructions, single

cycle 32-bit multiply.

– Fast interrupt service time (down to 9 cycles) with

automatic zero-cycle context save.

– 96 dedicated interrupt vectors that require no

software decision making

C2000 core features

• Main features:

– 32-bit floating-point unit on Delfino controllers

– On select Piccolo devices, an independent Control Law

Accelerator (CLA) processes floating-point control loops to

free the CPU for other purposes.

– Three 32-bit general purpose CPU timers brings accuracy

and flexibility to any applications.

– Code Security Module prevents reverse engineering and

protects valuable intellectual property

DSC by Freescale

DSC by Freescale

Freescale microcontrollers

portfolio

Sourc

e: [4

]

DSC by Freescale

Freescale DSC portfolio

Sourc

e: [4

]

DSC by Freescale

56F8000 block diagram

Sourc

e: [5

]

DSC by Freescale

56F8000 details

DSC by Freescale

56F8000 application

Sourc

e: [5

]

56F8XXX comparison

Sourc

e: [5

]

C2000 core features

Sourc

e: [5

]

MC56F8357• Main features:

– On-chip memory includes high-speed volatile and

nonvolatile components:

• 512 KB of Program Flash

• 4 KB of Program RAM (836X Devices)

• 32 KB of Data RAM

• 32 KB of Data Flash (836X Devices)

• 32 KB of Boot Flash

– Access up to 4MB of off-chip program and 32MB

of data memory

– Up to 60 MIPS at 60 MHz execution frequency

MC56F8357• Main features:

– Four 12-bit, Analog-to-Digital Converters

– Temperature Sensor

– Up to two FlexCAN (CAN Version 2.0 B-compliant)

– Two Serial Communication Interfaces (SCIs)

– Up to two Serial Peripheral Interfaces (SPIs)

– Two dedicated external interrupt pins

– Software-programmable Phase-Lock Loop

MC56F8357

MC56F8357 - Core• Main features 1/2:

– Efficient 16-bit 56800E family controller engine

with dual Harvard architecture

– Single-cycle 16 × 16-bit parallel Multiplier-

Accumulator (MAC)

– Four 36-bit accumulators, including extension bits

– Arithmetic and logic multi-bit shifter

– Parallel instruction set with unique DSP

addressing modes

– Hardware DO and REP loops

MC56F8357 - Core• Main features 2/2:

– Three internal address buses and one external

address bus

– Four internal data buses and one external data bus

– Instruction set supports both DSP and controller

functions

– Controller-style addressing modes and instructions for

compact code

– Efficient C compiler and local variable support

– Software subroutine and interrupt stack with depth

limited only by memory

MC56F8357 – Memory

MC56F8357 – Core architecture

MC56F8357 – Core pipeline

Digital Signal Processors

DSP Introduction- Digital Signal Processing: application of

mathematical operations to digitally

represented signals

- Signals represented digitally as sequences of

samples

- Digital signals obtained from physical signals

via tranducers (e.g., microphones) and analog

to-digital converters (ADC)

DSP Introduction

- Digital signals converted back to physical

signals via digital-to-analog converters (DAC)

- Digital Signal Processor (DSP): electronic

system that processes digital signals

DSP Introduction

- Most DSP tasks require:

- Repetitive numeric computations

- Attention to numeric fidelity

- High memory bandwidth, mostly via array

accesses

- Real-time processing

DSP Introduction

- DSPs must perform these tasks efficiently

while minimizing:

- Cost

- Power

- Memory use

- Development time

Common DSP applications

- Applications – Instrumentation and

measurement:

- Communications

- Audio and video processing

- Graphics, image enhancement, 3- D rendering

- Navigation, radar, GPS

- Control - robotics, machine vision, guidance

Common DSP algorithms

- Algorithms

- Frequency domain filtering - FIR and IIR

- Frequency- time transformations - FFT

- Correlation

FIR algorithm

Common DSP architecture

Requirements <-> Realisations

Fast data access

- Need of transferring data to / from memory

or DSP peripherals

- Need of retrieving instructions from memory

- Three main implementations:

- high-bandwidth memory architectures

– specialized addressing modes

– direct memory access

High-bandwidth memory architectures


- Only Harvard (b) and Super-Harvard (c) usedin DSPs

• Super-Harvard modification - adding to the DSP core a small bank of fast memory, called‘instruction cache’

• Data are also allowed to be stored in the program memory

• The last-executed program instructions arerelocated at run time in the instruction cache


• Also data-cache for fast access to data is

sometimes present


- Cache drawbacks:

– Problems caused by the lack of full predictability for

cache hits

– A missing cache hit happens when the data or the

instructions needed by the DSP are not stored in

cache memory, hence they have to be fetched from a

slower memory with an execution speed penalty

– A situation causing a missing cache hit is, for instance,

the flow change due to branch instructions.

Specialized addressing modes• Address generator blocks controls the address

generation for:

– specialized addressing modes such as indexing

addressing, circular buffers, and bit-reversal

addressing

Specialized addressing modes• Circular buffers – user for example in the

implementation of digital filters

Specialized addressing modes• Bit-reversal addressing – necessary for FFT

(butterfly)

Direct memory access

• The DMA controller is a second processor

working in parallel with the DSP core

• It is dedicated to transferring information

between two memory areas or between

peripherals and memory

• The DMA controller frees the DSP core for

other processing tasks

Direct memory access

Fast computation – MAC centered

• The MAC operation is used by many digital

processing algorithms

• The basic DSP arithmetic processing blocks are:

– a) many registers

– b) one or more multipliers

– c) one or more Arithmetic Logic Units (ALUs)

– d) one or more shifters

Fast computation – MAC centered

Instruction pipelining

• Instruction pipelining consists of:

– dividing the execution of instructions into different

stages

– executing the different instructions in parallel

stages.

• The net result is an increased throughput of the

instruction execution.

Parallel architectures

• Parallel-enhanced DSP architectures started to

appear on the market in the mid 1990s and

were based:

– on instruction-level parallelism (VLIW),

– data-level parallelism (SIMD),

– a combination of both

Parallel architectures - VLIW

Parallel architectures - VLIW• In VLIW many instructions are issued at the

same time and are executed in parallel by multiple execution units

• Characteristics of VLIW architectures include simple and regular instruction sets

• Instruction scheduling is done at compile-time and not at run-time

• writing assembly code for VLIW architecture is very complex and the optimization is oftenbetter left to the compiler

Parallel architectures - SIMD

Parallel architectures - SIMD

• SIMD architectures are based on data-level

parallelism

• Only one instruction is issued at a time

• The same operation specified by the instruction

is performed on multiple data sets

Numerical fidelity

• It is essential that the numerical fidelity be

maximized

• The errors due to the finite number of bits used

in the number representation and in the

arithmetic operations should be minimized

• Improving numerical fidelity can be done by

changing the numeric representation or by

dedicated hardware features

Numerical fidelity

• DSP can be categorized into:

– Fixed point (up to 64-bit, fractional arithmetic)

– Floating point (32- or 64-bit)

Fast execution control• It is important that the program in the DSP is

executed in a deterministic way

• Interrupts have to be serviced with minimal

latency

• An important DSP feature is the

implementation by hardware of looping

constructs, referred to as ‘zero-overhead

hardware loop’ - e.g. RPT #2 || NOP

Digital Signal Processor example

TI Multicore DSP+ARM KeyStone II System-on-Chip (SoC)

TI 66AK2H12• Up to 5.6 GHz of ARM and 9.6 GHz of DSP

processing coupled with:

– security,

– packet processing,

– Ethernet

• The raw computational performance is 38.4

GMACS/core and 19.2 Gflops/core (@ 1.2 GHz

operating frequency)

TI 66AK2H12• Eight TMS320C66x™ DSP Core Subsystems

Each With

– Up to 1.2 GHz C66x Fixed/Floating-Point DSP

Cores

• 38.4 GMacs/Core for Fixed Point @ 1.2 GHz

• 19.2 GFlops/Core for Floating Point @ 1.2 GHz

– Memory

• 32K Byte L1P Per Core

• 32K Byte L1D Per Core

• 1024K Byte Local L2 Per Core

TI 66AK2H12• ARM® Cortex™-A15 MPCore™ Processors Containing

Four ARM Cortex-A15 Cores

– Up to 1.4-GHz Cortex-A15 Processor Core Speed

– 4MB L2 Cache Memory Shared by All ARM Cores

– Full Implementation of ARMv7-A Architecture Instruction

Set

– 32KB L1 Instruction Cache and Data Cache per Cortex-A15

Processor Core

– AMBA 4.0 AXI Coherency Extension (ACE) Master Port,

Connected to MSMC (Multicore Shared Memory

Controller) for Low Latency Access to Shared MSMC SRAM

TI 66AK2H12• Network Coprocessor

– Packet Accelerator Enables Support for

• Transport Plane IPsec, GTP-U, SCTP, PDCP

• L2 User Plane PDCP (RoHC, Air Ciphering)

• 1 Gbps Wire Speed Throughput at 1.5 MPackets Per Second

– Security Accelerator Engine Enables Support for

• IPSec, SRTP, 3GPP and WiMAX Air Interface, and SSL/TLS Security

• ECB, CBC, CTR, F8, A5/3, CCM, GCM, HMAC, CMAC, GMAC, AES,

DES, 3DES, Kasumi, SNOW 3G, SHA-1, SHA-2 (256-bit Hash), MD5

• Up To 6.4 Gbps IPSec and 3 Gbps Air Ciphering

– Ethernet Subsystem

• Five SGMII Port Switch

TI 66AK2H12• Peripherals

– Four Lanes of SRIO 2.1

• 5 Gbps Operation Per Lane

• Supports Direct I/O, Message Passing

– Two Lanes PCIe Gen2

• Supports Up To 5 GBaud

– TwoHyperLink

• Supports Connections to Other KeyStone Architecture Devices

• Supports Up To 50 GBaud

– Five Enhanced Direct Memory Access (EDMA) Modules

TI 66AK2H12• Peripherals

– Two 72-Bit DDR3 Interfaces with Speeds Up To 1600 MHz

– USB 3.0

– Two UART Interfaces

– Three I2C Interfaces

– 32 GPIO Pins

– Three SPI Interfaces

– Semaphore Module

– Twenty 64-Bit Timers

– Five On-Chip PLLs

Keystone architecture• High performance structure for integrating

RISC and DSP cores with application-specific

coprocessors and I/O

• Four main hardware elements:

– Multicore Navigator,

– TeraNet,

– Multicore Shared Memory Controller

– HyperLink

Keystone architecture• Multicore Navigator:

– A packet-based manager that controls 16k queues

– When tasks are allocated to the queues, Multicore

Navigator provides hardware-accelerated dispatch

that directs tasks to the appropriate available

hardware

• TeraNet:

– central resource to move packets with 2 Tbps

capacity!

Keystone architecture• Multicore Shared Memory Controller:

– enables processing cores to access shared memory directly without drawing from the TeraNet’s capacity – no blocking of packetmoevement by memory access

• HyperLink:

– provides a 50-GBaud chip-level interconnect

– Working with Multicore Navigator, HyperLinkdispatches tasks to tandem devices transparently and executes tasks as if they are running on local resources

Thank you for your attention

Interesting parameters

Thank you for your attention

References[1] dsPIC family documentation; www.microchip.com

[2] www.ti.com

[3] C2000 family documentation; www.ti.com

[4] www.freescale.com

[5] 56F8000 family documentation; www.freescale.com

[6] http://www.coe.pku.edu.cn/tpic/2010913102418831.pdf

[7] http://www.dspguide.com/CH28.PDF

[8] http://www.cs.berkeley.edu/~pattrsn/252S98/Lec08-dsp.pdf

[9] http://www.ti.com/lit/ds/symlink/66ak2h12.pdf

Advanced Microcontrollers Grzegorz Budzy ń...

Documents

Transcript of Advanced Microcontrollers Grzegorz Budzy ń...