Advanced Microcontrollers Grzegorz Budzy ń...
Transcript of Advanced Microcontrollers Grzegorz Budzy ń...
AdvancedAdvanced MicrocontrollersMicrocontrollers
Grzegorz BudzyGrzegorz Budzyńń
LLectureecture 11:11:DigitalDigital SignalSignal ControllersControllers & & DigitalDigital SignalSignalProcessorsProcessors
Plan• Digital Signal Controllers
– Introduction
• Digital Signal Controllers vs Microcontrollers
• Digital Signal Controllers vs Digital Signal Processors
– DSC by Texas Instruments
– DSC by Freescale
• Digital Signal Processors
Digital Signal Controllers
Introduction
Digital Signal Controller (DSC) is a combination
of a Microcontroller and a Digital Signal
Processor (DSP)
Introduction
- Like microcontrollers DSC have:
- Fast interrupt response
- Control oriented peripherals (PWM, watchdog,
etc.)
- Usually programmed in C++ language (although
assembler programming possible)
Introduction
- Like digital signal processors DSC have:
- Single cycle multiply-and-accumulate MAC
instructions
- Barrel shifters
- Large Accumulators
Introduction
- Main applications of DSCs:
- Motor control
- Power conversion
- Sensor processing applications
Introduction
- Main DSC vedors:
- Texas Instruments
- NXP
- Microchip
- Infineon
- Renesas
DSC by Microchip
dsPIC – application areas
Sourc
e: [1
]
dsPIC - Architecture
• dsPIC – family of 16 bit RISC controllers with
DSP features
• Two subfamilies:
– dsPIC30F – smaller, slower
– dsPIC33F – highest performance
• The only digital signal controller on the
market available in QFN-28 cases at prices
down to 3$!!!
dsPIC - Architecture
• Main features:
– Modified Harvard architecture,
– Optimized for C-compilers
– Two 40-bit accumulators with rounding
– Memory options as in PIC24
– Many single-cycle MAC operations
– Cases 18 to 110 pins
DSC by Texas Instruments
DSC - TI portfolio
• Four main subfamilies:
– 24x 16-bit Series
– 28x Fixed point Series
– 28x Piccolo Series
– 28x Delfino Floating-point Series
DSC - TI portfolio
Sourc
e: [2
]
Piccolo Series• TMS320F2802x:
– fixed point microcontrollers
– 40-60MHz performance
– up to 64KB of on-chip flash
– small 38-pin package options
– feature rich peripherals:
• 150-ps high resolution enhanced pulse width modulators
(ePWMs)
• 4.6 MSPS 12-bit ADC
• high precision on-chip oscillators, analog comparators
• high speed 12-bit ADC
• support for I2C, SPI, and SCI.
Piccolo Series
- TMS320F2803x:
- fixed point 32-bit microcontrollers
- 60 MHz speed
- up to 128KB flash memory
- 64 or 80-pin packages
- peripherals and features of the 2802x devices plus:
- control law accelerator (CLA) for high efficiency control loops
- QEP module
- CAN and LIN interfaces
Delfino Series• TMS320F2833x:
– Integrated floating point unit simplifies
development and speeds control applications up
by an average of 50%
– F2833x devices run at up to 150 MHz (300
MFLOPS) with two package offerings that are pin-
for-pin compatible within all F2833x and F2823x
controllers
– Features up to 512KB of on-chip flash and a DMA
for high speed memory access.
Delfino Series• TMS320C2834x:
– delivers up to 600 MFLOPS of floating-point
performance
– up to 516KB of single-access RAM
– PWMs with 65-ps
– Direct Memory Access and a low-latency core
make the C2834x an excellent solution for
performance-hungry real-time control
applications.
28x Fixed-Point Series
• TMS320F2823x:
– F2823x generation of controllers is a fixed point
version of the F2833x devices
– Pin-to-pin compatible with the F2833x series, all
of the peripherals and features remain the same
except for the floating point unit.
28x Fixed-Point Series
• TMS320F280x:
– device offers 60-100Mhz performance
– were the first generation to feature:
• the on-chip 12.5 MSPS 12-bit ADC
• multiple high resolution PWM peripherals
• QEP (quadrature encoder pulse)
– F280xx devices have up to 256KB of flash
memory.
28x Fixed-Point Series
• TMS320F281x:
– F281x device generation features:
• 150Mhz core
• flexible Event Managers that provide access to timers,
compare/PWM units, captures, and quadrature-
encoder units
C2000
Architecture
Sourc
e: [3
]
C2000 core features• Main features:
– Efficient C engine with hardware that allows a C
compiler to generate compact code, resulting in
industry-leading code density
– Single cycle read-modify-write instructions, single
cycle 32-bit multiply.
– Fast interrupt service time (down to 9 cycles) with
automatic zero-cycle context save.
– 96 dedicated interrupt vectors that require no
software decision making
C2000 core features
• Main features:
– 32-bit floating-point unit on Delfino controllers
– On select Piccolo devices, an independent Control Law
Accelerator (CLA) processes floating-point control loops to
free the CPU for other purposes.
– Three 32-bit general purpose CPU timers brings accuracy
and flexibility to any applications.
– Code Security Module prevents reverse engineering and
protects valuable intellectual property
DSC by Freescale
DSC by Freescale
Freescale microcontrollers
portfolio
Sourc
e: [4
]
DSC by Freescale
Freescale DSC portfolio
Sourc
e: [4
]
DSC by Freescale
56F8000 block diagram
Sourc
e: [5
]
DSC by Freescale
56F8000 details
DSC by Freescale
56F8000 application
Sourc
e: [5
]
DSC by Freescale
56F8000 application
Sourc
e: [5
]
DSC by Freescale
56F8000 application
Sourc
e: [5
]
DSC by Freescale
56F8000 application
Sourc
e: [5
]
56F8XXX comparison
Sourc
e: [5
]
C2000 core features
Sourc
e: [5
]
MC56F8357• Main features:
– On-chip memory includes high-speed volatile and
nonvolatile components:
• 512 KB of Program Flash
• 4 KB of Program RAM (836X Devices)
• 32 KB of Data RAM
• 32 KB of Data Flash (836X Devices)
• 32 KB of Boot Flash
– Access up to 4MB of off-chip program and 32MB
of data memory
– Up to 60 MIPS at 60 MHz execution frequency
MC56F8357• Main features:
– Four 12-bit, Analog-to-Digital Converters
– Temperature Sensor
– Up to two FlexCAN (CAN Version 2.0 B-compliant)
– Two Serial Communication Interfaces (SCIs)
– Up to two Serial Peripheral Interfaces (SPIs)
– Two dedicated external interrupt pins
– Software-programmable Phase-Lock Loop
MC56F8357
MC56F8357 - Core• Main features 1/2:
– Efficient 16-bit 56800E family controller engine
with dual Harvard architecture
– Single-cycle 16 × 16-bit parallel Multiplier-
Accumulator (MAC)
– Four 36-bit accumulators, including extension bits
– Arithmetic and logic multi-bit shifter
– Parallel instruction set with unique DSP
addressing modes
– Hardware DO and REP loops
MC56F8357 - Core• Main features 2/2:
– Three internal address buses and one external
address bus
– Four internal data buses and one external data bus
– Instruction set supports both DSP and controller
functions
– Controller-style addressing modes and instructions for
compact code
– Efficient C compiler and local variable support
– Software subroutine and interrupt stack with depth
limited only by memory
MC56F8357 – Memory
MC56F8357 – Core architecture
MC56F8357 – Core architecture
MC56F8357 – Core pipeline
MC56F8357 – Core pipeline
Digital Signal Processors
DSP Introduction- Digital Signal Processing: application of
mathematical operations to digitally
represented signals
- Signals represented digitally as sequences of
samples
- Digital signals obtained from physical signals
via tranducers (e.g., microphones) and analog
to-digital converters (ADC)
DSP Introduction
- Digital signals converted back to physical
signals via digital-to-analog converters (DAC)
- Digital Signal Processor (DSP): electronic
system that processes digital signals
DSP Introduction
- Most DSP tasks require:
- Repetitive numeric computations
- Attention to numeric fidelity
- High memory bandwidth, mostly via array
accesses
- Real-time processing
DSP Introduction
- DSPs must perform these tasks efficiently
while minimizing:
- Cost
- Power
- Memory use
- Development time
Common DSP applications
- Applications – Instrumentation and
measurement:
- Communications
- Audio and video processing
- Graphics, image enhancement, 3- D rendering
- Navigation, radar, GPS
- Control - robotics, machine vision, guidance
Common DSP algorithms
- Algorithms
- Frequency domain filtering - FIR and IIR
- Frequency- time transformations - FFT
- Correlation
FIR algorithm
Common DSP architecture
Requirements <-> Realisations
Fast data access
- Need of transferring data to / from memory
or DSP peripherals
- Need of retrieving instructions from memory
- Three main implementations:
- high-bandwidth memory architectures
– specialized addressing modes
– direct memory access
High-bandwidth memory architectures
High-bandwidth memory architectures
- Only Harvard (b) and Super-Harvard (c) usedin DSPs
• Super-Harvard modification - adding to the DSP core a small bank of fast memory, called‘instruction cache’
• Data are also allowed to be stored in the program memory
• The last-executed program instructions arerelocated at run time in the instruction cache
High-bandwidth memory architectures
• Also data-cache for fast access to data is
sometimes present
High-bandwidth memory architectures
High-bandwidth memory architectures
- Cache drawbacks:
– Problems caused by the lack of full predictability for
cache hits
– A missing cache hit happens when the data or the
instructions needed by the DSP are not stored in
cache memory, hence they have to be fetched from a
slower memory with an execution speed penalty
– A situation causing a missing cache hit is, for instance,
the flow change due to branch instructions.
Specialized addressing modes• Address generator blocks controls the address
generation for:
– specialized addressing modes such as indexing
addressing, circular buffers, and bit-reversal
addressing
Specialized addressing modes• Circular buffers – user for example in the
implementation of digital filters
Specialized addressing modes• Bit-reversal addressing – necessary for FFT
(butterfly)
Direct memory access
• The DMA controller is a second processor
working in parallel with the DSP core
• It is dedicated to transferring information
between two memory areas or between
peripherals and memory
• The DMA controller frees the DSP core for
other processing tasks
Direct memory access
Direct memory access
Fast computation – MAC centered
• The MAC operation is used by many digital
processing algorithms
• The basic DSP arithmetic processing blocks are:
– a) many registers
– b) one or more multipliers
– c) one or more Arithmetic Logic Units (ALUs)
– d) one or more shifters
Fast computation – MAC centered
Instruction pipelining
• Instruction pipelining consists of:
– dividing the execution of instructions into different
stages
– executing the different instructions in parallel
stages.
• The net result is an increased throughput of the
instruction execution.
Parallel architectures
• Parallel-enhanced DSP architectures started to
appear on the market in the mid 1990s and
were based:
– on instruction-level parallelism (VLIW),
– data-level parallelism (SIMD),
– a combination of both
Parallel architectures - VLIW
Parallel architectures - VLIW• In VLIW many instructions are issued at the
same time and are executed in parallel by multiple execution units
• Characteristics of VLIW architectures include simple and regular instruction sets
• Instruction scheduling is done at compile-time and not at run-time
• writing assembly code for VLIW architecture is very complex and the optimization is oftenbetter left to the compiler
Parallel architectures - SIMD
Parallel architectures - SIMD
• SIMD architectures are based on data-level
parallelism
• Only one instruction is issued at a time
• The same operation specified by the instruction
is performed on multiple data sets
Numerical fidelity
• It is essential that the numerical fidelity be
maximized
• The errors due to the finite number of bits used
in the number representation and in the
arithmetic operations should be minimized
• Improving numerical fidelity can be done by
changing the numeric representation or by
dedicated hardware features
Numerical fidelity
• DSP can be categorized into:
– Fixed point (up to 64-bit, fractional arithmetic)
– Floating point (32- or 64-bit)
Fast execution control• It is important that the program in the DSP is
executed in a deterministic way
• Interrupts have to be serviced with minimal
latency
• An important DSP feature is the
implementation by hardware of looping
constructs, referred to as ‘zero-overhead
hardware loop’ - e.g. RPT #2 || NOP
Digital Signal Processor example
TI Multicore DSP+ARM KeyStone II System-on-Chip (SoC)
TI 66AK2H12• Up to 5.6 GHz of ARM and 9.6 GHz of DSP
processing coupled with:
– security,
– packet processing,
– Ethernet
• The raw computational performance is 38.4
GMACS/core and 19.2 Gflops/core (@ 1.2 GHz
operating frequency)
TI 66AK2H12• Eight TMS320C66x™ DSP Core Subsystems
Each With
– Up to 1.2 GHz C66x Fixed/Floating-Point DSP
Cores
• 38.4 GMacs/Core for Fixed Point @ 1.2 GHz
• 19.2 GFlops/Core for Floating Point @ 1.2 GHz
– Memory
• 32K Byte L1P Per Core
• 32K Byte L1D Per Core
• 1024K Byte Local L2 Per Core
TI 66AK2H12• ARM® Cortex™-A15 MPCore™ Processors Containing
Four ARM Cortex-A15 Cores
– Up to 1.4-GHz Cortex-A15 Processor Core Speed
– 4MB L2 Cache Memory Shared by All ARM Cores
– Full Implementation of ARMv7-A Architecture Instruction
Set
– 32KB L1 Instruction Cache and Data Cache per Cortex-A15
Processor Core
– AMBA 4.0 AXI Coherency Extension (ACE) Master Port,
Connected to MSMC (Multicore Shared Memory
Controller) for Low Latency Access to Shared MSMC SRAM
TI 66AK2H12• Network Coprocessor
– Packet Accelerator Enables Support for
• Transport Plane IPsec, GTP-U, SCTP, PDCP
• L2 User Plane PDCP (RoHC, Air Ciphering)
• 1 Gbps Wire Speed Throughput at 1.5 MPackets Per Second
– Security Accelerator Engine Enables Support for
• IPSec, SRTP, 3GPP and WiMAX Air Interface, and SSL/TLS Security
• ECB, CBC, CTR, F8, A5/3, CCM, GCM, HMAC, CMAC, GMAC, AES,
DES, 3DES, Kasumi, SNOW 3G, SHA-1, SHA-2 (256-bit Hash), MD5
• Up To 6.4 Gbps IPSec and 3 Gbps Air Ciphering
– Ethernet Subsystem
• Five SGMII Port Switch
TI 66AK2H12• Peripherals
– Four Lanes of SRIO 2.1
• 5 Gbps Operation Per Lane
• Supports Direct I/O, Message Passing
– Two Lanes PCIe Gen2
• Supports Up To 5 GBaud
– TwoHyperLink
• Supports Connections to Other KeyStone Architecture Devices
• Supports Up To 50 GBaud
– Five Enhanced Direct Memory Access (EDMA) Modules
TI 66AK2H12• Peripherals
– Two 72-Bit DDR3 Interfaces with Speeds Up To 1600 MHz
– USB 3.0
– Two UART Interfaces
– Three I2C Interfaces
– 32 GPIO Pins
– Three SPI Interfaces
– Semaphore Module
– Twenty 64-Bit Timers
– Five On-Chip PLLs
Keystone architecture• High performance structure for integrating
RISC and DSP cores with application-specific
coprocessors and I/O
• Four main hardware elements:
– Multicore Navigator,
– TeraNet,
– Multicore Shared Memory Controller
– HyperLink
Keystone architecture• Multicore Navigator:
– A packet-based manager that controls 16k queues
– When tasks are allocated to the queues, Multicore
Navigator provides hardware-accelerated dispatch
that directs tasks to the appropriate available
hardware
• TeraNet:
– central resource to move packets with 2 Tbps
capacity!
Keystone architecture• Multicore Shared Memory Controller:
– enables processing cores to access shared memory directly without drawing from the TeraNet’s capacity – no blocking of packetmoevement by memory access
• HyperLink:
– provides a 50-GBaud chip-level interconnect
– Working with Multicore Navigator, HyperLinkdispatches tasks to tandem devices transparently and executes tasks as if they are running on local resources
Thank you for your attention
Interesting parameters
Thank you for your attention
References[1] dsPIC family documentation; www.microchip.com
[2] www.ti.com
[3] C2000 family documentation; www.ti.com
[4] www.freescale.com
[5] 56F8000 family documentation; www.freescale.com
[6] http://www.coe.pku.edu.cn/tpic/2010913102418831.pdf
[7] http://www.dspguide.com/CH28.PDF
[8] http://www.cs.berkeley.edu/~pattrsn/252S98/Lec08-dsp.pdf
[9] http://www.ti.com/lit/ds/symlink/66ak2h12.pdf