CEC 320 and 322 Microprocessor Systems Class and...
Transcript of CEC 320 and 322 Microprocessor Systems Class and...
October 11, 2019 Sam Siewert
CEC 320 and 322Microprocessor Systems
Class and Lab
Lecture 8 - DSP MCUs
Lab (old #6) #7 Demo - Video
Use 4 GPIO pins to activate 4 phases of a stepper motor with a timer ISR
Stepper motor has 4 phases, but N steps per rotation (e.g. 48 teeth, 7.5 degrees apart)
– Turn Motor Off (no coils activated)– Low Torque - one coil activated (4
steps)– Full Torque - 2 adjacent coils
activated (4 steps)– Half Step - one coil, then same coil
with adjacent (8 steps)
RPM ModeN steps/sec / 48 steps/rev x 60 sec/min
Follow Modesteps commanded as potentiometer position changed
Sam Siewert 2
Example Menu and Commands
4 phase Stepper_motor
Full Torque GPIO Signal Verification with AD2
Full Torque0xC , 0x6 , 0x3 , 0x9
12- -, - 23 -, - -34, 1- -4 1100, 0110, 0011, 1001
Board Configuration & OLED Display - Verify
Darlington Array Stepper Motor Circuit
Sam Siewert 3
conversion-calculator-resistor-color-code-4-band
Darlington Drives Stepper Motor (5VDC for AIRPAX, 150mA)GPIO controls coil activation for Low, Full Torque and ½ stepLab calls for resistors, but shows inductors in-line
Demonstrate
1) Fwd at RPM2) Rev at RPM3) Follow Fwd4) Follow Rev5) Turn Off Motor
Video Demonstration for AIRPAX Stepper Motorhttp://mercury.pr.erau.edu/~siewerts/cec320/video/Lab-7/
Inductors pictured (not resistors) - https://www.electronicshub.org/inductor-color-code/
Unipolar 4-phase StepperRcoil = 9.1 Ohms5 VDC, rated to 550 mAE.g. Rseries + Rcoil = 50 Ohms, 100 mAStep angle = 7.5 deg– 48 steps / revolution– 10.4 ounce-inch torque
Lcoil = 10 mH1 RPM to 120 RPM for demo (limit?)Follow mode
Sam Siewert 4
8.33 rev/sec at 400 pps8.33 rev/sec x 60 sec/min = 500 RPM
Precise 4-Phase Unipolar StepperRcoil = 7.5 Ohms, 6.0 VDC, rated 800 mA, Lcoil = 6.6 mHStep = 1.8 deg, 200 steps / revolution, 30.5 oz-inch torque
Sam Siewert 5
Small Unipolar 4-phase Stepper
Rcoil = 20.0 Ohms, 5 VDC, rated 250 mA, Lcoil = 3.9 mHStep = 18 deg, 20 steps / revolution, 1.11 oz-inch torque
Sam Siewert 6
20 rev/sec at 400 pps20 rev/sec x 60 sec/min = 1200 RPM
77
Chapter 2 : Instruction sets2a Preliminaries
Video 2.1.1 Computer architecture taxonomy.2.1.2 Assembly language.
2b ARM Processor2c TI DSP
C64x family – Modern High Performance DSP
c55 – Older Low Performance DSP
2d x86/AMD64
/erau/cec320/s19/btd13-Feb-20
88
DSP Introduction• Digital Signal Processing: application of
mathematical operations to digitally represented signals
• Signals represented digitally as sequences of samples
• Digital signals obtained from physical signals via tranducers (e.g., microphones) and analog-to-digital converters (ADC)
• Digital signals converted back to physical signals via digital-to-analog converters (DAC)
• Digital Signal Processor (DSP): electronic system that processes digital signals
13-Feb-20 /erau/cec320/s19/btd
99
DSP algorithms and applications• Applications – Instrumentation and
measurement – Communications – Audio and video processing – Graphics, image enhancement, 3- D rendering – Navigation, radar, GPS – Control - robotics, machine vision, guidance
• Algorithms – Frequency domain filtering - FIR and IIR – Frequency- time transformations - FFT – Correlation
13-Feb-20 /erau/cec320/s19/btd
1010
What Do DSPs Need to Do Well?• Most DSP tasks require:
Repetitive numeric computations Attention to numeric fidelity High memory bandwidth, mostly via array accesses Real-time processing
• DSPs must perform these tasks efficiently while minimizing: Cost Power Memory use Development time
13-Feb-20 /erau/cec320/s19/btd
1111
Who Cares?• DSP is a key enabling technology for many
types of electronic products • DSP-intensive tasks are the performance
bottleneck in many computer applications today
• Computational demands of DSP-intensive tasks are increasing very rapidly
• In many embedded applications, general-purpose microprocessors are not competitive with DSP-oriented processors today
• 1997 market for DSP processors: $3 billion
13-Feb-20 /erau/cec320/s19/btd
1212
A Tale of Two Cultures• General Purpose Microprocessor traces roots back to
Eckert, Mauchly, Von Neumann (ENIAC)• DSP evolved from Analog Signal Processors, using
analog hardware to transform physical signals (classical electrical engineering)
• ASP to DSP because DSP insensitive to environment (e.g., same response in snow
or desert if it works at all) DSP performance identical even with variations in
components; 2 analog systems behavior varies even if built with same components with low-tolerance 1% variation
• Different history and different applications led to different terms, different metrics, some new inventions
• Increasing markets leading to cultural warfare
13-Feb-20 /erau/cec320/s19/btd
1313
TI Product Family• Tiva C Series
Embedded ARM Microcontrollers• Sitara
ARM cortex-A multiprocessors for general purpose computing
Used in BeagleBone products• MSP 430
Low Power Embedded• Hercules
Safety Critical / Real Time ARM Cortex-R• DaVinci & C6000
DSP for specific applications• Keystone
Heterogeneous Multiprocessors encompassing both ARM & DSP processors
13-Feb-20 /erau/cec320/s19/btd
1414
TI DSP Products
13-Feb-20 /erau/cec320/s19/btd
C55xC64x
1515
DSP SoC approach• TI “Keystone” Family
DSP First, other 2ndPreliminary Data• DSP as a co-processor to the
CPU in the same IC• Product available with double
this processing• Computational performance
38.4 GMACs/core and 19.2 Gflops/core.
• C66x is 100% backward compatible with software for C64x+ devices.
• C66x incorporates 90 new instructions targeted for floating point (FPi) and vector math oriented (VPi) processing.
• $165 ea in 1k units(Mar.2015)
13-Feb-20 /erau/cec320/s19/btd
1616
“General Use” DSP approach• TI “Sitara” Family• Released Oct ’15• ARM first,
DSP for specific algorithm optimization
• Used in BeagleBoard x15
• IC cost ~$75• More than 10
processing cores
13-Feb-20 /erau/cec320/s19/btd
1717
DSP Summary Viewing• Dream It. Do It. DSP It. Learn How TI DSPs have
evolved to unlock your designs. https://youtu.be/vaxsnYFeqaY Summary of DSP Evolution & Application In the context of “IoT” applications I _Like_ the DSP in the interface to sensors logical
placement
• C64x Instruction Set https://youtu.be/HRqIxtBi_E4 Marilyn Wolf – Textbook Author Monotone delivery of slides as provided (some overlap)
13-Feb-20 /erau/cec320/s19/btd
1818
Chapter 2 : Instruction sets2a Preliminaries
Video 2.1.1 Computer architecture taxonomy.2.1.2 Assembly language.
2b ARM Processor2c TI DSP
C64x family – Modern High Performance DSP
c55 – Older Low Performance DSP
2d x86/AMD64
/erau/cec320/s19/btd13-Feb-20
191913-Feb-20 /erau/cec320/s19/btd
TI C64x instruction set• Specific IC• C64x Architectural organization• C64x Instruction set
2020
Range of TI C6xx products
13-Feb-20 /erau/cec320/s19/btd
2121
C64x IC – TMS320C6455• TMS320C64x+™
DSP Core C64x Product
evolution• 8 32-bit instructions
per cycle
13-Feb-20 /erau/cec320/s19/btd
2222
C64x+ core• 8
instructions per cycle, but they are of a restricted type
• Instructions must also be partitioned to use separate register sets
13-Feb-20 /erau/cec320/s19/btd
2323
C64x Datapath• Eight parallel execution
functional units Each functional unit has a
unique set of instructions which it can execute
• Two separate datapaths Each has a unique &
separate set of registers
13-Feb-20 /erau/cec320/s19/btd
2424
C64 Instruction Set
• Instructions fetched in 256-bit (8-word; 32-byte) fetch packets Conventional instructions are 32-bits long
Some 16-bit compact instructions supported 8-instructions in packed VLIW word
13-Feb-20 /erau/cec320/s19/btd
2525
.D functional unit Opcodes• Fundamentally
Load &/or Store
• Only the (2) D functional units have access to the memory subsystem
• Ability to load/store bytes, half-words, words or arrays of words, long, double-words and non-aligned
13-Feb-20 /erau/cec320/s19/btd
2626
Addressing Modes
• For DSP applications – lots of pre/post incrementing for continuous operation loops
13-Feb-20 /erau/cec320/s19/btd
2727
.L functional unit opcodes• ALU, Logical,
min/max, packing & unpacking vectorsCatch all of simple computational operations
• Low hardware resources as compared to some other functional units
13-Feb-20 /erau/cec320/s19/btd
2828
.M functional unit Opcodes• M for Multiply• Most instructions
utilize the multiply functional units
• Lots of variants on multiply, multiply-accumulate and dot-product
13-Feb-20 /erau/cec320/s19/btd
2929
.S functional unit Opcodes• Shift &
Branch, compare, packing & unpacking instructions
13-Feb-20 /erau/cec320/s19/btd
3030
Non-specific unit opcodes• High level functions
& software pipelined loops
13-Feb-20 /erau/cec320/s19/btd
3131
C64x Pipeline• Four Fetch
Stages• Two
Decode stages
• Variable Execution stages
• Shading indicates not all functional units in use each cycle
13-Feb-20 /erau/cec320/s19/btd
3232
Relevant Terms• Software Pipelining
(SPLOOP) Writing a loop in such a way that a new result can be generated
each cycle by distributing the operations of a single high-level loop iteration across the functional unit(s) available in a single execution cycle
• Delay Slots Violation of vonNeumann architecture
Machine code requires understanding that the result of an instruction can not be used by the next instruction
The delay between destination as a specification and the ability to use the (new) result as an operand is the number of delay slots associated with the generating instruction
• Circular Addressing Allows for a continuous evaluation of incoming & outgoing
buffers being produced &/or consumed by independent parallel threads
13-Feb-20 /erau/cec320/s19/btd
3333
Software Pipelining
As compared with Hardware Pipelining• Allows the utilization of all resources and inner-loop
bodies to complete in minimum time• Typically coded by compiler – but important to be able
to write the high-level description using correct syntax
13-Feb-20 /erau/cec320/s19/btd
3434
C64x Summary• A “modern” DSP architecture with multiple
evolutionary improvements Instructions as needed for algorithm optimization Floating point added at the c67x stage
• Each core has 8-wide execution, pipeline depth > 7 for in excess of 54 instructions in-flight
• Support of an advanced compiler is necessary for optimization of a complex architecture(or many man-years of effort)
13-Feb-20 /erau/cec320/s19/btd
3535
DSP Marketing Viewing• Industry's first 10-GHz fixed/floating point DSP
https://youtu.be/pcGggktOZL8 Element14 – HIGH quality information Good use / application ideas & discussion
• TI Keystone II ARM+DSP Server for Worlds Most Power Efficient Super Computers https://youtu.be/WtF3aXXzb9Y Interview at a conference or trade-show “green 500” supercomputers - Optional -
13-Feb-20 /erau/cec320/s19/btd
3636
Summary: How are DSPs different?• Essentially infinite streams of data which
need to be processed in real time • Relatively small programs and data storage
requirements • Intensive arithmetic processing with low
amount of control and branching (in the critical loops)
• High amount of I/ O with analog interface
13-Feb-20 /erau/cec320/s19/btd
3737
Summary: How are DSPs different?• High speed of operation multiply
accumulate• Complex instructions for standard DSP
functions (IIR and FIR filters, convolvers) • Specialized memory addressing
Modular arithmetic for circular buffers (delay lines)
Bit reversal (FFT) • Zero overhead loops and repeat instructions • I/ O support – Serial and parallel ports
13-Feb-20 /erau/cec320/s19/btd
3838
Unique Features in DSP architectures• Continuous I/O stream, real time requirements• Multiple memory accesses• Autoinc/autodec addressing• Datapath
Multiply width Wide accumulator Guard bits/shifting rounding Saturation
• Weird things Circular addressing Reverse addressing
• Special instructions shift left and saturate (arithmetic left-shift)
13-Feb-20 /erau/cec320/s19/btd