w3 2004-09-29 Blackfin_Architecture v1
Transcript of w3 2004-09-29 Blackfin_Architecture v1
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
1/56
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
BlackfinBlackfin Processor ArchitectureProcessor Architecture
Instructor: Prof. Andy Wu
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
2/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Introduction
Blackfin Processor
Blackfin Processor Product Highlights
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
3/56
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
4/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Berkeley incorporated a Reduced Instruction Set Computer (RISC)architecture
It has the following key features:A fixed (32-bit) instruction size with few formats
CISC processors typically had variable length instruction sets with manyformats
A load store architecture were instructions that process data operate only onregisters and are separate from instructions that access memory
CISC processors typically allowed values in memory to be used as operandsin data processing instructions
A large register bank of thirty-two 32-bit registers, all of which could be used for
any purpose, to allow the load-store architecture to operate efficientlyCISC register sets were getting larger, but none was this large and most haddifferent registers for different purposes
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
5/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Hard-wired instruction decode logic
CISC processor used large microcode ROMs to decode their instructions
Pipelined execution
CISC processors allowed little, if any, overlap between consecutive instructions(though they do now)
Single-cycle executionCISC processors typically took many clock cycles to completes a single instruction
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
6/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Single memory space for program and data
Shared global bus
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
7/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Separate program and data memory spaces
Usually refer to separate program and data buses
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
8/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Program bus can be use for coefficient loading for MAC
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
9/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Introduction
BlackfinProcessor
Blackfin Processor Product Highlights
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
10/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Made by Analog Devices Coporation
A new breed of embedded media processor designed specifically
for today s embedded audio, video and communication applications.
Combine a 32-bit RISC-like instruction set and dual 16-bit multiply
accumulate (MAC) signal processing functionality
Perform equally well both in signal processing and control
processing applications-in many cases deleting the requirement for
separate heterogeneous processors
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
11/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
12/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs
Support for 8/16/32-bit integer and 16/32-bit fractional data types
Concurrent Fetch of One instruction and two unique data elements
Two loop counters that allow for nested zero-overhead looping
A Modified Harvard architecture in combinational with a hierarchical
memory
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
13/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
14/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Arbitrary bit and bit field manipulation, insertion and extraction
Two data address generator (DAG) units with circular and
bit-reversed addressing
Data address generator contains two 32-bit address ALUs and an addressregister file
Address register file consists of six 32-bit general purpose pointer registers and
four 32-bit circular buffer addressing registers
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
15/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Unified 4GB memory space
Mixed 16/32-bit instruction encoding for best code density
Memory protection for support of OS operation
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
16/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Three modes of operation
User mode
User mode has restricted access to a subset of system resources, thusproviding a protected software environment
User mode is considered the domain of application programs
Supervisor mode and Emulation mode
Supervisor mode and Emulation mode have unrestricted access to the coreresources
Supervisor mode and Emulation mode are usually reserved for the kernelcode of an operating system
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
17/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
BlackfinBlackfin Architecture SupportArchitecture Support
(Single Cycle )(Single Cycle )Possibility of the following parallel operations processed in one
clock cycleExecution of a single instruction operating on both MACs or ALUs
Execution of a 2 x 32-bit data moves
2 reads or 1 read/1 write
Execution of two pointer updates
Execution of hardware loop updates
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
18/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
BlackfinBlackfin Processor Compute UnitProcessor Compute Unit
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
19/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
BF533 Memory AccessBF533 Memory Access
Under the right conditions4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit DataLoads, 32 bit Data Store
PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS
background DMA activity
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
20/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Compute Unit ArchitectureCompute Unit Architecture
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
21/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Register FileRegister File
Data Register SyntaxR0, R1 etc. refer to 32 bit registers
R0.L refers to the low 16 bits of the R0 32 bit regR0.H refers to the high 16 bits of the R0 register
Accumulator SyntaxA0.L => low 16 bitsA0.H => next 16 bits
A0.W => least significant 32 bit wordA0.X => MS 8 bit extension
SHARC 16 32-bit data registers,
integer and float. There is a pair of
SHARC accumulator registers too
8 x 32 bit
OR
16 x 16 bit
2 x 40 bit
accumulators
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
22/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
23/56
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
24/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
25/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
A & B registers must stay on the same side of the | for both
InstructionFor dual and quad 16 bit operations the (CO) option causes thedestination registers to cross
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
26/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
27/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Multiplies are signed fractional by default
Signed fractional multiply result is automatically left shifted 1 bitSigned fractional multiply != signed integer multiply
Rounding available on fractional number multiplies and special
option of integer number multiplies
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
28/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Two cases
Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before
extracting a 16 bit value to the destination register too
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
29/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
When extracting a 16 bit fractional value from an accumulator the
high 16 bits is taken
Where in the destination register it goes depends on which
accumulator is being extracted from
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
30/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
When extracting a 16 bit integer value from an accumulator the low
16 bits is taken
Where in the destination register the 16 bit value goes depends on
which accumulator is being extracted from
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
31/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
32/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
33/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
In general there are 16 and 32 bit versions of the arithmetic
instructions
Most of the 32 bit instructions can be executed in parallel with 2 x 16bit memory/index operations
Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands|| means parallel
Examples:
A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\
R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
34/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
BlackfinBlackfin ProcessorProcessor
Memory ArchitectureMemory Architecture
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
35/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
A single, unified 4G byte address space using 32-bit addresses
The L1 memory system is the primary highest performance memory
available to the core and is faster than L2 memory system
The L2 memory system is off-chip and have longer access latencies
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
36/56
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
37/56
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
38/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Introduction
Blackfin Processor
Blackfin Processor Product Highlights
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
39/56
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
40/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
Analog Devices CROSSCORE ToolsAnalog Devices CROSSCORE Tools
CROSSCORE, Analog Devices development tools product line,
provides easier and more robust methods for engineers to developand optimize systems by shortening product development cycles for
faster time-to-market
VisualDSP++ software development and debugging environment
An integrated software development and debugging environment allowing forfast and easy development, debug, and deployment
EZ-KIT Lite evaluation systems
Provides an easy way to investigate the power of the ADI s family ofEmbedded Processors and DSPs to develop applications
EmulatorsEmulators are available for PCI and USB host platforms
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
41/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor
Key featuresHigh performance 16-bit dual MAC processor core up to 350 MHz
Flexible, software controlled Dynamic Power Management
Optimized RISC instruction set for high code density and programming C/C++language
Enhanced media instructions to process audio, image, and video for multimedia
applicationsIntegrated system peripherals including USB device, PCI, serial ports, UARTs,SPIs, 32-bit timers, and more
Blackfin processors utilizeSingle processor core
Single instruction set
Single programming modelSingle set of development tools
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
42/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor
Target applicationsAutomotive
Broadband access
Central office/network switch
Digital imaging and printing
Global positioning systems
Industrial signal processing
Instrumentation/telemetry
Internet appliances
Modem solutions
Personal branch exchanges (PBX)
POS terminals
Telecommunications
Video conferencing
VoIP phone solutions
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
43/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor
Blackfin Processor System Environment
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
44/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor
Blackfin Processor Memory Hierarchy
L1 instruction and data memories can be dynamically configured as SRAM,cache, or a combination of both
L2 for larger storage need of instruction and data
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
45/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor
Portable Low Power Architecture
Dynamic power management
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
46/56
ACCESS IC LAB
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
47/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric
MultiMulti--ProcessorProcessorADSP-BF561 Symmetric Multi-Processor Block Diagram
ACCESS IC LAB
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
48/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric
MultiMulti--ProcessorProcessorKey featuresBlackfin Symmetric Multi-Processor
Dual high performance Blackfin Processors up to 756 MHz
Capable of over 3000 MMACs
Independent processor cores for image processing and system control functions
RISC-like register and instruction model for ease of programming and C/C++
complier friendly support
Enhanced media instructions process audio, image, and video data formultimedia applications
Software controlled Dynamic Power Management with on-chip voltageregulation minimizes power consumption
ACCESS IC LAB
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
49/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric
MultiMulti--ProcessorProcessorKey featuresHighest Level of integration
328 Kbytes of total on-chip memory
Dual Parallel Peripheral Interface and ITU-R 656 video data formats
External memory controller providing glueless connection to multiple banksof external SDRAM, SRAM, FLASH, or ROM memory
High bandwidth, two-dimensional internal DMA controllers
UART with support for IrDA
Integrated on-chip voltage regulator
256-ball Pb-Free Mini-BGA, and 297-ball Sparse PBGA package options
ACCESS IC LAB G d t I tit t f El t i E i i NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
50/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric
MultiMulti--ProcessorProcessorKey featuresTarget Applications
Digital still cameras
Digital video cameras
Hybrid digital video/still cameras
Video security/surveillance system
Portable multimedia players
ACCESS IC LAB Graduate Institute of Electronics Engineering NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
51/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin
Processor SeriesProcessor SeriesKey featuresBlackfin Processors Offer Features Attractive to a Broad Application Base
Performance to 756 MHz/1512 MMAC enables multichannel audio plusVGA/D1 video processing in multimedia applications
Enhanced Dynamic Power Management with on-chip voltage regulationallows operation to 0.8V, extending battery life in portable applications
Application-tuned peripherals provide glueless connectivity to general-purpose converters in data acquisition applications
Multiple low cost, pin and code compatible derivatives enable software
differentiation in cost-sensitive consumer applications
ACCESS IC LAB Graduate Institute of Electronics Engineering NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
52/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin
Processor SeriesProcessor SeriesKey featuresHigh Level of Integration
Up to 148 Kbytes of on-chip SRAMParallel Peripheral Interface supporting ITU-R 656 video data formatsTwo-dual channel, full duplex synchronous serial ports supporting eightstereo IS channels12 DMA channels supporting one- and two-dimensional data transfers
Memory controller providing glueless connection to multiple banks of externalSDRAM, SRAM, flash, or ROM
Three timers supporting PWM and pulsewidth /event count modesUART with support for IrDASPI compatible port
Real-time clock
Watchdog timerPLL capable of 1x to 63xfrequency multiplication
160-ball mini-BGA, 169-ball Pb-Free PBGA and 176-lead LQFP packagesCommercial and industrial temperature ranges
ACCESS IC LAB Graduate Institute of Electronics Engineering NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
53/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin
Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresTwo 16-bit multipliers
Two 40-bit accumulators
Two 40-bit arithmetic logic units (ALU)
Four 8-bit video ALUs
One 40-bit shifterCompute register file
Contains eight 32-bit registers
Can be operated as 16 Independent 16-bit registers
MAC
Can perform a 16 - by 16 bit multiply per cycle, with accumulation
to a 40-bit result
Signed and unsigned formats, rounding, and saturation are supported
ACCESS IC LAB Graduate Institute of Electronics Engineering NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
54/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin
Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresProgram sequencer
Controls the instruction execution flow, including instruction alignment anddecoding
For program flow control, the sequencer supports PC-relative and indirectconditional jumps ( with static branch prediction ) and subroutine calls
Hardware is provided to support zero-overhead looping
The architecture is fully interlocked, meaning there are no visible pipeline effectswhen executing instructions with data dependencies
Address arithmetic unit
Provides two addresses for simultaneous dual fetches from memory
Contains a multiported register file consisting of four sets of 64-bit index,Modify, Length, and Base registers (for circular buffering) and eight additional32-bit pointer registers (for C-style indexed stack manipulation)
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
55/56
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin
Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresBlackfin processor support a modified Harvard architecture in combination with ahierarchical memory structure
Level 1 (L1) memories typically operate at the full processor speed with littleor no latency
At the L1 level, the instruction memory holds instructions only. The two data
memories hold data, and a dedicated scratchpad data memory stores stackand local variable information
Three modes of operation
User mode has restricted access to a subset of system resources, thusproviding a protected software environment
Supervisor and Emulation modes have unrestricted access to the systemcore resources
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
-
8/12/2019 w3 2004-09-29 Blackfin_Architecture v1
56/56
g g,
[1] Analog Devices Web Site, http://www.analog.com/
[2] Blackfin Processor
http://www.analog.com/processors/processors/blackfin/
[2] ADSP-BF533 Blackfin Processor Hardware Reference, Rev 1.0,
December 2003, Analog Devices. Section 2
[3] Blackfin Processor Instruction Set Reference, Rev 3, June 2004,
Analog Devices. Sections 8 ~ 10, 14 & 15
I suggest that students who want to be familiar with the Blackfin
Processor should read reference 3 and 4 thoroughly.