w3 2004-09-29 Blackfin_Architecture v1

download w3 2004-09-29 Blackfin_Architecture v1

of 56

Transcript of w3 2004-09-29 Blackfin_Architecture v1

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    1/56

    ACCESS IC LAB

    Graduate Institute of Electronics Engineering, NTU

    BlackfinBlackfin Processor ArchitectureProcessor Architecture

    Instructor: Prof. Andy Wu

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    2/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Introduction

    Blackfin Processor

    Blackfin Processor Product Highlights

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    3/56

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    4/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Berkeley incorporated a Reduced Instruction Set Computer (RISC)architecture

    It has the following key features:A fixed (32-bit) instruction size with few formats

    CISC processors typically had variable length instruction sets with manyformats

    A load store architecture were instructions that process data operate only onregisters and are separate from instructions that access memory

    CISC processors typically allowed values in memory to be used as operandsin data processing instructions

    A large register bank of thirty-two 32-bit registers, all of which could be used for

    any purpose, to allow the load-store architecture to operate efficientlyCISC register sets were getting larger, but none was this large and most haddifferent registers for different purposes

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    5/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Hard-wired instruction decode logic

    CISC processor used large microcode ROMs to decode their instructions

    Pipelined execution

    CISC processors allowed little, if any, overlap between consecutive instructions(though they do now)

    Single-cycle executionCISC processors typically took many clock cycles to completes a single instruction

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    6/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Single memory space for program and data

    Shared global bus

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    7/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Separate program and data memory spaces

    Usually refer to separate program and data buses

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    8/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Program bus can be use for coefficient loading for MAC

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    9/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Introduction

    BlackfinProcessor

    Blackfin Processor Product Highlights

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    10/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Made by Analog Devices Coporation

    A new breed of embedded media processor designed specifically

    for today s embedded audio, video and communication applications.

    Combine a 32-bit RISC-like instruction set and dual 16-bit multiply

    accumulate (MAC) signal processing functionality

    Perform equally well both in signal processing and control

    processing applications-in many cases deleting the requirement for

    separate heterogeneous processors

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    11/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    12/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs

    Support for 8/16/32-bit integer and 16/32-bit fractional data types

    Concurrent Fetch of One instruction and two unique data elements

    Two loop counters that allow for nested zero-overhead looping

    A Modified Harvard architecture in combinational with a hierarchical

    memory

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    13/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    14/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Arbitrary bit and bit field manipulation, insertion and extraction

    Two data address generator (DAG) units with circular and

    bit-reversed addressing

    Data address generator contains two 32-bit address ALUs and an addressregister file

    Address register file consists of six 32-bit general purpose pointer registers and

    four 32-bit circular buffer addressing registers

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    15/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Unified 4GB memory space

    Mixed 16/32-bit instruction encoding for best code density

    Memory protection for support of OS operation

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    16/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Three modes of operation

    User mode

    User mode has restricted access to a subset of system resources, thusproviding a protected software environment

    User mode is considered the domain of application programs

    Supervisor mode and Emulation mode

    Supervisor mode and Emulation mode have unrestricted access to the coreresources

    Supervisor mode and Emulation mode are usually reserved for the kernelcode of an operating system

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    17/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    BlackfinBlackfin Architecture SupportArchitecture Support

    (Single Cycle )(Single Cycle )Possibility of the following parallel operations processed in one

    clock cycleExecution of a single instruction operating on both MACs or ALUs

    Execution of a 2 x 32-bit data moves

    2 reads or 1 read/1 write

    Execution of two pointer updates

    Execution of hardware loop updates

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    18/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    BlackfinBlackfin Processor Compute UnitProcessor Compute Unit

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    19/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    BF533 Memory AccessBF533 Memory Access

    Under the right conditions4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit DataLoads, 32 bit Data Store

    PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS

    background DMA activity

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    20/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Compute Unit ArchitectureCompute Unit Architecture

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    21/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Register FileRegister File

    Data Register SyntaxR0, R1 etc. refer to 32 bit registers

    R0.L refers to the low 16 bits of the R0 32 bit regR0.H refers to the high 16 bits of the R0 register

    Accumulator SyntaxA0.L => low 16 bitsA0.H => next 16 bits

    A0.W => least significant 32 bit wordA0.X => MS 8 bit extension

    SHARC 16 32-bit data registers,

    integer and float. There is a pair of

    SHARC accumulator registers too

    8 x 32 bit

    OR

    16 x 16 bit

    2 x 40 bit

    accumulators

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    22/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    23/56

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    24/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    25/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    A & B registers must stay on the same side of the | for both

    InstructionFor dual and quad 16 bit operations the (CO) option causes thedestination registers to cross

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    26/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    27/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Multiplies are signed fractional by default

    Signed fractional multiply result is automatically left shifted 1 bitSigned fractional multiply != signed integer multiply

    Rounding available on fractional number multiplies and special

    option of integer number multiplies

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    28/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Two cases

    Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before

    extracting a 16 bit value to the destination register too

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    29/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    When extracting a 16 bit fractional value from an accumulator the

    high 16 bits is taken

    Where in the destination register it goes depends on which

    accumulator is being extracted from

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    30/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    When extracting a 16 bit integer value from an accumulator the low

    16 bits is taken

    Where in the destination register the 16 bit value goes depends on

    which accumulator is being extracted from

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    31/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    32/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    33/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    In general there are 16 and 32 bit versions of the arithmetic

    instructions

    Most of the 32 bit instructions can be executed in parallel with 2 x 16bit memory/index operations

    Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands|| means parallel

    Examples:

    A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\

    R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    34/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    BlackfinBlackfin ProcessorProcessor

    Memory ArchitectureMemory Architecture

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    35/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    A single, unified 4G byte address space using 32-bit addresses

    The L1 memory system is the primary highest performance memory

    available to the core and is faster than L2 memory system

    The L2 memory system is off-chip and have longer access latencies

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    36/56

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    37/56

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    38/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Introduction

    Blackfin Processor

    Blackfin Processor Product Highlights

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    39/56

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    40/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    Analog Devices CROSSCORE ToolsAnalog Devices CROSSCORE Tools

    CROSSCORE, Analog Devices development tools product line,

    provides easier and more robust methods for engineers to developand optimize systems by shortening product development cycles for

    faster time-to-market

    VisualDSP++ software development and debugging environment

    An integrated software development and debugging environment allowing forfast and easy development, debug, and deployment

    EZ-KIT Lite evaluation systems

    Provides an easy way to investigate the power of the ADI s family ofEmbedded Processors and DSPs to develop applications

    EmulatorsEmulators are available for PCI and USB host platforms

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    41/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor

    Key featuresHigh performance 16-bit dual MAC processor core up to 350 MHz

    Flexible, software controlled Dynamic Power Management

    Optimized RISC instruction set for high code density and programming C/C++language

    Enhanced media instructions to process audio, image, and video for multimedia

    applicationsIntegrated system peripherals including USB device, PCI, serial ports, UARTs,SPIs, 32-bit timers, and more

    Blackfin processors utilizeSingle processor core

    Single instruction set

    Single programming modelSingle set of development tools

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    42/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor

    Target applicationsAutomotive

    Broadband access

    Central office/network switch

    Digital imaging and printing

    Global positioning systems

    Industrial signal processing

    Instrumentation/telemetry

    Internet appliances

    Modem solutions

    Personal branch exchanges (PBX)

    POS terminals

    Telecommunications

    Video conferencing

    VoIP phone solutions

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    43/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor

    Blackfin Processor System Environment

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    44/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor

    Blackfin Processor Memory Hierarchy

    L1 instruction and data memories can be dynamically configured as SRAM,cache, or a combination of both

    L2 for larger storage need of instruction and data

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    45/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF535BF535 BlackfinBlackfin ProcessorProcessor

    Portable Low Power Architecture

    Dynamic power management

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    46/56

    ACCESS IC LAB

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    47/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric

    MultiMulti--ProcessorProcessorADSP-BF561 Symmetric Multi-Processor Block Diagram

    ACCESS IC LAB

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    48/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric

    MultiMulti--ProcessorProcessorKey featuresBlackfin Symmetric Multi-Processor

    Dual high performance Blackfin Processors up to 756 MHz

    Capable of over 3000 MMACs

    Independent processor cores for image processing and system control functions

    RISC-like register and instruction model for ease of programming and C/C++

    complier friendly support

    Enhanced media instructions process audio, image, and video data formultimedia applications

    Software controlled Dynamic Power Management with on-chip voltageregulation minimizes power consumption

    ACCESS IC LAB

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    49/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric

    MultiMulti--ProcessorProcessorKey featuresHighest Level of integration

    328 Kbytes of total on-chip memory

    Dual Parallel Peripheral Interface and ITU-R 656 video data formats

    External memory controller providing glueless connection to multiple banksof external SDRAM, SRAM, FLASH, or ROM memory

    High bandwidth, two-dimensional internal DMA controllers

    UART with support for IrDA

    Integrated on-chip voltage regulator

    256-ball Pb-Free Mini-BGA, and 297-ball Sparse PBGA package options

    ACCESS IC LAB G d t I tit t f El t i E i i NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    50/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF561BF561 BlackfinBlackfin SymmetricSymmetric

    MultiMulti--ProcessorProcessorKey featuresTarget Applications

    Digital still cameras

    Digital video cameras

    Hybrid digital video/still cameras

    Video security/surveillance system

    Portable multimedia players

    ACCESS IC LAB Graduate Institute of Electronics Engineering NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    51/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin

    Processor SeriesProcessor SeriesKey featuresBlackfin Processors Offer Features Attractive to a Broad Application Base

    Performance to 756 MHz/1512 MMAC enables multichannel audio plusVGA/D1 video processing in multimedia applications

    Enhanced Dynamic Power Management with on-chip voltage regulationallows operation to 0.8V, extending battery life in portable applications

    Application-tuned peripherals provide glueless connectivity to general-purpose converters in data acquisition applications

    Multiple low cost, pin and code compatible derivatives enable software

    differentiation in cost-sensitive consumer applications

    ACCESS IC LAB Graduate Institute of Electronics Engineering NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    52/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin

    Processor SeriesProcessor SeriesKey featuresHigh Level of Integration

    Up to 148 Kbytes of on-chip SRAMParallel Peripheral Interface supporting ITU-R 656 video data formatsTwo-dual channel, full duplex synchronous serial ports supporting eightstereo IS channels12 DMA channels supporting one- and two-dimensional data transfers

    Memory controller providing glueless connection to multiple banks of externalSDRAM, SRAM, flash, or ROM

    Three timers supporting PWM and pulsewidth /event count modesUART with support for IrDASPI compatible port

    Real-time clock

    Watchdog timerPLL capable of 1x to 63xfrequency multiplication

    160-ball mini-BGA, 169-ball Pb-Free PBGA and 176-lead LQFP packagesCommercial and industrial temperature ranges

    ACCESS IC LAB Graduate Institute of Electronics Engineering NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    53/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin

    Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresTwo 16-bit multipliers

    Two 40-bit accumulators

    Two 40-bit arithmetic logic units (ALU)

    Four 8-bit video ALUs

    One 40-bit shifterCompute register file

    Contains eight 32-bit registers

    Can be operated as 16 Independent 16-bit registers

    MAC

    Can perform a 16 - by 16 bit multiply per cycle, with accumulation

    to a 40-bit result

    Signed and unsigned formats, rounding, and saturation are supported

    ACCESS IC LAB Graduate Institute of Electronics Engineering NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    54/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin

    Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresProgram sequencer

    Controls the instruction execution flow, including instruction alignment anddecoding

    For program flow control, the sequencer supports PC-relative and indirectconditional jumps ( with static branch prediction ) and subroutine calls

    Hardware is provided to support zero-overhead looping

    The architecture is fully interlocked, meaning there are no visible pipeline effectswhen executing instructions with data dependencies

    Address arithmetic unit

    Provides two addresses for simultaneous dual fetches from memory

    Contains a multiported register file consisting of four sets of 64-bit index,Modify, Length, and Base registers (for circular buffering) and eight additional32-bit pointer registers (for C-style indexed stack manipulation)

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    55/56

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

    ADSPADSP--BF531/BF532/BF533BF531/BF532/BF533 BlackfinBlackfin

    Processor Series Core ArchitectureProcessor Series Core ArchitectureKey featuresBlackfin processor support a modified Harvard architecture in combination with ahierarchical memory structure

    Level 1 (L1) memories typically operate at the full processor speed with littleor no latency

    At the L1 level, the instruction memory holds instructions only. The two data

    memories hold data, and a dedicated scratchpad data memory stores stackand local variable information

    Three modes of operation

    User mode has restricted access to a subset of system resources, thusproviding a protected software environment

    Supervisor and Emulation modes have unrestricted access to the systemcore resources

    ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

  • 8/12/2019 w3 2004-09-29 Blackfin_Architecture v1

    56/56

    g g,

    [1] Analog Devices Web Site, http://www.analog.com/

    [2] Blackfin Processor

    http://www.analog.com/processors/processors/blackfin/

    [2] ADSP-BF533 Blackfin Processor Hardware Reference, Rev 1.0,

    December 2003, Analog Devices. Section 2

    [3] Blackfin Processor Instruction Set Reference, Rev 3, June 2004,

    Analog Devices. Sections 8 ~ 10, 14 & 15

    I suggest that students who want to be familiar with the Blackfin

    Processor should read reference 3 and 4 thoroughly.