Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing...

26
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture

Transcript of Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing...

Page 1: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

ECE 697F

Reconfigurable Computing

Lecture 9

Coarse Grained FPGA Architecture

Page 2: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Overview

• Review 5 different coarse-grained architectures

• How do they vary?

- Basic block

- Inter-block communication

- Communication protocol

- Software summary

- Results

• Device trends

Page 3: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Coarse-grained Architectures• DP-FPGA

- LUT-based

- LUTs share configuration bits

• Rapid

- Specialized ALUs, mutlipliers

- 1D pipeline

• Matrix

- 2-D array of ALUs

• Chess

- Augmented, pipelined matrix

• Raw

- Full RISC core as basic block

- Static scheduling used for communication

Page 4: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

DP-FPGA

• Break FPGA into datapath and control sections

• Save storage for LUTs and connection transistors

• Key issue is grain size

• Cherepacha/Lewis – U. Toronto

Page 5: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Configuration Sharing

MC = LUT SRAM bitsCE = connection block pass transistors

CEN

MC

N

CE*NMCA(N)

Set MC = 2-3CE

0 1 1 1 0 0 1 0

Y0

Y1

A0

B0

C0

A1

B1

C1

Page 6: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Two-dimensional Layout

• Control network supports distributed signals.

• Data routed as four-bit values.

Page 7: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

DP-FPGA Technology Mapping

• Ideal case would be if all datapath divisible by 4, no “irregularities”

• Area improvement includes logic values only.

• Shift logic included

Page 8: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Rapid

• Reconfigurable Pipeline Datapath

• Ebeling –University of Washington

• Uses hard-coded functional units (ALU, Memory, multiply)

• Good for signal processing

• Linear array of processing elements.

Cell Cell Cell

Page 9: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Rapid Datapath

• Segmented linear architecture

• All RAMs and ALUs are pipelined

• Bus connectors also contain registers

Page 10: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Rapid Control Path

• In addition to static control, control pipeline allows dynamic control.

• LUTs provide simple programmability.

• Cells can be chained together to form continuous pipe.

Page 11: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

FIR Filter Example

• Measure system response to input impulse

• Coefficients used to scale input.

• Running sum determined total.

Page 12: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Mapping a Tap to Rapid

• Chain multiple taps together (one multiplier per tap)

Page 13: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Matrix

• Dehon and Mirsky -> MIT

• 2-dimensional array of ALUs

• Each Basic Functional Unit contains “processor” (ALU + SRAM)

• Ideal for systolic and VLIW computation.

• 8-bit computation

• Forerunner of SiliconSpice product.

Page 14: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Basic Functional Unit

• Two inputs from adjacent blocks.

• Local memory for instructions, data.

Page 15: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

BFU Interconnect

• Near-neighbor and quad connectivity

• Pipelined interconnect at ALU inputs

• Data transferred in 8-bit groups.

• Interconnect not pipelined

Page 16: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

BFU Inputs

• Each ALU inputs come from several sources.

• Note that source is locally configurable based on data values.

Page 17: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Filtering Example

• yi = w1 * xi + w2 * xi+1 + w3 * xi+2 + … wk * xi+k-1

• For k-weight filter 4K cells needed

- One result every 2 cycles

• K/2 8x8 multiplies per cycle

K=8

Page 18: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Chess

• HP Labs – Bristol, England

• 2-D array – similar to Matrix

• Contains more “FPGA-like” routing resources.

• No reported software or application results

• Doesn’t support incremental compilation

Page 19: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Chess Interconnect

• More like an FPGA

• Takes advantage of near-neighbor connectivity

Page 20: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Chess Basic Block

• Switchbox memory can be used as storage

• ALU core for computation

Page 21: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Chess Statistics

• Use metrics to evaluate computational power.

• Efficient multiplies due to embedded ALU

• Process independent.

Page 22: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Reconfigurable Architecture Workstation

• MIT Computer Architecture Group

• Full RISC processor located as processing element.

• Routing decoupled into switch mode

• Parallelizing compiler used to distribute work load.

• Large amount of memory per tile.

Page 23: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

RAW Tile

• Full functionality in each tile

• Static router located for near-neighbor communication

Page 24: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

RAW Datapath

Page 25: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Raw Compiler• Parallelizes compilation across multiple tiles

• Orchestrates communication between tiles

• Some dynamic (data dependent) routing possible.

Page 26: Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004

Summary

• Architectures moving in the direction of coarse-grained blocks.

• Latest trend is functional pipeline.

• Communication determined at compile time

• Software support still a major issue.