FPGA for Complex System...

32
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011

Transcript of FPGA for Complex System...

Page 1: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

FPGA for Complex SystemImplementation

National Chiao Tung UniversityChun-Jen Tsai

04/14/2011

Page 2: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

2/32

About FPGA

FPGA was invented by Ross Freeman in 1989†

SRAM-based

FPGA properties Standard parts Allowing multi-level logic implementation Composed of programmable logic blocks and interconnects Some complex FPGAs also include non-programmable logic

blocks (such as processor cores, MAC units, and SRAMs) toimprove efficiency “Platform FPGA”

†R. H. Freeman, “Configurable Electrical Circuit Having Configurable Logic Elements and ConfigurableInterconnect,”U.S. Patent 4,870,302, Sep. 26, 1989

Page 3: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

3/32

Electronic Logic Components

Logic

GeneralPurpose IC ASIC

ProgrammableLogic Devices

GateArrays Cell-based ICs Full Custom

ICs

SPLDs(PALs) CPLDs FPGAs

Page 4: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

4/32

Programmable Array Logic (PAL)

PAL is a special case of sum-of-product logic inwhich the AND array is programmable and the ORarray is fixed

Each input is buffered and drives many AND gates:

AND gate symbols in PAL:

non-inverted output

inverted output

ABC

ABC ABC

A B C

Page 5: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

5/32

Function Implementation Using PAL

Combinational PALs have 10 ~ 20 inputs and 2 ~ 10outputs; with 2 ~ 8 AND gates driving each OR gate

Sequential PALs has extra D flip-flops with inputdriven from the programmable array logic

a full adder in PAL

Page 6: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

6/32

Complex Programmable Logic Devices

If several PLDs, along with some flip-flops, are putinto a single IC, we have a complex programmablelogic device (CPLD) that can be used to implement asmall digital system

Example: Xilinx CoolRunner Macrocell (MUXs and buffers)

PAL block

Page 7: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

7/32

Field Programmable Gate Arrays

The basic ideas of FPGA’s is to inter-connect small“truth tables”to form complex digital circuits

10000

11111. . .. . .00001

QABCDOutputInputs

00000

11111. . .. . .00001

QABCDOutputInputs

00000

11111. . .. . .10001

QABCDOutputInputs

. . .

table 3

table 1

table 2

Page 8: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

8/32

Logic Design with FPGA

A digital design on FPGA is composed of three parts: Logic elements Interconnect I/O blocks (IOB)

An FPGA configuration is similar to a program formicroprocessor Specifies “functional units”

and “interconnects”betweenfunctional units

LE LE LE

LE LE LE

LE LE LE

IOB

IOB

IOB

IOB

IOB

IOB

Interconnect

Interconnect

Page 9: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

9/32

CPU v.s. FPGA

Microprocessor & FPGAs are programmed indifferent ways

CPU

memory

instructions

data

logic logic

logic logic

FPGA program bits

Page 10: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

10/32

Logic Elements

Logic element (LE) is more capable than logic gates A simple LE can be programmed to behave as an

n-input, m-output function (for example, n = 4, m = 1); suchLE’s are called “fine-grained”LE’s (relatively speaking,these LE are “coarse”compared to a gate, for example)

Many FPGAs include distributed register bits around the LE

An FPGA may provide specialized complex LEblocks, such as multipliers, SRAMs, or processors These are all called coarse-grained LEs

A “platform-FPGA”is composed of both fine-grainedand coarse-grained LEs

Page 11: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

11/32

Generic Logic Elements

Example of fine-grain logic element structure

11 1 1 1. . .

00 0 0 110 0 0 0

outinputs

LE LE LE

LE LE LE

LE LE LE

IOB

IOB

IOB

IOB

IOB

IOB

Interconnect

Interconnect

Logic Element

Lookup Table(LUT) D Q

configuration bit

LE out

Page 12: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

12/32

Function Implementation with LUT

The datapath that implements F = ABC + ABC+ ABis as follows, the LUT4 has entries as follows:

A function with more than 4 variables can always bedecomposed to the sum (OR) of 4-variable function

00

00000001

11111. . .. . .

11

00100011

FX1 X2 X3 X4

LUT4 table entries(red means don’t care)

Page 13: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

13/32

Carry Chains in FPGA

Since addition is a very important operation, manyFPGAs have a dedicated circuitry for carry bitcalculation and propagation.

Page 14: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

14/32

Example: Spartan 2 Architecture (1/2)

A Xilinx Spartan device is composed of a 2-D array ofConfigurable Logic Blocks (CLB)

Page 15: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

15/32

Example: Spartan 2 Architecture (2/2)

In Spartan II, each CLB has two identical slices; eachslice contains two logic cells with a LUT, carry logic,and a register

F5IN

G4

G3

G2

G1

LookupTable

LookupTable

BYSR

F4

F3

F2

F1

BXCE

CLK

CIN

COUT

carry/controllogic

carry/controllogic

QD

QD

Y

YB

YQ

XXB

XQ

Page 16: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

16/32

Example: Spartan 2 I/O Blocks

Supports multiple I/O standards (PCI, AGP, etc.)

Page 17: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

17/32

Logic Implementation on FPGA

Logic synthesis How do we breakdown a function and map it to logic

elements? How do we implement an operation within a logic element?

Logic placement Where do we put each piece of logic in the array of logic

elements?

LE LE LE

LE LE LE

LE LE LE

Page 18: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

18/32

Interconnect Architecture

On an FPGA, we must be able to control Connections from wiring channels to LEs Connections between wires in the wiring channels

LE LE

Wiring channel

channel channel

chan

nel

chan

nel

Page 19: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

19/32

Wiring among LEs is organized into channels Channels are arranged horizontally and vertically on the chip There are many wires per channel

Connections between wires made at programmableinterconnection points

An EDA tool must choose: Channels from source to

destination Wires within the channels

Programmable Wiring

LE LE LE

LE LE LE

LE LE LE

LE

LE

LE

horizontal channel 2

vert

ical

chan

nel1

vert

ical

chan

nel5

vert

ical

chan

nel3

horizontal channel 3

Page 20: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

20/32

Programmable vs Fixed Interconnect

Compares to the wiring of fixed layout in a customlogic, there are two major disadvantages of FPGAinterconnect: Switch adds delay

FPGA interconnect has extra length The problem becomes worse as the logic becomes larger

D Q

Page 21: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

21/32

Interconnect Strategies

Types of wires: Short wires: local LE connections Global wires: long-distance, buffered communication Special wires: clocks, etc

Use design hierarchy to guide placement searchUse hard macros where possible

A macro is a larger modules designed to fit into a particularFPGA (similar to IP blocks for platform-based SoC)

Hard macro includes placement Soft macro does not include placement

Add placement constraints

Page 22: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

22/32

FPGAs and I/O Pins

Chip capacity is growing faster than package pinout Now, we can put many hardware functions in an FPGA,

but the total number of I/O pins is limited Must try to share a small amount of interface pins among

functions

Alternatively, one can use multiple smaller FPGAs tocompose same functions It’s harder to breakdown a design across FPGAs The performance may be better due to shorter routing

lengths

Page 23: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

23/32

FPGA Configuration Technologies

FPGA’s logic elements, interconnect switch, and I/Opins can be programmed using one of the followingthree technologies: SRAM-based

Can be programmed many times Must be programmed after power-up

Antifuse-based Programmed once via a burn-in step

Flash-based Similar to SRAM but using flash memory

Page 24: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

24/32

SRAM-based FPGAs

Program logic functions and interconnect usingSRAM to store boolean table and on/off state

Advantages: Re-programmable dynamically reconfigurable uses standard processes

Disadvantages: SRAM burns power Configuration lost at power-down (but not on reset!) Possible to steal, disrupt configuration bits

Just like piracy & virus issues of software

Page 25: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

25/32

Configuring SRAM-based FPGA

There are several ways to configure an FPGA JTAG interface, not good for “turn-key”systems FPGA in master mode, read configuration data from PROM FPGA in slave mode, microcontroller configures an FPGA

Page 26: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

26/32

Features of SRAM-based LUT

n-input LUT can handle function of 2n inputs All logic functions take the same amount of space All functions have the same delay

With CMOS custom logic, XOR is much slower than NAND;with SRAM LUT, XOR is as fast (slow) as NAND

SRAM is larger than static gate equivalent of function “Gate-count”is not a good measure for FPGA logic cost

For static gate, n input NAND/NOR gate has 2n transistors For FPGA LE, 4-input LUT has 128 transistors in SRAM, 96 in

multiplexer

Burns power even at idle

Page 27: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

27/32

Platform FPGAs

A complex system must be composed of hardwareand software components

To reduce system development/integration time,some chip companies starts to push “Platform FPGA”visions

Two examples: Xilinx has Virtex II Pro that provides PowerPC-based

platform FPGA Altera has Excalibur that features ARM-based platform

FPGA (a.k.a. System-on-Programmable-Chip, SoPC)

Page 28: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

28/32

Xilinx Platform FPGA Vision

Processing Platform: PowerPC D/I Caches Controllers Interfaces

DSP Platform: Distributed RAM 1818 Multipliers 600 Billion MACs/sec

Connectivity Platform: 100+ Gb Bandwidth I/O interfaces of the chip Rocket I/O (3.125 Gbps serial port) Hi-speed parallel

Page 29: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

29/32

Four Generations of Virtex Devices

1985 1992 2000 2002

Dev

ice

Com

plex

ity

Glue Logic

System-LevelFunction Blocks

XC2000-XC3000 XC4000, Virtex Virtex-II

PlatformFPGA

Virtex-II Pro,Virtex-4, Virtex-5

Platform forProgrammableSystems

Page 30: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

30/32

Example: Platform FPGA Systems

A platform implementation with remote configurationcapabilities†

†K. Park and H. Kim, Remote FPGA Reconfiguration Using MicroBlaze or PowerPC Processors, XApp 441, Sep. 2006

Page 31: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

31/32

FPGA Implementation Process

Step1: Design Design entry methods: HDL (Verilog or VHDL) or schematic

drawings

Step 2: Create netlist (synthesis) Translates V, VHD, SCH files into the standard format EDIF

file

Step 3: Physical design (Implementation) Translate, map, place & route the netlist into the target

device configuration bits

Step 4: Configure the FPGA Download BIT file into the FPGA

Page 32: FPGA for Complex System Implementationpeople.cs.nctu.edu.tw/~cjtsai/courses/soc/classnotes/soc11_04_FPGA... · FPGA for Complex System Implementation ... that can be used to implement

32/32

FPGA Design Flow

In this class, Xilinx ISE Foundation is used as theLogic design toolchain

Design Entry

Specification

Testbench Simulation

Synthesis

timing constraints

Place & Route

FPGA

bit file

Static timing analysis

Mapping