Complex Programmable Logic Device

42
ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ ece506 Lecture 3 Reconfigurable Architectures Ali Akoglu

description

ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 3 Reconfigurable Architectures Ali Akoglu. Complex Programmable Logic Device. Hierarchical design against size explosion of PLAs C ombinational logic with Flip Flops (registered output) - PowerPoint PPT Presentation

Transcript of Complex Programmable Logic Device

Page 1: Complex Programmable Logic Device

ECE 506

Reconfigurable Computing

http://www.ece.arizona.edu/~ece506

Lecture 3

Reconfigurable Architectures

Ali Akoglu

Page 2: Complex Programmable Logic Device

Complex Programmable Logic Device° Hierarchical design against size explosion of PLAs

• Combinational logic with Flip Flops (registered output)• Organized into logic blocks connected in an interconnect matrix • Usually enough logic for simple counters, state machines, decoders, etc.

Page 3: Complex Programmable Logic Device

Xilinx CoolRunner II CPLD° PLA and Macrocell combination ° 1.8V device, estimated power consumption of less than

100 micro amps° Up to 12,000 gates, 512 MacroCells

Page 4: Complex Programmable Logic Device

CPLD° Multiple Function Blocks (FBs) and I/O Blocks (IOBs)

• Fully interconnected (FB outputs and input signals to the FB Inputs)• Each FB provides programmable logic 54 inputs,18 outputs.

° The IOB provides buffering for device inputs and outputs. ° Output enable signals drive directly to the IOBs.

Page 5: Complex Programmable Logic Device

Function Block° Comprised of 18 independent macrocells,

• Each can implement a combinatorial or registered function. ° Logic within the FB is implemented using a sum-of-

products representation. • Fifty-four inputs (108 true and complement signals) into the

programmable AND-array to form 90 product terms. • Any number of these product terms, can be allocated to each

macrocell by the product term allocator.

How many product terms would you assign for each Macrocell?

Page 6: Complex Programmable Logic Device

Macrocell

Product Term Allocator selects: 5 product terms primary data inputs to the OR gate for combinatorial functions, as control inputs (clock, clock enable, set, reset, output en.)

configured for a combinatorial or registered function.

Page 7: Complex Programmable Logic Device

Product Term Allocator° Controls how the five direct product terms are assigned to each MC.

• For example, all five direct terms can drive the OR function .

Page 8: Complex Programmable Logic Device

Product Term Allocator

° Can re-assign other product terms within the FB to increase the logic capacity of a macrocell beyond five direct terms.

° Any macrocell requiring additional product terms can access uncommitted product terms in other macrocells within the FB.

° Up to 15 product terms can be available to a single macrocell with only a small incremental delay (tPTA)

Page 9: Complex Programmable Logic Device

Product Term Allocator

Page 10: Complex Programmable Logic Device

Product Term Allocator° Can re-assign product terms from any

macrocell within the FB by combining partial sums of products over several macrocells

• What is the incremental delay in this example 2tPTA

If all 90 product terms are available to any macrocell, what is the maximum incremental delay?

Page 11: Complex Programmable Logic Device

Programmability Options° PLDs, CPLDs have different types of programmability.

• initial programming and reprogramming

° One-time programmable: • device is programmed once and holds its programming "forever" • usually uses fuses to make/break links• not reusable, but usually the cheapest• discard device if changes are to be made

Page 12: Complex Programmable Logic Device

Programmability Options

° UV-Erasable (EPROM)• a floating gate positioned between regular MOS transistor control gate and the

channel.• floating gate is uncharged

° To program the cell: • a high voltage (e.g. 14 volts) applied to the control gate (drain is at ~12 volts). • causes current to flow between the source and drain.• accelerates electrons to high velocity and a small fraction of them traverse the

thin oxide and become trapped on the floating gate. • floating gate, surrounded by an insulating layer, becomes “permanently”

negatively charged and the transistor is permanently turned off.

° “Permanent” means about 10 years at 125 degrees C; at higher temperatures this time is reduced.

° Cells erased by Ultra-Violet (UV) light. • electrons on floating gates are excited and discharged to the substrate.

Page 13: Complex Programmable Logic Device

Programmability Options° Electrically Erasable (EEPROM)

• uses a floating gate structure with a control gate on top.• both erasing and reprogramming is accomplished with an

electrical current • device can be programmed/erased on circuit board, no special

packaging or IC socket is needed • erase time is much faster than UV erase • programming retained after power down

- non-volatile• programming/erasing limited to 1000s of cycles

Page 14: Complex Programmable Logic Device

Programmability Options° Electrically Erasable:

• both erasing and reprogramming is accomplished with an electrical current

• device can be programmed/erased on circuit board, no special packaging or IC socket is needed

• erase time is much faster than UV erase • programming retained after power down

- non-volatile• programming/erasing limited to 1000s of cycles

Page 15: Complex Programmable Logic Device

Electrically Erasable PLDs° Conventional PLDs are either

• One-time programmable• UV Erasable

° Must be placed in a programmer to program them° EE PLDs can be programmed and erased in place

• A small (four wire) connection to a computer is needed• Once programmed, will retain program indefinitely• Never have to take the chip out of its circuit

Page 16: Complex Programmable Logic Device

FPGA° Introduced in 1985 by Xilinx ° Similar to CPLDs ° A function to be implemented in FPGA

• Partitioned into modules , each implemented in a logic block. • Logic blocks connected with the programmable interconnection.

Page 17: Complex Programmable Logic Device

FPGA Technology° 1) Antifuse-based

• Realization of interconnections

° 2) Memory-based. • realization of interconnections and computation • FLASH, SRAM

Page 18: Complex Programmable Logic Device

FPGA Technology° Antifuse FPGAs:

• configured by burning a set of fuses. • once configured, cannot be altered any more• bug fixes and updates possible for new PCBs, but hardly for already

manufactured boards.• ASIC replacement for small volumes.

° Flash FPGAs• may be re-programmed several thousand times and are non-volatile• Expensive, re-configuration takes several seconds

° SRAM FPGAs• dominating technology• unlimited re-programming• additional circuitry is required to load the configuration into the FPGA

after power on• re-configuration is very fast, • Some devices allow even partial re-configuration during operation

Page 19: Complex Programmable Logic Device

Antifuse (Actel FPGA)° An antifuse is normally an open circuit. ° Two-terminal elements connected to upper and lower

layer of the antifuse, in the middle is a dielectric (Oxygen-Nitrogen-Oxygen, ONO) layer

° Initial state: • High resistance of dielectric does not allow any current to flow.

° Applying a high voltage: • causes large power dissipation and melts the dielectric• Drastically reduces the resistance • a link can be built, which permanently connects the two layers.

Page 20: Complex Programmable Logic Device

Antifuse chips

° Advantage ! • Small area

- With metal-to-metal anti-fuses, no silicon area is required to make connections, decreasing the area overhead of programmability.

• Much lower resistance and parasitic capacitance over transistors.- possible to include more switches per device- reduces the RC delays in the routing.

• No bitstream can be intercepted in the field (no bitstream transfer)- Need a Scanning Electron Microscope to try to know antifuse

states (an Actel AX2OOO antifuse FPGA contains 53 million antifuses with only 2-5% programmed in an average design)

• Interconnect structure is naturally “rad hard,” - relatively immune to the effects of radiation (except flip-flops!),- SRAM-based component can be “flipped” if hit by radiation

Page 21: Complex Programmable Logic Device

Antifuse chips

° Disadvantage !• not suitable for devices that must be frequently reprogrammed• one-time programmable FPGAs.• special programmers must be used to program a device before it is

mounted on a final product• involves significant changes to the properties of the materials in the

fuse, - leads to scaling challenges when new IC fabrication processes

are considered

Page 22: Complex Programmable Logic Device

Programmability Options° Static Random Access Memory (SRAM) Programming:

• Switch is a pass transistor controlled by the state of the SRAM bit• Logic block configuration bits are stored in SRAM • can be reprogrammed infinite number of times• use of standard CMOS process technology

- SRAM cells are created using exactly the same CMOS technologies as the rest of the device,

- No special processing steps are required in order to create these components.

- benefit from the increased integration, higher speeds and lower dynamic power consumption of new processes with smaller minimum geometries.

Page 23: Complex Programmable Logic Device

Programmability Options° SRAM Volatility

• programming contents NOT retained after power down• external non-volatile memory device required on power up

° SRAM Size• SRAM cell requires either 5 or 6 transistors and the

programmable element used to interconnect signals requires at least a single transistor.

° SRAM Security• Since the configuration information must be loaded into the

device at power up, there is the possibility that the configuration information could be intercepted and stolen for use in a competing system.

Page 24: Complex Programmable Logic Device

Programmability Options° Flash Programming:

• alternative that addresses some of the shortcomings of SRAM

° Use of floating gate programming technologies• inject charge onto a gate that “floats” above the transistor.

° Non-volatile • eliminates the need for the external storage for configuration data• can function immediately upon power-up

° Area efficiency• Area overhead: The programming circuitry (high and low voltage buffers)

needed to program the cell, • Cost is relatively modest as it is amortized across numerous

programmable elements.

Page 25: Complex Programmable Logic Device

Programmability Options° Cannot be reprogrammed an infinite number of times.

• Charge buildup in the oxide eventually prevents a flash-based device from being properly erased and programmed

° Non-standard CMOS process.• around five additional process steps on top of standard CMOS • behind SRAM-based devices by one or more generations. 

° Programming time is about three times that of an SRAM-based component.

° High resistance and capacitance due to the use of transistor-based switches.

° Solution: on-chip flash memory to provide non-volatile storage with SRAM cells to control the programmable elements in the design.

Page 26: Complex Programmable Logic Device

Programmability Options° An ideal technology

• non-volatile• reprogrammable • using a standard CMOS process • offer low on resistances and low parasitic capacitances.

Page 27: Complex Programmable Logic Device

FPGA Components°How can we implement any circuit in an FPGA?

• Example: Half adder- Combinational logic represented by truth table- What kind of hardware can implement a truth table?

Input Out

A B S

0 0 0

0 1 1

1 0 1

1 1 0

Input Out

A B C

0 0 0

0 1 0

1 0 0

1 1 1

Page 28: Complex Programmable Logic Device

FPGA Components° Lookup Table (LUT)

° Implement truth table in small memories (LUTs)

• Usually SRAM

° A function is implemented by writing all possible values that the function can take in the LUT

° The inputs values are used to address the LUT and retrieve the value of the function corresponding to the input values

A B S

0 0 0

0 1 1

1 0 1

1 1 0

A B C

0 0 0

0 1 0

1 0 0

1 1 1

0

1

1

0

Addr

Output

0

0

0

1

Output

2-input, 1-output LUTs

00

01

10

11

00

01

10

11

A

B

AddrA

B

S C

Page 29: Complex Programmable Logic Device

FPGA Components° Alternatively, could have used a 2-input, 2-output LUT

• Outputs commonly use same inputs

0

1

1

0

S

0

0

0

1

C

0

1

1

0

S

0

0

0

1

C

00

01

10

11

00

01

10

11

00

01

10

11

AddrA

B

AddrA

B

AddrA

B

Page 30: Complex Programmable Logic Device

FPGA Components° Slightly bigger example: Full adder

• Combinational logic can be implemented in a LUT with same number of inputs and outputs

- 3-input, 2-ouput LUT

Inputs Outputs

A B Cin S Cout

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 1 1 0 1

1 0 0 1 0

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

0 0

1 0

1 0

0 1

1 0

0 1

0 1

1 1

A

B

Cin

S Cout

Truth Table 3-input, 2-output LUT

Page 31: Complex Programmable Logic Device

FPGA Components° LUT Example: Implement the function

ABD+BCD+ABC • 2-input LUTs • 3-input LUTs • 4-input LUTs

Page 32: Complex Programmable Logic Device

FPGA Components° LUTs are used as function generators ° How many SRAM locations does a k-input LUT have?° How many different functions can a k-input LUT

implement? 0

1

1

0

S

0

0

0

1

C

01

10

11

AddrA

B

00

2k

22k

Page 33: Complex Programmable Logic Device

FPGA Components° Why aren’t FPGAs just a big LUT?° Size of truth table

• grows exponentially based on # of inputs• 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc.• Same number of rows in truth table and LUT• LUTs grow exponentially based on # of inputs

° Number of SRAM bits in a LUT • = 2i * o• i = # of inputs, o = # of outputs• Example: 64 input combinational logic with 1 output would

require 264 SRAM bits- 1.84 x 1019

° Clearly, not feasible to use large LUTs• So, how do FPGAs implement logic with many inputs?

Page 34: Complex Programmable Logic Device

FPGA Components° Fortunately, we can map circuits onto multiple LUTs

• Divide circuit into smaller circuits that fit in LUTs (same # of inputs and outputs)

• Example: 3-input, 2-output LUTs

Page 35: Complex Programmable Logic Device

FPGA Components° Large LUTs

• Fast when using all inputs• Wastes transistors otherwise

° Must also consider total chip area• Wasting transistors may be ok if there are plenty of LUTs

Page 36: Complex Programmable Logic Device

FPGA Components° What if circuit doesn’t map perfectly?

• More inputs in LUT than in circuit- Truth table handles this problem

• More outputs in LUT than in circuit- Extra outputs simply not used

– Space is wasted, so should use multiple outputs whenever possible

° Important Point• The number of gates in a circuit has no effect on the mapping

into a LUT- All that matters is the number of inputs and outputs- Unfortunately, it isn’t common to see large circuits with a

few inputs

1 gate 1,000,000 gates

Page 37: Complex Programmable Logic Device

FPGA Components° LUT-Realization

° A LUT is basically a multiplexer that evaluates the truth table stored in the configuration SRAM cells (can be seen as a one bit wide ROM).

Page 38: Complex Programmable Logic Device

° QUIZ2

Page 39: Complex Programmable Logic Device

FPGA Components° Example:

• Determine best LUTs for following circuit- Choices

– 4-input, 2-output LUT (delay = 2 ns)– 6-input, 2-output LUT (delay = 3 ns)

- Assume each SRAM cell is 6 transistors– 4-input LUT = 6 * 24 * 2 = 192 transistors– 6-input LUT = 6 * 26 * 2 = 384 transistors

Page 40: Complex Programmable Logic Device

FPGA Components° Example:

• Determine best LUTs for following circuit- Choices

– 4-input, 2-output LUT (delay = 2 ns)– 6-input, 2-output LUT (delay = 3 ns)

- Assume each SRAM cell is 6 transistors– 4-input LUT = 6 * 24 * 2 = 192 transistors– 6-input LUT = 6 * 26 * 2 = 384 transistors

6-input LUT

Propagation delay = 3 ns

Total transistors = 384

Page 41: Complex Programmable Logic Device

FPGA Components° Example:

• Determine best LUTs for following circuit- Choices

– 4-input, 2-output LUT (delay = 2 ns)– 6-input, 2-output LUT (delay = 3 ns)

- Assume each SRAM cell is 6 transistors– 4-input LUT = 6 * 24 * 2 = 192 transistors– 6-input LUT = 6 * 26 * 2 = 384 transistors

4-input LUT

Propagation delay = 4 ns

Total transistors = 384 transistors

6-input LUTs are 1.3x faster and use same area

Page 42: Complex Programmable Logic Device

FPGA Components° Problem: How to handle sequential logic

• Truth tables don’t work

° Possible solution: • Add a flip-flop to the output of LUT

° BLEs: the basic logic element • Circuit can now use output from LUT or from FF• Where does select come from?