L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful...

25
L15: 6.111 Spring 2004 1 Introductory Digital Systems Laboratory L15: Custom and ASIC VLSI Integration L15: Custom and ASIC VLSI Integration Acknowledgement: J. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 2003. Curt Schurgers 0.0000001 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 10 '68 '70 '72 '74 '76 '78 '80 '82 '84 '86 '88 '90 '92 '94 '96 '98 '00 '02 $ $ Average Cost of one transistor Gordon Moore, Keynote Presentation at ISSCC 2003

Transcript of L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful...

Page 1: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 1Introductory Digital Systems Laboratory

L15: Custom and ASIC VLSI IntegrationL15: Custom and ASIC VLSI Integration

Acknowledgement:

J. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 2003.

Curt Schurgers

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

'68 '70 '72 '74 '76 '78 '80 '82 '84 '86 '88 '90 '92 '94 '96 '98 '00 '02

$$

Ave

rage

Cos

t of o

ne

tran

sist

or

Gordon Moore, KeynotePresentation at ISSCC 2003

Page 2: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 2Introductory Digital Systems Laboratory

Moore’s LawMoore’s Law

In 1965, Gordon Moore was preparing a speech and made a memorable observation. When he started to graph data about the growth in memory chip performance, he realized there was a striking trend. Each new chip contained roughly twice as much capacity as its predecessor, and each chip was released within 18-24 months of the previous chip. If this trend continued, he reasoned, computing power would rise exponentially over relatively brief periods of time.

16151413121110

9876543210

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

LOG

2 OF

THE

NU

MB

ER O

FC

OM

PON

ENTS

PER

INTE

GR

ATE

D F

UN

CTI

ON

Electronics, April 19, 1965.

Source: Dataquest/Intel, 12/02Source: Dataquest/Intel, 12/02

'68 '70 '72 '74 '76 '78 '80 '82 '84 '86 '88 '90 '92 '94 '96 '98 '00 '02

1017

1016

1015

1014

1013

1012

1011

1010

109

Tran

sist

ors

Ship

ped

Per Y

ear

1018

“…E.O. Wilson, the famous Harvard biologist who is an expert on ants, estimates that there are 10 to the 16th and 10 to the 17th ants on earth. But if you look at this curve, this year we’re making one transistor for every ant.” –Gordon Moore, “An Update on Moore’s Law”

Gordon Moore, KeynotePresentation at ISSCC 2003

40048008

80808085 8086

286386

486Pentium® proc

P6

0.001

0.01

0.1

1

10

100

1000

1970 1980 1990 2000 2010Year

Tran

sist

ors

(MT)

2X growth in 1.96 years!

Courtesy of S. Borkar (Intel)

Page 3: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 3Introductory Digital Systems Laboratory

Layout 101Layout 101

N-channel MOSFET P-channel MOSFET

GND

VDD

metal poly p+ diff

contactfrommetalto ndiff

Ln

Wn

Lp

Wp

IN OUT

n-type well

p-type substrate

metal/pdiffcontact

n+ diff

IN OUT

VDD

SG

D

D

SG

3-D Cross-Section

Circuit Representation

Layout

Follow simple design rules (contractbetween process and circuit designers)

Page 4: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 4Introductory Digital Systems Laboratory

Custom Design/LayoutCustom Design/Layout

Adder stage 1

Wiring

Adder stage 2

Wiring

Adder stage 3B

it slice 0

Bit slice 2

Bit slice 1

Bit slice 63

Sum Select

Shifter

Multiplexers

Loopback Bus

From register files / Cache / Bypass

To register files / Cache

Loopback Bus

Loopback Bus

Die photograph of the Die photograph of the Itanium integer Itanium integer datapathdatapath

BitBit--slice Design Methodologyslice Design Methodology

Hand crafting the layout to achieve maximum clock rates (> 1Ghz)Exploits regularity in datapath structure to optimize interconnects

9-1

Mux

9-1

Mux

5-1

Mux

2-1

Mux

ck1

CARRYGEN

SUMGEN+ LU

1000um

b

s0

s1

g64

sum sumb

LU : LogicalUnit

SUM

SEL

a

to Cache

node1

REG

Itanium has 6 integer execution units like thisItanium has 6 integer execution units like this

Page 5: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 5Introductory Digital Systems Laboratory

The ASIC ApproachThe ASIC Approach

Verilog (or VHDL )Verilog (or VHDL )

Logic SynthesisLogic Synthesis

FloorplanningFloorplanning

PlacementPlacement

RoutingRouting

Tape-out

Circuit Extraction

Circuit Extraction

Pre-Layout Simulation

Pre-Layout Simulation

Post-Layout Simulation

Post-Layout Simulation

StructuralStructural

PhysicalPhysical

BehavioralBehavioralDesign Capture

Des

ign

Itera

tion

Des

ign

Itera

tion

Most Common Design Approach for Designs up to 500Mhz Clock Rates

Page 6: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 6Introductory Digital Systems Laboratory

Standard Cell ExampleStandard Cell Example

3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time

Power Supply Line (VDD)

Ground Supply Line (GND)

Each library cell (FF, NAND, NOR, INV, etc.) and the variations on size (strength of the gate) is fully characterized across temperature, loading, etc.

Delay in (ns)!!

Page 7: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 7Introductory Digital Systems Laboratory

Standard Cell Layout MethodologyStandard Cell Layout Methodology

Cell-structure hidden under interconnect layers

2-level metal technology Current Day Technology

With limited interconnect layers, dedicated routing channels between rows of standard cells are needed

Width of the cell allowed to vary to accommodate complexityInterconnect plays a significant role in speed of a digital circuit

Page 8: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 8Introductory Digital Systems Laboratory

module adder64 (a, b, sum); input [63:0] a, b; output [63:0] sum;

assign sum = a + b;endmodule

VerilogVerilog to ASIC Layout to ASIC Layout (the push button approach)(the push button approach)

After Synthesis

After Placement

After Routing

Page 9: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 9Introductory Digital Systems Laboratory

The “Design Closure” ProblemThe “Design Closure” Problem

Iterative Removal of Timing Violations (white lines)

Wire-to-wire capacitance causes inter-wire delay dependencies

BUS

d2

d1

l2

l1

VDD

CL

CL

CI

5==L

I

CCλ

Page 10: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 10Introductory Digital Systems Laboratory

Macro ModulesMacro Modules

256×32 (or 8192 bit) SRAM Generated by hard-macro module generator

Generate highly regular structures (entire memories, multipliers, etc.) with a few lines of code

Verilog models for memories automatically generated based on size

Page 11: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 11Introductory Digital Systems Laboratory

Clock DistributionClock Distribution

For 1Ghz clock, skew budget is 100ps.Variations along different paths arise

from:• Device: VT, W/L, etc.• Environment: VDD, °C• Interconnect: dielectric thickness

variation

D Q

D Q

Clock skew, courtesy Alpha

IBM Clock Routing

Page 12: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 12Introductory Digital Systems Laboratory

The Power Supply Wires are Not Ideal!The Power Supply Wires are Not Ideal!

GROUND GRID

PadPad

Driver

Receiver

Rd

Cd

Ccoup

Cint

To VDD Grid

To VDD Grid

To VDD Grid

The IR-drop problem causes internal power supply voltage to be less than the external source

Courtesy David Blaauw

Page 13: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 13Introductory Digital Systems Laboratory

Analog Circuits: Clock Frequency Analog Circuits: Clock Frequency Multiplication (Phase Locked Loop)Multiplication (Phase Locked Loop)

Courtesy M. PerrottCourtesy M. Perrott

VCO produces high frequency square waveDivider divides down VCO frequencyPFD compares phase of ref and divLoop filter extracts phase error information

Used widely in digital systems for clock synthesis(a standard IP block in most ASIC flows)

PLL

Intel 486, 50Mhz

up

down

Page 14: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 14Introductory Digital Systems Laboratory

There are a large number of implementations of the same functionality

These implementations present a different point in the area-time-power design space

Behavioral transformations allow exploring the design space a high-level

Optimization metrics:

area

time

power1. Area of the design2. Throughput or sample time TS3. Latency: clock cycles between

the input and associated output change

4. Power consumption5. Energy of executing a task6. …

Behavioral TransformationsBehavioral Transformations

Page 15: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 15Introductory Digital Systems Laboratory

FixedFixed--Coefficient MultiplicationCoefficient Multiplication

Z0Z1Z2Z3Z4Z5Z6Z7

X0 · Y3X1 · Y3X2 · Y3X3 · Y3

X0 · Y2X1 · Y2X2 · Y2X3 · Y2

X0 · Y1 X1 · Y1X2 · Y1X3 · Y1

X0 · Y0 X1 · Y0X2 · Y0X3 · Y0

Y0Y1Y2Y3

X0X1X2X3

Z = X · Y

Z0Z1Z2Z3Z4Z5Z6Z7

X0X1X2X3

X0X1X2X3

1001X0X1X2X3

Conventional Multiplication

Z = X · (1001)2

Constant multiplication (become hardwired shifts and adds)

X Z

<< 3Y = (1001)2 = 23 + 20

shifts using wiring

Page 16: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 16Introductory Digital Systems Laboratory

Transform: Canonical Signed Digits (CSD)Transform: Canonical Signed Digits (CSD)

10 11…1

1 110 1101 0 -110 0011

Canonical signed digit representation is used to increase the number of zeros. It uses digits {-1, 0, 1} instead of only {0, 1}.

Iterative encoding: replace string of consecutive 1’s

2N-2 + … + 21 + 20

01 -10…0

2N-1 - 20

01101111

10010001

=

0 -101 00-10

Worst case CSD has 50% non zero bits

X << 7 Z

<< 4Shift translates to re-wiring

Page 17: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 17Introductory Digital Systems Laboratory

Algebraic TransformationsAlgebraic Transformations

A B B A

Commutativity

Associativity

A

CB

C

A B

Distributivity

A BA B

Common sub-expressions

C

A BA C B

A + B = B + A (A + B) C = AB + BC

(A + B) + C = A + (B+C)

X YX Y X

Page 18: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 18Introductory Digital Systems Laboratory

Transforms for Efficient Resource UtilizationTransforms for Efficient Resource Utilization

CA B FD E

2

1

IG H

2

1

CA B

distributivity

Time multiplexing: mapped to 3 multipliers and 3 adders

Reduce number of operators to 2 multipliers and 2 adders

FD E IG H

Page 19: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 19Introductory Digital Systems Laboratory

A Very Useful Transform: RetimingA Very Useful Transform: Retiming

Retiming is the action of moving delay around in the systemsDelays have to be moved from ALL inputs to ALL outputs or vice versa

D

D

D

D

D

Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint partitions of these edges being cut. To retime, delays are moved from the ingoing to the outgoing edges or vice versa.

Benefits of retiming:• Modify critical path delay• Reduce total number of registers

D

D

D

Page 20: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 20Introductory Digital Systems Laboratory

Retiming Example: FIR FilterRetiming Example: FIR Filter

associativity of the addition

D D Dx(n)

h(0) h(1) h(2) h(3)

y(n)

D D Dx(n)

h(0) h(1) h(2) h(3)

y(n)

D D D

x(n)

h(0) h(1) h(2) h(3)

y(n)

retime

Direct form

Transposed form

Symbol for multiplication

∑=

⋅−=⊗=K

iihinxnxnhny

0

)()()()()(

(10)

(4)

Tclk = 22 ns

Tclk = 14 ns

Note: here we use a first cut analysis that assumes the delay of a chain of operators is the sum of their individual delays. This is not accurate.

Page 21: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 21Introductory Digital Systems Laboratory

Pipelining, Just Another TransformationPipelining, Just Another Transformation(Pipelining = Adding Delays + Retiming)(Pipelining = Adding Delays + Retiming)

How to pipeline:1. Add extra registers at

all inputs2. Retime

D

D

D

D

D

D

D

D

D

retime

add input registers

Contrary to retiming, pipelining adds extra registers to the system

Page 22: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 22Introductory Digital Systems Laboratory

The Power of Transforms: The Power of Transforms: LookaheadLookahead

D

x(n) y(n)

A

2D

x(n) y(n)

D AA

2D

x(n) y(n)

DAAA

2D

x(n) y(n)

DA2A

D

x(n) y(n)

A2

A DD

loop unrolling

distributivity

associativity

retiming

precomputed

y(n) = x(n) + A[x(n-1) + A y(n-2)]

y(n) = x(n) + A y(n-1)

Try pipeliningthis structure

How about pipeliningthis structure!

Page 23: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 23Introductory Digital Systems Laboratory

Scan TestingScan Testing

CLK

shift inScanShift

shift out

01

ScanShift

shift in

01

ScanShift

... Idea: have a mode in which all registers are chainedinto one giant shift register which can be loaded/read-out bit serially. Test remaining (combinational)logic by

(1) in “test” mode, shift in new values for allregister bits thus setting up the inputs to thecombinational logic

(2) clock the circuit once in “normal” mode, latchingthe outputs of the combinational logic back intothe registers

(3) in “test” mode, shift out the values of allregister bits and compare against expectedresults.

Courtesy of C. Terman and IEEE Press

Page 24: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 24Introductory Digital Systems Laboratory

Trends: “Chip in a Day”Trends: “Chip in a Day”((Matlab/SimulinkMatlab/Simulink to Silicon…)to Silicon…)

Mult2

Mac2Mult1 Mac1

S reg X reg Add,Sub,Shift

Courtesy of R. Brodersen

Map algorithms directly to silicon - bypass writing Verilog!

Page 25: L15: Custom and ASIC VLSI Integrationweb.mit.edu/6.111/www/s2004/LECTURES/l15.pdf · A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems

L15: 6.111 Spring 2004 25Introductory Digital Systems Laboratory(courtesy of G. (courtesy of G. QuQu, M. , M. PotkonjakPotkonjak))

Trends: Watermarking of Digital Designs Trends: Watermarking of Digital Designs

Fingerprinting is a technique to deter people from illegally redistributing legally obtained IP by enabling the author of the IP to uniquely identify the original buyer of the resold copy. The essence of the watermarking approach is to encode the author's signature. The selection, encoding, and embedding of the signature must result in minimal performance and storage overhead.

same functionality, same area, same performance watermark of 4768 bits embedded