Circuit Performance and Adders

43
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 1 Circuit Performance and Adders Recap from last time Hardware Design is Complicated Because We Want Circuits to Go Fast Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph Delay of Circuit (= Clock Period) is longest path in graph Making Circuits Go Fast = Shortening Longest Path Exploit Asymmetry between path lengths Shorten Longest Path by Introducing Redundant Logic Moving Logic from Long to Short Paths We will see a different technique today!

description

Circuit Performance and Adders. Recap from last time Hardware Design is Complicated Because We Want Circuits to Go Fast Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph - PowerPoint PPT Presentation

Transcript of Circuit Performance and Adders

Page 1: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 1

Circuit Performance and Adders Recap from last time

Hardware Design is Complicated Because We Want Circuits to Go Fast

Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph Delay of Circuit (= Clock Period) is longest path in graph

Making Circuits Go Fast = Shortening Longest Path Exploit Asymmetry between path lengths Shorten Longest Path by

• Introducing Redundant Logic• Moving Logic from Long to Short Paths

We will see a different technique today!

Page 2: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 2

Delay Model of a Circuit

Translate circuit into graph

Weights on nodes are delay through gates

Delay through circuit is longest path through graph

Easy, linear-time algorithm

2

1

1

A

B

C

D

2

1

1

A

B

C

D

Page 3: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 3

Circuit Performance Model

Latches

Combinational Logic

Latches

Inputs stabilize at 0

Logic finishes when last output stabilizes

Page 4: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 4

Circuit Performance Model

Outputs of latches are stable only at clock edge

Inputs to latches must be stable by next clock edge Time between clock edges must be > delay of combinational logic

Latches

Combinational Logic

Latches

Page 5: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 5

Adders

Highly-Studied Circuit, so case study in design “Ripple-carry” adder: standard adder where carry ripples

from one bit to another Longest path for n-bit adder is O(n) Number of gates for n-bit adder is O(n)

“Carry Lookahead”: Accelerate carry chain Collapse carry into all bits O(log n) delay (optimal!) O(n^3) gates (terrible!)

Practical Compromise is block-accelerated adders Block-carry lookahead Carry-select adder

Page 6: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 6

Hierarchical Carry lookahead

PHG, GG used as propagate, generate inputs to hierarchical block

Carry Lookahead Block

m-bit CLA adderPP GG

m-bit CLA adderPG GG

m-bit CLA adderPG GG

PG0 GG0PG1

GG1PG2 GG2

Page 7: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 7

Synopsis of Hierarchical Carry-Lookahead

n-bit adder, m-bit blocks, n/m blocks Delay is 2 log n + 2 log m Size is max(nm^2, (n/m)^3) Best is m = n^2/5 Delay is 14/5 log n, size is O(n^9/5)

Page 8: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 8

Analysis of the Carry-Lookahead Adder

n bit adder, m-bit blocks, n/m blocks

Delay through the adder: 2 * delay through the lookahead block + delay through the super-lookahead block Lookahead block 2 log m Super-block: 2 log n/m = 2 log n – 2 log m Total: 2 log n + 2 log m

Logic: scales like the lookahead blocks Size p block: O(p^3) from before Two size of blocks: n/m blocks of size m, one block of size n/m Total: n/m * m^3 = nm^2, (n/m)^3 Choose m to minimize max(nm^2,(n/m)^3) Solution at m=n^(2/5). Total is n + n^3/5

Page 9: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 9

Carry Select Adder

“Combinational Speculative Execution”

Basic intuition: Adders spend time waiting to see what carry-in is

Therefore Go ahead and guess each way Pick the right answer when the carry comes by

Page 10: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 10

Carry-Select adder Each block is doubled

One block computes Carry-in=0, other carry-in=1 Actual carry-in (carry-out from previous block) computes result

m sum bits 1 carry-out bit

m-bit blockm-bit blockm-bit block

m

m

01

001 1

m

Block 1

Block 0

Page 11: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 11

Analysis of Carry-Select Adder Delay analysis: Worst-case path is through Block0 then control of multiplexer chain O(m) gates in Block0 O(p = n/m) gates in multiplexer chain

Block0Block10Block11Block20Block21Blockp0Blockp1

Choose m to minimize max(n/m, m) Minimum is to choose m= n

Page 12: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 12

Twelve-bit Carry-Select Example

Problem: add -3 (0xffd, 111111111101) to 17 (0x011, 000000010001))

Use 4-bit carry select blocks 1 d

1 f1 f

1 0

e00,0

0

0,1

Result is 0xe (14)

0 f0 f

1 0

0,f

0,0

0,0

0

Page 13: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 13

Hardware for the Carry Select Adder

n blocks, each of n gates Additional hardware is n multiplexers +

additional adder for each block but the first n - n additional adder bits Therefore n + 2n - n = 2n gates Exactly twice the size of an ordinary adder, but

delay is n instead of n

Page 14: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 14

Carry-Bypass Adder

Like the carry-select adder, has O(n) delay

But even more efficient (in terms of gates) than the carry-select Has only n + n log n gates

However, it broke every timing analyzer…

Instead of shortening the longest path, made it longer!

How can this be? Isn’t the delay of the circuit the length of the longest path?...

Page 15: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 15

What is the delay of the Circuit?

The delay of a circuit is the time that the last output settles

This can be the length of the longest path, but sometimes isn’t

The longest path is an upper bound on the delay of the circuit, but sometimes this isn’t tight

Page 16: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 16

Example

Long paths are from X,Y->out through bottom of circuit But no signal can travel down these paths!

zy

x 1

2

2

2

2

2

2

Page 17: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 17

Example

zy

x 1

2

2

2

2

2

2

11 1

t=0t=1 0

t=2

1

1

1

t=3

t=6

0

1

1

t=4

Page 18: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 18

Timing Analysis

zy

x 1

2

2

2

2

2

2x y z delay

0 0 0 6 (z->out)

0 0 1 5 (z->z’->out)

0 1 0 6 (z->out)

0 1 1 5 (z->z’->out)

1 0 0 6 (z->out)

1 0 1 6 (y->out)

1 1 0 6 (z->out)

1 1 1 6 (x,y->out)

Longest path is 8, but no signal ever travels down it!

Page 19: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 19

What happened?

Long Paths are false A->B requires z=1 B->C requires z=0

Conflict! No signal can propagate down this path This analysis doesn’t quite work

Analysis has to take into account delays Complete theory not understood till 1993

This is good enough for carry-bypass adder

zy

x 1

2

2

2

2

2

2

A

B

C

Page 20: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 20

Announcements

Prof. Pister will lecture on wireless protocol Thursday Need this for your project

Spring Break

Tuesday 4/1 – TBD

Thursday 4/3 – MT review

Tuesday 4/8 – MT 2

Page 21: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 21

False Paths and Adders

Key idea: Don’t make critical paths in adder short Idea behind Carry Lookahead and Carry-Select adders Instead, make long paths false

Critical Path is Through the Carry Chain Only exercised when propagate bit through every block

is set? (Question: is this likely?) Therefore: when signal would propagate through carry

chain, skip the block!

Recall from block carry-lookahead adder: Group Propagate PG = P0P1P2P3 When PG=1 have the carry skip the whole block!

Page 22: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 22

Carry-Skip Block

m-bit ripple-carry adder

PG

0

1

Carry-in

Carry-in to next block

Page 23: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 23

Suppose Carry-in Propagates to Carry-Out…

m-bit ripple-carry adder

PG

0

1

Carry-in

Carry-in to next block

Page 24: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 24

Then PG=1

m-bit ripple-carry adder

PG

0

1

Carry-in

Carry-in to next block

Page 25: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 25

So Path goes Through the 1-port of the MUX

m-bit ripple-carry adder

PG

0

1

Carry-in

Carry-in to next block

Delay is 1-MUX delay, not 4 propagate delays!

Page 26: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 26

Full Carry-Bypass Adder

Block 0PG

0

1

Carry-inBlock 1

0

1

Block n/m

0

1

As before, n/m array of m-bit blocks

Page 27: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 27

Full Carry-Bypass Adder: Worst-case path

Block 0PG

0

1

Carry-inBlock 1

0

1

Block n/m -1

0

1

Worst-case path goes through m-1 bits of block 0, n/m-2 1 gates of multiplexer, m-1 bits of block n/m -1

Page 28: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 28

Timing and Size Analysis

Delay = 2 * (m – 1) + n/m – 2 Choose m to minimize delay => m= n We have Delay = 2 * (n – 1) + n – 2 = 3 n – 4 What’s the additional circuitry?

log m gates to build PG (1 per block) 1 two-input multiplexer per block n/m blocks => n/m (log m + 1) m = n => n (log n/2 + 1)

Same delay as carry-select, but much smaller (n + n) vs 2n

Page 29: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 29

Full Carry-Bypass Adder: Longest path

Block 0PG

0

1

Carry-inBlock 1

0

1

Block n/m -1

0

1

Longest path goes through all blocks and all multiplexers: m * n/m + n/m

Page 30: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 30

Longest Path vs Circuit Delay

Longest Path is n + n

Worst-case path is n

Worst-case path for ripple-carry is n

Made things better, but a timing analyzer thinks it’s worse! Stimulated tremendous interest in timing analyzers!

Page 31: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 31

Adder Summary

Adder Delay Size

Ripple-Carry n n

Carry-Lookahead (full)

log n n^3

Carry lookahead

(block)

14/5 log n n + n^3/5

Carry Select n 2n

Carry-Bypass n n+n

Page 32: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 32

A comment on n

Asymptotic results tell us what happens at infinity

For our purposes, n=16, 32, 64 Means: square root n = 4 – 8 Means: Log n = 4-6

For the sizes we are interested in, carry-select and carry-bypass are as fast as block CLA

Page 33: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 33

Remaining Questions (just for fun)

How often does worst-case delay path occur in Carry-bypass adder?

How do we automatically analyze for false paths?

Page 34: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 34

How often does (near) worst-case delay occur?

Worst case delay: Pi = 1 for all i > j, small j Pi=Ai Bi

How often is Pi=Ai Bi = 1?

Ai Large Ai Small, Negative

Ai Small, Positive

Bi Large

Bi Small, Negative

Bi Small, Positive

Only two of nine cases, but they happen frequently

Page 35: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 35

How hard is it to analyze false paths?

Hard! Problem noticed in early timing verifiers in the 1970’s Early researchers (Hitchcock, Jouppi, Ousterhout) used

hand-done rules Often wrong (if it’s hard to analyze automatically, it’s hard

to guess right by hand) Next: “Static sensitization”

Assert “non-controlling’’ values on side inputs (0 for OR/NOR, 1 for AND/NAND)

Make sure assignments are consistent Problem: Values are changing!

Page 36: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 36

Example

To sensitize a->d->f->g, note: a->d requires b=1

But b=1 => e=0, and f->g requires b=1

Similar argument says you can’t set b->d->f->f

1

1

1

1a b

c

d

e

f

g

Page 37: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 37

But…

1

1

1

1a b

c

d

e

f

g

a b c d e f g

0 0 0 0

1 0 0 0 0 1

2 0 0 0 0 1 0

3 0 0 0 0 1 0 0

Delay of the circuit is 3!

Path a->d->f->g really was true

Page 38: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 38

Key Problem

All inputs are changing… a->d requires b=1 means b=1 stable at t=0 But b changes to 0 at t=0 Therefore, value of b is unknown (X)

Also, delays of gates are unknown “1” really means [0,1]

1

1

1

1a b

c

d

e

f

g

Page 39: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 39

Key Idea: Derive Function for each time

1

1

1

1a b

c

d

e

f

g

a b c d e f g

0

1

2

3

d= 1 at 1d = 0 at 1d = X at 1

Page 40: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 40

Key Idea: Derive Function for each time

1

1

1

1a b

c

d

e

f

g

a b c d e f g

0

1

2

3

(d= 1 at 1) = (a=1 at 0) and (b = 1 at 0)

Page 41: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 41

Key Idea: Derive Function for each time

1

1

1

1a b

c

d

e

f

g

a b c d e f g

0

1

2

3

(d= 0 at 1) = (a=0 at 0) or (b = 0 at 0)

Page 42: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 42

Key Idea: Derive Function for each time

1

1

1

1a b

c

d

e

f

g

a b c d e f g

0

1

2

3

(d= 0 at 1) = (d=1 at 1) nor (d = 1 at 0)

Page 43: Circuit Performance and Adders

CS 150 – Spring 2008 – Lec #15: Ckt Performance - 43

Delay of the Circuit

Delay of the Circuit is the latest t such that (“output = X at t”) is not == 0

Problem is NP-complete

Size of problem is linear in number of time slices x number of gates

Mathematical machinery fairly massive “Special Theory”: 1989 – handled symmetric gates, zero-lower-

bounded delays (all signals were X until they hit their final values) Other cases were conservatively approximated

“General Theory”: 1993 – handled all gates, general delay models Gave exact answers for all delay types

Still hasn’t quite reached industrial practice!