Full Chip Analysis Chung-Kuan Cheng Computer Science and Engineering Department University of...

Post on 24-Dec-2015

226 views 1 download

Tags:

Transcript of Full Chip Analysis Chung-Kuan Cheng Computer Science and Engineering Department University of...

Full Chip Analysis

Chung-Kuan Cheng

Computer Science and Engineering Department

University of California, San Diego

La Jolla, CA 92093-0114

Kuan@cs.ucsd.edu

2

Outlines

I. IntroductionII. Circuit Level AnalysisIII. Logic Level Analysis

I. Timing AnalysisII. Functional Analysis

IV. Mixed Signal AnalysisV. Research DirectionsVI. Conclusion

3

I. Introduction

1. Trends of On-Chip Technologies

2. Statistics About Design Flaws

3. Spectrum of Analysis

4

I.1 Trends of On-Chip Technologies

System: Huge Numbers of Devices and Wires

Power/Ground Distribution: Low Voltage, High Current

Wires: Lateral Coupling, Fragmented ParasiticsDevices: Modeling, NoiseMixed Signal Design: RF+Analog+Digital

5

6

7

Power/Ground Distribution (ITRS)

2002 2003 2004 2005

Supply

Voltage(V)

1.5 1.5 1.2 1.2

Max

Power

130 140 150 160

On-Chip

Freq(MHz)

1,600 1,724 1,857 2,000

Off-Chip

Freq(MHz)

885 932 982 1,035

Lower V margin: Higher I & Inductance x Freq.

8

Static Vs. Dynamic Voltage Drop

Current envelope Dynamic - peak current

nT (n+1)T (n+2)T

Static- average current

Wire sizing can be used to control static drop Precise de-cap insertion filters peak current spikes Courtesy of Apache

9

Flip Chip Dynamic Effects

F:MHz

IRStatic vs.Dynamic

Ri(t) Ldi/dt

9.9 mV

17.3 mV

1.2 mV

16.2 mV

29.3 mV

12.3 mV

19.3 mV

36 mV

28.5 mV

22 mV

38.8 mV

41.6 mV

0.5%1.02%

1.08%2.77%

1.6%5.37%

2.2%8.0%

DynamicTotal18.5 mV

1.8

1.5

1.2

1.0

41.6 mV

64.5 mV

80.4 mV

250

500

750

1,000

Static

Dynamic

Vdd

Courtesy of Apache

10

Wire-bond Dynamic Effects

IR R i(t) Ldi/dt

103 mV

147 mV

3 mV

181 mV

275 mV

13 mV

200 mV

276 mV

75 mV

5.7% 8.3%

12% 19.2%

16.6%29.2%

DynamicTotal

150 mV

1.8

1.5

1.2

288 mV

351 mV

133

250

400

Static

Dynamic

Static vs.Dynamic

F:MHz Volt

Courtesy of Apache

11

12

Increasing System Complexity

Courtesy of Mentor

RF front end

DSP

Memory

Complexconverters

13

I.2 Statistics about Design Flaws Percent of Total Flaws Fixed in IC/ASIC Designs Having Two or More Silicon Spins

2%

3%

4%

4%

5%

5%

7%

12%

13%

47%

0% 10% 20% 30% 40% 50%

Other flaws

IR Drops

Mixed-Signal Interface

Power

Race Condition

Clocking

Yield

Noise

Slow Path

Logical or Functional

Collett Intl. 2000 Survey

4%

13%

17%

17%

20%

21%

23%

25%

28%

29%

35%

67%

0% 10% 20% 30% 40% 50% 60% 70% 80%

Other flaws

Firmware

Power

Race Condition

IR Drops

Mixed-Signal Interface

Yield

Clocking

Slow Path

Noise

Analog Circuit

Logical or Functional

Collett Intl. 2001 Survey

Logical or Functional Analog Noise Slow Path Mixed-signal interface Clock, Power/Ground Firmware

Logical or Functional Slow Path Noise

14

I.3 Spectrum of Analysis

Device

Circuit Logic System

Electrical Behavior Timing Switches Gate RTL

Physics EngineeringEE CS

Circuit theoryAlgorithmsDatabaseProgramming

Extraction

PrecisionMath DiscreteComplex, Real

15

I.3 Spectrum of Analysis(flow)

System

Software Hardware

Floorplan

Logic

Layout

Library

IP blocks

Analog

Architect

Chip

FunctionTimingCircuit Anal.

emulation

powerclock

critical paths

function

characterization

freq

power noise

global wirescross talk

mixed signal

16

I.3 Spectrum of Analysis(coverage)

Circuit Size

Coverage

Circuit Analysis

Logic: Static Timing (sign off)

Logic: FunctionalMixed Signal

Spice Hspice

ASX EldoApache

NassdaIota

Cadence

Synopsys

Mentor

Celestry

Cadence

Synopsys

Mentor

Axis

IBM

Aptix

CadenceSynopsys Mentor

Mentor

Cadence

17

I.3 Spectrum of Analysis(trend)

Layout Dominated Analysis Power/Ground, Clock Wires Pre-layout, Post-layout

Layout Oriented Analysis EE + CS

EE=> CS High Complexity CS=>EE Deep Submicron Effect Accuracy and Efficiency

18

II. Circuit Level Analysis

1. Circuit Analysis Advancement

2. Circuit Analysis Techniques

3. Examples

4. Tasks

19

1st Generation (SPICE) 2nd Generation (Fast SPICE) Next Generation (HSIM)

Circuit Size

Memory Usage

CPU Time

512M Bytes

Circuit Size

2 hrs

300M elements

Circuit Size

Memory Usage

CPU Time

512M Bytes

Circuit Size

2 hrs

300M elements

Circuit Size

Circuit Size

Memory Usage

CPU Time

1G Bytes

2M elements

20 hrs

Circuit Size

Circuit Size

Memory Usage

CPU Time

Bytes

20 hrs

2M elements

Circuit Size

Memory Usage

CPU Time

100M

Bytes

100K elements

100 hrs

Circuit Size

Circuit Size

Memory Usage

CPU Time

100 hrs

Circuit Size

100K elements

II.1 Circuit Analysis Advancement

Courtesy of Nassda

20

II.2 Circuit Analysis Techniques

Memory: Hierarchical Database Circuit Size: Parasitic Reduction Device Complexity: Table Model Simulation:

Backward Euler, Trapezoidal Integration Hierarchical Flow Event Driven (ignoring miller effect) Mixed Rate, Multiple step sizes (partition)

21

Circuit Type

(#MOS, #R,#C,#L)

Total

Elements

Memory

Usage

CPU

Time (hrs)

Memory A

(159M, 159M, 155M,0)

473M 775MB 1.65

Memory B

(3.1M, 5.4M, 4.5M, 88)

13M 195MB 0.69

D/A

(9K,65K,47K,0)

121K 42MB 1.11

PLL

(2K, 8K, 23K, 0)

51K 15MB 0.21

Analog

(119K, 175K, 232K,0)

525K 111MB 0.37

II.3 Examples (HSIM)

Courtesy of Nassda

22

II.4. Tasks

Convergence

Matrix SolverIntegration Partition

Hierarchy DatabaseInput Patterns

EE CSSpeed

Accuracy

Circuit Red.Device Mod.

Event DrivenHierarchical Flow

Math

23

III. Logic Level Analysis

1.Separation of Timing and Function

2.Static Timing Analysis

Algorithms, Gate Models, Path, Cross Talks

3.Functional Analysis

Event Driven, Cycle Based

4.Tasks

24

III.1 Separation of Timing and Function

FunctionalAnalysis

Simple timingmodel

Timing Analysis

Input vectordriven

Inputindependent

Slew, RC treecross talk

Function +Timing

High Complexity!

25

III.2 Static Timing Analysis

i. Algor.: Shortest and Longest Paths Search

ii. Gate Model: • Logic: Unate, Binate Signal Propagations• Timing: functions of Input Slope and Output Load

iii. Path Model:• Logic: False Path, Multiple Cycle Path, Cycles of

Combinational Logic, Multiple Clock Frequencies• Timing: RC Tree

iv. Cross Talks: Timing Window, ATPG

v. Tasks

26

III.2.i Algor.: Path Search

ArrivalTime,Slew Rate

Required Arrival Time

Static Timing Analysis: Worst Case Analysis, Independent of Input Patterns

0->1 slew rate window1->0 slew rate window

0->1 arrival time window1->0 arrival time window

PI1

PI2

PI3

A

G

F

B

H

E

C

D

J PO

Longest &Shortest Paths

27

III.2.I Algor: Path Search(cont)

0/0

0/0

0/0

1/2PI1

PI2

PI3

A

G

F

B

H

E

C

D

J PO3

21

2 2

1

1

2

3 21

2

32

23/3

2/4min/max

28

III.2.I Algor: Path Search(cont)

aminj, amaxj

amini,amaxidji

amini=minj aminj+dji

amaxi=maxj amaxj+dji

29

III.2i Algor.: Path Search

PI1

PI2

PI3

A

G

F

B

H

E

C

D

J PO3

21

2 2

1

1

2

3 21

2

32

2

0/0

0/0

0/0

1/2

3/3

2/4

4/5 6/7

4/4

4/6 6/8

6/10 8/12

min/maxLongest: PI2,G,F,E,D,J,POShortest: PI2,G,H,J,PO

30

III.2.ii Gate Logic Model: Unate & Binate Signals

a y

a y

a yNAND

Unateness: a 0->1 => y 1->0

XNOR

Binateness: a 0->1 => y 0->1 & 1->0

BDD

Check unateness based on BDD

31

III.2.ii Gate Timing Model

Slew rate of y Delay of y

Slew rate of a Slew rate of a

CeffCeff

a y

Ceff

Interconnect

32

III.2.iii Path Logic Model: False Path

4 bit adder 4 bit adder c0

c4c8

p[0,3]

y

Carry skip adderz

101101001111

C0=11

P[0:3]=110000 Z=1

+

33

III.2.iii Path Logic Model: False Path

False path: c0->y->c4 ->c8

Assumption: z->c4 ->c8 derives results faster

If we erase all false paths, we can identify the true critical paths and the corresponding input patterns

4 bit adder 4 bit adder c0

c4c8

p[0,3]

y

Carry skip adderz

34

III.2.iii Path Logic Model: False Path

False path b->c->d->e

3

10

410

2

a

b

c d

e

f

3,10 7,14

17,24

16

red+red=>redred+blue=>blueblue +blue=>blue

35

III.2.iv Cross Talk

WCN: worst-case noise: Delay & GlitchNoise with maximum

pulse height Fixed circuit structure

and parameters Fixed transition time of

input signals Variable arrival time of

input signals

WCN

36

Aggressor / Victim Input Victim Output

III.2.iv Cross Talk: Timing Window

P1

P2

P3

P4

P5

Aligned arrival time Skewed peak noise

A1

A2

A3

A4

V0

37 Aggressor Alignment WITHOUT Timing Constraints

III.2.iv Cross Talk: Timing Window

Skewed arrival time Aligned peak noise

Victim Output

P1

P2

P3

P4

P5

A1

A2

A3

A4

V0Sweep line

Aggressor / Victim Input

38 Aggressor Alignment WITH Timing Constraints

III.2.iv Cross Talk: Timing Window

Victim Output

A1

A2

A3

A4

V0

P1

P2

P5

Sweep Line

Aggressor / Victim Input

39

III.2.iv Cross Talk: Effective Timing Window

Timing window for aggressor input

Earliest arrival time

Timing window for victim output

Latest peak noise occurring time

aT bTLaT

RbT at bt

Lat

Rbt

maxV

iP

aT bTiA

Latest arrival time

at bt

iP

Earliest peak noise occurring time

40

Aggressor Alignment with Timing Constraints -- Reformulation

A1

A2

A3

A4

V0

(a) Original timing window (b) Shifted timing window

Old Sweep Line New Sweep Line

(c) Expanded timing window

New Sweep Line

41

III.2.v Tasks

Path model: special cases

ATPG

Path search in hierarchy

Gate model: power, noise

Path model: RCLK reduction

Cross talk

Timing window+pattern

EE CSSpeed

Accuracy

Math

42

III.3 Logic Level: Functional Analysis

i. Functional Analysis Techniques

ii. Event Driven Analysis

iii. Cycle Based Analysis

iv. Tasks

43

III.3.i Functional Analysis Techniques

Event Driven Simulation VCS, Verilog-XL, VSS, ModelSim

Cycle Based Simulation Frontline, Speedsim, Cyclone

Domain Specific Simulation SPW, COSSAP

44

III.3.ii Event Driven Analysis

Event Wheel Maintains schedules of events Enables sub-cycle timing

Advantages Timing accuracy Good Debug Capability Handles asynchronous

Disadvantages Performance

45

III.3.iii Cycle Based Analysis

RTL Description All gates evaluated every cycle Schedule is determined at compile time No timing No asynchronous feedback, latches Regression Phase High Performance High Capacity

46

III.3.iv Tasks

Pattern generation

Dynamic timing model

coverage

Hardware acceleration

EE CSSpeed

Accuracy

Math

47

IV. Mixed Signal Analysis

RF front-end

RF Simulator

Analog IP

VHDL-AMSVerilog-AMS

DSP

HDLVHDL/Verilog

Embedded

MemoriesHierarchical Fast SPICE

Complex

AnalogTraditional

SPICE

Custom Logic & Mixed-SignalFast SPICE

Single-Kernel simulator for full-chip SoC Verification

Courtesy of Mentor

48

IV. Mixed Signal Analysis: Interface

Digital AnalogD/A

A/D

D/A A/D

Analog Signal 0, 1, X

Threshold Detector

Rise, Fall TimeRise, Fall Resistance

49

Spice VHDL-AMS VHDL Verilog Verilog-AMS

VerilogSpice VHDLVHDL-AMS

VHDL-AMS Verilog-ASpice Spice Verilog-A

CVerilog VHDL

VHDL-AMS SpiceC Verilog-A

Mixed Signal: Mixed Languages

•Single Kernel Architecture•Single Netlist Hierarchy•Automatic D/A and A/D converter insertion

Courtesy of Mentor

50

IV. Tasks

EE CSSpeed

Accuracy

LanguageRF, Analog,

Power, Noise,

Convergence Partition

Interface Compiler

Math

51

IV. Research Directions

Hierarchy Management Analysis + Optimization Layout Oriented Analysis Circuit Reduction Spice

52

Physical objAlgor

Floorplan

Partition,Bus

LogicRep, buffer

PlaceWire-length,Congestion,

Block,alignment,match,size

RoutePattern,vias

AnalysisSignal integrity

Ir drop, didtmanufacturability

verification

CircuitComp,dynamic,

pass

Power, clockpackaging

View, hierarchy

Hierarchy Management

53

Hierarchy management

logic layout

Design process

algor

54

Hierarchy Management (cont.)

•Hierarchy Tree Construction•Hierarchy Tree Transformation•Incremental Changes•Graph Process on Tree Structure

55

Analysis and Optimization

i. Circuit Reductionii. Transient Analysisiii. Optimization of

• power/ground: pads, decoup caps, network

• clock networks: topology, shield, decoup caps

• Buses: shield, topology

56

•Huge Circuitry•Millions of nodes

•Whole Chip Analysis•Power/Ground, Substrate, Analog

•Guaranteed Accuracy •Accuracy vs Execution Time

•Construction or Incremental Changes

Layout Oriented Analysis

57

Layout Based Signal Analysis

  

•Generalized Y-Delta Transformation

•R,C,L,Coupling, Sources•Natural Frequency•Realizability•Hierarchical Circuit Analysis

58

2112 ggy

1 21g

2g

1 212y

Conductance in parallel

59

21

2112 gg

ggy

1 212y1 21g 2g0

Conductance in series

60

1

2 3

12y 13y

23y

321 ggg

ggy jiij

1g

2g 3g

1

2 3

0

Conductance in Y-structure

61

212 11

1

clsgls

g

csls

g

lsg

y

1

2 3

12y 13y

23y

e.g.

g

ls

1cs

1

2 3

0

Admittance in Y-structure

62

1

2 3

12y 13y

23y

22 11

1

clsgls

I

csls

g

IlsI

is the same , and

g

ls

1cs

1

2 3

I

0

44

2I

12y

Admittance in Y-structure, with current source

63

skm1

sk11

1

4

31

2

skm1

skm1

sk22

1

skm1

221

11

121

11

22

11

IVV

IVV

sksk

sksk

m

m

4

31

2

sk11

1

skm1

sk22

11V

2V

1I 2I

K-element

64

Reduction example

65

Waveform Estimation

Transient response evaluated using Y-Δ transformation with Hurwitz polynomial approximation.

8th order stabilized Y-Δ models are used for near-end and far-end node waveform evaluation.

Only a 3rd order AWE model is obtained.

66

Efficiency Comparison

Circuit type

Elements CPU time (s)

#R #L #C Stabilized Y-Δ *

Spice3f4 Efficiency

Tree-like 1035 1034 1001 0.34 3.94 11.58

16397 16394 14299 11.42 134.05 11.74

Mesh-like 1675 2439 733 5.22 73.19 14.02

8035 0 8038 2.07 25.95 12.54

66941 0 67119 41.25 1536.77 37.26

*15th order Y-Δ transformation is used.

67

Delay Accuracy Vs. Efficiency

Model order50% delay 90% delay

Delay Accuracy Delay Accuracy Efficiency

3rd 28.9 95% 33.4 95% 29.3

6th 27.9 99% 31.9 99.6% 27.9

3rd 49.5 96% 72.5 96% 14.7

6th 52.7 97% 71.5 97% 11.5

12th 51.2 99.8% 69.9 99.7% 10.1

RC*

RLC**

* 50% delay is 27.6ps, and 90% delay is 31.7ps.** 50% delay is 51.3ps, and 90% delay is 69.7ps.

.10,10,10 pHLfFCR

68

Observe Overshooting

Model Order * Overshooting ** Accuracy Efficiency

3rd 2.627 97.8% 11.1

6th 2.699 99.5% 10.7

12th 2.683 99.9% 10.3

* Mesh-like RLC circuit is tested, with ** Waveform will converge to DC 2.5v.

.10,10,10 pHLfFCR

69

Pole Analysis

Both AWE and Y-Δ transformation have artificial positive poles;

High order AWE tends to collapse approximate poles, hiding other less dominant ones.

Y-Δ transformation with model stabilization yields no positive poles, and has broader band in pole estimation.

70

V. Conclusion

Layout Oriented Analysis Unified tools combining EE and CS

with Math as foundation New Methodologies

Larger Circuits, Shorter Product Turnaround

71

References

M. Marek-Sadowska, UCSB L.T. Pileggi, CMU CK Cheng, et al, Interconnect Analysis and

Synthesis, John Wiley ACM/IEEE Design Automation Conf. IEEE/ACM Int. Conf. On CAD Apache, Nassda, Mentor, Synopsys,

Cadence, Celestry, IBM, and etc.