Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction...

31
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. Partially supported by NSF. Address comments to [email protected] Address comments to [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction...

Page 1: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power ReductionLeakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Yan Lin and Lei He

EE Department, UCLA

Partially supported by NSF. Partially supported by NSF.

Address comments to [email protected] comments to [email protected]

Page 2: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

OutlineOutline

Review and Motivation

Chip-level Vdd-level Assignment Algorithms

Experimental Results

Conclusions

Page 3: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

FPGA Power ReductionFPGA Power Reduction

Existing FPGAs are power inefficient compared to ASICs [kussy, ISLPED’98]

Power aware FPGA CAD algorithms for existingFPGA architectures CAD algorithms to minimize power-delay product

[Lamoureux et al, ICCAD’03] Configuration inversion for leakage reduction

[Anderson et al, FPGA’04]

Power efficient FPGA circuits and architectures Dual-Vdd and Vdd-programmable FPGA logic blocks

[Li et al, FPGA’04][Li et al, DAC’04] Vdd-programmable FPGA interconnects

[Li et al, ICCAD’04] [Gayasen et al, FPL’04] [Anderson et al, ICCAD’04]

Page 4: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Vdd-programmable switch Vdd selection for used switch Power-gating unused switch

Reduce leakage by 300X Configurable Vdd-level conversion

Avoid excessive leakage when low-Vdd switch drives high-Vdd switches

Vdd-programmable Interconnects [Li et al, ICCAD’04]Vdd-programmable Interconnects [Li et al, ICCAD’04]

Power transistor

Segment based Vdd-level converter insertion (SLC) Area overhead

35% area overhead for MCNC benchmark circuits Leakage overhead

29% leakage overhead for MCNC benchmark circuits

Conventional routing switch

Page 5: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Previous Approaches w/o LCsPrevious Approaches w/o LCs

[Gayasen et al, FPL’04] Level converters inserted at CLB inputs (outputs) All the routing trees driven by (driving) the source (sink)

CLB have the same Vdd-level as the source (sink) CLB Lacking in flexibility

A path-based Vdd-level assignment is performed for CLBsand interconnects

[Anderson et al, ICCAD’04] VT drop of NMOS is used to generate low-Vdd Positive feedback PMOS is used to tolerate low-Vdd switch

driving high-Vdd switches Alternative design of level converter Still has delay and power penalty

Page 6: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Our Major ContributionsOur Major Contributions

Proposed a few Vdd-level assignment algorithms Sensitivity based algorithms

TLC-S and dTLC-S for TLC and dTLC, respectively Linear programming (LP) based algorithm

dTLC-LP for dTLC

Proposed two ways to avoid using level converters in interconnects Tree based level converter insertion (TLC)

All the switches in one routing tree have same Vdd-level

Dual-Vdd tree based level converter insertion (dTLC) Only high-Vdd switch drives low-Vdd switches in one tree

Page 7: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Tree based LC insertion (TLC) allows one type of Vdd-level within one routing tree

Problem FormulationsProblem Formulations

Assign Vdd-level to each interconnect switch to minimize interconnect power Meet the delay target Tspec

Vdd-level converters are removed within interconnects are inserted at CLB inputs/outputs and can be used when needed

Dual-Vdd tree based LC insertion (dTLC) allows high-Vdd switch drives low-Vdd switches, but not vice versa

Page 8: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

OutlineOutline

Review and Motivation

Chip-level Vdd-level Assignment Algorithms

Experimental Results

Conclusions

Page 9: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Delay & Power Model with Dual-VddDelay & Power Model with Dual-Vdd

To incorporate dual-Vdd into timing analysis Pre-characterize the intrinsic delay and effective driving

resistance of switch using SPICE Calculate routing delay using Elmore delay model

Interconnect power

2)(5.0)( ijsclkijd VddifcfVddP Dynamic power

)( ijs VddP Leakage power is pre-characterized using SPICE

Page 10: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Chip-level Assignment AlgorithmsChip-level Assignment Algorithms

Tree based level converter insertion (TLC) Sensitivity based algorithm TLC-S

Dual-Vdd tree based level converter insertion (dTLC) Sensitivity based algorithm dTLC-S Linear programming (LP) based algorithm dTLC-LP

Page 11: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Sensitivity Based Algorithm TLC-SSensitivity Based Algorithm TLC-S

Iterative assignment Assign low-Vdd to the ‘untried’ tree with maximum power sensitivity

in each iteration Reject the assignment if critical path increases Iteration terminates after all trees are ‘tried’

Power sensitivity The power reduction by changing Vdd from high-Vdd to low-Vdd Power includes both dynamic and leakage power

ddVP /

Page 12: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Sensitivity Based Algorithm dTLC-SSensitivity Based Algorithm dTLC-S

A “candidate switch” is defined as A switch does not drive any switch Low-Vdd has been assigned to all of its fanout switches

Iterative assignment Assign low-Vdd to a candidate switch with maximum power

sensitivity in each iteration Reject assignment if critical path increases Iteration terminates when there is no candidate switch

Page 13: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

LP Based Algorithm dTLC-LP: OverviewLP Based Algorithm dTLC-LP: Overview

Chip-levelTime Slack Allocation

Net-levelBottom-up Assignment

Refinement

Single-Vdd placed and routed netlist

Dual-Vdd netlist

Page 14: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net EstimationdTLC-LP: Single-Net Estimation

Slack is represented in multiples of is delay increase of an interconnect segment by changing

Vdd from high-Vdd to low-Vdd

dd

b1

b2

b3

b4

sink1sink2s1

s2

s1=1s2=1

b1

b2

b3

b4

s2=1s1=2

b1

b2

b3

b4

s2=3s1=2

b1

b2

b3

b4

An example

Page 15: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net Estimation (Cont.)dTLC-LP: Single-Net Estimation (Cont.)

sik: Slack for kth sink in ith routing tree lik: Number of switches in the path from source to kth sink in ith tree SLij: Set of sinks in the fanout cone of jth switch in ith tree

Given the allocated slacks, estimate number of low-Vdd switches

An exampleSource

Page 16: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net Estimation (Cont.)dTLC-LP: Single-Net Estimation (Cont.)

sik: Slack for kth sink in ith routing tree lik: Number of switches in the path from source to kth sink in ith tree SLij: Set of sinks in the fanout cone of jth switch in ith tree

Given the allocated slacks, estimate number of low-Vdd switches

An example

s1

s1/l1

Source

Page 17: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net Estimation (Cont.)dTLC-LP: Single-Net Estimation (Cont.)

sik: Slack for kth sink in ith routing tree lik: Number of switches in the path from source to kth sink in ith tree SLij: Set of sinks in the fanout cone of jth switch in ith tree

Given the allocated slacks, estimate number of low-Vdd switches

An exampleSource

s2

s2/l2

Page 18: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net Estimation (Cont.)dTLC-LP: Single-Net Estimation (Cont.)

sik: Slack for kth sink in ith routing tree lik: Number of switches in the path from source to kth sink in ith tree SLij: Set of sinks in the fanout cone of jth switch in ith tree

Given the allocated slacks, estimate number of low-Vdd switches

An exampleSource

s3

s3/l3

Page 19: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP: Single-Net Estimation (Cont.)dTLC-LP: Single-Net Estimation (Cont.)

sik: Slack for kth sink in ith routing tree lik: Number of switches in the path from source to kth sink in ith tree SLij: Set of sinks in the fanout cone of jth switch in ith tree

Given the allocated slacks, estimate number of low-Vdd switches

An example

Min(sk/lk)

Source

Theorem: The estimation gives a lower bound of number of low-Vdd switches that can be achieved

Page 20: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP : Full-chip Time Slack AllocationdTLC-LP : Full-chip Time Slack Allocation

Objective function

fs(i): transition density of ith tree Fn(i): estimated number of low-Vdd switches in ith tree Directly minimize dynamic power May help minimizing leakage power that exponentially depends on Vdd-level

Constraints Net-based timing constraints

For edges other than routing

For edges corresponding to routing

For PIs and POs

Page 21: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP : Full-chip Time Slack AllocationdTLC-LP : Full-chip Time Slack Allocation

Objective function

fs(i): transition density of ith tree Fn(i): estimated number of low-Vdd switches in ith tree Directly minimize dynamic power May help minimizing leakage power that exponentially depends on Vdd-level

Constraints

Constraints due to transforming min function to linear function

Upper bound for useful slack

Theorem: The time slack allocation problem is an LP problem

Page 22: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP : OverviewdTLC-LP : Overview

Chip-levelTime Slack Allocation

Net-levelBottom-up Assignment

Refinement

Single-Vdd placed and routed netlist

Dual-Vdd netlist

Page 23: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP : Net-level Bottom-up AssignmentdTLC-LP : Net-level Bottom-up Assignment

Theorem: the bottom-up assignment is optimal

Perform bottom-up assignment within each tree to leverage the allocated slacks

d

Bottom-up assignment Assign low-Vdd to switches in the routing tree in a bottom-up fashion Slack is reduced by in each step Stop the process until no slack left

Page 24: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP : OverviewdTLC-LP : Overview

Chip-levelTime Slack Allocation

Net-levelBottom-up Assignment

Refinement

Single-Vdd placed and routed netlist

Dual-Vdd netlist

Page 25: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

OutlineOutline

Review and Motivation

Modeling and Problem Formulations

Chip-level Vdd-level Assignment Algorithms

Experimental Results

Conclusions

Page 26: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Experimental SettingExperimental Setting

Cluster-based Island Style FPGA Structure 100% buffered interconnects, subset switch block Uniform length 4 for all wire segments

ITRS 100nm technology

Use VPR [Betz-Rose-Marquardt] for placement and routing

Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation Considering short-circuit power, glitch power and input vector 8% average error compared to SPICE simulation

Page 27: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Interconnect Power Comparison between TLC-S, dTLC-S and dTLC-LPInterconnect Power Comparison between TLC-S, dTLC-S and dTLC-LP

dTLC-S and dTLC-LP achieve 6.7% and 6.9% less interconnect power compared to TLC-S, respectively

Interconnect power breakdown TLC-S, dTLC-S and dTLC-LP have almost the same leakage dTLC-S and dTLC-LP achieve 13.8% and 15.8% less interconnect dynamic

power compared to TLC-S, respectively

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

TLC-S dTLC-S dTLC-LP

Inte

rcon

nec

t P

ower

(w

att) Leakage power

Dynamic power

Page 28: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

dTLC-LP compared to SLC and h2lLCidTLC-LP compared to SLC and h2lLCi

SLC [Li et al, ICCAD ’04] Segment based level converter inserted in interconnects Sensitivity based assignment algorithm

h2lLCi [Gayasen et al, FPL’04] All the routing tree driven by source CLB have the same Vdd-level as the source CLB Path based assignment algorithm

dTLC-LP, SLC and h2lLCi achieve 77.54%, 74.70% and 41.80% low-Vdd switches w/o relaxing Tspec

At different delays, dTLC-LP achieves The highest number of low-Vdd switches The lowest power consumption

30%

40%

50%

60%

70%

80%

90%

100%

12.00 12.50 13.00 13.50 14.00 14.50 15.00 15.50

Critical Path Delay (ns)

% o

f V

dd

L

Sw

itch

es

dTLC-LP SLC h2lLCi0%

5%

10%

15%

20%25%

0.02

0.04

0.06

0.08

0.1

0.12

0.14

12.00 12.50 13.00 13.50 14.00 14.50 15.00 15.50

Critical Path Delay (ns)

Inte

rcon

nec

t P

ower

(w

att)

h2lLCi dTLC-LPSLC

0%5%

10% 15% 20% 25%

64%

19%

Page 29: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Runtime Comparison between TLC-S, dTLC-S and dTLC-LPRuntime Comparison between TLC-S, dTLC-S and dTLC-LP

0.E+001.E+032.E+033.E+034.E+035.E+036.E+037.E+038.E+039.E+031.E+04

alu4 apex2 apex4 elliptic ex1010 frisc pdc s38417 s38584

MCNC Benchmarks

Ru

ntim

e (s

)

TLC-SdTLC-SdTLC-LP

TLC-S runs the fastest

dTLC-S versus dTLC-LP Runs 3X faster than dTLC-LP But achieves similar power consumption

Page 30: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Conclusions and Future WorkConclusions and Future Work

Proposed two ways to avoid using level converters in Vdd-programmable interconnects Tree based level converter insertion (TLC) Dual-Vdd tree based level converter insertion (dTLC)

Developed chip-level dual-Vdd assignment algorithms w/o level converters Sensitivity based algorithms TLC-S and dTLC-S LP based algorithm dTLC-LP

Developed dTLC-LP that reduces interconnect power by 64%

Developed dTLC-S that obtains slightly smaller power reduction with 3X speedup compared to dTLC-LP

Extend chip-level Vdd-level assignment to interconnects using wire segments of different lengths

Allocate time slack to logic blocks and interconnects in a uniform fashion

Page 31: Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

Thank you!