C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory...

38
C. H. Ziesler et al., 20 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory University of Michigan Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 1 2 Marios C. Papaefthymiou 1 2 T. J. Watson Research Center IBM Research Division
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    3

Transcript of C. H. Ziesler et al., 2003 Energy Recovering Computers 1 Advanced Computer Architecture Laboratory...

C. H. Ziesler et al., 2003

Energy Recovering Computers

1 Advanced Computer Architecture LaboratoryUniversity of Michigan

Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 1 2

Marios C. Papaefthymiou 1

2 T. J. Watson Research CenterIBM Research Division

C. H. Ziesler et al., 2003

Conventional CMOS Operation

_+

Vdd

Vss

R

CR

time

• Constant supply voltage• One-way street to ground• High voltage drops across conducting devices• Energy dissipation grows with CV2

Vdd

IR

VR

PR

VC

ER

C. H. Ziesler et al., 2003

Charge Recovery

Vpc

R

C

time

• Time-varying voltage supply Vpc• Charge recovered from load C • Charge transfer through resistor is spread over available time• Energy dissipation grows with (RC/T) CV2

~ Vpc

IR

VR PR

ER

Vdd

VC

C. H. Ziesler et al., 2003

Quick Glance at History

● Physics (1970s)

– Logical reversibility of computation– Connection to thermodynamics (“adiabatic” computing)– No absolute minimum to energy dissipation, if computing

is arbitrarily slow. (Does not have to be slow, however.)

● Engineering (1990s)

– Logic circuitry– Energy recycling circuitry– VLSI prototyping

C. H. Ziesler et al., 2003

The Design Space

Complex web of intertwined issues

1. Which capacitance to recover from?

2. How to store/reuse recovered energy?

3. What circuits to do the recovery?

1. 2. 3.

C. H. Ziesler et al., 2003

• The case for reversible computation P. Solomon and D. Frank – SLPE'94

• Asymptotically zero-energy split-level charge recovery logic S.G. Younis and T. F. Knight, Jr. – SPLE’94

• Clock-powered CMOS: A hybrid adiabatic logic style for energy-efficient computing N. Tzartzanis and W. C. Athas – ARVLSI'99

And many, many others.... Few real working chips, however.

Sample Design Points from’90s

C. H. Ziesler et al., 2003

Our Contributions

• Multiplier chip (2001) Custom dynamic circuitry with fine-grain recovery from outputs of individual gates

• Discrete wavelet transform chip (2003) Standard-cell ASIC design with recovery of clock power

• Guiding principles– Simplicity– Minimal control and hardware overhead

• Key attributes– Single-phase sinusoidal waveform for power and synchronization– Integrated power-clock generators with inductive elements– Relatively high operating speeds (200MHz+ in silicon)– Higher energy efficiency than conventional counterparts

C. H. Ziesler et al., 2003

•Minimalist approach– Simple tools: magic and spice– Low-cost standard CMOS process: HP 0.5m, 40-pin DIP package, through MOSIS.

•Operational chip demonstrates practicality of energy-recovering circuit design

– Non-trivial size (8-bit operands, on-chip clock, self-test)– High throughput– Low energy dissipation

Multiplier Chip

• Suhwan Kim (while still at UM) and Conrad Ziesler received First Prize in VLSI Design Contest, DAC 2001.

C. H. Ziesler et al., 2003

• Working 1st silicon• Simple yet effective experimental setup.• Correct operation verified @ 130 MHz.• Energy efficient operation validated by measurements.

Low-Energy High-Speed Operation

C. H. Ziesler et al., 2003

Correct Function

130MHz

signatureoutput

C. H. Ziesler et al., 2003

Energy Comparison

2.7V

1.9V

2.2V1.9V

1.6V

2.9V

50 100 200

Frequency (MHz)

100

200

300

400

500

Dis

sipa

tion

per

Cyc

le (

pJ)

0

140

Energy recovery

4 stage static CMOS

2.3V

3.0V

3.0V

2.0V

8 stage static CMOS

2 stage static CMOS

4x

C. H. Ziesler et al., 2003

Simulations vs. Measurements

020406080

40 50 60 70 90 100 110 120 130

Frequency (MHz)

Dis

crep

ancy

(%

)

• Good correspondence up to 100MHz• Simulations gave consistently higher dissipation

C. H. Ziesler et al., 2003

Test Chip Overview

•Two multipliers with self-test per chip (minimum size die)• Integrated power-clock generator•Resonant LC oscillator

C. H. Ziesler et al., 2003

Input BILBO (self-test)

Product array

Multiplicand buffers

Result summation

Result buffers

Self-test Control

Output BILBO (self-test)

Multiplier and Self-Test

• 9,048 devices in multiplier array, 2,806 devices in self-test circuitry• Implemented entirely in energy-recovering dynamic logic family.

C. H. Ziesler et al., 2003

Energy-Recovering Dynamic Logic

• Sense amplifier• Precharge diodes• Current switches• Evaluation tree• Current tail• Power clock

Vss

bias

at

btaf bf

afot

NMOS NAND gatePower clock

• Single-phase sinusoidal power-clock• Minimum-size low-swing evaluation tee• Built-in state element• Dual-rail noise-tolerant design

C. H. Ziesler et al., 2003

Cascade Operation

• NMOS and PMOS gates are cascaded like domino logic.

• Animation shows tokens propagating through buffer chain.

C. H. Ziesler et al., 2003

• Zero-voltage switching

• LC- resonant clock generation

• External/bondwire inductor L

• Resistive/capacitive adiabatic load

• Compact: 170 x 115 m

Single-Phase Power-Clock Generator

Power Clock

Pulse Gen

Ring Osc

Gate Drive

Power Switches: S1, S2

25 tr. 19 tr. 10 tr.

RingOsc.

PulseGen.

GateDrive

_+_+

loadS2

S1

PCL

Vdd

Vss

Vbp

Vbn

C. H. Ziesler et al., 2003

Switch Timings

• Inductor current builds linearly when switches are on.

• Peak switch current less than peak inductor current.

• Switch S1 turned on at positive voltage peak.

• Switch S2 turned on at negative voltage peak.

• Fixed “on-window” controlled by pulse generator.

Indu

ctor

cur

rent

Out

put v

olta

ge

C. H. Ziesler et al., 2003

• Single-phase sinusoidal waveform @140MHz• ~60pF load, ~10nH external inductor• One DC supply (Vdd, Vss), two DC biases (NMOS, PMOS)

Power-Clock Waveform

Vdd

Vss

PMOS Bias

NMOS Bias

Power-Clock

C. H. Ziesler et al., 2003

The Automation Question

C. H. Ziesler et al., 2003

Resonant Clock ASIC

• Compatible with ASIC flow

• Synthesized by Conrad Ziesler and Joohee Kim using in-house standard-cells library and commercial tools

• Energy recovering clock tree and SRAM word/bit lines

• Low-cost bulk CMOS process: TSMC 0.25m, 108-pin PGA package, through MOSIS

• High frequency (300MHz)

• Low voltage (1.0-1.5V)

C. H. Ziesler et al., 2003

Chip Microphotograph

C. H. Ziesler et al., 2003

System Overview

C. H. Ziesler et al., 2003

The Energy Recovering Flip-Flop

• Clock signal: Single-phase resonant sinusoid• Probe activates state element only if next state differs from present state.• Low voltage operation at high speeds• Delay similar to conventional flip-flops• Fully compatible with standard-cell ASIC flow

16 transistors84 m2

probe state element

C. H. Ziesler et al., 2003

Flip-Flop Operation

C. H. Ziesler et al., 2003

Flip-Flop Power Characterization

Order of magnitude difference between idle (D, Q constant) and active (D, Q changing) dissipation.

200MHz 300MHz 500MHz1

10

100

1000Flip-Flop Energy

activeidle

frequency

ene

rgy

(fJ)

frequency (MHz)

Flip-Flip Energy per Cycle1000

100

10

1

ener

gy (

fJ)

200 300 500

C. H. Ziesler et al., 2003

Resonant Clock Generator

Resonate entire clock capacitance with small inductor

Pump resonant system with NMOS switch at appropriate times

NMOS switch only conducts incremental losses whenever “on”

NMOS Switch

Control

Pre-driver

Driver

C. H. Ziesler et al., 2003

Clock Generator Operation

C. H. Ziesler et al., 2003

Recovering vs. Conventional Hardware

Synthesized dual-mode ASIC– conventional– energy recovery

Dual-mode flip-flop cell

Conventional clock treewith conventional flip-flop

Resonant clock tree withenergy-recovering flip-flop

Direct comparison of dissipation at target throughput using identical hardware structures

C. H. Ziesler et al., 2003

ASIC Statistics

Discrete wavelet transform (DWT)

3,897 gates, 413 ffs

15,571 transistors

400m x 900m

13.6 pF , 21 nH

300 MHz , 1.5V

0.25m logic process

Dual-modeDWT

Clock generator

C. H. Ziesler et al., 2003

Simulation Results

Total system dissipation

Conventional mode includes clock tree

Energy recovery mode includes on-chip clock generator

300MHz , 1.5V Energy Recovery Conventional

Idle 6.74pJ 29.72pJ

Active 68.47pJ 78.28pJ

Per Cycle Energy Dissipation

C. H. Ziesler et al., 2003

Lab Setup

C. H. Ziesler et al., 2003

Correct Function

• 115MHz• 5nH high-Q external inductor

signature output(conventional)

signature output(energy recovery)

C. H. Ziesler et al., 2003

Self-Resonance at 225MHz

• No external inductor!• Clock resonates using parasitic inductance of package• Lossy parasitics limit power savings

200,000 samples@200ps intervals

signature output(energy recovery)

buffered power clock

8ns/div

4s/div

C. H. Ziesler et al., 2003

Preliminary Power Measurements

115MHz , 1.5V Energy Recovery Conventional

Idle 2.6mW 4.3mW

Active 8.9mW 11.7mW

Total system measurements by monitoring DC supplies

– Conventional mode includes clock tree

– Energy recovery mode includes on-chip clock generator

Not included: I/O supply, test circuitry, ring oscillator

Energy recovery data is conservative (obtained when simultaneously driving two DWT cores, one always in idle)

C. H. Ziesler et al., 2003

• Working silicon that realizes the potential of energy recovery– power savings over conventional CMOS– operating speeds in the 100s of Mhz– low control and hardware overhead– integrated power-clock generators

• Multiplier chip in 0.5m CMOS– Fine-grain recovery at the level of individual gate output– True singe-phase energy recovering dynamic logic– 2—4x energy savings over static CMOS in 50—200MHz

• DWT chip in 0.25m CMOS– ASIC-compatible design flow– Correct operation all the way up to self-resonance at 225MHz– Direct hardware comparison with dual-mode chip:

• 60—75% of conventional at 115MHz (measurement) • 1.2—4x savings at 300MHz (simulation)

Summary

C. H. Ziesler et al., 2003

Ongoing Research

• Continue chip testing– Break 300MHz barrier by bonding chip onto PCB– Test SRAM

• Experimentally assess EM noise – Clock EM power is centered around the resonant

frequency of the sinusoid

• Scaling– How large a clock network with single-phase sinusoid?– Skew/jitter?

• Ultra high speed (multiple GHz)– How fast with current clocking scheme?– Recovery circuitry with self-resonating clock network.

C. H. Ziesler et al., 2003

Links and Acknowledgments

Web Sitehttp://www.eecs.umich.edu/acal/energyrecovery

Energy recovery groupSuhwan Kim (emeritus)

Conrad ZieslerJoohee KimJuang-Ying ChuehVisvesh Sathe

Funded in part by U.S. Army Research Office and DARPADAAG-55-97-1-0250, DAAD-19-99-1-0340, N66001-02-C-8059