Post on 07-Apr-2015
Adiabatic LogicAdiabatic Logic
Power Consumption in CMOS CircuitsPower Consumption in CMOS Circuits-Dynamic Power dissipation y p
1. Switching Power2. Short circuit power
-Static power dissipation1. DC power2. Leakage power
2
Low-Power Design Techniquesg qImplementation at different levels of
abstraction
Technology-Scaling
Physical Design-Power driven P&R
-VT optimization-Alternative technology
-Low power layout optimization
C / hCircuit/Logic-Logic style: Adiabatic-Multi & variable supply
Architecture-Parallelization & pipelining-Retiming-Glitch minimization
SystemSystem-Partitioning-Component choicePower management-Power management
Principles
Adiabatic LogicAdiabatic LogicAdiabatic : reversible thermodynamic process that occurs without gain or loss of heatIdeal adiabatic logic: charge can be recycled (reused) fo infinite n mbe of timesfor infinite number of timesReal adiabatic logic: charge recycled many times so that significant power reduction can be possiblesignificant power reduction can be possibleTo achieve the charge saving expected from adiabatic logic, one of two power supplies is to be used:
- Constant current power supplyVariable voltage supply- Variable voltage supply
5
Adiabatic Charging PrincipleAdiabatic Charging Principle
Energy can be traded for delay by increasing charge transport timecharge transport time
6
Adiabatic Recovery Requirementsy qThree principles :- Device turn on under Vds=0 - Source-drain voltage change under device offSource drain voltage change under device off - Gradual voltage change
7
Carnot CycleCarnot CycleIn 1822-24, Sadi Carnot analyzed the efficiency of an ideal heat engine all of whose steps were reversible andheat engine all of whose steps were reversible, and furthermore proved that:
Any reversible engine (regardless of details) wouldAny reversible engine (regardless of details) would have the same efficiency (TH−TL)/TH.No engine could have greater efficiency than aNo engine could have greater efficiency than a reversible engine w/o producing work from nothing Temperature itself could be defined on a pthermodynamic scale based on heat recoverable by a reversible engine operating between TH and TL
Degree of Reversibilityg yThe degree of reversibility (a.k.a. reversibility, a.k.a. thermodynamic efficiency) of any quasi-adiabatic y y) y qprocess is defined as the ratio of:
the total free energy at the start of the process)(
)0(EE free
Δ÷ by the total energy spent in the processOr, equivalently:
)(tEspentΔ
, q ythe known, accessible information at the start÷ by the amount that is converted to entropy tS
KΔ
0
y pyThis same quantity is referred to as the (per-cycle) “quality factor” Q for any resonant element (e.g., LC
tS
oscillator) in EE.
Electrical ResistancePspent=I2R=(Q/t)2R, or Espent =Pt = Q2R/t ← scaling with 1/tspent Q / g /
Charge transfer through a resistor obeys the adiabatic principle!
Why?Conduction electrons have a large Fermi velocity or thermal velocity relative to drift velocitythermal velocity relative to drift velocity.
Scatter off of lattice-atom cross-sections with a mean free time tf that is fairly independent of drift f y pvelocity
Each scattering event thermalizes the electron’s drift ki ti f f f t’ t t l Ekinetic energy - a frac. f of current’s total Ek
Some Loss-Inducing InteractionsgFor ordinary voltage-coded electronics:
Interactions whose dissipation scales with speed:p pParasitic EM emission from dynamic (C,L) reactancesS i f b lli i l f l iScattering of ballistic electrons from lattice imperfections, causing Ohmic resistance
Interactions having different scaling laws:Interactions having different scaling laws:Interference from outside EM sourcesThermally-activated leakage of electrons overThermally activated leakage of electrons over potential energy barriersQuantum tunneling of electrons through narrow barriers (sub-Fermi wavelength)Losses due to intentional treatment of known physical information as entropy (bit erasure)physical information as entropy (bit erasure)
Some Ways to Reduce LossesSome Ways to Reduce LossesEM interference / emission: Add shielding, use high-QMEMS/NEMS ill tMEMS/NEMS oscillatorsScattering/resistance: Ballistic FETs, superconductorsh l l k d l d/ h hThermal leakage: avoid low VT and/or high temps
Tunneling: thick tunnel barriers, high-κ dielectrics, d t l F i l l/hi h l t ffi itconductors w. low Fermi-level/high electron affinity,
vacuum-gap barriers?Intentional bit erasure: reduce voltages use mostlyIntentional bit erasure: reduce voltages, use mostly-reversible adiabatic logic designs
Adiabatic electronics &Adiabatic electronics & CMOS implementations
Conventional Gates are IrreversibleLogic gate behavior (on receiving new input):
Many-to-one transformation of local state!Many to one transformation of local state!Required to dissipate bT, by Landauer principleIncurs ½CV2 dissipation in 2 out of 4 casesIncurs ½CV dissipation in 2 out of 4 cases.
Example:Transformation of local state:Just beforetransition:
Aftertransition:
i t i t
Example:Static CMOS Inverter:
in out in out0 00 1 0 1
in out0 1 0 11 0 1 01 1
Exact formula:( )( ) 2/1
diss 11 CVeffE f ⋅−+= −
for frequency reductionf :≡ RC/t
Adiabatic Logic FamiliesAdiabatic Logic FamiliesPartially adiabatic circuits- Some energy is recovered
2N2P / 2N-2N2P CAL (Clocked CMOS Adiabatic Logic)TSEL (True Single Phase Adiabatic)SCAL (S l d Adi b ti L i )SCAL (Source-coupled Adiabatic Logic)
Fully adiabatic circuits Dissipate little energy very slow- Dissipate little energy, very slowPAL (Pass-transistor Adiabatic Logic)Split-level Charge Recovery Logic (SCRL)Split-level Charge Recovery Logic (SCRL)
19
Complementary Pass Transistor EnergyComplementary Pass-Transistor Energy Recovery Adiabatic Logic (CPERL)
20
CPERL (1)CPERL (1)
All NMOSAll NMOSGate consists of two parts1) Charge/discharge1) Charge/discharge
function part (M1 – M6)2) Logic function part2) Logic function part
(M9 – M12)
21
CPERL (2)CPERL (2)
22
CPERL (3)CPERL (3)An assumption was made that ϕ1 and IN areAn assumption was made that ϕ1 and IN are in the same phaseAs ϕ1 ramps up IN rises alsoAs ϕ1 ramps up, IN rises alsoInbar remains lowM9 & M11 turns onBN1 is precharged to (Vdd – Vth)gBN2 is still at low voltage
23
CPERL (4)CPERL (4)When ϕ1 ramps down, IN goes down also ϕ p gcausing M9 & M11 to turn offAs ϕ2 ramps up and due to the gate-to-channel ϕ p p gcapacitance in M1, BN1 goes higher than Vdd causing M1 to turn onϕ2 will charge the node OUT in an adiabatic manner to VddAs ϕ2 ramps down, OUT goes down alsoThe charge stored on OUT is recovered to supplied through the discharge process
24
CPERL (5)CPERL (5)Two stages of CPERL inverters gchain are shown and just half of the circuit for the simplicityDuring period t1, A is assumed high and BN2 is at (Vdd-Vth)
During t2, ϕ2 ramps downand the the charge will trapped at BN2BN2
During t3, ϕ2 rises againAssuming that A Low and /A high, M10 of stage 2 will turn on
25
CPERL (6)CPERL (6)M3 of stage 1 will turn on alsoCurrent will flow through M10 & M3due to voltageCurrent will flow through M10 & M3due to voltage differenceThis charge sharing will stop when a voltage balance g g p goccurs between the nodesM5 is working under diode connectionIf the voltage difference is still higher than Vth, M5 will turn on until the voltage difference becomes lower than VthVthIf this difference is already less than Vth, M5 will stay off
26
CPERL (7)CPERL (7)
Brent Kung adder has three units:- Propagate and Generate unit- Carry parallel prefix unit- Sum unit
27
CPERL (8)CPERL (8)
28
CPERL (9)CPERL (9)
29
CPERL (10)CPERL (10)
30
SCRL: Split-level Charge Recovery LogicRecovery Logic
φ
Just before AfterTransformation of local state:Just beforetransition:
Aftertransition:
in out in outin out in out0 ½ 0 11 ½ 1 01 ½ 1 0
Input-Barrier, Clocked-Bias RetractileCycle of operation:
Inputs raise or lower barriers* Must reset outputprior to input.
Do logic w. series/parallel barriers
p p* Combinational logiconly!
Clock applies bias force which changes state, or not
0 0 0E l
Input barrier heightExamples:Hall’s logic,SCRL gates
10 N
SCRL gates,Rod logic interlocks
Clocked force applied →
Retractile Logic w. SCRL gatesg gSimple combinational logic of any depth N:
R i N ti i hRequires N timing phasesNon-pipelinedN ti l f
Time →
No sequential reuse ofHW (even worse)
We needWe needsequentiallogic!logic!
Sequential Retractile Logicq gApproach #1 (Hall ‘92):
After every N stages, invoke an irreversible latchstores the output of the last stage
Then, retract all the stages,and begin a new cycle
Problems:R d di i ti b t t f t f NReduces dissipation by at most a factor of NAlso reduces HW efficiency by order N!
In worst case compared to a pipelined sequentialIn worst case, compared to a pipelined, sequential circuit
Approach #2 (Knight & Younis, ‘93):Approach #2 (Knight & Younis, 93):The “store output” stage can also be reversible!Gives fully-adiabatic, sequential, pipelined circuits!y , q , p p
N can be as small 1 or 2 & still have arbitrarily high Q
Simple Reversible CMOS LatchUses a standard CMOS transmission gateSequence of operation:Sequence of operation:(1) input initially matches latch contents (output)(2) input changes→output changes (3) latch closes(2) input changes→output changes (3) latch closes
(4) input removed
P Before Input Inputinput: arrived: removed:
in outinput: arrived: removed:in out in out in outa a a a a a
Pb b a b
Resetting a Reversible LatchgCan reversibly unlatch data as follows: (exactly the reverse of the latching process)(exactly the reverse of the latching process)
(1) Data value d stored on memory node M.(2) Present an exact copy of d on input(2) Present an exact copy of d on input.(3) Open the latch (connecting input to M).
No dissipation since voltage levels matchNo dissipation since voltage levels match(4) Retract the copy of d from the input.
R t t t d i l t h lRetracts copy stored in latch also.
Input-Bias Clocked-Barrier LogicCycle of operation:
Data input applies bias Can amplify/restoreData input applies biasAdd forces to do logic
Clock signal raises barrier
Can amplify/restore input signalin clocking step.
Clock signal raises barrierData input bias removed 11
Retractinput
0 0Retract
input
ClockCan reset latch reversibly given 0 0
input barrierupClock up
copy of contents.
10 N
Input“0”
Input“1”
Examples: AdiabaticQDCA, SCRL latch, Rod logic latch PQ logic 10 Nlogic latch, PQ logic,Buckled logic
SCRL 6-tick clock cycleInitial state: All gates off, all nodes neutral.
Tick #1: Input goes valid, forward T-gate opens.
Tick #2: Forward gate charges, output goes valid. (Tick #1 of subsequent g te )
in out in outgate.)
in out
Tick #3: Forward T-gate closes reverse gate charges
Tick #5: Reverse gate discharges input goes neutralcloses, reverse gate charges.
in out
discharges, input goes neutral.
in out
Tick #6: Reverse T-gate closes, output goes
in outTick #4: Reverse T-gate opens, forward gate
in outneutral.Ready for next input!
gdischarges.
24 ti k / l24 ticks/cyclein this version-includes 2-levelincludes 2 levelretractile stages
Some Timing TerminologyFor sequential adiabatic circuits:
1 Tick: Time for a single ramp transitiong padiabatic speed fraction f times the RC gate delay.
1 Phase: Latency for a data value to propagate forward by 1 pipeline stage.1 Cycle: Minimum period for all timing information to return back to its initial statereturn back to its initial state.Diadic: Two retractile levels per gate
permits inverting or non-inverting logicpermits inverting or non inverting logic.Dual rail: Two wires per logic value
permits universal logic with monadic gatespermits universal logic with monadic gates
Monadic: only 1 level
Some Figures of DemeritSome quantities we may wish to minimize:
Ticks/phase:proportional to logic propagation latency
Ticks/cycle:i l t t f d t th h treciprocal to rate of data throughput
Transistor-ticks/cycle:reciprocal to HW cost efficiencyreciprocal to HW cost-efficiency
Number of required clock/power input signals:supplying these may be a significantsupplying these may be a significant component of system cost
Number of distinct voltage levels required:g qmay affect reliability/power tradeoff
Some Interesting Questionsg QAbout pipelined, sequential, fully-adiabatic CMOS logic:
Q: Does it require an intermediate voltage level?Q: Does it require an intermediate voltage level?A: No, you can get by with only 2 different levels.
Q: What is the minimum number of externally provided Q y ptiming signals you can get away with?
A: ≤4 (≤12 if split levels are used)Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit?an adiabatic circuit?
A: Yes, but not statically, unless N2 hardware is usedwhere N is the number of stages per fullwhere N is the number of stages per full sequential cycle
We now demonstrate these answers.
Some SCRL timing diagrams
2LAL: 2 Level Adiabatic Logic2LAL: 2-Level Adiabatic Logic
2LAL: 2-level Adiabatic Logicg
Use simplified T-gate symbol:
(Implementable using ordinary CMOS transistors)
p g yBasic buffer element:
cross-coupled T-gates
PP
p gOnly 4 timing signals,4 ticks per cycle:
:≡
inφ1
0 1 2 3Tick #
φi rises during tick iφi falls during tick (i+2)
Pin
out
0 1 2 3φ0φ1
φimod 4
outφ0
φ1φ2φ3
2LAL Cycle of Operation
φ1→1Tick #0 Tick #1 Tick #2 Tick #3
in→1 in→0φ1
in φ0→1
φ1→0out→1φ0→0
in=0φ1→1
out→0
out=0φ0→1 φ0→0
2LAL Shift Register Structureg1-tick delay per logic stage:
inφ1 φ2 φ3 φ0
out
L i l ti i & ti
φ0 φ1 φ2 φ3
Logic pulse timing & propagation:
i0 1 2 3 ... 0 1 2 3 ...
inin
More complex logic functionsNon-inverting Boolean functions:
φA
φ
A A B
φ
BA A B
A∨B
For inverting functions, must use quad-rail logic
AB
g , q gencoding:
To invert, just A = 0 A = 1swap the rails!
Zero-transistor
A0A0
A“inverters.” A1A1
Hardware Efficiency issuesHardware Efficiency issuesHardware efficiency: # of logic operations / hardware / timeHardware space-time complexity: How much h d f h h lhardware for how much time per logic op?Minimizing:(# f t i t ) × (# f ti k ) / ( t l )(# of transistors) × (# of ticks) / (gate cycle)SCRL inverter, w. return path:
(8 t i t ) (6 ti k ) 48 t i t ti k(8 transistors) × (6 ticks) = 48 transistor-ticksQuad-rail 2LAL buffer stage:
(16 t i t ) (4 ti k ) 64 t i t ti k(16 transistors) × (4 ticks) = 64 transistor-ticks
More SCRL vs. 2LALSCRL reversible NAND, w. all inverters:
(23 transistors) × (6 ticks) = 138 T-ticks(23 transistors) × (6 ticks) = 138 T ticksQuad-rail 2LAL AND:
(48 transistors) × (4 ticks) = 192 T-ticks(48 transistors) × (4 ticks) = 192 T ticksResult of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimizeof rails, and # ticks/cycle, it does not minimize overall spacetime complexity.The question of whether 6-tick SCRL minimizes per-q pop spacetime complexity among pipelined adiabatic CMOS logics is still open.
Minimizing Power-Clock SignalsMinimizing Power Clock SignalsHow many external clock signals required?
N l l d t til d l iN-level-deep retractile cascade logic:2N waveforms × 1 phase = 2N signals
6 ti k/ l 6 h d i SCRL6 tick/cycle, 6-phase dynamic SCRL:6 waveforms × 6 phases = 36 signals
24 tick/cycle, 3-phase static SCRL:12 waveforms × 3 phases = 36 signals
4 tick/cycle, 2LAL:1 waveform × 4 phases = 4 signals!
It turns out that 12 signals are sufficient to implement any combination of 2-level or 3-level logics (including retractile) on chip!retractile) on-chip!
How to Do ItCircular 2LAL shifter; pulse-gated clocks
0 1 2 3Tick #
0 1 2 3P0P1in
P1 P2 P3 P0
0P2P3out
0
φ0φ1
P0 P1 P2
outP3
φ2φ2
φ2φ3
2
GCAL: General CMOS Adiabatic LogicgA general CMOS adiabatic design methodology
Currently under development at UFCombines best features of SCRL, 2LAL, and retractile logics:
Permits designs attaining asymptotically optimal cost-efficiencyFor any combination of time space spacetime energy costsFor any combination of time, space, spacetime, energy costs
Arbitrarily high degree of reversibilityPermits using minimal 2-level and 3-level adiabatic gatesRequires only 4 externally supplied clock/power signals for 2-level logic
And only 12 total for mixed 2-level + 3-level logicSupports mixtures of fully-pipelined and retractile logic.Supports quiescent dynamic/static latches & RAM cells
Tools currently under development:Tools currently under development:A new HDL specialized for describing adiabatic designsDigital circuit simulator with adiabaticity checkerAdiabatic logic synthesis tool, with automatic legacy design converter
GCAL DRAM/SRAM cellsGCAL DRAM cell
4 transistorsGCAL SRAM cell
8 transistors4 transistors4 word lines/row2 bit li / l ( 1)
8 transistors6 word lines/row2 bit li / l ( 1)2 bit lines/col (or 1) 2 bit lines/col (or 1)
DRAM Cell Write Cycle1. All nodes initially ½.
T-gate initially closed (off)T gate initially closed (off).2. Transmission gate opens.
Internal node is connected toInternal node is connected to bit-line (at matching voltage).
3. Bit line transitions to 0 or 1.Pulls internal node to matching level.
4. Transmission gate closes.Internal node latched to new level.
5. Bit line transitions back to ½.P f lPrepares for a new cycle.
6. Use the reverse sequence of operations to unwrite.
DRAM Cell Read Cycle
1. All external nodes initially ½.1. T-gate initially off.2. Internal node contains data.
2. Inverter rails split.1. Bit line set to (inverted) data.( )
3. T-gate at end of column latches bit-line data.4. Inverter rails merge.g
1. Bit line restored to ½ level.5. Can use the reverse sequence of operations to unread5. Can use the reverse sequence of operations to unread
copy of data available at end of column.
Fully-Adiabatic DRAM cell6T, 6 lines/row, 1 line/column (in/out together)Read cycle:y
Initially: φ lines neutral, out neutral, R offR for desired row turns onφ for desired row splits, driving out columnR turns off, out is readR turns off, out is readφ merges, out is reset
Write cycle:Write cycle:First, do read cycle.in is set to outin is set to outW turns onin changed to new valuein changed to new value...
Fully-Adiabatic SRAMy10-T, 10 lines/row, 1 line/columnOperation similar to DRAM, except:R d tRead-out:T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off
Write:Write:T2 off; N2 retracts; N1 retracts, copy of M presented
on input; T1 on; inh ffchanges; T1 off, N1
asserts; N2 asserts; T2 on
N1 N2
MT1 T2 T3in out
Limits of AdiabaticsLimits of Adiabatics
Structured SystemsStructured SystemsA structured system is defined as a system about whose state we have some knowledgewhose state we have some knowledge.
Some of its physical information is known.∴ Its entropy is not at a maximum (by defn )∴ Its entropy is not at a maximum (by defn.).∴ It is not at equilibrium (by defn.).
For states with a given energy EFor states with a given energy E,we say the system’s energy is distributed amongthose states in proportion to their probabilitythose states, in proportion to their probability.
All statesof the abstractsystem havingenergy E
States w.prob. > 0
The system’senergy is“in here” energy E“in here”
Desired TrajectoriesjAny structured
t b ild
Time
system we build to serve some
hConfig- Desired trajectories
purpose has somedesired
urationj
desiredtrajectory, or set of trajectories through its configuration space that weof trajectories, through its configuration space that we would ideally like it to follow at all times.
Think of any given state as having a specificThink of any given state as having a specific “desirability” at any given time.
Energy LossesgyEnergy dissipation can be viewed as a departure of part of the system’s energy away from the system’spart of the system s energy away from the system s desired trajectory.E.g., 1 of 106 electrons TimeE.g., 1 of 10 electronsleaks out of aDRAM cell = at
ion
Time
system’s energy hasdeparted from desiredtrajectory (all 106 stay) nf
igur
a
trajectory (all 106 stay)by a small amount Co
Energy that hasEnergy that hasdeparted from desiredtrajectoriesj