(1895) Essay on Barbers' Razors, Razor Hones, Razor Stropes & Razor Honing
Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor...
Transcript of Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor...
![Page 1: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/1.jpg)
1 1
1
1
Bubble Razor An Architecture-Independent Approach to
Timing-Error Detection and Correction
Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester
Electrical Engineering & Computer Science Department
The University of Michigan, Ann Arbor
![Page 2: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/2.jpg)
2 2
2
2
Outline
Issues with Prior Razor
Bubble Razor Algorithm
Circuitry and Implementation
Area Overhead Tradeoffs
Test Chip Results
![Page 3: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/3.jpg)
3 3
3
3
Timing Margins
Voltage
Process Aging
Temperature
clock
Lost performance/energy
Margins for uncertainty:
Process Variation
Temperature Variation
Voltage Variation
Aging Effects
Associated Costs:
Lost performance
Lost energy
Tester time (tradeoff)
actual circuit delay
Data
![Page 4: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/4.jpg)
4 4
4
4
Technique Process Ambient Data
Global Local Global Local
Slow Fast Slow Fast
Table Lookup X X
Table & Sensors X X X
Canary Circuit X X
Razor Designs X X X X X X X
Eliminating Margins
Always Correct
Tables, Canaries
Detect and Correct
Razor Style Shadow
Latch
Main DFF
Error
Q D
CLK
DCLK S. Das, et. al. [VLSI 2005]
![Page 5: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/5.jpg)
5 5
5
5
Speculation Window and Hold Time
DFF B DFF A
CLK A
CLK B
Speculation Window
Speculation window linked to minimum delay constraint (hold time)
![Page 6: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/6.jpg)
6 6
6
6
Architectural Invasiveness
IF ID EX MEM WB CHK
Razor I Style – All Flops Reload Previous Values
Razor II Style – Check Stage and Architectural Replay
S. Das, et. al. [VLSI 2005]
D. Blaauw, et. al. [ISSCC 2008]
IF ID EX MEM WB
K. Bowman, et. al. [ISSCC 2008]
• Requires Designer Effort
• RTL written with Razor in mind
![Page 7: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/7.jpg)
7 7
7
7
Fundamentals of Bubble Razor
Two-Phase Latch Timing
Automatically convert Flip-Flop based design
Time Borrowing as Correction Mechanism
Does not modify design architecture
Does not require reloading / replaying instructions
Local Correction (Bubbles)
Break requirement of stalling entire chip at once
![Page 8: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/8.jpg)
8 8
8
8
Two Phase Latch Razor Timing
CLK A
CLK B
Larger Speculation Window
Minimum delay constraint the same as conventional design
LD B
LD A
![Page 9: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/9.jpg)
9 9
9
9
Time Borrowing as Error Correction
DFF DFF LD LD
TD TD TD TD
LD LD
G closed open closed open closed X closed open
D
Error
Bubble Razor – Switch to Latches, Borrow Time
• No Hold Time Issues
• Architecture Agnostic
• Push-button approach
• No metastability on datapath
![Page 10: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/10.jpg)
10 10
10
10
Stalling Locally with Bubbles
1 3 5 7 2 4 6 8
Tim
e
Eventually it all resolves
Stalling the Clock Locally
• With flops, all registers hold data
• With latches, half registers hold bubbles
• Every latch stalls exactly once
• Communication only between neighbors
Blue tells Green to stall Purple tells
Blue to stall Yellow takes off again
Red tells Purple to stall Yellow tells
Red to stall Yellow tells downstream
no new data exists Yellow stalls
Not immediately overwritten
![Page 11: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/11.jpg)
11 11
11
11
Timing of Clock Waveforms
1 2 3 4 5 6 7 8 9 10
2
3
4
1
Should Arrive
Timing violation
Give time to Recover
Prevent Double Sampling inst1
Prevent Losing inst2
Prevent Losing inst3
![Page 12: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/12.jpg)
12 12
12
12
Timing of Clock Waveforms
1 2 3 4 5 6 7 8 9 10
2
3
4
1
Should Arrive
Timing violation
Give time to Recover
Prevent Double Sampling inst1
Prevent Losing inst2
Prevent Losing inst3
![Page 13: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/13.jpg)
13 13
13
13
Timing of Clock Waveforms
1 2 3 4 5 6 7 8 9 10
7
2
3
4
5
6
8
1
9
10
![Page 14: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/14.jpg)
14 14
14
14
7
2
3
4
5
6
8
1
9
10
Timing of Clock Waveforms
1 2 3 4 5 6 7 8 9 10
Timing violation Stall Neighbors
Stall 3
![Page 15: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/15.jpg)
15 15
15
15
The Required Circuitry
2 3 1 2
CG CG CG CG
TD TD TD TD
B
B
B
B
![Page 16: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/16.jpg)
16 16
16
16
Error Detection And OR Circuitry
TD TD TD
1
![Page 17: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/17.jpg)
17 17
17
17
Clock Gate Control Logic
A cluster stalls and sends bubbles to all neighbors if
Told by a neighboring cluster
Did not stall in the previous cycle
Equivalent to sending bubbles to “other” neighbors
CG
B
![Page 18: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/18.jpg)
18 18
18
18
Clustering with hMETIS
Widely used Hypergraph
partitioning program, hMETIS
Clusters must only contain
members with the same phase
Create two graphs, and partition
independently
Connected in hMETIS graph, if
transitively connected in circuit
Edge Weight = number of latches
that form transitive connection
4 1
2
5
6
3
1
2
3
4
5
6 1
1
2
1
2
1
![Page 19: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/19.jpg)
19 19
19
19
Clustering Results
Tradeoff between
sizes of OR gates
Combining errors
Combining bubbles
100 negative clusters 70 positive clusters
![Page 20: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/20.jpg)
20 20
20
20
Two Port Memory Boundary Approach
Must fit edge triggered
memory into stalling
algorithm
![Page 21: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/21.jpg)
21 21
21
21
“Managing” the Synthesis/APR Tools
Want balanced pipelines, no time borrowing
Model razor latches as flip flops
Dynamic OR always followed by latch
Model dynamic OR as static
Model latch as flip flop (captures when latch closes)
Use regular ICG cells
Can use conventional clock tree synthesis
Final design appears to be relatively “normal”
Flip-flop based design with clock gating
Everything is timing constrained
“Razorization” process is entirely automated
Synthesis and netlist transformation scripts
![Page 22: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/22.jpg)
22 22
22
22
Retiming And Number of Latches
Retiming can increase the number of latches
Results in area overhead
![Page 23: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/23.jpg)
23 23
23
23
Area Overhead of Latch Transformation
![Page 24: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/24.jpg)
24 24
24
24
Speculation Window Size
Full Clock Phase (100%) Minus Delay of Error
Propagation Circuits
Maximum allowed by technique
Number / Location of Latches with Error Checking
Maximum slowdown that does not result in unchecked error
Speculation Window
![Page 25: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/25.jpg)
25 25
25
25
50%
Where Error Checking is Needed
If circuit delay suddenly becomes 130% of its
nominal value, all timing errors will be detected
before the circuit fails
15%
30% Speculation Window
A B C D
65%
50%
26%
50% 20%
65%
91% 156%
Delay at PoFF
Delay at Worst
Leave B
Arrive C
>50? >50? >50?
Arrive D
![Page 26: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/26.jpg)
26 26
26
26
Path Distribution for Cortex-M3
Positive
Latches Negative
Latches
Flip
Flops
All
Latches
![Page 27: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/27.jpg)
27 27
27
27
Area Increase from Error Checking
20% Area Overhead
30% Timing Speculation
![Page 28: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/28.jpg)
28 28
28
28
Implementation on ARM Cortex-M3
![Page 29: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/29.jpg)
29 29
29
29
Characterizing Throughput / Energy
Operating Point Set for Worst Case Operation
85°C
10% Supply Droop
2σ Process
5% Safety Margin
200 MHz at 1.0 V
![Page 30: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/30.jpg)
30 30
30
30
Gains from Bubble Razor
![Page 31: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/31.jpg)
31 31
31
31
Gains from Bubble Razor
![Page 32: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/32.jpg)
32 32
32
32
Bubble Razor Results
Slow Average Fast
![Page 33: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/33.jpg)
33 33
33
33
Bubble Razor Results
Worst
Case
200 MHz 8.5
FFT/ms
First
Failure
333 MHz 14.2
FFT/ms
Optimum 425 MHz 17.3
FFT/ms
Worst
Case
1.0 V 3.08
μJ/FFT
First
Failure
0.775 V 1.42
μJ/FFT
Optimum 0.725 V 1.18
μJ/FFT
![Page 34: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D.](https://reader036.fdocuments.net/reader036/viewer/2022070722/5f01b3e07e708231d400a1f4/html5/thumbnails/34.jpg)
34 34
34
34
Conclusion
First Razor style implementation on a complete,
commercial processor (ARM Cortex-M3).
Proposed two-phase latch based Razor technique
Novel local replay algorithm
Demonstrated automated nature of technique
Successfully implemented and fabricated in 45nm
60% energy efficiency or 100% throughput increase
over worst case margining