50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated...
Transcript of 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated...
![Page 1: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/1.jpg)
AnalogCircuitWorks
50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from
Synchronous RTL
Bill Ellersick, Analog Circuit Works
Dylan Hand, Reduced Energy Microsystems
Peter A. Beerel, University of Southern California
April 19, 2018
![Page 2: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/2.jpg)
BIG PICTURE
Power-efficiency is key to mobile and datacenters
Industry shifting from raw performance to performance/W
REM and ACW are fabless semiconductor companies collaborating on a project using unique technology to build ultra-efficient processors
1 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 3: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/3.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
POWER EFFICIENT MICROPROCESSOR PROJECT
2
Synchronous µProcessor
µProcessor
Sync Core
Memory
SERDES
I/O
Sync.
Logic
AnalogLegend: Digital Interface
• Asynchronous logic to reduce power (same functionality from outside)
• Integrated power management with 5 zones to optimize different circuit types
Asynchronous µProcessor
µProcessor
Async Core
Memory
Power
SERDES
I/O
Async
Logic
Vdd<4:0>
Sync to Async Scripts
Integrated Power Mgmt
Optimize/Verify vs.Vdd
![Page 4: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/4.jpg)
WORST-CASE PERFORMANCE
Instruction Time
1 2 3Traditional
SynchronousProcessor
Wasted Time
3
Block Size
Synchronous designs synthesized to meet worst-case
Area and power overhead required to hit performance target can be substantial
Power-
Optimized
Block Size
Extra
Area to
Meet
Timing
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 5: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/5.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
OVERVIEW
• Introduction
Asynchronous bundled data technology
• Supply voltage scaling
• Synchronous to asynchronous flow
• Summary
4
![Page 6: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/6.jpg)
RESILIENT BUNDLED DATA
Static combinational logic, with error detecting latches to pass data to next block
Delay matches worst-case combinational logic delay 90% of the time
10% of the time, outputs change after Delay: Control holds off next stage
5 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 7: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/7.jpg)
1 2 3
AVERAGE-CASE PERFORMANCEInstruction Time
1 2 3Traditional
SynchronousProcessor
ACW/REM
Asynchronous Processor
Wasted Time
6
Block Size
Relaxing timing constraints allow some instructions to take longer
On average, computational performance does not change
Smaller logic blocks are more power and area efficient
Power-
Optimized
Block Size
Extra
Area to
Meet
Timing
Power-
Optimized
Block Size
Async
Control
Logic
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 8: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/8.jpg)
DATA DEPENDENCY
7
Instruction Time
Resilient BDa +
bc + d
e +
f
Time Saved
Block Size
Resilient BD allows cycle time to change depending on input data
Assumes long data dependent cycles are rare: average performance improves
Power-
Optimized
Block Size
Async
Control
Logic
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 9: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/9.jpg)
SELECTABLE DELAYS
Identify blocks with variable delay based on operation (manually)
Change delay of stage on cycle-by-cycle basis
Achieve averaging over time
8 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 10: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/10.jpg)
RESILIENT BD ADVANTAGESExploits variations in delay regardless of logical function enabling a more generic approach
Allows designers to cut margins as timing violations will be detected and corrected
Tolerates and averages random delay variations
9 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 11: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/11.jpg)
DELAY DISTRIBUTION
Delay
Num
ber
of O
pera
tions
Slower
Traditional design
runs at “worst-case” performance
REM design runs
at “average-case” performance
10
Worst-case operations occur infrequently. Wide variability in delay shows
potential to optimize designs for average case performance, saving time.
Distribution of Operation Delays in Synchronous Processor
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 12: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/12.jpg)
VOLTAGE SCALING
11
Dropping voltage trades time savings for power savings
Matches synchronous performance at a lower voltage
Instruction TimePower
HL
1 2 3
1 2 3
HL
Time Saved
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 13: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/13.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
TIME SAVINGS CAN TRADE FOR EVEN LARGER POWER SAVINGS
• When peak speed not needed, run at lower Vdd
• Power drops faster than speed as Vdd drops => efficiency increases• Chart shows 28nm normalized logic efficiency (Gops/W) vs. Vdd
• Blue curve includes efficient, integrated buck and charge pump converters
• Async designs use 2-3x less power than sync designs [D. Hand, et al, "Blade -- A Timing Violation Resilient Asynchronous Template," ASYNC 2015
12
![Page 14: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/14.jpg)
EDA FLOW OVERVIEW
13
Design flow on top of existing synchronous flow
Custom Tcl / Perl scripts to glue between tools and generate constraints
Benefits from advancements in tools & synchronous IP (e.g. RISC-V core)
Sync RTL
(Verilog)
Async
Synthesis
Place & Route,
Optimization
Timing
Analysis
Verification
Analog
Verification
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 15: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/15.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
SYNC TO ASYNC• Automated conversion of synchronous netlist to asynchronous
• Choose asynchronous logic domains• Replace flip-flops with latches• Add custom cells: delay lines, error detecting latches, and control logic
14
Async TCL Scripts ReSynthesis
Async
Netlist
Sync
RTLSync Synthesis
Sync
Netlist
Sync
Timing.sdf
Custom Cell Design/Verif/Char
Async Constraints
File (Tcl)
Sync Constraints
File (Tcl)
![Page 16: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/16.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
INTERACTIVE OPTIMIZATION, TIMING CLOSURE
Define virtual clock for each async controller to establish constraints
Periods, delays of clocks based on analysis of asynchronous circuit
Async Scripts analyze average case performance and generate constraintsIterate to obtain desired performance post-synthesis and place/routeDelay lines synthesized, modified during P&R
15
Place/Route
OptimizationAsync Scripts
Resynthesis
Modified
Netlist
Intermediate
Layout
Timing
ReportFinal
Layout
Target Hit
Target Missed
Performance
Target (Tcl)
Async
Netlist
Power Intent
(UPF)
Async Constraints
File (Tcl)Async Constraints
File (Tcl)
![Page 17: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/17.jpg)
TECHNOLOGY SUMMARY
16
Power Savings
Smaller LogicLess Wasted
TimeLower Voltage
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
![Page 18: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/18.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
RECENT RESULTS
• 130nm computational logic ASIC• Flawless operation from 0.65 to 1.3V, silicon-verified
• 3.39M transistors in 10.35mm2 with 6 power domains
• Excellent power efficiency, matching simulation
• 28nm 3-stage OpenCore MIPS CPU• 14k gates
• 15,226 um2: 9% area increase vs. 14,000 um2 synchronous design
• 35% speedup vs. synchronous design
17
![Page 19: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/19.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
UPCOMING SILICON
• REM’s upcoming 22nm Machine Vision processor• Custom architecture tailored for edge computing
• Expected order-of-magnitude smaller power envelope than existing solutions
• Heterogeneous processors for maximum versatility• Dual RISC-V processors
• Powerful Tensilica Vision DSP
• Proprietary Asynchronous Neural Network Accelerator designed in-house
18
Vision
DSPAI Core
R1 Memory System
RISC-V
CPU
RISC-V
CPUISP
DDR USB GPIO I2C I2S UART SPI CSI
![Page 20: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/20.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
22FDX PROCESS
• 22nm Fully-Depleted Silicon-on-Insulator• FinFET performance at 28nm planar cost
• Energy efficient for logic, analog, RF
• Roadmap to 12nm FD-SOI
• Ultra-low power dissipation with supply voltage as low as 0.4V
• Body biasing can super-charge or throttle with no layout changes
Reference: https://www.globalfoundries.com/technology-solutions/cmos/fdx/22fdx
19
![Page 21: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/21.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
WHY ASYNCHRONOUS LOGIC
• Eliminates inefficiencies in synchronous RTL methodology• Waits for worst case transistor, process, voltage, temperature, noise
• It may finally be time• ILLIAC II was an early asynchronous processor, in 1962
• In ILLIAC’s first 3 weeks of operation, it discovered 3 new Mersenne primes
• (In one of my early school projects, I built the WILLIAC, powered by paper tape)
• Asynchronous arithmetic units are widely used
• Intel recently announced the Loihi neural asynchronous processor
20
![Page 22: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/22.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
WHY RESILIENT BUNDLED DATA
• Automatic conversion from sync RTL: no costly manual design
• Tolerates variations and recovers in realtime• Performance improves as supply drops (variations increase)
• Handles rapid changes in supply voltage gracefully
• 100s of timing constraints vs. 100K in previous async flows• Combinational blocks are similar to synchronous combinational blocks
• Meets urgent need for improved power efficiency
21
![Page 23: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/23.jpg)
SUMMARY
22 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
• REM and ACW are building highly power-efficient processors
• REM’s asynchronous technology enables average-case design• Dramatic power-efficiency gains
• Can optimize asynchronous flow with latest IP advances
• Analog Circuit Works enhances power efficiency, enables SoCs• DDR Memory I/F, CSI Camera I/F, USB
• Integrated power management optimizes each circuit
• We are open to partnerships on products
![Page 24: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill](https://reader034.fdocuments.net/reader034/viewer/2022042200/5e9f8cc7888ca434e02c9168/html5/thumbnails/24.jpg)
Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works
ANALOG CIRCUIT WORKS