Post on 06-Aug-2020
STATISTICAL COMPUTING THE ALTERNATIVE ROAD TO LOW ENERGY
Jan M. Rabaey Donald O. Pederson Distinguished Prof.
University of California at Berkeley
DAC 2009
With major contributions from Doug Jones, Subhasish Mitra, and Naresh Shanbhag All research sponsored by Gigascale Systems Research Center (GSRC)
It Is All About Energy …
Further progress in all aspects of future information technology platform requires continuing increase in energy efficiency
The Compute Cloud Mobiles
But … We are running out of options
Waste has been largely eliminated (…)
0 0.2 0.4 0.6 0.8 1 1.2
VDD (V)
0.001
0.01
0.1
1
En
erg
y (
no
rm.)
0.3V
12x
Minimum energy point set by leakage
Technology scaling may not help much anymore
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
20 30 40 50 60 70 80 90
Technology node (nm)
EO
P (
fJ)
Process variations and random upsets dictate noise and timing margins
The ways out … New devices that lower the minimum energy point
Example: NEMS Relay Logic (King, Alon)
Others: TFETs, IGFETs
Probably more than decade out
Cut the margins in major way and absorb the consequences
Robustness
Effic
iency
Conventional
Statistical Computing
Possible Today!
First Step: Better-than-Worst-Case Computing
Example: RAZOR (T. Austin at al, Michigan)
Scale voltage more than is allowable and deal with the circumstances (through error-trapping and correction) Shadow
Latch
Error_L
Error comparator
clk_del
FF
clk
Q D
“razorized pipeline”
Reduced margins throughput uncertainty – functionally deterministic
The Opportunity: Functional Non-Determinism +
-
Effic
iency
-
+
Redundancy/ Overdesign
App. Domain Solution
Statistical Computing
Required
Accura
cy
Statistical Computing
Statistical Performance Metrics
Statistical Model of Implementation Platform
Statistical Computation
Inputs: deterministic or stochastic variables Outputs: stochastic variables with guaranteed properties (mean, distribution, bounds)
Implementation adds randomness (errors) System designed such that output metrics are accomplished in spite of randomness of implementation
Requires error models to help design the compensation techniques
Examples: synthesis, classification, modeling, search, recognition
Not to be confused with …
Probabilistic algorithms: Algorithms that have element of randomness
Given deterministic inputs and implementation, outputs are random variables
May lead to better performance (search, optimization, polynomial factoring) Example: simulated annealing, genetic algorithms
No specific benefits related to nanoscale computing
Not to be Confused With … Probabilistic Boolean Networks
All signals in logic network considered as stochastic variables.
Noise added into the process. Each logic network is essentially stochastic process, producing stochastic variables at the output
Soft data is turned into Boolean variables with error probability at decision points (e.g. latch with sharp timing edges)
In Out
Equivalent to discrete communication known as Binary Symmetric Channel (BSC) Studied extensively in Information Theory (Von Neumann, Winograd, Hajek) Coding can be applied, but large overhead and latency
Statistical Computing … What it is! Computational engines that, given the properties and statistics of the input signals and the physical implementation, ensure that the outputs fall within the desired specifications
Basic Tools: • Algorithmic resilience • Estimation • Detection
Example: Error-Resilient System Architecture (ERSA)
RMS: Recognition, Mining, Synthesis
Emerging killer applications: cognition, vision, genomics
Large data sets, highly parallel
Core algorithms
Probabilistic belief propagation, K-means clustering, Bayesian networks
[S. Mitra et al., Stanford University]
Cognitive resilience
“Acceptable” results OK
Algorithmic resilience
Low order bit-errors – minimal effects
Intolerant to control and higher order bit-errors
RMS Workload Model
Worker thread
Main thread
Setup
Work Assignment
Barrier
Data Reduction
Work Queue
Convergence Test
Iterations
Calculate
Worker thread
Calculate
Worker thread
Calculate
13
Relaxed
Reliability
Cores
Super
Reliable
Core
RRC 1
L1 cache
RRC 1
L1 cache
RRC 1
L1 cache
RRC 1
L1 cache
L2 Bank 2
L2 Bank 2
L2 Bank 2
Supervisor
SRC
L1 cache
Interconnect
L2 cache Bank 1
…
…
: Reliable : Unreliable
ERSA Vision: Asymmetric Reliability
OS visible Sequestered from OS
Relaxed Reliability Cores : Specification
• Inexpensive & Unreliable • Without expensive error detection
• Worker Threads • Consists most of the workload
• Reliable parts • Memory Bound Check • Restart
Super Reliable Core : Specification
• Highly Reliable (Expensive) • Proper Error Protection
• Executes Main Thread • Assign Worker Threads • Reduction
• Supervise RRCs • Timeout check
RMS on ERSA
Calculate
Worker thread + bounds check
(RRC)
Main Thread (SRC)
Setup
Work Assignment
Barrier “Basic” check
Data Reduction
(Work, Memory bounds, Timeout)
Convergence
Test
Iterations
Worker thread + bounds check
(RRC)
Simplistic ERSA inadequate
Convergence filtering heuristics
Convergence damping
= Estimation
Calculate
ERSA Prototyping
MISP [Hankins ISCA 06] Emulation Firmware
Hardware Error Injection (Virtualization)
OS + Many-Core Runtime
Application Program
RRC
RRC
SRC
RRC
RRC
RRC
RRC
RRC
Error model: Random register bits flipped per RRC at random intervals (consistent with mean error rate)
ERSA Results
0
5
10
15
20
25
30
0 1K 2K 3K 5K 10K 20K
Error %
(Probabilty
Dist.)
Errors / RRC / sec
Naïve ERSA
No ERSA
Optimized
ERSA
Bayesian Network Inference
0
20
40
60
80
100
0 2K 4K 6K 8K 10K 25K
Successful
Decoding
(%)
Errors / RRC / sec
Naïve ERSA
No ERSA
Optimized
ERSA
LDPC Decoding
0
0.5
1
1.5
2
2.5
3
3.5
4
0 500 1K 5K 10K 15K 20K 25K 30K
Normalized
Execution
Time
Errors / RRC / sec
Naïve ERSA
No ERSA
Optimized
ERSA
0
0.5
1
1.5
2
2.5
3
3.5
4
0 500 1K 5K 10K 15K 20K 25K 30K
Normalized
Execution
Time
Errors / RRC / sec
Naïve ERSA
No ERSA
Optimized
ERSA
Execution T
ime
Outp
ut
Qualit
y
Algorithmic Noise-Tolerance (ANT) Combining estimation and detection
• Main Block designed for average case – Makes intermittent errors (reduced margins)
• Estimator approximates Main Block output
• Detector compares and replaces
• Assumes algorithmic knowledge for designing efficient estimators [Courtesy: Shanbhag et al, UIUC]
Results: ANT Motion Estimation
2.5X energy-savings
ANT
Conventional
ideal conventional ANT
PSNR variance reduction: 7X
Peak SNR
PSNR increase
Using Estimation Only “Sensor Networks on a Chip (SNOC)”
Stochastic Model
i i Y +=
estimate
observations noise
Estimation Theory
Computational cores Requires
Efficient and robust estimators
Favorable error-statistics, e.g. independent and identical distributions
[Shanbhag, Jones et al, UIUC]
SNOC-based PN-Code Acquisition
300X
40% energy
savings
Probability (Detection)
Commonly wireless CDMA receiver kernel
Polyphase decomposition
800X (better performance), 300X (reduced performance variation), 40% (energy savings)
800X
Statistical Computing: Quo Vadis?
So far: pretty much ad-hoc
The quest for a generalized strategy
Input descriptions that capture intended statistical behavior
(GP) statistical processors with known error models
Algorithm optimization and software generation (a.k.a compilers) so that intended behavior is obtained
Statistical Processors – Thinking aloud
Reliable simple CPU:
Calibration: Collect statistics (Vdd,f) for cores and interconnect
Statistics selection: Application dependent (QoS); (Vdd,f) for ‘good’ statistics
Application-dependent reconfiguration and adaptation
Energy-efficient unreliable IP Cores
Performs majority of computation
Intermittent errors
General-Purpose Statistical Computing Soft NMR (N-way Modular Redundancy)
Soft voter: combines multiple observations with observed error profiles and multiple hypothesis to provide output minimizing error does not need algorithmic information
Challenges: Generation of error profiles Hypothesis synthesis
[Courtesy: Shanbhag, Kim, UIUC]
Error Profiles (pdf)
0.00%
0.01%
0.10%
1.00%
10.00%
100.00%
0.6 0.8 1 1.2 1.4 1.6 1.8 2
Err
or
rate
Supply Voltage
random
bzip
ammp
Example: Errors resulting from Voltage Over-Scaling (VOS)
Can be obtained from simulation or on-chip test
Kogge-Stone Adder with realistic patterns
[Courtesy: T. Austin, Umich]
Example: Soft Multiplier • Error Statistics
– VOS: 16 bit RCA with 66% of Vdd-crit
• N=3: 10X in Psys, 3X in Pe
• N=7: 800X in Psys at Pe = 0.2
800x
3x
Major Take-Away’s
Energy rules
Reductions in energy/op not quickly fore coming
Statistical computing allows for major reduction in margins and eliminates over-design
Initial prototypes show very promising potential
The Million $ Proposal: “General-purpose statistical computing” (or is this an oxymoron)