Mohsen Imanimoimani.weebly.com/uploads/2/3/8/6/23860882/iot_seminar_talk.pdf · System Energy...
Transcript of Mohsen Imanimoimani.weebly.com/uploads/2/3/8/6/23860882/iot_seminar_talk.pdf · System Energy...
System Energy Efficiency Labseelab.ucsd.edu
Mohsen Imani
University of California San Diego
Winter 2016
System Energy Efficiency Labseelab.ucsd.edu
Technology Trend for IoT
2http://www.flashmemorysummit.com/English/Collaterals/Proceedi
ngs/2014/20140807_304C_Hill.pdf
System Energy Efficiency Labseelab.ucsd.edu
Motivation
IoT significantly increases the amount of
computation and data generation
Amount of generated information surpassed 1.8
zettabytes which will be increased by 50% in 2020!
The rate of data generation is beyond the capability
of current computing systems.
Energy efficient data storage and computing!
3
System Energy Efficiency Labseelab.ucsd.edu
Motivation
How to improve IoT storage and computation?
Efficient large storage to store big data
Non-volatile memory
Energy efficient computing
Approximate computing
Near data computing (in-memory processing)
Neuromorphic computing (parallel processing)
Efficient computing with NVMs!
4
System Energy Efficiency Labseelab.ucsd.edu
NVM Requirements in IoT Context
Non-volatile memory requirement for IoT devices?
Minimize cost and area
Field programmability
Minimize start-up time
Low voltage, low power
Provide secure data storage
5
System Energy Efficiency Labseelab.ucsd.edu
NVM Requirements in IoT Context
Minimize Cost and Area
Because many IoT devices will have to be very inexpensive and
small
Minimize any additional wafer processing cost due to extra
masks or processing
Field Programmability
For setting user preferences or updating keys need to be
programmable during: chip manufacturing, test, when installed in
end-user equipment
Minimize start-up time
NVM should be fast enough to allow executing code directly
Avoiding the need to copy code to RAM for execution and
reducing boot-up time
6
System Energy Efficiency Labseelab.ucsd.edu
NVM Requirements in IoT Context
Low Voltage, Low Power IoT ecosystem will run on small batteries
Battery replacement may be difficult or even impossible
Devices convert motion, light, heat or an electromagnetic field
into the electrical energy needed to power the sensor
Embedded memory with low standby and operating power
dissipation
Provide Secure Data Storage Many applications involving the exchange of sensitive data, such
as financial transactions
Memory must have a high level of physical security and be
extremely difficult to reverse engineer
7
System Energy Efficiency Labseelab.ucsd.edu
Traditional Memory Hierarchies
8
Why SRAM as Cache?
Why DRAM as main memory?
Why Flash as SSD?
Why HDD as Secondary storage?
Speed
Density
Non-volatility + Price
Price
System Energy Efficiency Labseelab.ucsd.edu
NVRAM Comparison
9http://web.engr.oregonstate.edu/~sllu/xie.pdf
High density, low leakage, non-volatile
System Energy Efficiency Labseelab.ucsd.edu
STT-RAM: Spin-Transfer Torque RAM
STT-RAM: Spin-Transfer Torque RAM The spin torque direction of electrons to flip a bit in a magnetic tunneling
junction (MTJ)
(a) The Structure of MTJ
(b) Parallel: bit 0 (low Resistance)
(c) Anti-Parallel: bit 1 (high Resistance)
Advantage:
High read performance
High endurance!
Disadvantage:
Write energy: high amount of current
needed to reorient the magnetization for
most commercial applications
Write latency: low ON/OFF resistance ratio
(~2)
Asymmetric write: writing “1s” needs much
more time and energy than writing zero
System Energy Efficiency Labseelab.ucsd.edu
Domain Wall Memory (DWM)
Domain Wall Memory (DWM)
Similar to STT-RAM structure
Advantage:
Needs only one tunneling barrier and fixed layer → area saving
Disadvantage:
Complexity in design,
Write delay
11
Ferromagnetic tapeFerromagnetic tape
Free LayerFree LayerDomain WallDomain Wall
DomainDomain Fixed LayerFixed Layer Extra DomainsExtra Domains
MTJ
System Energy Efficiency Labseelab.ucsd.edu
Shift-based DWM
Shift-based DWM
Write by shifting data of one of the two fixed layers with the
desirable direction comp
Advantage: fast write operation than DWM
Disadvantage: Complexity on design
12
(a) 1-bit DWM Fast (b) Multi-bit DWM Area efficient,
but needs extra latency for shifting
12
Polarized direction
System Energy Efficiency Labseelab.ucsd.edu
PCM: Phase Change Memory
Phase Change Memory (PCM) Flips a bit by changing the state of material
Crystalline (SET) and amorphous (RESET) phase
PCM Cell Phases
PCM Operations
Advantage:
Better scalability than other
emerging technologies.
Very high density!
Disadvantage:
Slow in write (asymmetric write
operation)
Low endurance (107)
Candidate for DRAM replacement
System Energy Efficiency Labseelab.ucsd.edu
ReRAM: Resistive RAM
Types: Access-based and crossbar ReRAM
Access-based ReRAM (1T-1R)
A dialectric, which is normally insulating can be made to conduct
through after application of a sufficiently high voltage
Advantage:
Very fast in both read and write ~ 20ns
Very high density
Disadvantage:
Limited endurance (105)
14
Working mechanism of ReRAM
System Energy Efficiency Labseelab.ucsd.edu
ReRAM: Resistive RAM
Crossbar ReRAM
Replace with Cache? DRAM? Flash? Hard?
Crossbar ReRAM (1T-nR)
Advantage:
Highly scalable
Can be implemented at the top of the chip with in 3D architecture
Very low energy consumption
Low cost
Disadvantage:
Much slower than 1T-1R ~us
System Energy Efficiency Labseelab.ucsd.edu
Crossbar RRAM in IoT
16https://www.crossbar-inc.com/assets/resource/presentation/FMS2014-Slides-RRAM-in-IoT.pdf
Crossbar 1T-nR 1T-1R
System Energy Efficiency Labseelab.ucsd.edu
NVMs Comparison
STT-RAM: SRAM cache replacement
PCRAM: DRAM main memory and storage
ReRAM: NAND Flash, embedded NOR
System Energy Efficiency Labseelab.ucsd.edu
Approximate Computing
Why today’s systems waste time, energy, and
complexity to provide uniformly fresh operation for
applications that do not require it?
The idea that we are hindering computer systems’
efficiency by demanding too much accuracy from
them.
IoT applications are fundamentally approximate such as
machine learning, speech recognition, search, graphics,
and physical simulation
19
System Energy Efficiency Labseelab.ucsd.edu
Approximation
Where approximation can apply?
CPU
GPU
Accelerators
Storage
In which level?
Circuit
Architecture
Software
20
System Energy Efficiency Labseelab.ucsd.edu
Circuit Level Approximation
Applying voltage overscaling on circuits
New nano-scaled technologies and variability issues!
Probability of having multiple errors in different
process corners
Designing approximate building blocks with very
lower energy consumption
E.g. 4-bit XOR gates accept wrong answer in some
set of inputs
Mostly focus on adder and multiplier which are
building block of DSPs, ALUs, etc.
21
System Energy Efficiency Labseelab.ucsd.edu
GPU-Acceleration
Several streaming IoT applications need to be
accelerated using parallel processors such as GPUs
Requires energy efficient computing
Lookup table (associative memory):
Promising memory to reduce the energy consumption
of parallel processing
Pre-stores frequent patterns and their corresponding
output
Retrieve them in runtime in case of repeating
22
System Energy Efficiency Labseelab.ucsd.edu
Associative Memory Integration
23
Searches associative memory
(TCAM) in parallel with FPU
processing in a single cycle
Hit in TCAM stops FPU
computation using clock gating
This hit activates the
corresponding row of ReRAM
memory to read the result of
computation
Is there any approximation?
System Energy Efficiency Labseelab.ucsd.edu
Limitation of Associative Memory
TCAM consumes high energy consumption for each
search, with high switching activity
They need to search entire table so fast in single
cycle!
How to reduce their energy? using NVM based
TCAM
Zero leakage power for keeping the data
Very high density
The energy is still high because of high match-line
activity
24
System Energy Efficiency Labseelab.ucsd.edu
Approximate Computing
Applying voltage overscaling on TCAM
Accept the data matching with 1-2 bits hamming
distance
E.g. 11000000 input matches with 11000100
Pros:
Very low TCAM search energy under voltage
overscaling
Increasing TCAM hit-rate higher average time that
FPU is clock gated! High energy saving!!
25
System Energy Efficiency Labseelab.ucsd.edu
Approximate Computing
Approximate matches degrade computation
accuracy, because we consider 2+2.01=4!
For multimedia application PSNR >30dB guarantees
the accuracy
26
System Energy Efficiency Labseelab.ucsd.edu
Approximate Computing
How to reduce accuracy degradation?
Relaxing the computation on least significant bits
Accepting NO mismatch on MSBs
E.g. 1100000 ~ 11000001
but 1100000 Is NOT 0100000
27
Buffer
MLs
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
TCAM
Cell
En
L
Sen
se A
mp
lifiers
Applications have different accuracy
requirements
Tunable approximation
Framework to support new applications
Voltage Relaxation
System Energy Efficiency Labseelab.ucsd.edu
Approximate Computing
Bitline-configurable achieve to 43.6% energy
savings
Row-configurable achieves 44.5% energy savings
Acceptable quality loss of 10%
28
System Energy Efficiency Labseelab.ucsd.edu
Other State of The Art Techniques
How we can speed up computation with less impact
on accuracy?
Approximation reduces computation energy
consumption. What about data movement energy?
29
Neuromorphic Computing
Near Data Computing
System Energy Efficiency Labseelab.ucsd.edu
Neuromorphic Computing
Computation which works based on human neurons,
also called brain-inspired computing
All bits have the same impact on computation
30
Fast and parallel computing
High potential to reduce the
computation error on approximation
or process variation
Can be implemented on crossbar
memristive devices
Requires building block to do basic
computations such as dot product,
XOR, etc.
System Energy Efficiency Labseelab.ucsd.edu
Near Data Computing
Processing in-memory
Bring computation closer to data
From computer-centric to data-centric model
Old concept, but renewed interest due to:
New technologies (NVM, 3D)
Technology trends
Big data
31
System Energy Efficiency Labseelab.ucsd.edu
Summary
IoT increases the rate of data generation over the
world which requires:
Energy efficient computing
Large and efficient storage
Non-volatile memories can be used to improve both
energy efficiency and storage systems
High density NVM storage with nearly zero leakage
power
Approximate computing, near data computing and
neuromorphic computing using NVM-devices
32