Mohsen Imanimoimani.weebly.com/uploads/2/3/8/6/23860882/iot_seminar_talk.pdf · System Energy...

System Energy Efficiency Labseelab.ucsd.edu

Mohsen Imani

University of California San Diego

Winter 2016


Technology Trend for IoT

2http://www.flashmemorysummit.com/English/Collaterals/Proceedi

ngs/2014/20140807_304C_Hill.pdf


Motivation

IoT significantly increases the amount of

computation and data generation

Amount of generated information surpassed 1.8

zettabytes which will be increased by 50% in 2020!

The rate of data generation is beyond the capability

of current computing systems.

Energy efficient data storage and computing!

3


Motivation

How to improve IoT storage and computation?

Efficient large storage to store big data

Non-volatile memory

Energy efficient computing

Approximate computing

Near data computing (in-memory processing)

Neuromorphic computing (parallel processing)

Efficient computing with NVMs!

4


NVM Requirements in IoT Context

Non-volatile memory requirement for IoT devices?

Minimize cost and area

Field programmability

Minimize start-up time

Low voltage, low power

Provide secure data storage

5



Minimize Cost and Area

Because many IoT devices will have to be very inexpensive and

small

Minimize any additional wafer processing cost due to extra

masks or processing

Field Programmability

For setting user preferences or updating keys need to be

programmable during: chip manufacturing, test, when installed in

end-user equipment

Minimize start-up time

NVM should be fast enough to allow executing code directly

Avoiding the need to copy code to RAM for execution and

reducing boot-up time

6



Low Voltage, Low Power IoT ecosystem will run on small batteries

Battery replacement may be difficult or even impossible

Devices convert motion, light, heat or an electromagnetic field

into the electrical energy needed to power the sensor

Embedded memory with low standby and operating power

dissipation

Provide Secure Data Storage Many applications involving the exchange of sensitive data, such

as financial transactions

Memory must have a high level of physical security and be

extremely difficult to reverse engineer

7


Traditional Memory Hierarchies

8

Why SRAM as Cache?

Why DRAM as main memory?

Why Flash as SSD?

Why HDD as Secondary storage?

Speed

Density

Non-volatility + Price

Price


NVRAM Comparison

9http://web.engr.oregonstate.edu/~sllu/xie.pdf

High density, low leakage, non-volatile


STT-RAM: Spin-Transfer Torque RAM

STT-RAM: Spin-Transfer Torque RAM The spin torque direction of electrons to flip a bit in a magnetic tunneling

junction (MTJ)

(a) The Structure of MTJ

(b) Parallel: bit 0 (low Resistance)

(c) Anti-Parallel: bit 1 (high Resistance)

Advantage:

High read performance

High endurance!

Disadvantage:

Write energy: high amount of current

needed to reorient the magnetization for

most commercial applications

Write latency: low ON/OFF resistance ratio

(~2)

Asymmetric write: writing “1s” needs much

more time and energy than writing zero


Domain Wall Memory (DWM)

Domain Wall Memory (DWM)

Similar to STT-RAM structure

Advantage:

Needs only one tunneling barrier and fixed layer → area saving

Disadvantage:

Complexity in design,

Write delay

11

Ferromagnetic tapeFerromagnetic tape

Free LayerFree LayerDomain WallDomain Wall

DomainDomain Fixed LayerFixed Layer Extra DomainsExtra Domains

MTJ


Shift-based DWM

Shift-based DWM

Write by shifting data of one of the two fixed layers with the

desirable direction comp

Advantage: fast write operation than DWM

Disadvantage: Complexity on design

12

(a) 1-bit DWM Fast (b) Multi-bit DWM Area efficient,

but needs extra latency for shifting

12

Polarized direction


PCM: Phase Change Memory

Phase Change Memory (PCM) Flips a bit by changing the state of material

Crystalline (SET) and amorphous (RESET) phase

PCM Cell Phases

PCM Operations

Advantage:

Better scalability than other

emerging technologies.

Very high density!

Disadvantage:

Slow in write (asymmetric write

operation)

Low endurance (107)

Candidate for DRAM replacement


ReRAM: Resistive RAM

Types: Access-based and crossbar ReRAM

Access-based ReRAM (1T-1R)

A dialectric, which is normally insulating can be made to conduct

through after application of a sufficiently high voltage

Advantage:

Very fast in both read and write ~ 20ns

Very high density

Disadvantage:

Limited endurance (105)

14

Working mechanism of ReRAM


ReRAM: Resistive RAM

Crossbar ReRAM

Replace with Cache? DRAM? Flash? Hard?

Crossbar ReRAM (1T-nR)

Advantage:

Highly scalable

Can be implemented at the top of the chip with in 3D architecture

Very low energy consumption

Low cost

Disadvantage:

Much slower than 1T-1R ~us


Crossbar RRAM in IoT

16https://www.crossbar-inc.com/assets/resource/presentation/FMS2014-Slides-RRAM-in-IoT.pdf

Crossbar 1T-nR 1T-1R


Existing NV Memory Technology

Comparison

17


NVMs Comparison

STT-RAM: SRAM cache replacement

PCRAM: DRAM main memory and storage

ReRAM: NAND Flash, embedded NOR


Approximate Computing

Why today’s systems waste time, energy, and

complexity to provide uniformly fresh operation for

applications that do not require it?

The idea that we are hindering computer systems’

efficiency by demanding too much accuracy from

them.

IoT applications are fundamentally approximate such as

machine learning, speech recognition, search, graphics,

and physical simulation

19


Approximation

Where approximation can apply?

CPU

GPU

Accelerators

Storage

In which level?

Circuit

Architecture

Software

20


Circuit Level Approximation

Applying voltage overscaling on circuits

New nano-scaled technologies and variability issues!

Probability of having multiple errors in different

process corners

Designing approximate building blocks with very

lower energy consumption

E.g. 4-bit XOR gates accept wrong answer in some

set of inputs

Mostly focus on adder and multiplier which are

building block of DSPs, ALUs, etc.

21


GPU-Acceleration

Several streaming IoT applications need to be

accelerated using parallel processors such as GPUs

Requires energy efficient computing

Lookup table (associative memory):

Promising memory to reduce the energy consumption

of parallel processing

Pre-stores frequent patterns and their corresponding

output

Retrieve them in runtime in case of repeating

22


Associative Memory Integration

23

Searches associative memory

(TCAM) in parallel with FPU

processing in a single cycle

Hit in TCAM stops FPU

computation using clock gating

This hit activates the

corresponding row of ReRAM

memory to read the result of

computation

Is there any approximation?


Limitation of Associative Memory

TCAM consumes high energy consumption for each

search, with high switching activity

They need to search entire table so fast in single

cycle!

How to reduce their energy? using NVM based

TCAM

Zero leakage power for keeping the data

Very high density

The energy is still high because of high match-line

activity

24



Applying voltage overscaling on TCAM

Accept the data matching with 1-2 bits hamming

distance

E.g. 11000000 input matches with 11000100

Pros:

Very low TCAM search energy under voltage

overscaling

Increasing TCAM hit-rate higher average time that

FPU is clock gated! High energy saving!!

25



Approximate matches degrade computation

accuracy, because we consider 2+2.01=4!

For multimedia application PSNR >30dB guarantees

the accuracy

26



How to reduce accuracy degradation?

Relaxing the computation on least significant bits

Accepting NO mismatch on MSBs

E.g. 1100000 ~ 11000001

but 1100000 Is NOT 0100000

27

Buffer

MLs

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

TCAM

Cell

En

L

Sen

se A

mp

lifiers

Applications have different accuracy

requirements

Tunable approximation

Framework to support new applications

Voltage Relaxation



Bitline-configurable achieve to 43.6% energy

savings

Row-configurable achieves 44.5% energy savings

Acceptable quality loss of 10%

28


Other State of The Art Techniques

How we can speed up computation with less impact

on accuracy?

Approximation reduces computation energy

consumption. What about data movement energy?

29

Neuromorphic Computing

Near Data Computing


Neuromorphic Computing

Computation which works based on human neurons,

also called brain-inspired computing

All bits have the same impact on computation

30

Fast and parallel computing

High potential to reduce the

computation error on approximation

or process variation

Can be implemented on crossbar

memristive devices

Requires building block to do basic

computations such as dot product,

XOR, etc.


Near Data Computing

Processing in-memory

Bring computation closer to data

From computer-centric to data-centric model

Old concept, but renewed interest due to:

New technologies (NVM, 3D)

Technology trends

Big data

31


Summary

IoT increases the rate of data generation over the

world which requires:

Energy efficient computing

Large and efficient storage

Non-volatile memories can be used to improve both

energy efficiency and storage systems

High density NVM storage with nearly zero leakage

power

Approximate computing, near data computing and

neuromorphic computing using NVM-devices

32

Mohsen Imanimoimani.weebly.com/uploads/2/3/8/6/23860882/iot_seminar_talk.pdf · System Energy...

Documents

Transcript of Mohsen Imanimoimani.weebly.com/uploads/2/3/8/6/23860882/iot_seminar_talk.pdf · System Energy...