MS Thesis Defense

43
MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen CoE EECS Department April 21, 2014

description

MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen. CoE EECS Department April 21, 2014. About Me. Tania Jareen MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications: - PowerPoint PPT Presentation

Transcript of MS Thesis Defense

Page 1: MS Thesis Defense

MS Thesis Defense

“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”

By

Tania Jareen

CoE EECS Department

April 21, 2014

Page 2: MS Thesis Defense

Jareen 2

About Me

Tania Jareen

MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications:

“An Effective Locking-Free Caching Technique for Power-Aware Multicore Computing Systems,” accepted in the IEEE ICIEV-2014 conference.

“A Novel Level-1 Cache Mapping Approach to Improve System Security without Compromising Performance to Power Ratio,” currently preparing.

Page 3: MS Thesis Defense

Jareen 3

Committee Members

Dr. Abu Asaduzzaman, EECS Dept.

Dr. Ramazan Asmatulu, ME Dept.

Dr. Zheng Chen, EECS Dept.

Page 4: MS Thesis Defense

Jareen 4

“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”

Outline ►

IntroductionProblem StatementSome Important TermsPrevious WorkProposalSimulationSimulation Results ConclusionsFuture Work

Q U E S T I O N S ? Any time, please.

Page 5: MS Thesis Defense

Jareen 5

Introduction

Multicore System Multicore system is a collection of parallel or concurrent

processing units, divides a large complex problem into many small tasks

Main goal : to process a complex problem faster

Dual-core System

Page 6: MS Thesis Defense

Jareen 6

Problem Statement

Challenges for Multicore System

High Average Memory Latency

High Total Power Consumption

Cache Side Channel Security Attack

Page 7: MS Thesis Defense

Jareen 7

Contributions

Propose a multicore system design to reduce the average memory latency

Propose a multicore system design to reduce the total power consumption

Propose a multicore system design to provide hardware level security

Page 8: MS Thesis Defense

Jareen 8

Some Important Terms

■ Cache

A small buffer to store

recent information

Helps to mitigate the speed gap

between processor and main memory

Increases the overall performance of the system significantly

Logically cache is placed between CPU and main memory

Cache and Main Memory (Computer Desktop Encyclopedia)

Page 9: MS Thesis Defense

Jareen 9

Some Important Terms

■ Cache Organization

Cache Hit – The requested data contains in the cache

Cache Miss – The requested data does not contain in the cache

Cache Organization

Page 10: MS Thesis Defense

Jareen 10

Some Important Terms

■ Cache Replacement Policy

Some blocks from the

cache need to be replaced

to store new blocks as the

cache memory size is limited

Replacement should be in a manner so that the miss ratio will be low

Some of the well know cache replacement policies – Least Recent Used (LRU), Random, Most Recent Used, First In First Out etc

Cache Replacement Policy (Aaron Toponce)

Page 11: MS Thesis Defense

Jareen 11

Some Important Terms

■ Memory Update Policy

This is a combination of Read policy and Write policy

Read Policy – indicates how a word is to be read.

Write Policy – indicates how the write of a memory block will be handled. Example: Write-Through, Write-Back

Page 12: MS Thesis Defense

Jareen 12

Some Important Terms

■ Cache Locking

Lock the most usable data for future

During replacement, these locked blocks

will not be replaced

Increases the hit ratio and performance

Reduces average memory access time and power consumption

Problem : Hard to predict locking blocks, all processor configuration does not suit, reduces effective cache size

Locked Cache

Page 13: MS Thesis Defense

Jareen 13

Some Important Terms

■ Victim Cache

Oldest and the most popular

technique to improve performance

Used in between CL1 and CL2

Holds the victim blocks during cache replacement

Reduces average memory latency and total power consumption

Victim cache Organization

Page 14: MS Thesis Defense

Jareen 14

Some Important Terms

■ Stream Buffering

During cache miss, required blocks along with some additional blocks come from main memory to CL2 and then copy to CL1

The additional blocks are kept in Stream Buffer

Helps to reduce average memory latency and total power consumption

Page 15: MS Thesis Defense

Jareen 15

Some Important Terms

■ Cache Side Channel Attack

Hardware attack, mainly on cache

Exploits important information from cache by passively monitoring

Uses physical properties (Example: time variation, power

consumption, sound variation, heat production) [1,2,3,4]

Silent attack, but most dangerous

Page 16: MS Thesis Defense

Jareen 16

Some Important Terms

■ A-Symmetric Encryption Step 1: Receiver generates private and public key and shares the public key with the sender.

Step 2. Sender encrypts the information using the public key

Step 3: Sender sends the encrypted information to receiver

Step 4: Receiver decrypts the information using its own private key

A-Symmetric Encryption

Page 17: MS Thesis Defense

Jareen 17

Previous Work

■ To Improve Average Memory Latency and Total Power Consumption:

Victim Cache between CL1 and main memory and Stream Buffering [6]

Problem – no guarantee that the victim blocks are with maximum number of miss

Selective Victim Caching [7]

Problem – possibility to pollute the cache, need prediction

Page 18: MS Thesis Defense

Jareen 18

Previous Work

Selective Pre-Fetching [8]

Problem – need a history of references

Cache Locking [9]

Problem – hard to predict the blocks with high cache miss, all processor configuration do not support

■ To Improve Cache Level Security: Partitioned Cache [1]

Problem – cache underutilization, need to depend on software

Dynamic Memory-to-Cache Remapping [5]

Page 19: MS Thesis Defense

Jareen 19

Proposed Mechanism

■ Smart Victim Cache (SVC) MCB = Miss Cache Block

VCB = Victim Cache Block

SBB = Stream Buffering Block

BACMI = Block Address and Cache Miss Information

SLLC = Shared Last Level Cache

Proposed Cache Organization with SVC

Page 20: MS Thesis Defense

Jareen 20

Work Flow Diagram

Work Flow Diagram

Page 21: MS Thesis Defense

Jareen 21

Proposed Mechanism

Block size = 128 Bytes; main memory = 4 GB

SVCSize(KB)

Num. ofBlocks

SVC1:MCB

(Block)

SVC2:VCB+SBB

(Block)

Max. Num.of BACMIs(MCB*16)

2 168 5 + 3 128

5 8 + 3 80

4 328 21 + 3 128

16 13 + 3 256

8 648 53 + 3 128

48 13 + 3 768

16 1288 117 + 3 128

112 13 + 3 1792

32 2568 245 + 3 128

240 13 + 3 3840

Maximum Number of BACMI entries for a given SVC with various MCB

MCB = Miss Cache Block

VCB = Victim Cache Block

SBB = Stream Buffering Block

BACMI = Block Address and Cache Miss Information

Page 22: MS Thesis Defense

Jareen 22

Simulation

■ Assumptions

SVC can be enabled and disabled

All cores equally share SVC

LRU replacement policy is used

Write-Back update policy is used

Page 23: MS Thesis Defense

Jareen 23

Simulation

■ Workload

Moving Picture Experts

Group’s – 4 (MPEG-4) Advanced Video

Coding (H.264/AVC) Matrix Inversion (MI) Fast Fourier Transform (FFT)

H.264/AVC behaves similar to MPEG-4 MI behaves similar to FFT

Page 24: MS Thesis Defense

Jareen 24

Simulation

■ Input Parameters

Number of cores = 4

SVC size = 2, 4, 8, 16, 32 KB

I1/D1 size of CL1 = 8/8, 16/16, 32/32, 64/64, 128/128 KB

CL2 size = 256, 512, 1024, 2048, 4096 KB

Line size = 16, 32, 64, 128, 256 B

Associativity level = 1- ,2-, 4-, 8-, 16 - way

Page 25: MS Thesis Defense

Jareen 25

Simulation

■ Assumption for Delay Penalty

Number of cycle to load and store any operation = 100

Number of cycle to branch any operation = 150

Satisfy Any Instruction at

Number of Cycles

ALU 1

Private CL1 3

Shared CL2 10

Page 26: MS Thesis Defense

Jareen 26

Simulation

■ Assumption for Power Consumption

Power Consumption by mWatts/Operation

CPU 3.6

I1 2.7

Other Components 2.1

Page 27: MS Thesis Defense

Jareen 27

Simulation Results

■ Impact of SVC Size

Page 28: MS Thesis Defense

Jareen 28

Simulation Results■ Impact of SVC and CL1 Size

Both the latency and total power consumption decreases for MPEG-4 when the cache size increases For MPEG-4, both latency and power decreases mostly with SVC and no locking

Impact of SVC and CL1 Size on Memory Latency and Total Power Consumption

Page 29: MS Thesis Defense

Jareen 29

Simulation Results

■ Impact of SVC and Line Size

With the increase of line size of MPEG-4, latency and power consumption decreases For MPEG-4, latency and power consumption both decreases with SVC and no locking

Impact of SVC and Line Size on Memory latency and Total power Consumption

Page 30: MS Thesis Defense

Jareen 30

Simulation Results

■ Impact of SVC and Associativity Level

For MPEG-4, with increase of associativity level, latency and power consumption decreases For MPEG-4, latency and power consumption decreases most for SVC and no locking

Impact of SVC and Associativity Level on Memory Latency and Total Power Consumption

Page 31: MS Thesis Defense

Jareen 31

Simulation Results

■ Impact of SVC and CL2/SLLC Size

With the increase of CL2 size, for MPEG-4, latency becomes stable but power consumption increasesBoth latency and power consumption for MPEG-4, decreases mostly for using SVC and no locking

Impact of SVC and CL2/SLLC Size on Memory Latency and Total power Consumption

Page 32: MS Thesis Defense

Jareen 32

Simulation Results■ Comparison of SVC and Cache Line Locking

Average memory latency and total power consumption decreases as locked CL2 cache increases from 0% to 25% locking. Average memory latency and total power consumption both decreases with using SVC and no locking compared to using locking or no SVC and no locking.

Comparison of SVC and Cache Line Locking

Page 33: MS Thesis Defense

Jareen 33

Proposed Solution for Security Improvement

■ Randomized Cache Mapping Between D1X and CL1 (Solution-1)

Randomized Cache Mapped Between D1X and CL1

Page 34: MS Thesis Defense

Jareen 34

Proposed Solution for Security Improvement

■ Problem with Solution-1

Extra hardware D1X implementation

Increase memory latency for processing

Increase total power consumption about 17%

Page 35: MS Thesis Defense

Jareen 35

Proposed Modified Solution for Security Improvement

■ Randomized Cache Mapping Between Main Memory and CL1 (Solution-2)

Randomized Cache Mapped between CL1 and Main Memory

It is expected that the probability of cache side channel attack decreases from 40K to 1 for 16 blocks of CL1

Page 36: MS Thesis Defense

Jareen 36

Conclusions

Using several levels of cache in multicore systems cause serious performance and power issue

Shared cache among various cores in multicore system cause hardware level security threat

Proposed SVC significantly increases the system performance by

reducing the memory latency, power consumption

Proposed cache randomization technique between main memory and CL1 reduces the probability of cache attack significantly

Page 37: MS Thesis Defense

Jareen 37

Conclusions

Average memory latency is reduced with SVC by 17% compared to CL2 cache locking

Total power consumption is reduced with SVC by 21% compared to CL2 cache locking

According to our estimates the probability of cache side channel attack reduces from 40K to 1 for 16 block of CL1

Page 38: MS Thesis Defense

Jareen 38

Future Work

Explore the impact of SVC on average memory latency, total power consumption for real time embedded system and Handheld computers

Explore the randomized cache mapping between CL1 and main memory technique on real time embedded system and handheld computers

Page 39: MS Thesis Defense

Jareen 39

QUESTION

“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”

Page 40: MS Thesis Defense

Jareen 40

Thank You

Contact:

Full Name: Tania Jareen

Telephone: (316) 516-8516

E-mail: [email protected]

“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”

Page 41: MS Thesis Defense

Jareen 41

References

1. D. Page, “Partitioned Cache Architecture as a Side-Channel Defense Mechanism,” in Cryptology ePrint Archive, Report 2005/280, 2005.

2. O. Aciicmez, “Yet another Micro Architectural Attack: exploiting I-Cache,” in CSAW ’07 Proceedings of the 2007 ACM workshop on Computer security architecture, pp. 11-18, DOI: 10.1145/1314466.1314469, 2007.

3. C.P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems,” Springer Berlin Heidelberg. pp. 104-113, DOI: 10.1007/3-540-68697-5_9, 1996.

4. P. Kocher, et al., "Differential Power Analysis," in Proceedings of the 19th Annul International Cryptology Conference on Advances in Cryptology, 1999.

Page 42: MS Thesis Defense

Jareen 42

References

5. Z. Wang and R.B. Lee, "A novel cache architecture with enhanced performance and security," in Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 2008.

6. N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Western Research Laboratory (WRL), Digital Equipment Corporation, URL:https://www.cis.upenn.edu/~cis501/papers/joupp victim.pdf, 1990.

7. D. Stiliadia and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches,” in IEEE Transactions on Computers, Vol. 46, No. 5. pp. 603-610, DOI: 10.1109/12.589235, 2002.

Page 43: MS Thesis Defense

Jareen 43

References

8. R. Pendse and H. Katta, “Selective Prefetching: Prefetching when only required,” in the 42nd Midwest Symposium on Circuits and Systems, Vol. 2. pp. 866-869, DOI: 10.1109/MWSCAS.1999.867772, 1999.

9. A. Asaduzzaman, F.N. Sibai, and M. Rani, “Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level,” in the EUROMICRO Journal of Systems Architecture, Vol. 56, Issue 4-6. pp 151-162, 2010.