Doppelgänger: A Cache for Approximate Computing

79
Doppelgänger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger

Transcript of Doppelgänger: A Cache for Approximate Computing

Page 1: Doppelgänger: A Cache for Approximate Computing

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel

Jorge Albericio

Andreas Moshovos

Natalie Enright Jerger

Page 2: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

2

main memory

processor core

private caches

shared last-level cache

Page 3: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

3

processor core

private caches

main memory

shared last-level cache

Page 4: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

4

processor core

private caches

main memory

shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

Page 5: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

5

processor core

private caches

main memory

shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

Need hierarchy of large caches…

Page 6: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

6

main memory

processor core

private caches

shared last-level cache

Page 7: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

7

main memory

processor core

private caches

shared last-level cache

Page 8: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

8

main memory

processor core

private caches

shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area!

Page 9: Doppelgänger: A Cache for Approximate Computing

Cache Hierarchy

9

main memory

processor core

private caches

shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area!

Higher efficiency via Approximate Computing…

Page 10: Doppelgänger: A Cache for Approximate Computing

Summary

10

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Page 11: Doppelgänger: A Cache for Approximate Computing

Summary

11

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Page 12: Doppelgänger: A Cache for Approximate Computing

Summary

12

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.

Page 13: Doppelgänger: A Cache for Approximate Computing

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

13

Page 14: Doppelgänger: A Cache for Approximate Computing

Approximate Computing

Not all data/computations need to be precise.

14

http://www.zentut.com/

http://www.businessweek.com/

http://www.cc.gatech.edu/~cnieto6/

http://www.analyticbridge.com/

http://themusicparlour.blogspot.ca/

http://www.scientific-computing.com/

Data mining

Computer vision Audio and video processing

Gaming Machine learning Dynamical simulation

Page 15: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

15

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 16: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

16

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 17: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

17

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 18: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

18

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 19: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

19

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 20: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

20

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Page 21: Doppelgänger: A Cache for Approximate Computing

Approximate Similarity

21

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Allows for 77% cache storage savings of approximable data!

Page 22: Doppelgänger: A Cache for Approximate Computing

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

22

Page 23: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

23

main memory

processor core

private caches

shared last-level cache

Page 24: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

24

main memory

processor core

private caches

shared last-level cache

How can we exploit approximate similarity to save area and energy in the last-level cache?

Page 25: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

25

main memory

processor core

private caches

shared last-level cache

Page 26: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

26

main memory

processor core

private caches

shared LLC precise LLC Doppelgänger LLC

Page 27: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

27

tag array data array

address

from L2

data from

memory

Page 28: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

28

tag array data array

address

from L2

data from

memory

One-to-one mapping of data values to memory locations.

Page 29: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

29

tag array data array

address

from L2

data from

memory

One-to-one mapping of data values to memory locations.

But the fundamental goal of a processor is to process data values, not memory locations…

Page 30: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

30

tag array data array

address

from L2

data from

memory

Page 31: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

31

tag array data array

address

from L2

data from

memory

Page 32: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

32

tag array data array

address

from L2

data from

memory

Page 33: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

33

tag array data array

address

from L2

data from

memory

Page 34: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

34

tag array data array

address

from L2

data from

memory

Multiple copies of approximately similar blocks.

Page 35: Doppelgänger: A Cache for Approximate Computing

Conventional Cache

35

tag array data array

address

from L2

data from

memory

Page 36: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

36

tag array

approximate data array

address

from L2

data from

memory

Page 37: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

37

tag array

approximate data array

address

from L2

data from

memory

Smaller data array allows for substantial area and energy savings.

Page 38: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

38

tag array

approximate data array

address

from L2

data from

memory

Page 39: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

39

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address

from L2

data from

memory

Page 40: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Lookups

40

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

Page 41: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Lookups

41

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

Page 42: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Lookups

42

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

map X from

tag array

Page 43: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Lookups

43

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

data A to L2

map X from

tag array

Page 44: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

44

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

Page 45: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

45

tag 0 map X

tag 5

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

Page 46: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

46

tag 0 map X

tag 5

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

data B from

memory

data B to L2

Page 47: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

47

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Page 48: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

48

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Page 49: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

49

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Miss!

Page 50: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Miss)

50

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Page 51: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Miss)

51

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Page 52: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

52

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

Page 53: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

53

tag 0 map X

tag 6

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

Page 54: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

54

tag 0 map X

tag 6

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory

data C to L2

Page 55: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

55

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Page 56: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

56

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Page 57: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions

57

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Hit!

Page 58: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Hit)

58

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Page 59: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Hit)

59

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Page 60: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Hit)

60

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2 Data block A serves as an acceptable approximation of data block C.

Page 61: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Insertions (Hit)

61

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Page 62: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Similarity Mapping

62

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Page 63: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Similarity Mapping

63

data block A

A[0] A[1] A[n]

hash function

mapping

hash

map

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Page 64: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Similarity Mapping

64

data block A

A[0] A[1] A[n]

mapping

hash

map

hash function Aggregates values in block:

hash = AVG(A*0+, …, A*n+)

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Page 65: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Similarity Mapping

65

data block A

A[0] A[1] A[n]

hash function

hash

map

Discretizes hash value:

mapping

map

(M-bit)

All possible

hash values

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Page 66: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache - Similarity Mapping

66

data block A

A[0] A[1] A[n]

hash function

hash

map

Discretizes hash value:

mapping

map

(M-bit)

All possible

hash values

approximately

similar

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Page 67: Doppelgänger: A Cache for Approximate Computing

Doppelgänger Cache

67

main memory

processor core

private caches

shared LLC precise LLC Doppelgänger LLC

Page 68: Doppelgänger: A Cache for Approximate Computing

uniDoppelgänger Cache

68

main memory

processor core

private caches

shared LLC uniDoppelgänger LLC

Page 69: Doppelgänger: A Cache for Approximate Computing

uniDoppelgänger Cache

69

main memory

processor core

private caches

shared LLC uniDoppelgänger LLC

Precise blocks simply use physical address as the map value.

Page 70: Doppelgänger: A Cache for Approximate Computing

70

More details in paper: Cache writes, replacements and coherence.

Details on hash functions and mapping.

Sensitivity to size of map space and data array.

Evaluation of uniDoppelgänger.

Doppelgänger Cache

Page 71: Doppelgänger: A Cache for Approximate Computing

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

71

Page 72: Doppelgänger: A Cache for Approximate Computing

Evaluation

72

Applications: PARSEC and AxBench

Performance: Full-system cycle-level simulation

Error: Pin simulation

Area and Energy: CACTI

Configuration: 4 cores, private L1 and L2

2MB shared LLC (1MB precise, 1MB Doppelgänger)

Doppelgänger: 14-bit similarity map, 1/4 data array

Page 73: Doppelgänger: A Cache for Approximate Computing

73

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

Page 74: Doppelgänger: A Cache for Approximate Computing

74

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

Page 75: Doppelgänger: A Cache for Approximate Computing

75

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

Page 76: Doppelgänger: A Cache for Approximate Computing

76

Evaluation

0.8x

0.9x

1.0x

1.1x

1.2x

1.3x

1.4x

applicationoutput accuracy

applicationperformance

total cachedynamic energy

reduction

total cacheleakage energy

reduction

total cache areareduction

bet

ter

Page 77: Doppelgänger: A Cache for Approximate Computing

77

Evaluation

0.8x

0.9x

1.0x

1.1x

1.2x

1.3x

1.4x

applicationoutput accuracy

applicationperformance

total cachedynamic energy

reduction

total cacheleakage energy

reduction

total cache areareduction

bet

ter

Page 78: Doppelgänger: A Cache for Approximate Computing

Conclusion

78

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.

Page 79: Doppelgänger: A Cache for Approximate Computing

Thank you

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel

Jorge Albericio

Andreas Moshovos

Natalie Enright Jerger