CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2...

Post on 04-Mar-2018

225 views 6 download

Transcript of CS 61C: Great Ideas in Computer Architecture (Machine ...cs61c/fa16/lec/15/L15.pdf · Caches Part 2...

CS61C:GreatIdeasinComputerArchitecture(MachineStructures)

CachesPart2

Instructors:BernhardBoser &RandyH.Katz

http://inst.eecs.berkeley.edu/~cs61c/

10/18/16 Fall2016- Lecture#15 1

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 2

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 3

Second-LevelCache(SRAM)

TypicalMemoryHierarchy

Control

Datapath

SecondaryMemory(Disk

OrFlash)

On-ChipComponents

RegFile

MainMemory(DRAM)Data

CacheInstrCache

Speed(cycles):½’s 1’s 10’s 100’s-10001,000,000’s

Size(bytes): 100’s 10K’s M’sG’sT’s

• Principleoflocality+memoryhierarchypresentsprogrammerwith≈asmuchmemoryasisavailableinthecheapest technologyatthe≈speedofferedbythefastest technology

Cost/bit:highest lowest

Third-LevelCache(SRAM)

10/18/16 Fall2016- Lecture#15 4

Processor

Control

Datapath

AddingCachetoComputer

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-Memory Interface I/O-MemoryInterfaces

Program

Data

Cache

10/18/16 Fall2016- Lecture#15 5

Processororganizedaroundwordsand bytes

Memory (includingcache)organizedaroundblocks,

whicharetypicallymultiple words

KeyCacheConcepts• PrincipleofLocality– TemporalLocalityandSpatialLocality

• HierarchyofMemories (speed/size/costperbit)toexploitlocality

• Cache– copyofdatainlowerlevelofmemoryhierarchy

• DirectMappedtofindblockincacheusingTagfieldandValidbitforHit

• CacheDesignOrganizationChoices:– FullyAssociative,Set-Associative,Direct-Mapped

610/18/16 Fall2016- Lecture#15

CacheOrganizations• “FullyAssociative”:Blockplacedanywhereincache– Firstdesignlastlecture– Note:NoIndexfield,butonecomparator/block

• “DirectMapped”:Blockgoesonlyoneplaceincache– Note:Onlyonecomparator– Numberofsets=numberblocks

• “N-waySetAssociative”:Nplacesforblockincache– Numberofsets=NumberofBlocks/N– Ncomparators– FullyAssociative:N=numberofblocks– DirectMapped:N=1

10/18/16 Fall2016- Lecture#15 7

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

8 88Byte

Word8-Byte Block

address address address

2 LSBs are 0 3 LSBs are 0

0

1

2

3

01234567012345670123456701234567

Byte offset in blockBlock #

MemoryBlockvs.WordAddressing

10/18/16 Fall2016- Lecture#15 8

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

82

83

84

85

86

87

88

89

90

91

2

3

4

5

6

7

0

1

2

3

0

1

0

1

0

1

0

1

0

1

010100100000

010100110000

010101000000

010101010000

010101100000

010101110000

010110000000

010110010000

010110100000

010110110000

MemoryBlockNumberAliasing

Block# Block#mod8 Block#mod2

12-bitmemoryaddresses,16Byteblocks

10/18/16 Fall2016- Lecture#15 9

ProcessorAddressFieldsusedbyCacheController

• BlockOffset:Byteaddresswithinblock• SetIndex:Selectswhichset• Tag:Remainingportionofprocessoraddress

• SizeofIndex=log2(numberofsets)• SizeofTag=Addresssize– SizeofIndex

– log2(numberofbytes/block)

Block offsetSetIndexTag

ProcessorAddress(32-bitstotal)

10/18/16 Fall2016- Lecture#15 10

• Onewordblocks,cachesize=1Kwords(or4KB)

Direct-MappedCacheRevisted

20Tag 10Index

DataIndex TagValid012...

102110221023

3130 ... 131211 ... 210Byteoffset

20

Data

32

HitValidbitensures

somethingusefulincacheforthisindex

CompareTagwithupperpartofAddresstoseeifa

Hit

Readdatafromcache

insteadofmemoryif

aHit

Comparator

10/18/16 Fall2016- Lecture#15 11

Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)

3130 ... 131211... 210 Byteoffset

DataTagV012...

253254255

DataTagV012...

253254255

DataTagV012...

253254255

SetIndex

DataTagV012...

253254255

8Index

22Tag

Hit Data

32

4x1select

Way0 Way1 Way2 Way3

10/18/16 Fall2016- Lecture#15 12

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 13

HandlingStoreswithWrite-Through

• Storeinstructionswritetomemory,changingvalues

• Needtomakesurecacheandmemoryhavesamevaluesonwrites:twopolicies

1)Write-ThroughPolicy:writecacheandwritethroughthecachetomemory– Everywriteeventuallygetstomemory– Tooslow,soincludeWriteBuffertoallowprocessortocontinueoncedatainBuffer

– Bufferupdatesmemoryinparalleltoprocessor

10/18/16 Fall2016- Lecture#15 14

Write-ThroughCache

• Writebothvaluesincacheandinmemory

• WritebufferstopsCPUfromstallingifmemorycannotkeepup

• Writebuffermayhavemultipleentriestoabsorbburstsofwrites

• Whatifstoremissesincache?

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

720

12

1312041 Addr Data

WriteBuffer

10/18/16 Fall2016- Lecture#15 15

HandlingStoreswithWrite-Back

2)Write-BackPolicy:writeonlytocacheandthenwritecacheblockbacktomemorywhenevictblockfromcache–Writescollectedincache,onlysinglewritetomemoryperblock

– Includebittoseeifwrotetoblockornot,andthenonlywritebackifbitisset• Called“Dirty”bit(writingmakesit“dirty”)

10/18/16 Fall2016- Lecture#15 16

Write-BackCache

• Store/cachehit,writedataincacheonlyandsetdirtybit– Memoryhasstalevalue

• Store/cachemiss,readdatafrommemory,thenupdateandsetdirtybit– “Write-allocate”policy

• Load/cachehit,usevaluefromcache

• Onanymiss,writebackevictedblock,onlyifdirty.Updatecachewithnewblockandcleardirtybit

Processor

32-bitAddress

32-bitData

Cache

32-bitAddress

32-bitData

Memory

1022 99252

720

12

1312041

DDDD

DirtyBits

10/18/16 Fall2016- Lecture#15 17

Write-Throughvs.Write-Back

• Write-Through:– Simplercontrollogic– Morepredictabletimingsimplifiesprocessorcontrollogic

– Easiertomakereliable,sincememoryalwayshascopyofdata(bigidea:Redundancy!)

• Write-Back– Morecomplexcontrollogic– Morevariabletiming(0,1,2memoryaccessespercacheaccess)

– Usuallyreduceswritetraffic

– Hardertomakereliable,sometimescachehasonlycopyofdata

10/18/16 Fall2016- Lecture#15 18

Administrivia• Midterm#22weeksaway!

November1!– Inclass!3:40-5PM– Synchronousdigitaldesignand

Project3(processordesign)included

– PipelinesandCaches– ONEDoublesidedCribsheet– ReviewSession,Sunday,10/30,

1-3PM,10Evans

1910/18/16 Fall2016- Lecture#15

iClicker Saga

10/18/16 Fall2016-- Lecture#15 20

iClicker andEPA

• Nolongertakingattendanceinlecture– butwehopeyouwillcontinuetocomeanyway

• ContinuetouseClickerquestionsinlecturetohelpyoutestyourunderstanding

• EPAwillbebasedona“holistic”assessmentoflecture,piazza,guerrillaandtutoringsessions,officehours,discussion,andlabparticipation

• EPAwillbecalculatedsoastoonlyhelpyourcoursegrade,neverhurtit

10/18/16 Fall2016-- Lecture#15 21

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 22

Cache(Performance) Terms

• Hitrate:fractionofaccessesthathitinthecache• Missrate:1– Hitrate• Misspenalty:timetoreplaceablockfromlowerlevelinmemoryhierarchytocache

• Hittime:timetoaccesscachememory(includingtagcomparison)

• Abbreviation:“$”=cache(aBerkeleyinnovation!)

10/18/16 Fall2016- Lecture#15 23

AverageMemoryAccessTime(AMAT)• AverageMemoryAccessTime(AMAT)istheaveragetimetoaccessmemoryconsideringbothhitsandmissesinthecache

AMAT= Timeforahit+Missrate× Misspenalty

10/18/16 Fall2016- Lecture#15 24

Clickers/PeerInstruction

AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psecD: 800psec

2510/18/16 Fall2016- Lecture#15

Clickers/PeerInstruction

AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psecD: 800psec

2610/18/16 Fall2016- Lecture#15

1clockcycle+.02*50clockcycles=2clockcycles

PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString

0 4 0 4

0 4 0 4

• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

10/18/16 Fall2016- Lecture#15 27

0 4 0 4

0 4 0 4

miss miss miss miss

miss miss miss miss

00Mem(0) 00Mem(0)01 4

01Mem(4)000

00Mem(0)01 4

00Mem(0)01 4

00Mem(0)01 4

01Mem(4)000

01Mem(4)000

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

Ping-pong effectduetoconflictmisses- twomemorylocationsthatmapintothesamecacheblock

• 8requests,8misses

• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404

10/18/16 Fall2016- Lecture#15 28

PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 29

AlternativeBlockPlacementSchemes

• DMplacement:mem block12in8blockcache:onlyonecacheblockwheremem block12canbefound—(12modulo8)=4

• SAplacement:foursetsx 2-ways(8cacheblocks),memoryblock12inset(12mod4)=0;eitherelementoftheset

• FAplacement:mem block12canappearinanycacheblocks10/18/16 Fall2016- Lecture#15 30

Example:2-WaySetAssociative$(4words=2setsx2waysperset)

0

Cache

MainMemory

Q:Howdowefindit?

Usenext1lowordermemoryaddressbittodeterminewhichcacheset(i.e.,modulothenumberofsetsinthecache)

Tag Data

Q:Isitthere?

Compareall thecachetagsinthesettothehighorder3memoryaddressbits totellifthememoryblockisinthecache

V

0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

Set

1

01

Way

0

1

OnewordblocksTwoloworderbitsdefine thebyteintheword(32bwords)

10/18/16 Fall2016- Lecture#15 31

PingPongCacheExample:4Word2-WaySA$,SameReferenceString

0 4 0 4

• Considerthemainmemorywordreferencestring04040404Startwithanemptycache- allblocks

initiallymarkedasnotvalid

10/18/16 Fall2016- Lecture#15 32

PingPongCacheExample:4-Word2-WaySA$,SameReferenceString

0 4 0 4

• Considerthemainmemoryaddressreferencestring04040404

miss miss hit hit

000Mem(0) 000Mem(0)

Startwithanemptycache- allblocksinitiallymarkedasnotvalid

010Mem(4) 010Mem(4)

000Mem(0) 000Mem(0)

010Mem(4)

• Solvestheping-pongeffectinadirect-mappedcacheduetoconflictmissessincenowtwomemorylocationsthatmapintothesamecachesetcanco-exist!

• 8requests,2misses

10/18/16 Fall2016- Lecture#15 33

Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)

3130 ... 131211... 210 Byteoffset

DataTagV012...

253254255

DataTagV012...

253254255

DataTagV012...

253254255

Index DataTagV012...

253254255

8Index

22Tag

Hit Data

32

4x1select

Way0 Way1 Way2 Way3

10/18/16 Fall2016- Lecture#15 34

AlternativeOrganizationsofanEight-BlockCache

Totalsizeof$inblocksisequaltonumberofsets× associativity.Forfixed$sizeandfixedblocksize,increasing associativitydecreasesnumberofsetswhileincreasingnumberofelementsperset.Witheightblocks,an8-wayset-associative$issameasafullyassociative$.

10/18/16 Fall2016- Lecture#15 35

RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

Wordoffset ByteoffsetIndexTag

10/18/16 Fall2016- Lecture#15 36

RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit

Wordoffset ByteoffsetIndexTag

Decreasingassociativity,lowerway,moresets

Fullyassociative(onlyoneset)Tagisallthebitsexceptblockandbyteoffset

Directmapped(onlyoneway)Smallertags,onlyasinglecomparator

Increasingassociativity,higherway,lesssets

SelectsthesetUsedfortagcompare Selectsthewordintheblock

10/18/16 Fall2016- Lecture#15 37

TotalCacheCapacity=Associativity× #ofsets× block_sizeBytes=blocks/set× sets× Bytes/block

ByteOffsetTag Index

C=N× S× B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

10/18/16 Fall2016- Lecture#15 38

TotalCacheCapacity=

39

Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

ByteOffsetTag Index

C=N*S*B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

DoubletheAssociativity:Numberofsets?tag_size?index_size?#comparators?

DoubletheSets:Associativity?tag_size?index_size?#comparators?

10/18/16 Fall2016- Lecture#15

TotalCacheCapacity=

40

Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block

ByteOffsetTag Index

C=N*S*B

address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)

DoubletheAssociativity:Halvethenumberofsetstag_size +1whileindex_size – 1,2xcomparators

DoubletheSets:Halvetheassociativitytag_size - 1whileindex_size +1,½xcomparators

10/18/16 Fall2016- Lecture#15

YourTurn• Foracacheof64blocks,eachblockfourbytesinsize:

1. Thecapacityofthecacheis:256 bytes.2. Givena2-waySetAssociativeorganization,thereare32

sets,eachof2 blocks,and2 placesablockfrommemorycouldbeplaced.

3. Givena4-waySetAssociativeorganization,thereare16setseachof4 blocksand4 placesablockfrommemorycouldbeplaced.

4. Givenan8-waySetAssociativeorganization,thereare8setseachof8 blocksand8 placesablockfrommemorycouldbeplaced.

10/18/16 Fall2016- Lecture#15 41

Clicker/PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?

(i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)

A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)onlyD:AllfourstatementsaretrueE:Nonearetrue

10/18/16 Fall2016- Lecture#15 42

CostsofSet-AssociativeCaches• N-wayset-associativecachecosts– Ncomparators(delayandarea)– MUXdelay(setselection)beforedataisavailable– Dataavailableaftersetselection(andHit/Missdecision).DM$:blockisavailablebeforetheHit/Missdecision• InSet-Associative,notpossibletojustassumeahitandcontinueandrecoverlaterifitwasamiss

• Whenmissoccurs,whichway’sblockselectedforreplacement?– LeastRecentlyUsed(LRU):onethathasbeenunusedthelongest(principleoftemporallocality)• Musttrackwheneachway’sblockwasusedrelativetootherblocksintheset

• For2-waySA$,onebitperset→setto1whenablockisreferenced;resettheotherway’sbit(i.e.,“lastused”)

10/18/16 Fall2016- Lecture#15 43

CacheReplacementPolicies• RandomReplacement

– Hardwarerandomlyselectsacacheevict• Least-RecentlyUsed

– Hardwarekeepstrackofaccesshistory– Replacetheentrythathasnotbeenusedforthelongesttime– For2-wayset-associativecache,needonebitforLRUreplacement

• ExampleofaSimple“Pseudo”LRUImplementation– Assume64FullyAssociativeentries– Hardwarereplacementpointerpointstoonecacheentry– Wheneveraccessismadetotheentrythepointerpointsto:

• Movethepointertothenextentry– Otherwise:donotmovethepointer– (exampleof“not-most-recentlyused”

replacementpolicy)

44

:

Entry0Entry1

Entry63

ReplacementPointer

10/18/16 Fall2016- Lecture#15

BenefitsofSet-AssociativeCaches• ChoiceofDM$versusSA$dependsonthecostofamiss

versusthecostofimplementation

• Largestgainsareingoingfromdirectmappedto2-way(20%+reductioninmissrate)

10/18/16 Fall2016- Lecture#15 45

Outline

• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…

10/18/16 Fall2016– Lecture#15 46

And inConclusion…

• NameoftheGame:ReduceAMAT–ReduceHitTime–ReduceMissRate–ReduceMissPenalty

• Balancecacheparameters(Capacity,associativity,blocksize)

10/18/16 Fall2016- Lecture#15 47