L15 Caches2 - University of California, Berkeleycs61c/fa17/lec/15/L15... · 2017. 10. 17. · –4...
Transcript of L15 Caches2 - University of California, Berkeleycs61c/fa17/lec/15/L15... · 2017. 10. 17. · –4...
-
10/16/17
1
CS61C:GreatIdeasinComputerArchitecture(MachineStructures)
CachesPart2Instructors:
Krste Asanović &RandyH.Katzhttp://inst.eecs.berkeley.edu/~cs61c/
110/16/17 Fall2017 - Lecture#15
Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
210/16/17 Fall2017 – Lecture#15
Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
310/16/17 Fall2017 – Lecture#15
Second-LevelCache(SRAM)
TypicalMemoryHierarchy
Control
Datapath
SecondaryMemory(Disk
OrFlash)
On-ChipComponents
RegFile
MainMemory(DRAM)Data
CacheInstrCache
Speed(cycles):½’s1’s10’s100’s-10001,000,000’s
Size(bytes): 100’s 10K’sM’sG’sT’s
• Principleoflocality+memoryhierarchypresentsprogrammerwith≈asmuchmemoryasisavailableinthecheapest technologyatthe≈speedofferedbythefastest technology
Cost/bit:highestlowest
Third-LevelCache(SRAM)
10/16/17 Fall2017 - Lecture#15 4
Processor
Control
Datapath
AddingCachetoComputer
PC
Registers
Arithmetic&LogicUnit(ALU)
MemoryInput
Output
Bytes
Enable?Read/Write
Address
WriteData
ReadData
Processor-MemoryInterface I/O-MemoryInterfaces
Program
Data
Cache
10/16/17 Fall2017 - Lecture#15 5
Processororganizedaroundwordsand bytes
Memory(includingcache)organizedaroundblocks,
whicharetypicallymultiplewords
KeyCacheConcepts
• PrincipleofLocality– TemporalLocalityandSpatialLocality
• HierarchyofMemories (speed/size/costperbit)toexploitlocality
• Cache– copyofdatainlowerlevelofmemoryhierarchy• DirectMappedtofindblockincacheusingTagfieldandValid
bitforHit• CacheDesignOrganizationChoices:– FullyAssociative,Set-Associative,Direct-Mapped
610/16/17 Fall2017 - Lecture#15
-
10/16/17
2
CacheOrganizations• “FullyAssociative”:Blockplacedanywhereincache– Firstdesignlastlecture– Note:NoIndexfield,butonecomparator/block
• “DirectMapped”:Blockgoesonlyoneplaceincache– Note:Onlyonecomparator– Numberofsets=numberblocks
• “N-waySetAssociative”:Nplacesforblockincache– Numberofsets=NumberofBlocks/N– Ncomparators– FullyAssociative:N=numberofblocks– DirectMapped:N=1
710/16/17 Fall2017 - Lecture#15
0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111
0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111
0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111
8 88Byte
Word8-Byte Block
address address address
2 LSBs are 0 3 LSBs are 0
0
1
2
3
01234567012345670123456701234567
Byte offset in blockBlock #
MemoryBlockvs.WordAddressing
810/16/17 Fall2017 - Lecture#15
010100100000
010100110000
010101000000
010101010000
010101100000
010101110000
010110000000
010110010000
010110100000
010110110000
010100100000
010100110000
010101000000
010101010000
010101100000
010101110000
010110000000
010110010000
010110100000
010110110000
82
83
84
85
86
87
88
89
90
91
2
3
4
5
6
7
0
1
2
3
0
1
0
1
0
1
0
1
0
1
010100100000
010100110000
010101000000
010101010000
010101100000
010101110000
010110000000
010110010000
010110100000
010110110000
MemoryBlockNumberAliasing
Block# Block#mod8 Block#mod2
12-bitmemoryaddresses,16Byteblocks
10/16/17 Fall2017 - Lecture#15 9
ProcessorAddressFieldsUsedbyCacheController
• BlockOffset:Byteaddresswithinblock• SetIndex:Selectswhichset• Tag:Remainingportionofprocessoraddress
• SizeofIndex=log2(numberofsets)• SizeofTag=Addresssize– SizeofIndex
– log2(numberofbytes/block)
BlockoffsetSetIndexTag
ProcessorAddress(32-bitstotal)
10/16/17 Fall2017 - Lecture#15 10
WhatLimitsNumberofSets?
• Foragiventotalnumberofblocks,wesavecomparatorsifhavemorethantwosets
• Limit:AsManySetsasCacheBlocks=>onlyoneblockperset–onlyneedsonecomparator!
• Called“Direct-Mapped”Design
11
BlockoffsetIndexTag
10/16/17 Fall2017 - Lecture#15
DirectMappedCacheExample:Mappinga6-bitMemoryAddress
• Inexample,blocksizeis4bytes/1word• Memoryandcacheblocksalwaysthesamesize,unitoftransferbetweenmemoryandcache• #Memoryblocks>>#Cacheblocks
– 16Memoryblocks=16words=64bytes=>6bitstoaddressallbytes– 4Cacheblocks,4bytes(1word)perblock– 4Memoryblocksmaptoeachcacheblock
• Memoryblocktocacheblock,akaindex:middletwobits• Whichmemoryblockisinagivencacheblock,akatag:toptwobits
12
05 1
ByteWithinBlock
ByteOffset
23
BlockWithin$
4
Mem BlockWithin$Block
Tag Index
10/16/17 Fall2017 - Lecture#15
-
10/16/17
3
OneMoreDetail:ValidBit
• Whenstartanewprogram,cachedoesnothavevalidinformationforthisprogram
• Needanindicatorwhetherthistagentryisvalidforthisprogram
• Adda“validbit”tothecachetagentry0=>cachemiss,evenifbychance,address=tag1=>cachehit,ifprocessoraddress=tag
10/16/17 Fall2017 - Lecture#15 13
CacheOrganization:SimpleFirstExample
00011011
Cache
MainMemory
Q:Whereinthecacheisthemem block?
Usenext2low-ordermemoryaddressbits– theindex– todeterminewhichcacheblock(i.e.,modulothenumberofblocksinthecache)
Tag Data
Q:Isthememoryblockincache?Comparethecachetagtothehigh-order2memoryaddressbitstotellifthememoryblockisinthecache(providedvalidbitisset)
Valid
0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx
OnewordblocksTwoloworderbits(xx)definethebyteintheblock(32bwords)Index
10/16/17 Fall2017 - Lecture#15 14
Example:Alternativesinan8BlockCache• DirectMapped:8blocks,1way,1tagcomparator,8sets• FullyAssociative:8blocks,8ways,8tagcomparators,1set• 2WaySetAssociative:8blocks,2ways,2tagcomparators,4sets• 4WaySetAssociative:8blocks,4ways,4tagcomparators,2sets
1510/16/17 Fall2017 - Lecture#15 15
0
1
2
3
DM:8sets1way
4
5
6
7
0
1
2
3
FA:1set8ways
4
5
6
7
0
1
2
3
2WaySA:4sets Set0
Set1
Set2
Set3
4
5
6
7
0
1
2
3
4WaySA:2sets
Set0
Set1
4
5
6
7
• Onewordblocks,cachesize=1Kwords(or4KB)
Direct-MappedCache
20Tag 10Index
DataIndex TagValid012...
102110221023
3130...131211...210Byteoffset
20
Data
32
HitValidbitensures
somethingusefulincacheforthisindex
CompareTagwithupperpartofAddresstoseeifa
Hit
Readdatafromcache
insteadofmemoryif
aHit
Comparator
10/16/17 Fall2017 - Lecture#15 16
PeerInstruction
• Foracachewithconstanttotalcapacity, ifweincreasethenumberofwaysbyafactoroftwo,whichstatementisfalse:A:ThenumberofsetscouldbedoubledB:ThetagwidthcoulddecreaseC:Theblocksizecouldstaythesame:Theblocksizecouldbehalved
1710/16/17 Fall2017 - Lecture#15
Break!
1810/16/17 Fall2017 - Lecture#15
-
10/16/17
4
Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
1910/16/17 Fall2017 – Lecture#15
HandlingStoreswithWrite-Through
• Storeinstructionswritetomemory,changingvalues• Needtomakesurecacheandmemoryhavesamevaluesonwrites:twopolicies
1)Write-ThroughPolicy:writecacheandwritethroughthecachetomemory– Everywriteeventuallygetstomemory– Tooslow,soincludeWriteBuffertoallowprocessortocontinueoncedatainBuffer
– Bufferupdatesmemoryinparalleltoprocessor
10/16/17 Fall2017 - Lecture#15 20
Write-ThroughCache
• Writebothvaluesincacheandinmemory
• WritebufferstopsCPUfromstallingifmemorycannotkeepup
• Writebuffermayhavemultipleentriestoabsorbburstsofwrites
• Whatifstoremissesincache?
Processor
32-bitAddress
32-bitData
Cache
32-bitAddress
32-bitData
Memory
1022 99252
720
12
1312041 Addr Data
WriteBuffer
10/16/17 Fall2017 - Lecture#15 21
HandlingStoreswithWrite-Back
2)Write-BackPolicy:writeonlytocacheandthenwritecacheblockbacktomemorywhenevictblockfromcache–Writescollectedincache,onlysinglewritetomemoryperblock– Includebittoseeifwrotetoblockornot,andthenonlywritebackifbitisset• Called“Dirty”bit(writingmakesit“dirty”)
10/16/17 Fall2017 - Lecture#15 22
Write-BackCache
• Store/cachehit,writedataincacheonlyandsetdirtybit– Memoryhasstalevalue
• Store/cachemiss,readdatafrommemory,thenupdateandsetdirtybit– “Write-allocate”policy
• Load/cachehit,usevaluefromcache
• Onanymiss,writebackevictedblock,onlyifdirty.Updatecachewithnewblockandcleardirtybit
Processor
32-bitAddress
32-bitData
Cache
32-bitAddress
32-bitData
Memory
1022 99252
720
12
1312041
DDDD
DirtyBits
10/16/17 Fall2017 - Lecture#15 23
Write-Throughvs.Write-Back
• Write-Through:– Simplercontrollogic– Morepredictabletimingsimplifiesprocessorcontrollogic
– Easiertomakereliable,sincememoryalwayshascopyofdata(bigidea:Redundancy!)
• Write-Back– Morecomplexcontrollogic– Morevariabletiming(0,1,2memoryaccessespercacheaccess)
– Usuallyreduceswritetraffic– Hardertomakereliable,sometimescachehasonlycopyofdata
10/16/17 Fall2017 - Lecture#15 24
-
10/16/17
5
Administrivia• Midterm#22weeksaway!October31!
– Inclass!8-9:30AM– SynchronousdigitaldesignandProject2(processordesign)included– PipelinesandCaches– ONEDoublesidedCribsheet– ReviewSession:Saturday,Oct28(LocationTBA)
• 5-10opendrop-inseatsforthesetutoringsessions:• M 1-2Soda611• Th3-4Soda380• F 5-6Soda651
• GuerrillaSessiontonight7-9pminCory293• Project2-1Partytomorrow7-9pmCory293• IfyouwouldliketochangeyourpartnershipforProject2,emailyourlabTA
– WewillsendoutaGoogleformtotrackallProject2partnerships
2510/16/17 Fall2017 - Lecture#15
Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
10/16/17 Fall2017 – Lecture#15 26
Cache(Performance) Terms
• Hitrate:fractionofaccessesthathitinthecache• Missrate:1– Hitrate• Misspenalty:timetoreplaceablockfromlowerlevelinmemoryhierarchytocache
• Hittime:timetoaccesscachememory(includingtagcomparison)
• Abbreviation:“$”=cache(aBerkeleyinnovation!)10/16/17 Fall2017 - Lecture#15 27
AverageMemoryAccessTime(AMAT)• AverageMemoryAccessTime(AMAT)istheaveragetimetoaccessmemoryconsideringbothhitsandmissesinthecacheAMAT=Timeforahit+Missrate× Misspenalty
10/16/17 Fall2017 - Lecture#15 28
PeerInstruction
AMAT=Timeforahit+MissratexMisspenalty• Givena200psec clock,amisspenaltyof50clockcycles,amissrateof0.02missesperinstructionandacachehittimeof1clockcycle,whatisAMAT?A:≤200psecB:400psecC:600psec: 800psec
2910/16/17 Fall2017 - Lecture#15
PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString
0 4 0 4
0 4 0 4
• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404
Startwithanemptycache- allblocksinitiallymarkedasnotvalid
10/16/17 Fall2017 - Lecture#15 30
-
10/16/17
6
0 4 0 4
0 4 0 4
miss miss miss miss
miss miss miss miss
00Mem(0) 00Mem(0)01 4
01Mem(4)000
00Mem(0)01 4
00Mem(0)01 4
00Mem(0)01 4
01Mem(4)000
01Mem(4)000
Startwithanemptycache- allblocksinitiallymarkedasnotvalid
Ping-pong effectduetoconflictmisses- twomemorylocationsthatmapintothesamecacheblock
• 8requests,8misses
• Considerthemainmemoryaddressreferencestringofwordnumbers:04040404
10/16/17 Fall2017 - Lecture#15 31
PingPongCacheExample:Direct-MappedCachew/4Single-WordBlocks,Worst-CaseReferenceString Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
3210/16/17 Fall2017 – Lecture#15
Example:2-WaySetAssociative$(4words=2setsx2waysperset)
0
Cache
MainMemory
Q:Howdowefindit?
Usenext1lowordermemoryaddressbittodeterminewhichcacheset(i.e.,modulothenumberofsetsinthecache)
Tag Data
Q:Isitthere?
Compareall thecachetagsinthesettothehighorder3memoryaddressbits totellifthememoryblockisinthecache
V
0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx
Set
1
01
Way
0
1
OnewordblocksTwoloworderbitsdefinethebyteintheword(32bwords)
10/16/17 Fall2017 - Lecture#15 33
PingPongCacheExample:4Word2-WaySA$,SameReferenceString
0 4 0 4
• Considerthemainmemorywordreferencestring04040404Startwithanemptycache- allblocks
initiallymarkedasnotvalid
10/16/17 Fall2017 - Lecture#15 34
PingPongCacheExample:4-Word2-WaySA$,SameReferenceString
0 4 0 4
• Considerthemainmemoryaddressreferencestring04040404
miss miss hit hit
000Mem(0) 000Mem(0)
Startwithanemptycache- allblocksinitiallymarkedasnotvalid
010Mem(4) 010Mem(4)
000Mem(0) 000Mem(0)
010Mem(4)
• Solvestheping-pong effectinadirect-mappedcacheduetoconflictmissessincenowtwomemorylocationsthatmapintothesamecachesetcanco-exist!
• 8requests,2misses
10/16/17 Fall2017 - Lecture#15 35
Four-WaySet-AssociativeCache• 28 =256setseachwithfourways(eachwithoneblock)
3130...131211...210 Byteoffset
DataTagV012...
253254255
DataTagV012...
253254255
DataTagV012...
253254255
Index DataTagV012...
253254255
8Index
22Tag
Hit Data
32
4x1select
Way0 Way1 Way2 Way3
10/16/17 Fall2017 - Lecture#15 36
-
10/16/17
7
Break!
3710/16/17 Fall2017 - Lecture#15
RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit
Wordoffset ByteoffsetIndexTag
10/16/17 Fall2017 - Lecture#15 38
RangeofSet-AssociativeCaches• Forafixed-sizecacheandfixedblocksize,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.,thenumberorways)andhalvesthenumberofsets– decreasesthesizeoftheindexby1bitandincreasesthesizeofthetagby1bit
Wordoffset ByteoffsetIndexTag
Decreasingassociativity,lowerway,moresets
Fullyassociative(onlyoneset)Tagisallthebitsexceptblockandbyteoffset
Directmapped(onlyoneway)Smallertags,onlyasinglecomparator
Increasingassociativity,higherway,lesssets
SelectsthesetUsedfortagcompare Selectsthewordintheblock
10/16/17 Fall2017 - Lecture#15 39
TotalCacheCapacity=Associativity× #ofsets× block_sizeBytes=blocks/set× sets× Bytes/block
ByteOffsetTag Index
C=N× S× B
address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)
10/16/17 Fall2017 - Lecture#15 40
TotalCacheCapacity=
41
Associativity*#ofsets*block_sizeBytes=blocks/set*sets*Bytes/block
ByteOffsetTag Index
C=N*S*B
address_size =tag_size +index_size +offset_size=tag_size +log2(S)+log2(B)
DoubletheAssociativity:Numberofsets?tag_size?index_size?#comparators?
DoubletheSets:Associativity?tag_size?index_size?#comparators?
10/16/17 Fall2017 - Lecture#15
YourTurn• Foracacheof64blocks,eachblockfourbytesinsize:1. Thecapacityofthecacheis:____ bytes.2. Givena2-waySetAssociativeorganization,thereare___ sets,eachof__
blocks,and__ placesablockfrommemorycouldbeplaced.3. Givena4-waySetAssociativeorganization,thereare____ setseachof__
blocksand__ placesablockfrommemorycouldbeplaced.4. Givenan8-waySetAssociativeorganization,thereare____ setseachof__
blocksand___ placesablockfrommemorycouldbeplaced.
10/16/17 Fall2017 - Lecture#15 42
-
10/16/17
8
PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?
(i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)
A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)only:Allfourstatementsaretrue
10/16/17 Fall2017 - Lecture#15 43
PeerInstruction• ForSsets,Nways,Bblocks,whichstatementshold?
(i)ThecachehasBtags(ii)ThecacheneedsNcomparators(iii)B=NxS(iv)SizeofIndex=Log2(S)
A:(i)onlyB:(i)and(ii)onlyC:(i),(ii),(iii)only:Allfourstatementsaretrue
10/16/17 Fall2017 - Lecture#15 44
CostsofSet-AssociativeCaches• N-wayset-associativecachecosts– Ncomparators(delayandarea)–MUXdelay(setselection)beforedataisavailable– Dataavailableaftersetselection(andHit/Missdecision).DM$:blockisavailablebeforetheHit/Missdecision• InSet-Associative,notpossibletojustassumeahitandcontinueandrecoverlaterifitwasamiss
• Whenmissoccurs,whichway’sblockselectedforreplacement?– LeastRecentlyUsed(LRU):onethathasbeenunusedthelongest(principleoftemporallocality)• Musttrackwheneachway’sblockwasusedrelativetootherblocksintheset• For2-waySA$,onebitperset→setto1whenablockisreferenced;resettheotherway’sbit(i.e.,“lastused”)
10/16/17 Fall2017 - Lecture#15 45
CacheReplacementPolicies• RandomReplacement
– Hardwarerandomlyselectsacacheevict• Least-RecentlyUsed
– Hardwarekeepstrackofaccesshistory– Replacetheentrythathasnotbeenusedforthelongesttime– For2-wayset-associativecache,needonebitforLRUreplacement
• ExampleofaSimple“Pseudo”LRUImplementation– Assume64FullyAssociativeentries– Hardwarereplacementpointerpointstoonecacheentry– Wheneveraccessismadetotheentrythepointerpointsto:
• Movethepointertothenextentry– Otherwise:donotmovethepointer– (exampleof“not-most-recentlyused”replacementpolicy)
46
:
Entry0Entry1
Entry63
ReplacementPointer
10/16/17 Fall2017 - Lecture#15
BenefitsofSet-AssociativeCaches• ChoiceofDM$versusSA$dependsonthecostofamissversusthecostof
implementation
• Largestgainsareingoingfromdirectmappedto2-way(20%+reductioninmissrate)
10/16/17 Fall2017 - Lecture#15 47
Outline
• CacheOrganizationandPrinciples• WriteBackvs.WriteThrough• CachePerformance• CacheDesignTradeoffs• AndinConclusion…
4810/16/17 Fall2017 – Lecture#15
-
10/16/17
9
ChipPhotos
4910/16/17 Fall2017- Lecture#15
And inConclusion…
• NameoftheGame:ReduceAMAT–ReduceHitTime–ReduceMissRate–ReduceMissPenalty
• Balancecacheparameters(Capacity,associativity,blocksize)
10/16/17 Fall2017 - Lecture#15 50