CS 230: Computer Organization and Assembly Language
-
Upload
cecilia-anthony -
Category
Documents
-
view
13 -
download
0
description
Transcript of CS 230: Computer Organization and Assembly Language
CMLCML
CS 230: Computer Organization and
Assembly LanguageAviral
ShrivastavaDepartment of Computer Science and
EngineeringSchool of Computing and Informatics
Arizona State University
Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB
CMLCML
Announcements• This Lecture: Caches
• Next Lecture: More Caches, Virtual Memory
• Finals– Tuesday, Dec 08, 2009– Please come on time (You’ll need all the time)– Open book, notes, and internet– No communication with any other human
CMLCML
Time, Time, Time• Making a Single Cycle Implementation is very easy
– Difficulty and excitement is in making it fast• Two fundamental methods to make Computers fast
– Pipelining– Caches
Address Instruction
InstructionMemory
Write Data
Reg Addr
Reg Addr
Reg Addr
Register
File ALU
DataMemory
Address
Write Data
Read DataPC
Read Data
Read Data
CMLCML
Kinds of Memory
CPU Registers 100s Bytes <10s ns
SRAM K Bytes 10-20 ns $.00003/bit
DRAM M Bytes 50ns-100ns $.00001/bit
Disk G Bytes ms 10-6 cents
Tape infinite sec-min
Flipflops
SRAM
DRAM
Disk
Tape
faster
larger
CMLCML
Memory Hierarchy: Insights
• Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to
the processor• Spatial Locality (Locality in Space):
=> Move blocks consists of contiguous words to the upper levels
Lower LevelMemoryUpper Level
MemoryTo Processor
From ProcessorBlk X
Blk Y
CMLCML
Memory Hierarchy: Terminology
• Hit: data appears in some block in the upper level (Block X) – Hit Rate: fraction of memory accesses found in the upper
level– Hit Time: Time to access the upper level which consists
of• RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level (Block Y)– Miss Rate = 1 - (Hit Rate)– Miss Penalty: Time to replace a block in the upper level
+ Time to deliver the block the processor
– Hit Time << Miss Penalty
Lower LevelMemoryUpper Level
MemoryTo Processor
From ProcessorBlk X
Blk Y
CMLCML
Memory Hierarchy: Show me numbers
• Consider application− 30% instructions are load/stores− Suppose memory latency = 100 cycles
− Time to execute 100 instructions = 70*1 + 30*100 = 3070 cycles
• Add a cache with latency 2 cycle− Suppose hit rate is 90%
− Time to execute 100 instructions= 70*1 + 27*2 + 3*100 = 70+54+300 = 424 cycles
CMLCML
Direct-Mapped Cache (1/2)
• In a direct-mapped cache, each memory address is associated with one possible block within the cache– Therefore, we only need to look in a
single location in the cache for the data if it exists in the cache
– Block is the unit of transfer between cache and memory
CMLCML
Direct-Mapped Cache (2/2)
Cache Location 0 can be occupied by data from:
– Memory location 0, 4, 8, ... – 4 blocks => any memory
location that is multiple of 4
MemoryMemory Address
0123456789ABCDEF
4 Byte Direct Mapped Cache
Cache Index
0123
CMLCML
Addressing in Direct-Mapped Cache
• Since multiple memory addresses map to same cache index, how do we tell which one is in there?
• What if we have a block size > 1 byte?• Answer: divide memory address into three
fields
ttttttttttttttttt iiiiiiiiii oooo
tag index byteto check to offsetif have selectwithincorrect block block block
CMLCML
Direct-Mapped Cache Terminology
• All fields are read as unsigned integers.
• Index: specifies the cache index (which “row” of the cache we should look in)
• Offset: once we’ve found correct block, specifies which byte within the block we want
• Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location
CMLCML
Direct-Mapped Cache Example (1/3)
• Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks
• Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture
• Offset– need to specify correct byte within a block– block contains 4 words
= 16 bytes = 24 bytes
– need 4 bits to specify correct byte
CMLCML
Direct-Mapped Cache Example (2/3)
• Index: (~index into an “array of blocks”)– need to specify correct row in cache– cache contains 16 KB = 214 bytes– block contains 24 bytes (4 words)– # blocks/cache
= bytes/cachebytes/block
= 214 bytes/cache 24 bytes/block
= 210 blocks/cache– need 10 bits to specify this many rows
CMLCML
Direct-Mapped Cache Example (3/3)
• Tag: use remaining bits as tag– tag length = addr length – offset - index
= 32 - 4 - 10 bits = 18 bits
– so tag is leftmost 18 bits of memory address
• Why not full 32 bit address as tag?– All bytes within block need same address (4b)– Index must be same for every address within a block, so
it’s redundant in tag check, thus can leave off to save memory (here 10 bits)
CMLCML
TIO
AREA (cache size, B)= HEIGHT (# of blocks) * WIDTH (size of one block, B/block)
WIDTH (size of one block, B/block)
HEIGHT(# of blocks)
AREA(cache size, B)
2(H+W) = 2H * 2W
Tag Index Offset
CMLCML
Accessing data in a direct mapped cache
• Ex.: 16KB of data, direct-mapped, 4 word blocks
• Read 4 addresses1. 0x000000142. 0x0000001C3. 0x000000344. 0x00008014
• Memory values on right:
– only cache/ memory level of hierarchy
Address (hex)Value of WordMemory
0000001000000014000000180000001C
abcd
... ...
0000003000000034000000380000003C
efgh
0000801000008014000080180000801C
ijkl
... ...
... ...
... ...
CMLCML
Accessing data in a direct mapped cache
• 4 Addresses:–0x00000014, 0x0000001C, 0x00000034, 0x00008014
• 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields
000000000000000000 0000000001 0100
000000000000000000 0000000001 1100
000000000000000000 0000000011 0100
000000000000000010 0000000001 0100
Tag Index Offset
CMLCML
16 KB Direct Mapped Cache, 16B blocks
• Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
Index00000000
00
CMLCML
1. Read 0x00000014
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 000000000000000000 0000000001 0100
Index
Tag field Index field Offset
00000000
00
CMLCML
So we read block 1 (0000000001)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 000000000000000000 0000000001 0100
Index
Tag field Index field Offset
00000000
00
CMLCML
No valid data
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 000000000000000000 0000000001 0100
Index
Tag field Index field Offset
00000000
00
CMLCML
Load data into cache, setting tag & valid
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000001 0100
Index
Tag field Index field Offset
0
000000
00
CMLCML
Read from cache at offset, return word b
• 000000000000000000 0000000001 0100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
Index
Tag field Index field Offset
0
000000
00
CMLCML
2. Read 0x0000001C = 0…00 0..001 1100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000001 1100
Index
Tag field Index field Offset
0
000000
00
CMLCML
Index is Valid
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000001 1100
Index
Tag field Index field Offset
0
000000
00
CMLCML
Index valid, Tag Matches
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000001 1100
Index
Tag field Index field Offset
0
000000
00
CMLCML
Index Valid, Tag Matches, return d
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000001 1100
Index
Tag field Index field Offset
0
000000
00
CMLCML
3. Read 0x00000034 = 0…00 0..011 0100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000011 0100
Index
Tag field Index field Offset
0
000000
00
CMLCML
So read block 3
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000011 0100
Index
Tag field Index field Offset
0
000000
00
CMLCML
No valid data
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000011 0100
Index
Tag field Index field Offset
0
000000
00
CMLCML
Load that cache block, return word f
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 0 a b c d
• 000000000000000000 0000000011 0100
1 0 e f g h
Index
Tag field Index field Offset
0
0
0000
00
CMLCML
A Direct Mapped Cache for MIPS
• What is the block size– 32-bits, 4 bytes
• How many blocks in cache– 1024
• How long is index– 10-bits
• How many blocks in memory– 2^32/4 = 2^30
• How many memory blocks map to the same cache block– 2^30/1024 =
2^20• How long is tag
– 20-bits
CMLCML
Yoda says…
You will find only what you bring in