ESE570_MemCkts15
-
Upload
narendra-chaurasia -
Category
Documents
-
view
216 -
download
0
description
Transcript of ESE570_MemCkts15
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
1
ESE 570 SEMICONDUCTOR MEMORIES
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
2
GPUVideoRAM
MemoryController
I/OController
System busAGP bus
USB bus PCI bus
Disk Adapter
Ethernet AdapterOther
buses
Ch 1
Ch 2
DRAM DIMM
DRAM DIMM
CPUL1-D L1-I
L2-Cache
A Typical Computer System
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
3
non-volatile volatile
Non-Volatile Volatile
ROM
(no power required to hold data)
(requires power to hold data)
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
4
CPU Memory HierarchyCPU Chip
off-chip cache
memory
L1 on-CPU cache 1k to 64 k SRAM (register file)
L2
L3
L4
64k to 4M
4M to 32M(multi-core shared)
8M
SRAM or DRAM
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
5
Memory hierarchies exploit locality by cacheing (keeping close to the processor) data likely to be used again.
This is done because we can build large, slow memories & small, fast memories, but we cant build large, fast memories.
If it works, we get the illusion of SRAM access time with disk based memory capacity.
SRAM (static RAM) -- 5-20 ns access time, very expensive (on-CPU faster).
DRAM (dynamic RAM) -- 60-100 ns, cheaper.
Disk -- access time measured in milliseconds, very cheap.
Locality and Cacheing
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
6
1980 1985 1990 1995 2000 2005 2010Year
100,000
10,000
1,000
100
10
1
CPU
Memory
Rel
ativ
e Pe
rfor
man
ceWhy Do We Care about Memory Hierarchy?
Processor Memory Performance Gap (grew about 50%/year)
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
7
LowAccess TimeCycle Time
Access Time (tAC) time required to read data from a single memory cellCycle Time (tC) time required to perform a read or write operation plus any recovery time before the next read/write operation can begin (measure of overall data rate)
2 1* 3 4 2 2 2 1* 3 4 2 2
*1 = Best
Cache/PDAs
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
8
TYPICAL RANDOM ACCESS MEMORY ARRAY ORGANIZATION
CHIP
I/O
INTE
RFAC
E
Data
Address(N + M)
ChipControlSignals
G
S,D
TYPICAL RANDOM ACCESS MEMORY ARRAY ORGANIZATION (ONE (1) OF 2M-BIT WORD PER ROW)
SENSE AMPLIFIERS/DRIVERS
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
9
Practical Issues:1. N >> M
a. Long, thin layout => awkward to fit into system chip floor-plan.b. Long bit lines slow memory access, i.e. more parasitic capacitance.
2. 2N*2M very large, say 1010 to 1012 cellsa. Long bit lines slow memory access, i.e. more parasitic capacitance.
Remedies:1. Reorganize memory by reducing the number of rows to 2N-k and increasing the
number of rows to 2M+k, i.e. make 2. Construct large memories from smaller modular blocks
NkM&k (complicates column decoder)
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
10
Internal Row Buffer
0
1
2
3
Rows
0 1 2 3Cols
Supercell (2, 1) 8 bits wide
MemoryController
(to CPU)
addr
data8
2
128 bit DRAM chip
DRAM Chip Partitioned into Supercells
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
11
NONVOLATILE MEMORY ROM
Pseudo-nMOS
NOR gate
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
12
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
13
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
14
Pseudo-nMOS NAND gate
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
15
MEMORYStoring
L = 2N = 4 words, each 2M bits wide
N decoder address bits needed to select a specific one of 2N words or rows in memory, one at a time.
Purpose of ROW DECODER -> reduce number of external signals (or bits) needed to select a word or row from memory.
L = 2N = 4
N = 2
N = 2 Address Bits to access each of 2N = 4 Word Lines
DESIGN OF ROW AND COLUMN DECODERS
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
16
(N*L) nMOS + (L) pMOS + 2N INVs (4N Xstrs)
TOTAL = NL + L + 4N Xstrs
L = 2N = 4 rows
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
17
(2M-1 + 5M) Xstrs
Let there be one (1) of 2M bit word per Row.
Row decoder selects one (1) 2M bit lines.
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
18
C2M
CA1CA1
CA2CA2
CAMCAM
no separate decoder needed.M series connected nMOS pass Xstrs.
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
19
1.5
3.5 fF/m2
1.8 fF
Rrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cellCrow = Cox (LnMOSWnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cell
= 9, i.e. 29 = 512 rows= 6, i.e. 26 = 64 cols.
R512
C63
R511
C63 C64
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
20
VG256
Crow = Cox (LnMOS/WnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cellRrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cell
= 256 x 80 = 20.48 k
= 256 x 10.5 fF = 2688 fF= 20.9 ns
*row
Elmore Delay Formula
*DN=j=1
N
C jk=1
j
Rk=Rrow C rowN 'N&1(
2
*row0.69*D64=1.2nswhere N = 64
C64
R512
64
6463 64
63
64
VG64
(at VG64
)
VG64
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
21
REVIEW
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
22
C column512C dbn=5120.0118 pF6 pF=512 x 1.8 fF = 0.9 pF
*column
R512
R512
512
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
23
C column128C dbn=1280.0118 pF1.5 pF
Other Parameters:VOH = VDD = 5V
VT0n = - VT0p = 1 VnCox = 20 A/V
2
V OL0V
.18ns
=128 x 11.8 fF = 1.47 pF
0.9 pF
= 20.9 ns + 18 ns = 38.9 ns
*column=*PHL
11 ns
*access=*row&*column=1.2ns&11 ns=12.2ns
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
24
ACCESS DATA WHILE NOT MODIFYING THE DATA IN SRAM CELL
(or Differential Column)
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
25
of data
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
26
6T CMOS SRAM Cell
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
27
pre-charged
(Rk from ROW Decoder) Rk
Rk = 0: M3 & M4 are OFF
If Rk = 0 for ALL rows (all k = 2N), the bit line capacitances CC and CNOT-C are
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
28
a. WRITE 1 OP
b. READ 1 OP
c. WRITE 0 OP
d. READ 0 OP
Rk
Rk = 1
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
29
Rk -> 1
Rk -> 1
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
30
and interprets the result as a 1 data bit.
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
31
Rk -> 1
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
32
and interprets the result as a 0 data bit.
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
33
Rk
Rk
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
34
Rk
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
35
CMOS SRAM
(1)
(0)
(0)
(1) (1)
(0)
W DATA WB WB OPERATION (M3 ON)
from ROW DECODERRkWRITE CKT
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
36
MP2
M2WB
WB
MA2
MA4 MA5
MA1MA3
VNOT-C VC
CMOS SRAMSRAM READ CIRCUIT
Differential Sense Amplifier (one per column)
Sense Amp Gain:
Read Select
Rk
S
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
37
Rk6T 1-bit CMOS
SRAM Cell
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
38
Static Dynamic RAM
HISTORICAL EVOLUTION OF THE DRAM CELLPull-up transistors(two per column)
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
39
3
M1 M2M4 M3
M1
M2M3
6
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
40
INDUSTRY STANDARD 1T-1C DRAM CELL
NOTE:Two-poly capacitors have very low dissipation
M1
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
41
Uses three-phase non-over-lapping clock scheme with PC = pre-charge, 1 = RS = read and 2 = WS or write. NOTE bit-lines are no longer complements.
write select (WS)
VB
VC
VNOT-C
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
42
3-T DRAM CELL OPERATION - cont.
I = 0
WSWS = 1; RS = 0
DATA = 0
CNOT-C
CC >> C
Precharge PC = 1
M2
(WRITE)(READ)
VC VB
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
43
VBVC
CC C
WS
V R=CC V C&C V B
CC&CWhen WS = 1
Since CC >> C V RV C independent of VB
CHARGE SHARING IN 3T DRAM WRITE 1
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
44
3-T DRAM CELL OPERATION - cont.
Falling Vdata-out is interpreted as 1
(Read 1 is non-destructive)
VB keeps M2 ON
WSWS = 0; RS = 1
WS = 1; RS = 0WS
DATA = 1
CNOT-C
CC Vdata-out
VC VB
DATA = 1
3-T DRAM cell is Inverting
MDATA MDATA (and M1)(and C) DISCHARGED
CC >> C
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
45
3-T DRAM CELL OPERATION - cont.
VB keeps M2 OFF
Read 0 is non-destructive
High Vdata-out is interpreted as 0
WS
CC Vdata-out
3-T DRAM cell is Inverting
WS = 0;
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
46
3-T DRAM CELL OPERATION - cont.
Din
Din
Dout
Dout
WS
DoutDin
(WRITE) (READ)
write1
write0
CC >> C
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
47
1T-1C DRAM CELL OPERATION
READ 0 OP: DESTROYS BIT STORED ON C => REFRESH is NEEDED
VB
V BL1=CC V PRE&C V B
CC&C
CC >> C
R = 0 => VBL0
= VPRE
R = 1 =>
)V BL=V BL1V BL0=CC V PRE&C V B
CC&CV PRE
)V BL=CC
CC&C'V BV PRE (
READ OP Charge Sharing
)V BL=C
CC&C'V BV PRE (
READ 0: DESTROYS BIT STORED on C => REFRESH is NEEDEDSTEP 1 READ OP: Pre-charge column capacitance Cc to HIGH (VBL0 = VPRE = VDD/2)STEP 2 READ OP: Set R = 1 and detect VBL on Cc + C due to charge sharing
WRITE 1 OP: D = 1, R = 1 (M1 ON) => C CHARGES to 1
WRITE 0 OP: D = 0, R = 1 (M1 ON) => C DISCHARGES to 0
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
48
EXAMPLE: 1T DRAM Read OP:
Assume a bit line capacitance of CC = 1 pF, storage capacitance C = 50 fF
and a bit line pre-charge VPRE
= 1.25 V. Let the voltage stored on C to be V
B = 2.5 V for a 1 and V
B = 0 V for a 0. Determine the V
BL1 for
reading a 1 and the the VBL0
for reading a 0.
)V BL1=C
CC&C'V BV PRE (=
0.05 pF1 pF&0.05 pF
'2.5V1.25V (=60 mV
)V BL0=C
CC&C'V BV PRE (=
0.05 pF1 pF&0.05 pF
'0V1.25V (=60mV
-
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15
49
Static SRAM+ Data is stored as long as power supply is applied+ Volatile or nonvolatile- Large cells (6-T per cell) fewer bits per unit chip area
- 4x to 10x larger than comparable DRAM- 10x more expensive than comparable DRAM
++ Fast due to simple interface and efficient read/write operations+ Used where speed is important, e.g. caches
+ Differential outputso Use sense amplifiers to increase read performance
Dynamic DRAM++ Small cells (1-T to 3-T per cell) more bits per unit chip area
+ 4x to 10x higher density than SRAM with same chip area- Periodic refresh required if DATA stored for > 1 msec- Volatile- Slow due to very complex interface
- row/column access multiplexedo Used where speed in less important than high capacity, e.g. main memory
SRAM vs. DRAM
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49