ESE570_MemCkts15

download ESE570_MemCkts15

of 49

description

ECD LAB

Transcript of ESE570_MemCkts15

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    1

    ESE 570 SEMICONDUCTOR MEMORIES

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    2

    GPUVideoRAM

    MemoryController

    I/OController

    System busAGP bus

    USB bus PCI bus

    Disk Adapter

    Ethernet AdapterOther

    buses

    Ch 1

    Ch 2

    DRAM DIMM

    DRAM DIMM

    CPUL1-D L1-I

    L2-Cache

    A Typical Computer System

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    3

    non-volatile volatile

    Non-Volatile Volatile

    ROM

    (no power required to hold data)

    (requires power to hold data)

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    4

    CPU Memory HierarchyCPU Chip

    off-chip cache

    memory

    L1 on-CPU cache 1k to 64 k SRAM (register file)

    L2

    L3

    L4

    64k to 4M

    4M to 32M(multi-core shared)

    8M

    SRAM or DRAM

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    5

    Memory hierarchies exploit locality by cacheing (keeping close to the processor) data likely to be used again.

    This is done because we can build large, slow memories & small, fast memories, but we cant build large, fast memories.

    If it works, we get the illusion of SRAM access time with disk based memory capacity.

    SRAM (static RAM) -- 5-20 ns access time, very expensive (on-CPU faster).

    DRAM (dynamic RAM) -- 60-100 ns, cheaper.

    Disk -- access time measured in milliseconds, very cheap.

    Locality and Cacheing

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    6

    1980 1985 1990 1995 2000 2005 2010Year

    100,000

    10,000

    1,000

    100

    10

    1

    CPU

    Memory

    Rel

    ativ

    e Pe

    rfor

    man

    ceWhy Do We Care about Memory Hierarchy?

    Processor Memory Performance Gap (grew about 50%/year)

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    7

    LowAccess TimeCycle Time

    Access Time (tAC) time required to read data from a single memory cellCycle Time (tC) time required to perform a read or write operation plus any recovery time before the next read/write operation can begin (measure of overall data rate)

    2 1* 3 4 2 2 2 1* 3 4 2 2

    *1 = Best

    Cache/PDAs

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    8

    TYPICAL RANDOM ACCESS MEMORY ARRAY ORGANIZATION

    CHIP

    I/O

    INTE

    RFAC

    E

    Data

    Address(N + M)

    ChipControlSignals

    G

    S,D

    TYPICAL RANDOM ACCESS MEMORY ARRAY ORGANIZATION (ONE (1) OF 2M-BIT WORD PER ROW)

    SENSE AMPLIFIERS/DRIVERS

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    9

    Practical Issues:1. N >> M

    a. Long, thin layout => awkward to fit into system chip floor-plan.b. Long bit lines slow memory access, i.e. more parasitic capacitance.

    2. 2N*2M very large, say 1010 to 1012 cellsa. Long bit lines slow memory access, i.e. more parasitic capacitance.

    Remedies:1. Reorganize memory by reducing the number of rows to 2N-k and increasing the

    number of rows to 2M+k, i.e. make 2. Construct large memories from smaller modular blocks

    NkM&k (complicates column decoder)

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    10

    Internal Row Buffer

    0

    1

    2

    3

    Rows

    0 1 2 3Cols

    Supercell (2, 1) 8 bits wide

    MemoryController

    (to CPU)

    addr

    data8

    2

    128 bit DRAM chip

    DRAM Chip Partitioned into Supercells

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    11

    NONVOLATILE MEMORY ROM

    Pseudo-nMOS

    NOR gate

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    12

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    13

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    14

    Pseudo-nMOS NAND gate

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    15

    MEMORYStoring

    L = 2N = 4 words, each 2M bits wide

    N decoder address bits needed to select a specific one of 2N words or rows in memory, one at a time.

    Purpose of ROW DECODER -> reduce number of external signals (or bits) needed to select a word or row from memory.

    L = 2N = 4

    N = 2

    N = 2 Address Bits to access each of 2N = 4 Word Lines

    DESIGN OF ROW AND COLUMN DECODERS

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    16

    (N*L) nMOS + (L) pMOS + 2N INVs (4N Xstrs)

    TOTAL = NL + L + 4N Xstrs

    L = 2N = 4 rows

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    17

    (2M-1 + 5M) Xstrs

    Let there be one (1) of 2M bit word per Row.

    Row decoder selects one (1) 2M bit lines.

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    18

    C2M

    CA1CA1

    CA2CA2

    CAMCAM

    no separate decoder needed.M series connected nMOS pass Xstrs.

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    19

    1.5

    3.5 fF/m2

    1.8 fF

    Rrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cellCrow = Cox (LnMOSWnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cell

    = 9, i.e. 29 = 512 rows= 6, i.e. 26 = 64 cols.

    R512

    C63

    R511

    C63 C64

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    20

    VG256

    Crow = Cox (LnMOS/WnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cellRrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cell

    = 256 x 80 = 20.48 k

    = 256 x 10.5 fF = 2688 fF= 20.9 ns

    *row

    Elmore Delay Formula

    *DN=j=1

    N

    C jk=1

    j

    Rk=Rrow C rowN 'N&1(

    2

    *row0.69*D64=1.2nswhere N = 64

    C64

    R512

    64

    6463 64

    63

    64

    VG64

    (at VG64

    )

    VG64

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    21

    REVIEW

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    22

    C column512C dbn=5120.0118 pF6 pF=512 x 1.8 fF = 0.9 pF

    *column

    R512

    R512

    512

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    23

    C column128C dbn=1280.0118 pF1.5 pF

    Other Parameters:VOH = VDD = 5V

    VT0n = - VT0p = 1 VnCox = 20 A/V

    2

    V OL0V

    .18ns

    =128 x 11.8 fF = 1.47 pF

    0.9 pF

    = 20.9 ns + 18 ns = 38.9 ns

    *column=*PHL

    11 ns

    *access=*row&*column=1.2ns&11 ns=12.2ns

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    24

    ACCESS DATA WHILE NOT MODIFYING THE DATA IN SRAM CELL

    (or Differential Column)

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    25

    of data

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    26

    6T CMOS SRAM Cell

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    27

    pre-charged

    (Rk from ROW Decoder) Rk

    Rk = 0: M3 & M4 are OFF

    If Rk = 0 for ALL rows (all k = 2N), the bit line capacitances CC and CNOT-C are

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    28

    a. WRITE 1 OP

    b. READ 1 OP

    c. WRITE 0 OP

    d. READ 0 OP

    Rk

    Rk = 1

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    29

    Rk -> 1

    Rk -> 1

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    30

    and interprets the result as a 1 data bit.

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    31

    Rk -> 1

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    32

    and interprets the result as a 0 data bit.

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    33

    Rk

    Rk

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    34

    Rk

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    35

    CMOS SRAM

    (1)

    (0)

    (0)

    (1) (1)

    (0)

    W DATA WB WB OPERATION (M3 ON)

    from ROW DECODERRkWRITE CKT

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    36

    MP2

    M2WB

    WB

    MA2

    MA4 MA5

    MA1MA3

    VNOT-C VC

    CMOS SRAMSRAM READ CIRCUIT

    Differential Sense Amplifier (one per column)

    Sense Amp Gain:

    Read Select

    Rk

    S

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    37

    Rk6T 1-bit CMOS

    SRAM Cell

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    38

    Static Dynamic RAM

    HISTORICAL EVOLUTION OF THE DRAM CELLPull-up transistors(two per column)

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    39

    3

    M1 M2M4 M3

    M1

    M2M3

    6

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    40

    INDUSTRY STANDARD 1T-1C DRAM CELL

    NOTE:Two-poly capacitors have very low dissipation

    M1

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    41

    Uses three-phase non-over-lapping clock scheme with PC = pre-charge, 1 = RS = read and 2 = WS or write. NOTE bit-lines are no longer complements.

    write select (WS)

    VB

    VC

    VNOT-C

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    42

    3-T DRAM CELL OPERATION - cont.

    I = 0

    WSWS = 1; RS = 0

    DATA = 0

    CNOT-C

    CC >> C

    Precharge PC = 1

    M2

    (WRITE)(READ)

    VC VB

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    43

    VBVC

    CC C

    WS

    V R=CC V C&C V B

    CC&CWhen WS = 1

    Since CC >> C V RV C independent of VB

    CHARGE SHARING IN 3T DRAM WRITE 1

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    44

    3-T DRAM CELL OPERATION - cont.

    Falling Vdata-out is interpreted as 1

    (Read 1 is non-destructive)

    VB keeps M2 ON

    WSWS = 0; RS = 1

    WS = 1; RS = 0WS

    DATA = 1

    CNOT-C

    CC Vdata-out

    VC VB

    DATA = 1

    3-T DRAM cell is Inverting

    MDATA MDATA (and M1)(and C) DISCHARGED

    CC >> C

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    45

    3-T DRAM CELL OPERATION - cont.

    VB keeps M2 OFF

    Read 0 is non-destructive

    High Vdata-out is interpreted as 0

    WS

    CC Vdata-out

    3-T DRAM cell is Inverting

    WS = 0;

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    46

    3-T DRAM CELL OPERATION - cont.

    Din

    Din

    Dout

    Dout

    WS

    DoutDin

    (WRITE) (READ)

    write1

    write0

    CC >> C

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    47

    1T-1C DRAM CELL OPERATION

    READ 0 OP: DESTROYS BIT STORED ON C => REFRESH is NEEDED

    VB

    V BL1=CC V PRE&C V B

    CC&C

    CC >> C

    R = 0 => VBL0

    = VPRE

    R = 1 =>

    )V BL=V BL1V BL0=CC V PRE&C V B

    CC&CV PRE

    )V BL=CC

    CC&C'V BV PRE (

    READ OP Charge Sharing

    )V BL=C

    CC&C'V BV PRE (

    READ 0: DESTROYS BIT STORED on C => REFRESH is NEEDEDSTEP 1 READ OP: Pre-charge column capacitance Cc to HIGH (VBL0 = VPRE = VDD/2)STEP 2 READ OP: Set R = 1 and detect VBL on Cc + C due to charge sharing

    WRITE 1 OP: D = 1, R = 1 (M1 ON) => C CHARGES to 1

    WRITE 0 OP: D = 0, R = 1 (M1 ON) => C DISCHARGES to 0

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    48

    EXAMPLE: 1T DRAM Read OP:

    Assume a bit line capacitance of CC = 1 pF, storage capacitance C = 50 fF

    and a bit line pre-charge VPRE

    = 1.25 V. Let the voltage stored on C to be V

    B = 2.5 V for a 1 and V

    B = 0 V for a 0. Determine the V

    BL1 for

    reading a 1 and the the VBL0

    for reading a 0.

    )V BL1=C

    CC&C'V BV PRE (=

    0.05 pF1 pF&0.05 pF

    '2.5V1.25V (=60 mV

    )V BL0=C

    CC&C'V BV PRE (=

    0.05 pF1 pF&0.05 pF

    '0V1.25V (=60mV

  • Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

    49

    Static SRAM+ Data is stored as long as power supply is applied+ Volatile or nonvolatile- Large cells (6-T per cell) fewer bits per unit chip area

    - 4x to 10x larger than comparable DRAM- 10x more expensive than comparable DRAM

    ++ Fast due to simple interface and efficient read/write operations+ Used where speed is important, e.g. caches

    + Differential outputso Use sense amplifiers to increase read performance

    Dynamic DRAM++ Small cells (1-T to 3-T per cell) more bits per unit chip area

    + 4x to 10x higher density than SRAM with same chip area- Periodic refresh required if DATA stored for > 1 msec- Volatile- Slow due to very complex interface

    - row/column access multiplexedo Used where speed in less important than high capacity, e.g. main memory

    SRAM vs. DRAM

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49