EE-382M VLSI–II Early Planning for Memory Array...

58
Foil # 1 / 58 The University of Texas at Austin EE 382M Class Notes Early Planning for Memory Array Design EE-382M VLSI–II Steven C. Sullivan Gian Gerosa

Transcript of EE-382M VLSI–II Early Planning for Memory Array...

Page 1: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 1 / 58 The University of Texas at AustinEE 382M Class Notes

Early Planning for Memory Array

Design

EE-382M VLSI–II

Steven C. SullivanGian Gerosa

Page 2: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 2 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (5 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 3: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 3 / 58 The University of Texas at AustinEE 382M Class Notes

Access TimeCapacity

Register File 0.25-1ns0.5-1KB

Level 1 Cache 1-4ns8-64KB

Level 2 Cache 5-20ns256KB-2MB

Main Memory 35-50ns128-256MB

Hard Drive 5-10ms10-50GB

Memory Hierarchy

Processor

Memory hierarchy gives the appearance of large capacity and fast access time.

Page 4: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 4 / 58 The University of Texas at AustinEE 382M Class Notes

2006

1982

Processor-Memory Performance Gap

µProc60%/yr

DRAM7%/yr.

1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Performance Gap:(grows 50% per year)

Perf

orm

ance

The need for memory hierarchy is steadily

increasing. 20

0120

0220

0320

04

1.35X/yr

1.55X/yr

2005

2007

10000

Page 5: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 5 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Hierarchy Evolution

Chipset

Cache

DRAM

386No on-die cache.

L1 cache on motherboard.

CPU

Chipset

L2

DRAM

L1

486

CPU

Level 1 cache on-die. Level 2 on motherboard

Chipset

L2

DRAM

Pentium

I D

CPU

Separate Instruction and Data Caches

Page 6: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 6 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Hierarchy Evolution

Chipset DRAM

Pentium III

I D

CPU

L2 cache on-die

L2

Chipset DRAM

Pentium 4(Foster)

I D

CPU

L3 cache on-die

L2L3

Chipset

L2

DRAM

Pentium II

I D

CPU

Separate bus to L2 cache in same

package

Recent development: 3-D packaging allows more integration

Page 7: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 7 / 58 The University of Texas at AustinEE 382M Class Notes

P4

Page 8: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 8 / 58 The University of Texas at AustinEE 382M Class Notes

Functional Block Diagram

Multiplexors andSense Amplifiers

Column Decoder

Column Address

Data

Row

Decoder

Cell Array

2N x 2M2NNRowAddress

Word Lines

Read/WriteBuffer

2K 2K

2(M-K)

2M

(M-K)

“1-hot” select

Page 9: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 9 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (4 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 10: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 10 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Cell Overview

• A memory cell array has the following capabilities;• A means of storing bits of information (storage elements)• A means of selecting the stored information (wordlines)• A means of transferring data to/from storage elements (bitlines)

• 1T/1C memory cell is the simplest implementation• Only requires 1 W/L and 1 B/L metalization

• 6T SRAM cell consumes more area and requires true & complement bitlines, but is more stable and develops a sensing voltage faster than DRAM cell

• Register File cells allow multiple entries to be accessed or written simultaneously– However, this requires multiple wordlines and bitlines and

becomes metal-limited– Used for integer/floating point registers, single & multiple-cycle

queues and buffers

Page 11: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 11 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Cell Types

• Schematic of 1-T DRAM cell, 6T dual ended SRAM cell

WL

BL

1-transistor DRAM

Storagecap

WL

BL #BL

6-transistor SRAM

• Industry standard DRAM cell• Smallest area per bit• Explicit storage capacitor• Destructive READ

• Industry standard SRAM cell• Used for FAST static arrays• Cross-coupled inverters• Non-destructive READ with

proper stability analysis

Page 12: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 12 / 58 The University of Texas at AustinEE 382M Class Notes

WL

BL #BL

6-transistor SRAM cell

BL #BL

WL

GND

VDD

PFET

NFET

PASSGATE

1.0 μm

(65n

m)

0.68 μm (65nm)

In 65nm CMOS, a typical6T bitcell area = .68 μm2

Page 13: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 13 / 58 The University of Texas at AustinEE 382M Class Notes

Multi-Port Memory Cell Types

WWL

WBL

RWL

D #D

#WBL

RBL #RBL

1 Read (DE), 1 Write (DE)

Page 14: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 14 / 58 The University of Texas at AustinEE 382M Class Notes

Multi-Port Memory Cell Types

WWL

WBL

RBLRWL

D #D#WBL

1 Write (DE), 1 Read (SE)

Page 15: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 15 / 58 The University of Texas at AustinEE 382M Class Notes

Register File Multi-Ported Bitcell

VDD rwl wl0 GND wl1 GND

wl0

bl0

bl0b

VDD

bl1

GND

rbl

GND

bl1b

rwl

GND

RWL

D #D

WL0

WL1

BL0

BL1

BL0

BB

L1B

RBL

2 Write (DE), 1 Read (SE)

Page 16: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 16 / 58 The University of Texas at AustinEE 382M Class Notes

Multi-Port Memory Cell Types

WWL

WBL

RBL

RWL

D #D

#RBL

1 Write (SE), 1 Read (DE)

Page 17: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 17 / 58 The University of Texas at AustinEE 382M Class Notes

Multi-Port Memory Cell Types

WWL

WBL

RBL

RWL

D #D

#RBL

1 Write (SE), 1 Read (DE)Slight modification

Page 18: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 18 / 58 The University of Texas at AustinEE 382M Class Notes

Multi-Port Memory Cell Types

WWL

WBLRBL0

RWL0

D #D

RWL1

RBL1

1 Write (SE), 2 Read (SE)

Page 19: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 19 / 58 The University of Texas at AustinEE 382M Class Notes

Relative Memory Cell SizesDimensions in M1 pitches.

(assume M1 same)

Cell WL Dir BL Dir Area

1T 1 1.5 1.5

4T 3 4 12

6T 4 6 24

4R/2W 9 9 81

Page 20: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 20 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (4 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 21: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 21 / 58 The University of Texas at AustinEE 382M Class Notes

Array Design Choices

• Decoders– Predecoder & Banked WL Drivers - for large number of rows– Hierarchical WL & WL Repeaters - for large number of cols

• Cells– Differential - for few ports and large array size– Single Ended - for many ports or small array size

• Bitlines– Hierarchical - for many rows & available higher metal– Serial - for large number of rows & no higher metal

• Column Muxing– Differential - group by bit– Single Ended - group by entry

Page 22: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 22 / 58 The University of Texas at AustinEE 382M Class Notes

Basic Array Characteristics

• Array Size– Number of entries– Bits per entry

• Number of Ports– Number of simultaneous reads– Number of simultaneous writes

• Latency– Cycles from address to read data– Cycles from address to write completed

Page 23: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 23 / 58 The University of Texas at AustinEE 382M Class Notes

Precharge

Basic Array Layout

Cell

Address

BitL

ine

Bitline ReceiversWrite Buffers

Decoder

Rows

Columns

Cell

Cell

Read DataWrite Data

Pre-D

ec

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

CellCell

CellCellCellWordLine

Page 24: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 24 / 58 The University of Texas at AustinEE 382M Class Notes

Large Signal vs Small Signal Arrays

WordLine

Cell

Sense Amp

Bit Bit#

Data

Small Signal Arrays• Differential bitlines• Dual-ended Sense

amplifier

WordLine

Cell

Bit#

Data

Large Signal Arrays• Single-ended bitline• Inverter threshold

sense

Page 25: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 25 / 58 The University of Texas at AustinEE 382M Class Notes

• Small Signal Arrays:– DRAM and SRAM chips– Processor D-cache and I-cache

• Large Signal Arrays:– Processor register files– Multi-ported data structures

• Small Signal Arrays are less common because:– Sense amps require special characterization– More sensitive to noise– Area and timing overhead of differential sense amp– May not scale well to low supply voltage

Large Signal vs Small Signal Arrays

Page 26: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 26 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (5 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 27: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 27 / 58 The University of Texas at AustinEE 382M Class Notes

Register File Bitline Segmentation

• Problem: In general, long bitlines cause very slow edge rates– May consider converting to an SSA design approach

• However, very short bitlines causes overall area to increase– Array efficiency goes down; wastes valuable silicon area

• Solution: Break up bitline depth to determine optimal design point– Divide up into smaller sections & recombine with “wire-OR”

• Example #1 shows 16 memory cells on a bitline which drives a dynamic “wire OR” global bitline

• Example #2 shows a “serial” global bitline structure– The lower global bitline is in series with the upper global bitline

with a receiver and NMOS pulldown device in the center (acts like a “repeater”)

Page 28: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 28 / 58 The University of Texas at AustinEE 382M Class Notes

Register File Segmentation Example #1

Memorycell

Local BL

Global BL

Global BL receiver

Dynamic latch

#pc

Global bitline acts a dynamic “wire-OR”16 cells

Page 29: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 29 / 58 The University of Texas at AustinEE 382M Class Notes

Register File Segmentation Example #2

• Serial global bitline

Memorycell

Local BL

Global BL

Global BL receiver

Dynamic “wire OR”

Dynamic “wire OR”

#pc

#pc Dynamic latch

Page 30: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 30 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (5 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 31: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 31 / 58 The University of Texas at AustinEE 382M Class Notes

• Cell Area– 6T bitcell dimensions strongly dependent on technology

• Need an actual layout study to determine area– Multiported cells are wire limited and can be easily caclulated

• Cell Height is a function of {MV_Pitch*(Wordlines + Shields)}

• Cell Width is a function of {MH_Pitch*(Bitlines + Datalines + Shields)}

• Local Bitline Receivers and Dataline drivers– Height of array is increased by local bitline receivers

• NumReadPorts*NumEntries/CellPerLBL– Height of array is increased by local dataline drivers

• NumWritePorts*NumEntries/CellPerLBL

Array Area Estimation

Page 32: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 32 / 58 The University of Texas at AustinEE 382M Class Notes

Array Area Estimation

• Decoder & Wordline Repeaters

– Width of array is increased by the decoder

• Decoder width is a function of number of ports

• 20% of total array width is a reasonable estimate

– Width of array is increased by wordline repeaters

• Typically no more than 32 to 64 bitcells on a single wordline (limits rise/fall time of selected row)

Page 33: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 33 / 58 The University of Texas at AustinEE 382M Class Notes

Array Area EstimationCell Height & Width CalculationRecall

Cell Height = {MH_Pitch*(Wordlines + Shields)}

MH_Pitch*[(#R + #W) + WL_shield*(#R + #W + 1)]

Cell Width = {MV_Pitch*(Bitlines + Datalines + Shields)}

Mv_Pitch*(#R + Rd_shield*#R + 1) + (#W + Wr_shield*#W + 1)

Where

#R Number of Read Ports#W Number of Write PortsWL_shield Read wordline shield factorRd_shield Read bitline shield factorWr_shield Write dataline shield factorMH_Pitch Wordline PitchMV_Pitch Bitline Pitch

Page 34: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 34 / 58 The University of Texas at AustinEE 382M Class Notes

Array Area Estimation

Consider: 3 read ports & 2 write ports, 16-bits, 64-entryCell Height = MH_Pitch*(Wordlines + Shields)= MH_Pitch*[(#R + #W) + WL_shield*(#R + #W + 1)]

= 0.2um * [(3 + 2) + (5 shields + 1)] = 2.20um

Cell Width = MV_Pitch*(Bitlines + Datalines + Shields)= MV_Pitch*(#R + Rd_shield*#R + 1) + (#W + Wr_shield*#w + 1)

= 0.2um * [(3 + 0.5*3 + 1) + (2 + 0.5*2 + 1) ] = 1.90um

• Sub-array dimensions are:

X = 16 * (Cell_width) = 16 * 1.90um = 30.4umY = 64 * (Cell_Height) = 64 * 2.20um = 140.8um

Page 35: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 35 / 58 The University of Texas at AustinEE 382M Class Notes

SRAM Array Area Estimation

Estimate subarray first:1. # 6T bitcells * bitcell area + wordline & column decoders + sense-amp

+ read/write sequentials.2. The decoders + sense-amps + sequentials are typically 15% of the

subarray bitcell area.3. Use an ‘array efficiency’ factor to calculate the total SRAM array area;

this includes clock buffers, address decoders, control logic, repeaters, routing, etc.; typical numbers are in the range of ~60%.

EXAMPLE:

• A 16KB L1 cache with four 4KB subarrays; each subarray is comprised of 128 bitcells/colum and 256 bitcells/wordline; the 6T bitcell area in this 65 nm CMOS technology is 0.82 μm2.

Bitcell subarray = 0.68 μm2 * 128 * 256 = 22,282 μm2

Subarray = 1.15 * 22,282 = 25,624 μm2

4 subarrays = 4 * 25,624 = ~102,500 μm2

16KB L1 cache = 102,500 / 0.60 = 170,833 μm2 or ~ 0.17 mm2

Page 36: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 36 / 58 The University of Texas at AustinEE 382M Class Notes

Floorplan Options

DE

CO

DE

Sub Array

Rd Block

Wrt DriverCTL

DE

CO

DE

Sub Array

Rd Block

Wrt DriverCTL

Sub Array

Rd Block

Wrt Driver

Possible Large-Signal Array Floorplans• Array Area Calculator provides dimensions for these blocks

Pchg Pchg Pchg

Page 37: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 37 / 58 The University of Texas at AustinEE 382M Class Notes

Floorplanning ToolStructured Datapath

Page 38: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 38 / 58 The University of Texas at AustinEE 382M Class Notes

Sample FloorplanGenerated from a floorplanning CAD tool

bitslices

rwldrv

wwldrvdecode

mergelogic

Page 39: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 39 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (5 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 40: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 40 / 58 The University of Texas at AustinEE 382M Class Notes

wordline RC delay (example) 128 bitcells in a row

• RT = Σ Ri = 140 mΩ/ * 348μm/0.1μm = 487.2 Ω

• CT = Σ Ci = CM1 + Ggate

= 348μm * 0.23fF/μm + 128*(2*0.5μm)*2.0fF/μm= 80fF + 256fF = 336fF

• trow = 0.38 * RT * CT = 62ps (50% point of rising wave)

Break into components= wordline driver + wordline RC delay + column fall time + colmux + setup

Access Time Estimation

R1 R2 R128

C1 C2 C128clk

V128

Page 41: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 41 / 58 The University of Texas at AustinEE 382M Class Notes

Access Time EstimationColumn Fall Time• Assume bitline is discharged linearly, then we can use;• dV/dt = Iread/CBL

• Bitline falls to VDD/2 = 1.0V/2 in 113ps

68fF0.5um*600uA/um

dV/dt = WL=VDD

68fFCBL

0.5μm

Iread

LOWdV/dt = 4.41 V/ns

1.0V

VDD/2 50%

BL

t {ns}

V

113ps

dV/dt = 4.41 V/ns

CJ=1.25fF/μm2

Page 42: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 42 / 58 The University of Texas at AustinEE 382M Class Notes

Access Time Estimation

Sum up components of delay; assume inverter delay is 40ps and nand2 is about 60ps delay and setup into latch is 30ps;

Taccess = Wordline driver + wordline delay + column delay + column mux + setup

= (60ps + 40ps) + 62ps + 113ps + 60ps + 30ps

= 365ps

Should easily meet machine cycle time since low frequency … however,the above calculated value of 365ps is only the READ-ACCESS time …Wire routing and data capture budgets have not been factored yet.May be able to use a “high Vt” device if it is available from Fab

Page 43: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 43 / 58 The University of Texas at AustinEE 382M Class Notes

Preliminary Power Estimation

• Most power dissipation for an array occurs in bitlines and sense amplifiers• Calculate total bitline capacitance

– {Metal2 bitline cap} + {junction cap} X {number of bitcells}• Calculate sense node capacitive load to include in power dissipation • For power dissipation, use the approximation:

Pdiss = a * Ctotal * (Vsupply)2 * frequency

Where alpha is the “Activity Factor” 0 < a < 1

• Memory cells can contribute significant D.C. power due to leakage from many cells in standby; be sure to take that into account

Pstatic = Ileakage * VDD

Page 44: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 44 / 58 The University of Texas at AustinEE 382M Class Notes

Class Agenda

• Memory Hierarchy (6 foils)

• Memory Cell Types (9 foils)

• Basic Array Structure (5 foils)

• Bitline Segmentation (3 foils)

• Area Estimation (7 foils)

• Access Time & Power Estimation (4 foils)

• Clock & Power Distribution (4 foils)

Page 45: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 45 / 58 The University of Texas at AustinEE 382M Class Notes

Local Clock Distribution

• At high frequencies, clock uncertainties become a significant portion of the cycle time (10-15% of cycle time or more)

• Important to define the overall clocking scheme and distribution before implementation begins

• Clock inaccuracy is composed of 2 major sources;– Clock jitter: due to PLL, DLL, etc– Clock skew: mismatches in clock buffer tree, load,

inductance or variances due to process (Leff is not constant), VDD (it is not constant), and local temperature.

• A global clock grid that distributes to local clock buffers requires large overhead but helps minimize clock skew– LCB’s are evenly distributed within array block and tap off

of global clock grid with minimum route

Page 46: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 46 / 58 The University of Texas at AustinEE 382M Class Notes

Port1 Input Data LatchLCB

LCB

Port0 Input Data Latch LCB

LCB

Port0 Read/Write CktLCB

Port0 Output LatchLCB

LCB

Port1 Output LatchLCB

Port1 Read/Write Ckt

LCB

LCB

LCB

LCB

BitcellArray

Port1 Input Data LatchLCB

LCB

Port0 Input Data LatchLCB

LCB

Port0 Read/Write Ckt LCB

BitcellArray

Port0 D

ecoder

LCB

LCB

Port0 Output Latch LCB

LCBPort1 Output LatchPort1 Read/Write Ckt

LCB

LCB

LCB

LCB

LCB

LCB

LCB

Port0 Read/Write CktP

ort1 Decoder

LCB Placement

Large number of LCBs minimizes wire load from LCB to sequentials, thus reducing skew variance.

Page 47: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 47 / 58 The University of Texas at AustinEE 382M Class Notes

SAMPLE Power/Ground GRID

Shielding takes up significant routing resources.Global M6 routes over the array should have minimal coupling noise to array bitlines.

* Where λ is minimum critical dimension for width/space

Sig

Sig

Si g

Sig

VSS VDD VSSS

ig

48λ

Sig

Vss

Vss

Vss

Vss

(Full Shielding, MCF = 1.0)

λ

2λ2λ

λ

Page 48: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 48 / 58 The University of Texas at AustinEE 382M Class Notes

Power/Clock Grid• Clock grid is interleaved between VDD and VSS on metal6

Port1 Input Data LatchLCB

LCB

Port0 Input Data Latch LCB

LCB

Port0 Read/Write CktLCB

Port0 Output LatchLCB

LCB

Port1 Output LatchLCB

Port1 Read/Write Ckt

LCB

LCB

LCB

LCB

BitcellArray

Port1 Input Data LatchLCB

LCB

Port0 Input Data LatchLCB

LCB

Port0 Read/Write Ckt LCB

BitcellArray

Port0 D

ecoderLCB

LCB

Port0 Output Latch LCB

LCBPort1 Output LatchPort1 Read/Write Ckt

LCB

LCB

LCB

LCB

LCB

LCB

LCB

Port0 Read/Write CktP

ort1 Decoder

Page 49: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 49 / 58 The University of Texas at AustinEE 382M Class Notes

BACKUP

Page 50: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 50 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Array Performance

• Optimization of memory arrays and caches requires careful analysis of:– Size and speed of the array which impacts:

• Power: static and dynamic• Latency: number of clocks to access the memory cell• Area and aspect ratios• Redundancy

– Hit rate (caches): requires additional logic and tag arrays.– Architecture: How many levels of caching?

• In addition need to account for array BIST. This requires additional logic and impacts performance.

Page 51: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 51 / 58 The University of Texas at AustinEE 382M Class Notes

Memory Array Performance

Page 52: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 52 / 58 The University of Texas at AustinEE 382M Class Notes

Array Redundant Elements

Cell

Address

WordLine

BitL

ine

Bitline ReceiversWrite Buffers

Decoder

Rows

Columns

Cell

Cell

Read DataWrite Data

Pre-D

ec

Precharge

Redundant Address &

enable

Redundant Wordline &

Driver

Redundant Column & Bitslice

Account for area overhead if redundancy is used for repair

Page 53: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 53 / 58 The University of Texas at AustinEE 382M Class Notes

Trade-offsLarge Signal Arrays Small Signal Arrays

Simplest sense scheme• Single-ended bitlines

Need sense-amplifier• Dual-ended bitlines

Good noise margin• Vdd/2 threshold

Noise-sensitive• Few hundred millivolts ΔV

Lower bitcell density(Used for small queues & register files, 8 ~ 32 cells on a bitline)

Highest bitcell density(Used for large 1st & 2nd level cache arrays, 64, 128, 256 or more cells on a bitline)

Static timing analysis works Static timing analysis difficult

Multi-portedUsually single-ended;Many READ/WRITE ports

Single portedUsually dual-ended; 1 ~ 3 ports

Page 54: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 54 / 58 The University of Texas at AustinEE 382M Class Notes

Dual-Ended Cell Column MuxingAddr[6:0]

Read D

ecoder

128 Rows

2 Cols

Data[1:0]

Write D

ecoderC

ells

Addr[6:2]

Read D

ec

32 Rows

8 Cols

Data[0]

Write D

ec

Cells

Bit 0

Cells

Bit 1

4:1 4:1

Data[1]

Addr[1:0]

For minimum delay cell array should be roughly square.

Page 55: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 55 / 58 The University of Texas at AustinEE 382M Class Notes

Single Ended Cell Column Muxing

Single ended arrays must group bits of the same entry together, to write wordlines only on cells of one entry.

Addr[6:2]

Read D

ec

32 Rows

8 Cols

Data[0]

Cells E

ntry A4:1

Data[1]

Addr[1:0]

Write D

ec

Cells E

ntry B

Cells E

ntry C

Cells E

ntry D

Write D

ec

4:1

Addr[6:0]

Read D

ecoder

128 Rows

2 Cols

Data[1:0]

Write D

ecoderC

ells

Page 56: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 56 / 58 The University of Texas at AustinEE 382M Class Notes

Dual Ended vs Single Ended Column Muxing

Same bit of different entries grouped together.

Write data driven only on some columns.

Dual-Ended Cells

Write wordline “on’ for entire row.

Different bits of same entry grouped together.

Write data can be driven on every column.

Write wordline “on” for only 1 entry.

A0 B0 C0 D0 A1 B1 C1 D1

Data[0]

4:1

Data[1]

4:1

Read WL

Write WL

A0 A1 B0 B1 C0 C1 D0 D1

Data[0]

4:1

Data[1]

Read WL

Write WLs

Single-EndedCells

4:1

Page 57: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 57 / 58 The University of Texas at AustinEE 382M Class Notes

Segmentation Guidelines• Design considerations for segmenting the bitlines are based on

variables such as;– Number of entries– Number of ports– Number of bits

• Processor architecture and manufacturing technology also contribute to design decisions– For example, a high-leakage process may limit the number of

cells on a bitline before losing state

• The following table is a guideline to help determine how to divide up the bitlines for optimum performance– The final decision will be based on careful HSPICE

simulations of the different options over PVT variations

Page 58: EE-382M VLSI–II Early Planning for Memory Array Designusers.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_7.pdf · – Width of array is increased by the decoder • Decoder width is

Foil # 58 / 58 The University of Texas at AustinEE 382M Class Notes

Table of GuidelinesENTRIES PORTS <=64 <=128 <=256

1--7

Single Array; Split LBL with a maximum of 8 bits per LBL in M2; each to NAND2 receiver followed by a latch; GBL to the input of latch at the bottom in M4 ; 1-cycle latency is assumed

Split into 2 sub arrays with 64 entries each; LBL and GBL should follow the guidelines for similar ports; Output of GBL to NAND2 between subarrays. Single cycle latency is assumed

LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries. 2:1 mux between the two 128 entry sub-arrays; at least two cycle latency is required

8--16

Single Array; Split LBL with a maximum of 8 bits per LBL in M2;Each to a NAND2 receiver followed by latch;Split GBLs are routed in M4 to NAND2 ; dynamic latch in the middle; Lached outputs to destination drivers in M4 (or M3)

Split into 2 sub arrays with up to 64 entries each;LBL and GBL should follow the guidelines for entries with similar ports;Output of GBL to dynamic latch followed by latches;Two cycle latency is assumed.

LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries. 2:1 mux between the two 128 entry sub-arrays;More than 2-cycle latency is required.

17 --21

Single Array; Split LBL with a maximum of 8 bits per LBL in M2; each to NAND2 receiver followed by dynamic wire-ORSplit GBLs are routed in M4 (or M2) to NAND2Latch in the middle; Latched outputs to destination drivers in M4 (or M3); Maximum of 48 entries can be supported for this many ports

Split into 2 sub arrays with up to 48 entries each;LBL and GBL should follow the guidelines for similar ports;Ouput of GBL to dynamic latch followed by latches;At least 2-cycle latency assumed;

LBL and GBL guidelines are the same as < 64 entries with similar ports;Stacked twice for 256 entries.2:1 mux between the two 128 entry sub-arrays;More than 2-cycle latency is required