CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R....

36
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho , J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer Science University of Pittsburgh

Transcript of CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R....

Page 1: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

CA-RAM:A High-Performance Memory

Substrate for Search-Intensive Applications

Sangyeun Cho, J. R. Martin, R. Xu,M. H. Hammoud and R. Melhem

Dept. of Computer ScienceUniversity of Pittsburgh

Page 2: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Search ops in applications

Search (or lookup) operations represent an important common function

Network packet processing• For each arriving packet, determine the output port• Given packet information, find a matching classification rule• Each look up can incur many memory accesses

Speech recognition• Searching (e.g., dictionary lookup) takes up ~24% of CPU cycles

Forthcoming RMS (Recognition, Mining, and Synthesis) apps

Page 3: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Search performance and power

Search performance must match increasing line speeds• For OC-768, up to 104M packets must be processed per second• Network traffic has doubled every year [McKeown03]• Routing tables (~200K prefixes in a core router) are growing [RIS]

• IPv6

Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]

Search in battery-operated devices should be energy-efficient

Conventional search solutions• Software methods (tries, hash table, …)• Hardware methods (CAM, TCAM, …)

Page 4: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

IP lookup using a trie

Consider an IP address: 0 1 0 0 0 1 1 0

Software approach is “flexible”

high memory capacity requirement

high memory bandwidth requirement

not SCALABLE

Page 5: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

IP lookup using TCAM

Consider an IP address: 0 1 0 0 0 1 1 0

110100*110101*110111*01000*01100*01101*11011*0100*0110*1101*10*0*

sort beforestoring

choose the firstamong the matched high bandwidth, constant time

lookup

TCAMs are relatively small, expensive

power consumption very high

not SCALABLE

Page 6: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM – a hybrid approach

Can we do better than the existing conventional schemes?• CAM-like search performance• RAM-like cost and power

CA-RAM combines hashing w/ hardware parallel matching

CA-RAM design goals• High lookup performance• Low power consumption• Smaller chip area per stored datum• Straightforward system-level integration

Page 7: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Talk roadmap

What is CA-RAM?

Prototype design

Case study 1: IP lookup

Case study 2: Trigram lookup for speech recognition

Page 8: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM – Content Addressable RAM

Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array

Match logic

Memory cells

Conventional CAM/TCAM CA-RAM

Page 9: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Very simple, yet efficient

Use hashing to store keys in a particular row To look up, hash the search key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead!

Index generato

r

Keyi1

Match processor1

Keyi2

Keyj2Keyj1

Match processor2…

search key

Page 10: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Pipelined CA-RAM operation

Index generator

Search keyKeyi1

Match processor1

Keyi2

Keyj2Keyj1

Match processor2

Result

Match processor3

Keyi3

Keyj3

Step 1 Step 2 Step 3 Step 4

Index

Keyj2Keyj1 Keyj3

Search key Match processor2

Index generationMemory accessKey matchingResult forwarding

Page 11: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Dealing w/ bucket overflows

Careful design of hash function

Increase bucket size• Reduce load factor (); = # of occupied entries / # of total entries

Use “chaining”; store overflows in subsequent rows• Multiple accesses per lookup

Use a small overflow CAM, accessed in parallel• Similar to popular “victim caching”

Use two-level hashing and employ multiple CA-RAM banks

……

Page 12: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

……

Page 13: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Adapting key size

Keyi1

Reconfigurable match logic

Keyi2

Keyj2Keyj1

Keyi3

Keyj3

Match information

Keyi1 Keyi2

Keyj2Keyj1 Adapting key size is straightforward

Will benefit supporting multiple apps/ standards

Select key bitsfor matching

Page 14: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

……

Page 15: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Supporting binary/ternary matching

Reconfigurable match logic

Match information

Keyi1 Keyi2

Keyj2Keyj1

Search key

Maskj1

Maski1

Developed configurable comparator

T-matching requires 2 bits / 1 symbol

Supporting different types of matching in different bit positions feasible

Consider maskbits or not

Ternary2:1

Ki

MATCHi

Mi TMi

xor...

K0 K1 K2 KN-1

MATCH

MATCHN-1

Page 16: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

Storing data and keys in a CA-RAM module• Cuts # of memory accesses for a lookup by half

……

Page 17: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Simult. key matching & data access

Reconfigurable match logic

Match information

Keyi1 Keyi2

Keyj2Keyj1

Search key

Dataj1

Datai1

Data access follows TCAM lookup

CA-RAM supports data embedding

Cuts memory traffic & latency by half

Match result & Data

Match key &bypass data

Page 18: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half

Providing range checking capabilities• Beneficial for rule-based packet filtering

……

Page 19: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Supporting range checking

Reconfigurable match logic

Match information

Keyi1 Rangei1

Rangej1Keyj1

Search key

(Range checking causes troubles)

(Entries must be expanded)

CA-RAM can upport range checking efficiently Match key &

check range

Page 20: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM-based memory subsystem

CA-RAM slice

CA-RAM slice

CA-RAM slice

...

CA-RAM slice

CA-RAM slice

CA-RAM slice

...

... ... ...

Result queue

InputController

OutputController

Request queue

Config

Request

Result

Page 21: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Prototype implementation

We implemented a prototype CA-RAM slice design (w/ a degree of reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs

We used a standard cell (0.16m) based ASIC design flow

Step # cells Area, m2 Delay, ns

Expand search key 3,804 66,228 (0.89)

Calculate match vector 5,252 10,591 0.95

Decode match vector 899 1,970 1.91

Extract result 6,037 21,775 1.99

Total 15,992 100,564 4.85

Page 22: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Area and power: CA-RAM vs. TCAM

0123456789

10

16T SRAM-basedTCAM

8T DRAM-basedTCAM

6T DRAM-basedTCAM

DRAM-based ternaryCA-RAM

Per

Cel

l Are

a (u

m2)

@13

0nm

4.5x

11x

0

1

2

3

4

5

6

7

8

16T SRAM-basedTCAM

8T DRAM-basedTCAM

6T DRAM-basedTCAM

DRAM-based ternaryCA-RAM

4.5M

b P

ower

(W

) @

143M

Hz

14x

4x

Cell area (m2)@130nm CMOS

Power (W)4.5Mb @143MHz

CA-RAM area advantage 4.5x~11x

CA-RAM power advantage 4x~14x

Page 23: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Performance: CA-RAM vs. (T)CAM

Page 24: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

Case study 1: IP lookup

Page 25: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Problem description

Given• A set of prefixes (each prefix is associated with output port number)• IP address

Find a prefix that matches with input IP address and return output port number associated with it• In the presence of multiple matching prefixes, choose the longest

Procedure• Find a good hash function to distribute prefixes• Determine CA-RAM organization

Page 26: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Data set and hashing method

IP core router’s table having 186,760 entries

Bit selection scheme [Zane et al. ‘03]

• 98% of prefixes are at least 16 bits long• Select hash bits from the first 16 bits (low-order bits)

Page 27: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Shaping CA-RAM

Consider multiple design points:

Design B

Design A

Design D

Design C

Design EDesign F

2,048 rows (32 entries)

4,096 rows (64 entries)

( = 0.47)

( = 0.40)

( = 0.36)

( = 0.36)

( = 0.24)

( = 0.36)

Page 28: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

0

0.5

1

1.5

2

2.5

Design A Design B Design C Design D Design E Design F

Performance

0%

10%

20%

30%

40%

Design A Design B Design C Design D Design E Design F

Spilled entries

0

0.5

1

1.5

2

2.5

Design A Design B Design C Design D Design E Design F

Average memoryaccess latency

( = 0.47) ( = 0.40) ( = 0.36) ( = 0.36) ( = 0.24) ( = 0.36)

“Uniform” traffic

“Skewed” traffic

With a properly chosen ,

CA-RAM achieves near-constant AMAL

Page 29: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Area and power

0

0.2

0.4

0.6

0.8

1

1.2

TCAM TCAM

CA-RAM

CA-RAM

Area Power

CA-RAM advantageous over TCAM

Design B

Relative area orpower

Page 30: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

Case study 2: Trigram lookup in speech recognition

Page 31: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Problem, data set, and hashing

Problem• Look up a trigram in the trigram database

Data set• A subset of the Sphinx trigram database• We picked up entries having 13~16 characters• Still 5,385,231 entries or 86MB

Hashing• DJB, an efficient string hash function• (Used in Sphinx)

Page 32: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Result

Page 33: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Data distribution

Page 34: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

Area comparison

0

0.2

0.4

0.6

0.8

1

1.2

Relative area

CAM CA-RAM

Page 35: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

ISPASS 2007

CA-RAM conclusions

Compared w/ software methods• Less # of memory accesses; higher lookup performance

Compared w/ CAM or TCAM• Higher density matching that of DRAM large lookup table• Competitive performance• Low power – a critical advantage for cost-effective system design• Reconfigurable

Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, …Can adopt new standards much more easily, e.g., IPv6

Two case studies show the efficacy of the CA-RAM approach• 3~5× improvement in area and power, compared with CAM/TCAM

Page 36: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

CA-RAM:A High-Performance Memory

Substrate for Search-Intensive Applications

Questions?